Revisiting negative selection algorithms.
Ji, Zhou; Dasgupta, Dipankar
2007-01-01
This paper reviews the progress of negative selection algorithms, an anomaly/change detection approach in Artificial Immune Systems (AIS). Following its initial model, we try to identify the fundamental characteristics of this family of algorithms and summarize their diversities. There exist various elements in this method, including data representation, coverage estimate, affinity measure, and matching rules, which are discussed for different variations. The various negative selection algorithms are categorized by different criteria as well. The relationship and possible combinations with other AIS or other machine learning methods are discussed. Prospective development and applicability of negative selection algorithms and their influence on related areas are then speculated based on the discussion.
Kernel Extended Real-Valued Negative Selection Algorithm (KERNSA)
2013-06-01
are discarded, which is similar to how T-cells function in the BIS. An unlabeled, future sample is considered non -self if any detectors match it. This...Affinity Performs Best With Each type of Dataset 65 5.1.4 More Kernel Functions . . . . . . . . . . . . . . . . . . . . . . . . 65 5.1.5 Automate the...13 2.5 The Negative Selection Algorithm (NSA). . . . . . . . . . . . . . . . . . . . . 16 2.6 Illustration of self and non -self
Negative Selection Algorithm for Aircraft Fault Detection
NASA Technical Reports Server (NTRS)
Dasgupta, D.; KrishnaKumar, K.; Wong, D.; Berry, M.
2004-01-01
We investigated a real-valued Negative Selection Algorithm (NSA) for fault detection in man-in-the-loop aircraft operation. The detection algorithm uses body-axes angular rate sensory data exhibiting the normal flight behavior patterns, to generate probabilistically a set of fault detectors that can detect any abnormalities (including faults and damages) in the behavior pattern of the aircraft flight. We performed experiments with datasets (collected under normal and various simulated failure conditions) using the NASA Ames man-in-the-loop high-fidelity C-17 flight simulator. The paper provides results of experiments with different datasets representing various failure conditions.
A real negative selection algorithm with evolutionary preference for anomaly detection
NASA Astrophysics Data System (ADS)
Yang, Tao; Chen, Wen; Li, Tao
2017-04-01
Traditional real negative selection algorithms (RNSAs) adopt the estimated coverage (c0) as the algorithm termination threshold, and generate detectors randomly. With increasing dimensions, the data samples could reside in the low-dimensional subspace, so that the traditional detectors cannot effectively distinguish these samples. Furthermore, in high-dimensional feature space, c0 cannot exactly reflect the detectors set coverage rate for the nonself space, and it could lead the algorithm to be terminated unexpectedly when the number of detectors is insufficient. These shortcomings make the traditional RNSAs to perform poorly in high-dimensional feature space. Based upon "evolutionary preference" theory in immunology, this paper presents a real negative selection algorithm with evolutionary preference (RNSAP). RNSAP utilizes the "unknown nonself space", "low-dimensional target subspace" and "known nonself feature" as the evolutionary preference to guide the generation of detectors, thus ensuring the detectors can cover the nonself space more effectively. Besides, RNSAP uses redundancy to replace c0 as the termination threshold, in this way RNSAP can generate adequate detectors under a proper convergence rate. The theoretical analysis and experimental result demonstrate that, compared to the classical RNSA (V-detector), RNSAP can achieve a higher detection rate, but with less detectors and computing cost.
Wang, Zhu; Shuangge, Ma; Wang, Ching-Yun
2017-01-01
In health services and outcome research, count outcomes are frequently encountered and often have a large proportion of zeros. The zero-inflated negative binomial (ZINB) regression model has important applications for this type of data. With many possible candidate risk factors, this paper proposes new variable selection methods for the ZINB model. We consider maximum likelihood function plus a penalty including the least absolute shrinkage and selection operator (LASSO), smoothly clipped absolute deviation (SCAD) and minimax concave penalty (MCP). An EM (expectation-maximization) algorithm is proposed for estimating the model parameters and conducting variable selection simultaneously. This algorithm consists of estimating penalized weighted negative binomial models and penalized logistic models via the coordinated descent algorithm. Furthermore, statistical properties including the standard error formulae are provided. A simulation study shows that the new algorithm not only has more accurate or at least comparable estimation, also is more robust than the traditional stepwise variable selection. The proposed methods are applied to analyze the health care demand in Germany using an open-source R package mpath. PMID:26059498
Wang, Zhu; Ma, Shuangge; Wang, Ching-Yun
2015-09-01
In health services and outcome research, count outcomes are frequently encountered and often have a large proportion of zeros. The zero-inflated negative binomial (ZINB) regression model has important applications for this type of data. With many possible candidate risk factors, this paper proposes new variable selection methods for the ZINB model. We consider maximum likelihood function plus a penalty including the least absolute shrinkage and selection operator (LASSO), smoothly clipped absolute deviation (SCAD), and minimax concave penalty (MCP). An EM (expectation-maximization) algorithm is proposed for estimating the model parameters and conducting variable selection simultaneously. This algorithm consists of estimating penalized weighted negative binomial models and penalized logistic models via the coordinated descent algorithm. Furthermore, statistical properties including the standard error formulae are provided. A simulation study shows that the new algorithm not only has more accurate or at least comparable estimation, but also is more robust than the traditional stepwise variable selection. The proposed methods are applied to analyze the health care demand in Germany using the open-source R package mpath. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
NegGOA: negative GO annotations selection using ontology structure.
Fu, Guangyuan; Wang, Jun; Yang, Bo; Yu, Guoxian
2016-10-01
Predicting the biological functions of proteins is one of the key challenges in the post-genomic era. Computational models have demonstrated the utility of applying machine learning methods to predict protein function. Most prediction methods explicitly require a set of negative examples-proteins that are known not carrying out a particular function. However, Gene Ontology (GO) almost always only provides the knowledge that proteins carry out a particular function, and functional annotations of proteins are incomplete. GO structurally organizes more than tens of thousands GO terms and a protein is annotated with several (or dozens) of these terms. For these reasons, the negative examples of a protein can greatly help distinguishing true positive examples of the protein from such a large candidate GO space. In this paper, we present a novel approach (called NegGOA) to select negative examples. Specifically, NegGOA takes advantage of the ontology structure, available annotations and potentiality of additional annotations of a protein to choose negative examples of the protein. We compare NegGOA with other negative examples selection algorithms and find that NegGOA produces much fewer false negatives than them. We incorporate the selected negative examples into an efficient function prediction model to predict the functions of proteins in Yeast, Human, Mouse and Fly. NegGOA also demonstrates improved accuracy than these comparing algorithms across various evaluation metrics. In addition, NegGOA is less suffered from incomplete annotations of proteins than these comparing methods. The Matlab and R codes are available at https://sites.google.com/site/guoxian85/neggoa gxyu@swu.edu.cn Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Liu, Xing-Cai; He, Shi-Wei; Song, Rui; Sun, Yang; Li, Hao-Dong
2014-01-01
Railway freight center location problem is an important issue in railway freight transport programming. This paper focuses on the railway freight center location problem in uncertain environment. Seeing that the expected value model ignores the negative influence of disadvantageous scenarios, a robust optimization model was proposed. The robust optimization model takes expected cost and deviation value of the scenarios as the objective. A cloud adaptive clonal selection algorithm (C-ACSA) was presented. It combines adaptive clonal selection algorithm with Cloud Model which can improve the convergence rate. Design of the code and progress of the algorithm were proposed. Result of the example demonstrates the model and algorithm are effective. Compared with the expected value cases, the amount of disadvantageous scenarios in robust model reduces from 163 to 21, which prove the result of robust model is more reliable.
An Evolutionary Algorithm to Generate Ellipsoid Detectors for Negative Selection
2005-03-21
of Congress on Evolutionary Computation. Honolulu,. 58. Lamont, Gary B., Robert E. Marmelstein, and David A. Van Veldhuizen . A Distributed Architecture...antibody and an antigen is a function of several processes including electrostatic interactions, hydrogen bonding, van der Waals interaction, and others [20...Kelly, Patrick M., Don R. Hush, and James M. White. “An Adaptive Algorithm for Modifying Hyperellipsoidal Decision Surfaces”. Journal of Artificial
Information Gain Based Dimensionality Selection for Classifying Text Documents
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dumidu Wijayasekara; Milos Manic; Miles McQueen
2013-06-01
Selecting the optimal dimensions for various knowledge extraction applications is an essential component of data mining. Dimensionality selection techniques are utilized in classification applications to increase the classification accuracy and reduce the computational complexity. In text classification, where the dimensionality of the dataset is extremely high, dimensionality selection is even more important. This paper presents a novel, genetic algorithm based methodology, for dimensionality selection in text mining applications that utilizes information gain. The presented methodology uses information gain of each dimension to change the mutation probability of chromosomes dynamically. Since the information gain is calculated a priori, the computational complexitymore » is not affected. The presented method was tested on a specific text classification problem and compared with conventional genetic algorithm based dimensionality selection. The results show an improvement of 3% in the true positives and 1.6% in the true negatives over conventional dimensionality selection methods.« less
Kherfi, Mohammed Lamine; Ziou, Djemel
2006-04-01
In content-based image retrieval, understanding the user's needs is a challenging task that requires integrating him in the process of retrieval. Relevance feedback (RF) has proven to be an effective tool for taking the user's judgement into account. In this paper, we present a new RF framework based on a feature selection algorithm that nicely combines the advantages of a probabilistic formulation with those of using both the positive example (PE) and the negative example (NE). Through interaction with the user, our algorithm learns the importance he assigns to image features, and then applies the results obtained to define similarity measures that correspond better to his judgement. The use of the NE allows images undesired by the user to be discarded, thereby improving retrieval accuracy. As for the probabilistic formulation of the problem, it presents a multitude of advantages and opens the door to more modeling possibilities that achieve a good feature selection. It makes it possible to cluster the query data into classes, choose the probability law that best models each class, model missing data, and support queries with multiple PE and/or NE classes. The basic principle of our algorithm is to assign more importance to features with a high likelihood and those which distinguish well between PE classes and NE classes. The proposed algorithm was validated separately and in image retrieval context, and the experiments show that it performs a good feature selection and contributes to improving retrieval effectiveness.
Quantum Monte Carlo Simulation of Frustrated Kondo Lattice Models
NASA Astrophysics Data System (ADS)
Sato, Toshihiro; Assaad, Fakher F.; Grover, Tarun
2018-03-01
The absence of the negative sign problem in quantum Monte Carlo simulations of spin and fermion systems has different origins. World-line based algorithms for spins require positivity of matrix elements whereas auxiliary field approaches for fermions depend on symmetries such as particle-hole symmetry. For negative-sign-free spin and fermionic systems, we show that one can formulate a negative-sign-free auxiliary field quantum Monte Carlo algorithm that allows Kondo coupling of fermions with the spins. Using this general approach, we study a half-filled Kondo lattice model on the honeycomb lattice with geometric frustration. In addition to the conventional Kondo insulator and antiferromagnetically ordered phases, we find a partial Kondo screened state where spins are selectively screened so as to alleviate frustration, and the lattice rotation symmetry is broken nematically.
Wang, Yuliang; Zhang, Zaicheng; Wang, Huimin; Bi, Shusheng
2015-01-01
Cell image segmentation plays a central role in numerous biology studies and clinical applications. As a result, the development of cell image segmentation algorithms with high robustness and accuracy is attracting more and more attention. In this study, an automated cell image segmentation algorithm is developed to get improved cell image segmentation with respect to cell boundary detection and segmentation of the clustered cells for all cells in the field of view in negative phase contrast images. A new method which combines the thresholding method and edge based active contour method was proposed to optimize cell boundary detection. In order to segment clustered cells, the geographic peaks of cell light intensity were utilized to detect numbers and locations of the clustered cells. In this paper, the working principles of the algorithms are described. The influence of parameters in cell boundary detection and the selection of the threshold value on the final segmentation results are investigated. At last, the proposed algorithm is applied to the negative phase contrast images from different experiments. The performance of the proposed method is evaluated. Results show that the proposed method can achieve optimized cell boundary detection and highly accurate segmentation for clustered cells. PMID:26066315
Negative Example Selection for Protein Function Prediction: The NoGO Database
Youngs, Noah; Penfold-Brown, Duncan; Bonneau, Richard; Shasha, Dennis
2014-01-01
Negative examples – genes that are known not to carry out a given protein function – are rarely recorded in genome and proteome annotation databases, such as the Gene Ontology database. Negative examples are required, however, for several of the most powerful machine learning methods for integrative protein function prediction. Most protein function prediction efforts have relied on a variety of heuristics for the choice of negative examples. Determining the accuracy of methods for negative example prediction is itself a non-trivial task, given that the Open World Assumption as applied to gene annotations rules out many traditional validation metrics. We present a rigorous comparison of these heuristics, utilizing a temporal holdout, and a novel evaluation strategy for negative examples. We add to this comparison several algorithms adapted from Positive-Unlabeled learning scenarios in text-classification, which are the current state of the art methods for generating negative examples in low-density annotation contexts. Lastly, we present two novel algorithms of our own construction, one based on empirical conditional probability, and the other using topic modeling applied to genes and annotations. We demonstrate that our algorithms achieve significantly fewer incorrect negative example predictions than the current state of the art, using multiple benchmarks covering multiple organisms. Our methods may be applied to generate negative examples for any type of method that deals with protein function, and to this end we provide a database of negative examples in several well-studied organisms, for general use (The NoGO database, available at: bonneaulab.bio.nyu.edu/nogo.html). PMID:24922051
MultiMiTar: a novel multi objective optimization based miRNA-target prediction method.
Mitra, Ramkrishna; Bandyopadhyay, Sanghamitra
2011-01-01
Machine learning based miRNA-target prediction algorithms often fail to obtain a balanced prediction accuracy in terms of both sensitivity and specificity due to lack of the gold standard of negative examples, miRNA-targeting site context specific relevant features and efficient feature selection process. Moreover, all the sequence, structure and machine learning based algorithms are unable to distribute the true positive predictions preferentially at the top of the ranked list; hence the algorithms become unreliable to the biologists. In addition, these algorithms fail to obtain considerable combination of precision and recall for the target transcripts that are translationally repressed at protein level. In the proposed article, we introduce an efficient miRNA-target prediction system MultiMiTar, a Support Vector Machine (SVM) based classifier integrated with a multiobjective metaheuristic based feature selection technique. The robust performance of the proposed method is mainly the result of using high quality negative examples and selection of biologically relevant miRNA-targeting site context specific features. The features are selected by using a novel feature selection technique AMOSA-SVM, that integrates the multi objective optimization technique Archived Multi-Objective Simulated Annealing (AMOSA) and SVM. MultiMiTar is found to achieve much higher Matthew's correlation coefficient (MCC) of 0.583 and average class-wise accuracy (ACA) of 0.8 compared to the others target prediction methods for a completely independent test data set. The obtained MCC and ACA values of these algorithms range from -0.269 to 0.155 and 0.321 to 0.582, respectively. Moreover, it shows a more balanced result in terms of precision and sensitivity (recall) for the translationally repressed data set as compared to all the other existing methods. An important aspect is that the true positive predictions are distributed preferentially at the top of the ranked list that makes MultiMiTar reliable for the biologists. MultiMiTar is now available as an online tool at www.isical.ac.in/~bioinfo_miu/multimitar.htm. MultiMiTar software can be downloaded from www.isical.ac.in/~bioinfo_miu/multimitar-download.htm.
Cheng, Jerome; Hipp, Jason; Monaco, James; Lucas, David R; Madabhushi, Anant; Balis, Ulysses J
2011-01-01
Spatially invariant vector quantization (SIVQ) is a texture and color-based image matching algorithm that queries the image space through the use of ring vectors. In prior studies, the selection of one or more optimal vectors for a particular feature of interest required a manual process, with the user initially stochastically selecting candidate vectors and subsequently testing them upon other regions of the image to verify the vector's sensitivity and specificity properties (typically by reviewing a resultant heat map). In carrying out the prior efforts, the SIVQ algorithm was noted to exhibit highly scalable computational properties, where each region of analysis can take place independently of others, making a compelling case for the exploration of its deployment on high-throughput computing platforms, with the hypothesis that such an exercise will result in performance gains that scale linearly with increasing processor count. An automated process was developed for the selection of optimal ring vectors to serve as the predicate matching operator in defining histopathological features of interest. Briefly, candidate vectors were generated from every possible coordinate origin within a user-defined vector selection area (VSA) and subsequently compared against user-identified positive and negative "ground truth" regions on the same image. Each vector from the VSA was assessed for its goodness-of-fit to both the positive and negative areas via the use of the receiver operating characteristic (ROC) transfer function, with each assessment resulting in an associated area-under-the-curve (AUC) figure of merit. Use of the above-mentioned automated vector selection process was demonstrated in two cases of use: First, to identify malignant colonic epithelium, and second, to identify soft tissue sarcoma. For both examples, a very satisfactory optimized vector was identified, as defined by the AUC metric. Finally, as an additional effort directed towards attaining high-throughput capability for the SIVQ algorithm, we demonstrated the successful incorporation of it with the MATrix LABoratory (MATLAB™) application interface. The SIVQ algorithm is suitable for automated vector selection settings and high throughput computation.
Hoffmann, S
1992-12-01
A prospective evaluation was made of an algorithm for a selective use of throat swabs in patients with sore throat in general practice. The algorithm states that a throat swab should be obtained (a) in all children younger than 15 years; (b) in patients aged 15 years or more who have pain on swallowing and at least three of four signs (enlarged or hyperaemic tonsils; exudate; enlarged or tender angular lymph nodes; and a temperature > or = 38 degrees C); and (c) in adults aged 15-44 years with pain on swallowing and one or two of the four signs, but not both cough and coryza. Group A streptococci were found by laboratory culture in 30% of throat swabs from 1783 patients. Using these results as the reference, the algorithm was 95% sensitive and 26% specific, and assigned 80% of the patients to be swabbed. Its positive and negative predictive values in this setting were 36% and 92%, respectively. It is concluded that this algorithm may be useful in general practice.
Absolute magnitude calibration using trigonometric parallax - Incomplete, spectroscopic samples
NASA Technical Reports Server (NTRS)
Ratnatunga, Kavan U.; Casertano, Stefano
1991-01-01
A new numerical algorithm is used to calibrate the absolute magnitude of spectroscopically selected stars from their observed trigonometric parallax. This procedure, based on maximum-likelihood estimation, can retrieve unbiased estimates of the intrinsic absolute magnitude and its dispersion even from incomplete samples suffering from selection biases in apparent magnitude and color. It can also make full use of low accuracy and negative parallaxes and incorporate censorship on reported parallax values. Accurate error estimates are derived for each of the fitted parameters. The algorithm allows an a posteriori check of whether the fitted model gives a good representation of the observations. The procedure is described in general and applied to both real and simulated data.
Using a Portfolio of Algorithms for Planning and Scheduling
NASA Technical Reports Server (NTRS)
Sherwood, Robert; Knight, Russell; Rabideau, Gregg; Chien, Steve; Tran, Daniel; Engelhardt, Barbara
2003-01-01
The Automated Scheduling and Planning Environment (ASPEN) software system, aspects of which have been reported in several previous NASA Tech Briefs articles, includes a subsystem that utilizes a portfolio of heuristic algorithms that work synergistically to solve problems. The nature of the synergy of the specific algorithms is that their likelihoods of success are negatively correlated: that is, when a combination of them is used to solve a problem, the probability that at least one of them will succeed is greater than the sum of probabilities of success of the individual algorithms operating independently of each other. In ASPEN, the portfolio of algorithms is used in a planning process of the iterative repair type, in which conflicts are detected and addressed one at a time until either no conflicts exist or a user-defined time limit has been exceeded. At each choice point (e.g., selection of conflict; selection of method of resolution of conflict; or choice of move, addition, or deletion) ASPEN makes a stochastic choice of a combination of algorithms from the portfolio. This approach makes it possible for the search to escape from looping and from solutions that are locally but not globally optimum.
Secure relay selection based on learning with negative externality in wireless networks
NASA Astrophysics Data System (ADS)
Zhao, Caidan; Xiao, Liang; Kang, Shan; Chen, Guiquan; Li, Yunzhou; Huang, Lianfen
2013-12-01
In this paper, we formulate relay selection into a Chinese restaurant game. A secure relay selection strategy is proposed for a wireless network, where multiple source nodes send messages to their destination nodes via several relay nodes, which have different processing and transmission capabilities as well as security properties. The relay selection utilizes a learning-based algorithm for the source nodes to reach their best responses in the Chinese restaurant game. In particular, the relay selection takes into account the negative externality of relay sharing among the source nodes, which learn the capabilities and security properties of relay nodes according to the current signals and the signal history. Simulation results show that this strategy improves the user utility and the overall security performance in wireless networks. In addition, the relay strategy is robust against the signal errors and deviations of some user from the desired actions.
Overby, Casey Lynnette; Pathak, Jyotishman; Gottesman, Omri; Haerian, Krystl; Perotte, Adler; Murphy, Sean; Bruce, Kevin; Johnson, Stephanie; Talwalkar, Jayant; Shen, Yufeng; Ellis, Steve; Kullo, Iftikhar; Chute, Christopher; Friedman, Carol; Bottinger, Erwin; Hripcsak, George; Weng, Chunhua
2013-01-01
Objective To describe a collaborative approach for developing an electronic health record (EHR) phenotyping algorithm for drug-induced liver injury (DILI). Methods We analyzed types and causes of differences in DILI case definitions provided by two institutions—Columbia University and Mayo Clinic; harmonized two EHR phenotyping algorithms; and assessed the performance, measured by sensitivity, specificity, positive predictive value, and negative predictive value, of the resulting algorithm at three institutions except that sensitivity was measured only at Columbia University. Results Although these sites had the same case definition, their phenotyping methods differed by selection of liver injury diagnoses, inclusion of drugs cited in DILI cases, laboratory tests assessed, laboratory thresholds for liver injury, exclusion criteria, and approaches to validating phenotypes. We reached consensus on a DILI phenotyping algorithm and implemented it at three institutions. The algorithm was adapted locally to account for differences in populations and data access. Implementations collectively yielded 117 algorithm-selected cases and 23 confirmed true positive cases. Discussion Phenotyping for rare conditions benefits significantly from pooling data across institutions. Despite the heterogeneity of EHRs and varied algorithm implementations, we demonstrated the portability of this algorithm across three institutions. The performance of this algorithm for identifying DILI was comparable with other computerized approaches to identify adverse drug events. Conclusions Phenotyping algorithms developed for rare and complex conditions are likely to require adaptive implementation at multiple institutions. Better approaches are also needed to share algorithms. Early agreement on goals, data sources, and validation methods may improve the portability of the algorithms. PMID:23837993
Localization Algorithm with On-line Path Loss Estimation and Node Selection
Bel, Albert; Vicario, José López; Seco-Granados, Gonzalo
2011-01-01
RSS-based localization is considered a low-complexity algorithm with respect to other range techniques such as TOA or AOA. The accuracy of RSS methods depends on the suitability of the propagation models used for the actual propagation conditions. In indoor environments, in particular, it is very difficult to obtain a good propagation model. For that reason, we present a cooperative localization algorithm that dynamically estimates the path loss exponent by using RSS measurements. Since the energy consumption is a key point in sensor networks, we propose a node selection mechanism to limit the number of neighbours of a given node that are used for positioning purposes. Moreover, the selection mechanism is also useful to discard bad links that could negatively affect the performance accuracy. As a result, we derive a practical solution tailored to the strict requirements of sensor networks in terms of complexity, size and cost. We present results based on both computer simulations and real experiments with the Crossbow MICA2 motes showing that the proposed scheme offers a good trade-off in terms of position accuracy and energy efficiency. PMID:22163992
Predicting Positive and Negative Relationships in Large Social Networks.
Wang, Guan-Nan; Gao, Hui; Chen, Lian; Mensah, Dennis N A; Fu, Yan
2015-01-01
In a social network, users hold and express positive and negative attitudes (e.g. support/opposition) towards other users. Those attitudes exhibit some kind of binary relationships among the users, which play an important role in social network analysis. However, some of those binary relationships are likely to be latent as the scale of social network increases. The essence of predicting latent binary relationships have recently began to draw researchers' attention. In this paper, we propose a machine learning algorithm for predicting positive and negative relationships in social networks inspired by structural balance theory and social status theory. More specifically, we show that when two users in the network have fewer common neighbors, the prediction accuracy of the relationship between them deteriorates. Accordingly, in the training phase, we propose a segment-based training framework to divide the training data into two subsets according to the number of common neighbors between users, and build a prediction model for each subset based on support vector machine (SVM). Moreover, to deal with large-scale social network data, we employ a sampling strategy that selects small amount of training data while maintaining high accuracy of prediction. We compare our algorithm with traditional algorithms and adaptive boosting of them. Experimental results of typical data sets show that our algorithm can deal with large social networks and consistently outperforms other methods.
Qeli, Ermir; Omasits, Ulrich; Goetze, Sandra; Stekhoven, Daniel J; Frey, Juerg E; Basler, Konrad; Wollscheid, Bernd; Brunner, Erich; Ahrens, Christian H
2014-08-28
The in silico prediction of the best-observable "proteotypic" peptides in mass spectrometry-based workflows is a challenging problem. Being able to accurately predict such peptides would enable the informed selection of proteotypic peptides for targeted quantification of previously observed and non-observed proteins for any organism, with a significant impact for clinical proteomics and systems biology studies. Current prediction algorithms rely on physicochemical parameters in combination with positive and negative training sets to identify those peptide properties that most profoundly affect their general detectability. Here we present PeptideRank, an approach that uses learning to rank algorithm for peptide detectability prediction from shotgun proteomics data, and that eliminates the need to select a negative dataset for the training step. A large number of different peptide properties are used to train ranking models in order to predict a ranking of the best-observable peptides within a protein. Empirical evaluation with rank accuracy metrics showed that PeptideRank complements existing prediction algorithms. Our results indicate that the best performance is achieved when it is trained on organism-specific shotgun proteomics data, and that PeptideRank is most accurate for short to medium-sized and abundant proteins, without any loss in prediction accuracy for the important class of membrane proteins. Targeted proteomics approaches have been gaining a lot of momentum and hold immense potential for systems biology studies and clinical proteomics. However, since only very few complete proteomes have been reported to date, for a considerable fraction of a proteome there is no experimental proteomics evidence that would allow to guide the selection of the best-suited proteotypic peptides (PTPs), i.e. peptides that are specific to a given proteoform and that are repeatedly observed in a mass spectrometer. We describe a novel, rank-based approach for the prediction of the best-suited PTPs for targeted proteomics applications. By building on methods developed in the field of information retrieval (e.g. web search engines like Google's PageRank), we circumvent the delicate step of selecting positive and negative training sets and at the same time also more closely reflect the experimentalist´s need for selecting e.g. the 5 most promising peptides for targeting a protein of interest. This approach allows to predict PTPs for not yet observed proteins or for organisms without prior experimental proteomics data such as many non-model organisms. Copyright © 2014 Elsevier B.V. All rights reserved.
Ju, Bin; Qian, Yuntao; Ye, Minchao; Ni, Rong; Zhu, Chenxi
2015-01-01
Predicting what items will be selected by a target user in the future is an important function for recommendation systems. Matrix factorization techniques have been shown to achieve good performance on temporal rating-type data, but little is known about temporal item selection data. In this paper, we developed a unified model that combines Multi-task Non-negative Matrix Factorization and Linear Dynamical Systems to capture the evolution of user preferences. Specifically, user and item features are projected into latent factor space by factoring co-occurrence matrices into a common basis item-factor matrix and multiple factor-user matrices. Moreover, we represented both within and between relationships of multiple factor-user matrices using a state transition matrix to capture the changes in user preferences over time. The experiments show that our proposed algorithm outperforms the other algorithms on two real datasets, which were extracted from Netflix movies and Last.fm music. Furthermore, our model provides a novel dynamic topic model for tracking the evolution of the behavior of a user over time. PMID:26270539
Ju, Bin; Qian, Yuntao; Ye, Minchao; Ni, Rong; Zhu, Chenxi
2015-01-01
Predicting what items will be selected by a target user in the future is an important function for recommendation systems. Matrix factorization techniques have been shown to achieve good performance on temporal rating-type data, but little is known about temporal item selection data. In this paper, we developed a unified model that combines Multi-task Non-negative Matrix Factorization and Linear Dynamical Systems to capture the evolution of user preferences. Specifically, user and item features are projected into latent factor space by factoring co-occurrence matrices into a common basis item-factor matrix and multiple factor-user matrices. Moreover, we represented both within and between relationships of multiple factor-user matrices using a state transition matrix to capture the changes in user preferences over time. The experiments show that our proposed algorithm outperforms the other algorithms on two real datasets, which were extracted from Netflix movies and Last.fm music. Furthermore, our model provides a novel dynamic topic model for tracking the evolution of the behavior of a user over time.
NASA Astrophysics Data System (ADS)
Pietrzyk, Mariusz W.; Donovan, Tim; Brennan, Patrick C.; Dix, Alan; Manning, David J.
2011-03-01
Aim: To optimize automated classification of radiological errors during lung nodule detection from chest radiographs (CxR) using a support vector machine (SVM) run on the spatial frequency features extracted from the local background of selected regions. Background: The majority of the unreported pulmonary nodules are visually detected but not recognized; shown by the prolonged dwell time values at false-negative regions. Similarly, overestimated nodule locations are capturing substantial amounts of foveal attention. Spatial frequency properties of selected local backgrounds are correlated with human observer responses either in terms of accuracy in indicating abnormality position or in the precision of visual sampling the medical images. Methods: Seven radiologists participated in the eye tracking experiments conducted under conditions of pulmonary nodule detection from a set of 20 postero-anterior CxR. The most dwelled locations have been identified and subjected to spatial frequency (SF) analysis. The image-based features of selected ROI were extracted with un-decimated Wavelet Packet Transform. An analysis of variance was run to select SF features and a SVM schema was implemented to classify False-Negative and False-Positive from all ROI. Results: A relative high overall accuracy was obtained for each individually developed Wavelet-SVM algorithm, with over 90% average correct ratio for errors recognition from all prolonged dwell locations. Conclusion: The preliminary results show that combined eye-tracking and image-based features can be used for automated detection of radiological error with SVM. The work is still in progress and not all analytical procedures have been completed, which might have an effect on the specificity of the algorithm.
Scholl, Zackary N.; Marszalek, Piotr E.
2013-01-01
The benefits of single molecule force spectroscopy (SMFS) clearly outweigh the challenges which include small sample sizes, tedious data collection and introduction of human bias during the subjective data selection. These difficulties can be partially eliminated through automation of the experimental data collection process for atomic force microscopy (AFM). Automation can be accomplished using an algorithm that triages usable force-extension recordings quickly with positive and negative selection. We implemented an algorithm based on the windowed fast Fourier transform of force-extension traces that identifies peaks using force-extension regimes to correctly identify usable recordings from proteins composed of repeated domains. This algorithm excels as a real-time diagnostic because it involves <30 ms computational time, has high sensitivity and specificity, and efficiently detects weak unfolding events. We used the statistics provided by the automated procedure to clearly demonstrate the properties of molecular adhesion and how these properties change with differences in the cantilever tip and protein functional groups and protein age. PMID:24001740
A reconsideration of negative ratings for network-based recommendation
NASA Astrophysics Data System (ADS)
Hu, Liang; Ren, Liang; Lin, Wenbin
2018-01-01
Recommendation algorithms based on bipartite networks have become increasingly popular, thanks to their accuracy and flexibility. Currently, many of these methods ignore users' negative ratings. In this work, we propose a method to exploit negative ratings for the network-based inference algorithm. We find that negative ratings play a positive role regardless of sparsity of data sets. Furthermore, we improve the efficiency of our method and compare it with the state-of-the-art algorithms. Experimental results show that the present method outperforms the existing algorithms.
NASA Astrophysics Data System (ADS)
Chen, Jun; Zhang, Xiangguang; Xing, Xiaogang; Ishizaka, Joji; Yu, Zhifeng
2017-12-01
Quantifying the diffuse attenuation coefficient of the photosynthetically available radiation (Kpar) can improve our knowledge of euphotic depth (Zeu) and biomass heating effects in the upper layers of oceans. An algorithm to semianalytically derive Kpar from remote sensing reflectance (Rrs) is developed for the global open oceans. This algorithm includes the following two portions: (1) a neural network model for deriving the diffuse attention coefficients (Kd) that considers the residual error in satellite Rrs, and (2) a three band depth-dependent Kpar algorithm (TDKA) for describing the spectrally selective attenuation mechanism of underwater solar radiation in the open oceans. This algorithm is evaluated with both in situ PAR profile data and satellite images, and the results show that it can produce acceptable PAR profile estimations while clearly removing the impacts of satellite residual errors on Kpar estimations. Furthermore, the performance of the TDKA algorithm is evaluated by its applicability in Zeu derivation and mean temperature within a mixed layer depth (TML) simulation, and the results show that it can significantly decrease the uncertainty in both compared with the classical chlorophyll-a concentration-based Kpar algorithm. Finally, the TDKA algorithm is applied in simulating biomass heating effects in the Sargasso Sea near Bermuda, with new Kpar data it is found that the biomass heating effects can lead to a 3.4°C maximum positive difference in temperature in the upper layers but could result in a 0.67°C maximum negative difference in temperature in the deep layers.
Morita, Kenji; Jitsev, Jenia; Morrison, Abigail
2016-09-15
Value-based action selection has been suggested to be realized in the corticostriatal local circuits through competition among neural populations. In this article, we review theoretical and experimental studies that have constructed and verified this notion, and provide new perspectives on how the local-circuit selection mechanisms implement reinforcement learning (RL) algorithms and computations beyond them. The striatal neurons are mostly inhibitory, and lateral inhibition among them has been classically proposed to realize "Winner-Take-All (WTA)" selection of the maximum-valued action (i.e., 'max' operation). Although this view has been challenged by the revealed weakness, sparseness, and asymmetry of lateral inhibition, which suggest more complex dynamics, WTA-like competition could still occur on short time scales. Unlike the striatal circuit, the cortical circuit contains recurrent excitation, which may enable retention or temporal integration of information and probabilistic "soft-max" selection. The striatal "max" circuit and the cortical "soft-max" circuit might co-implement an RL algorithm called Q-learning; the cortical circuit might also similarly serve for other algorithms such as SARSA. In these implementations, the cortical circuit presumably sustains activity representing the executed action, which negatively impacts dopamine neurons so that they can calculate reward-prediction-error. Regarding the suggested more complex dynamics of striatal, as well as cortical, circuits on long time scales, which could be viewed as a sequence of short WTA fragments, computational roles remain open: such a sequence might represent (1) sequential state-action-state transitions, constituting replay or simulation of the internal model, (2) a single state/action by the whole trajectory, or (3) probabilistic sampling of state/action. Copyright © 2016. Published by Elsevier B.V.
Prominent feature extraction for review analysis: an empirical study
NASA Astrophysics Data System (ADS)
Agarwal, Basant; Mittal, Namita
2016-05-01
Sentiment analysis (SA) research has increased tremendously in recent times. SA aims to determine the sentiment orientation of a given text into positive or negative polarity. Motivation for SA research is the need for the industry to know the opinion of the users about their product from online portals, blogs, discussion boards and reviews and so on. Efficient features need to be extracted for machine-learning algorithm for better sentiment classification. In this paper, initially various features are extracted such as unigrams, bi-grams and dependency features from the text. In addition, new bi-tagged features are also extracted that conform to predefined part-of-speech patterns. Furthermore, various composite features are created using these features. Information gain (IG) and minimum redundancy maximum relevancy (mRMR) feature selection methods are used to eliminate the noisy and irrelevant features from the feature vector. Finally, machine-learning algorithms are used for classifying the review document into positive or negative class. Effects of different categories of features are investigated on four standard data-sets, namely, movie review and product (book, DVD and electronics) review data-sets. Experimental results show that composite features created from prominent features of unigram and bi-tagged features perform better than other features for sentiment classification. mRMR is a better feature selection method as compared with IG for sentiment classification. Boolean Multinomial Naïve Bayes) algorithm performs better than support vector machine classifier for SA in terms of accuracy and execution time.
Comparison of trend analyses for Umkehr data using new and previous inversion algorithms
NASA Technical Reports Server (NTRS)
Reinsel, Gregory C.; Tam, Wing-Kuen; Ying, Lisa H.
1994-01-01
Ozone vertical profile Umkehr data for layers 3-9 obtained from 12 stations, using both previous and new inversion algorithms, were analyzed for trends. The trends estimated for the Umkehr data from the two algorithms were compared using two data periods, 1968-1991 and 1977-1991. Both nonseasonal and seasonal trend models were fitted. The overall annual trends are found to be significantly negative, of the order of -5% per decade, for layers 7 and 8 using both inversion algorithms. The largest negative trends occur in these layers under the new algorithm, whereas in the previous algorithm the most negative trend occurs in layer 9. The trend estimates, both annual and seasonal, are substantially different between the two algorithms mainly for layers 3, 4, and 9, where trends from the new algorithm data are about 2% per decade less negative, with less appreciable differences in layers 7 and 8. The trend results from the two data periods are similar, except for layer 3 where trends become more negative, by about -2% per decade, for 1977-1991.
Automated detection of ventricular pre-excitation in pediatric 12-lead ECG.
Gregg, Richard E; Zhou, Sophia H; Dubin, Anne M
2016-01-01
With increased interest in screening of young people for potential causes of sudden death, accurate automated detection of ventricular pre-excitation (VPE) or Wolff-Parkinson-White syndrome (WPW) in the pediatric resting ECG is important. Several recent studies have shown interobserver variability when reading screening ECGs and thus an accurate automated reading for this potential cause of sudden death is critical. We designed and tested an automated algorithm to detect pediatric VPE optimized for low prevalence. Digital ECGs with 12 leads or 15 leads (12-lead plus V3R, V4R and V7) were selected from multiple hospitals and separated into a testing and training database. Inclusion criterion was age less than 16 years. The reference for algorithm detection of VPE was cardiologist annotation of VPE for each ECG. The training database (n=772) consisted of VPE ECGs (n=37), normal ECGs (n=492) and a high concentration of conduction defects, RBBB (n=232) and LBBB (n=11). The testing database was a random sample (n=763). All ECGs were analyzed with the Philips DXL ECG Analysis algorithm for basic waveform measurements. Additional ECG features specific to VPE, mainly delta wave scoring, were calculated from the basic measurements and the average beat. A classifier based on decision tree bootstrap aggregation (tree bagger) was trained in multiple steps to select the number of decision trees and the 10 best features. The classifier accuracy was measured on the test database. The new algorithm detected pediatric VPE with a sensitivity of 78%, a specificity of 99.9%, a positive predictive value of 88% and negative predictive value of 99.7%. This new algorithm for detection of pediatric VPE performs well with a reasonable positive and negative predictive value despite the low prevalence in the general population. Copyright © 2016 Elsevier Inc. All rights reserved.
Detection of pneumonia using free-text radiology reports in the BioSense system.
Asatryan, Armenak; Benoit, Stephen; Ma, Haobo; English, Roseanne; Elkin, Peter; Tokars, Jerome
2011-01-01
Near real-time disease detection using electronic data sources is a public health priority. Detecting pneumonia is particularly important because it is the manifesting disease of several bioterrorism agents as well as a complication of influenza, including avian and novel H1N1 strains. Text radiology reports are available earlier than physician diagnoses and so could be integral to rapid detection of pneumonia. We performed a pilot study to determine which keywords present in text radiology reports are most highly associated with pneumonia diagnosis. Electronic radiology text reports from 11 hospitals from February 1, 2006 through December 31, 2007 were used. We created a computerized algorithm that searched for selected keywords ("airspace disease", "consolidation", "density", "infiltrate", "opacity", and "pneumonia"), differentiated between clinical history and radiographic findings, and accounted for negations and double negations; this algorithm was tested on a sample of 350 radiology reports. We used the algorithm to study 189,246 chest radiographs, searching for the keywords and determining their association with a final International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) diagnosis of pneumonia. Performance of the search algorithm in finding keywords, and association of the keywords with a pneumonia diagnosis. In the sample of 350 radiographs, the search algorithm was highly successful in identifying the selected keywords (sensitivity 98.5%, specificity 100%). Analysis of the 189,246 radiographs showed that the keyword "pneumonia" was the strongest predictor of an ICD-9-CM diagnosis of pneumonia (adjusted odds ratio 11.8) while "density" was the weakest (adjusted odds ratio 1.5). In general, the most highly associated keyword present in the report, regardless of whether a less highly associated keyword was also present, was the best predictor of a diagnosis of pneumonia. Empirical methods may assist in finding radiology report keywords that are most highly predictive of a pneumonia diagnosis. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.
Kim, Hyunsoo; Park, Haesun
2007-06-15
Many practical pattern recognition problems require non-negativity constraints. For example, pixels in digital images and chemical concentrations in bioinformatics are non-negative. Sparse non-negative matrix factorizations (NMFs) are useful when the degree of sparseness in the non-negative basis matrix or the non-negative coefficient matrix in an NMF needs to be controlled in approximating high-dimensional data in a lower dimensional space. In this article, we introduce a novel formulation of sparse NMF and show how the new formulation leads to a convergent sparse NMF algorithm via alternating non-negativity-constrained least squares. We apply our sparse NMF algorithm to cancer-class discovery and gene expression data analysis and offer biological analysis of the results obtained. Our experimental results illustrate that the proposed sparse NMF algorithm often achieves better clustering performance with shorter computing time compared to other existing NMF algorithms. The software is available as supplementary material.
NASA Technical Reports Server (NTRS)
Chidester, Thomas R.; Foushee, H. Clayton
1989-01-01
A full mission simulation research study was completed to assess the potential for selection along dimensions of personality. Using a selection algorithm described by Chidester (1987), captains were classified as fitting one of three profiles using a battery of personality assessment scales, and the performances of 23 crews led by captains fitting each profile were contrasted over a one and one-half day simulated trip. Crews led by captains fitting a Positive Instrumental Expressive profile (high achievement motivation and interpersonal skill) were consistently effective and made fewer errors. Crews led by captains fitting a Negative Communion profile (below average achievement motivation, negative expressive style, such as complaining) were consistently less effective and made more errors. Crews led by captains fitting a Negative Instrumental profile (high levels of Competitiveness, Verbal Aggressiveness, and Impatience and Irritability) were less effective on the first day but equal to the best on the second day. These results underscore the importance of stable personality variables as predictors of team coordination and performance.
Yakhelef, N; Audibert, M; Varaine, F; Chakaya, J; Sitienei, J; Huerga, H; Bonnet, M
2014-05-01
In 2007, the World Health Organization recommended introducing rapid Mycobacterium tuberculosis culture into the diagnostic algorithm of smear-negative pulmonary tuberculosis (TB). To assess the cost-effectiveness of introducing a rapid non-commercial culture method (thin-layer agar), together with Löwenstein-Jensen culture to diagnose smear-negative TB at a district hospital in Kenya. Outcomes (number of true TB cases treated) were obtained from a prospective study evaluating the effectiveness of a clinical and radiological algorithm (conventional) against the alternative algorithm (conventional plus M. tuberculosis culture) in 380 smear-negative TB suspects. The costs of implementing each algorithm were calculated using a 'micro-costing' or 'ingredient-based' method. We then compared the cost and effectiveness of conventional vs. culture-based algorithms and estimated the incremental cost-effectiveness ratio. The costs of conventional and culture-based algorithms per smear-negative TB suspect were respectively €39.5 and €144. The costs per confirmed and treated TB case were respectively €452 and €913. The culture-based algorithm led to diagnosis and treatment of 27 more cases for an additional cost of €1477 per case. Despite the increase in patients started on treatment thanks to culture, the relatively high cost of a culture-based algorithm will make it difficult for resource-limited countries to afford.
Zhan, Xue-yan; Zhao, Na; Lin, Zhao-zhou; Wu, Zhi-sheng; Yuan, Rui-juan; Qiao, Yan-jiang
2014-12-01
The appropriate algorithm for calibration set selection was one of the key technologies for a good NIR quantitative model. There are different algorithms for calibration set selection, such as Random Sampling (RS) algorithm, Conventional Selection (CS) algorithm, Kennard-Stone(KS) algorithm and Sample set Portioning based on joint x-y distance (SPXY) algorithm, et al. However, there lack systematic comparisons between two algorithms of the above algorithms. The NIR quantitative models to determine the asiaticoside content in Centella total glucosides were established in the present paper, of which 7 indexes were classified and selected, and the effects of CS algorithm, KS algorithm and SPXY algorithm for calibration set selection on the accuracy and robustness of NIR quantitative models were investigated. The accuracy indexes of NIR quantitative models with calibration set selected by SPXY algorithm were significantly different from that with calibration set selected by CS algorithm or KS algorithm, while the robustness indexes, such as RMSECV and |RMSEP-RMSEC|, were not significantly different. Therefore, SPXY algorithm for calibration set selection could improve the predicative accuracy of NIR quantitative models to determine asiaticoside content in Centella total glucosides, and have no significant effect on the robustness of the models, which provides a reference to determine the appropriate algorithm for calibration set selection when NIR quantitative models are established for the solid system of traditional Chinese medcine.
HIV misdiagnosis in sub-Saharan Africa: performance of diagnostic algorithms at six testing sites
Kosack, Cara S.; Shanks, Leslie; Beelaert, Greet; Benson, Tumwesigye; Savane, Aboubacar; Ng’ang’a, Anne; Andre, Bita; Zahinda, Jean-Paul BN; Fransen, Katrien; Page, Anne-Laure
2017-01-01
Abstract Introduction: We evaluated the diagnostic accuracy of HIV testing algorithms at six programmes in five sub-Saharan African countries. Methods: In this prospective multisite diagnostic evaluation study (Conakry, Guinea; Kitgum, Uganda; Arua, Uganda; Homa Bay, Kenya; Doula, Cameroun and Baraka, Democratic Republic of Congo), samples from clients (greater than equal to five years of age) testing for HIV were collected and compared to a state-of-the-art algorithm from the AIDS reference laboratory at the Institute of Tropical Medicine, Belgium. The reference algorithm consisted of an enzyme-linked immuno-sorbent assay, a line-immunoassay, a single antigen-enzyme immunoassay and a DNA polymerase chain reaction test. Results: Between August 2011 and January 2015, over 14,000 clients were tested for HIV at 6 HIV counselling and testing sites. Of those, 2786 (median age: 30; 38.1% males) were included in the study. Sensitivity of the testing algorithms ranged from 89.5% in Arua to 100% in Douala and Conakry, while specificity ranged from 98.3% in Doula to 100% in Conakry. Overall, 24 (0.9%) clients, and as many as 8 per site (1.7%), were misdiagnosed, with 16 false-positive and 8 false-negative results. Six false-negative specimens were retested with the on-site algorithm on the same sample and were found to be positive. Conversely, 13 false-positive specimens were retested: 8 remained false-positive with the on-site algorithm. Conclusions: The performance of algorithms at several sites failed to meet expectations and thresholds set by the World Health Organization, with unacceptably high rates of false results. Alongside the careful selection of rapid diagnostic tests and the validation of algorithms, strictly observing correct procedures can reduce the risk of false results. In the meantime, to identify false-positive diagnoses at initial testing, patients should be retested upon initiating antiretroviral therapy. PMID:28691437
HIV misdiagnosis in sub-Saharan Africa: performance of diagnostic algorithms at six testing sites.
Kosack, Cara S; Shanks, Leslie; Beelaert, Greet; Benson, Tumwesigye; Savane, Aboubacar; Ng'ang'a, Anne; Andre, Bita; Zahinda, Jean-Paul Bn; Fransen, Katrien; Page, Anne-Laure
2017-07-03
We evaluated the diagnostic accuracy of HIV testing algorithms at six programmes in five sub-Saharan African countries. In this prospective multisite diagnostic evaluation study (Conakry, Guinea; Kitgum, Uganda; Arua, Uganda; Homa Bay, Kenya; Doula, Cameroun and Baraka, Democratic Republic of Congo), samples from clients (greater than equal to five years of age) testing for HIV were collected and compared to a state-of-the-art algorithm from the AIDS reference laboratory at the Institute of Tropical Medicine, Belgium. The reference algorithm consisted of an enzyme-linked immuno-sorbent assay, a line-immunoassay, a single antigen-enzyme immunoassay and a DNA polymerase chain reaction test. Between August 2011 and January 2015, over 14,000 clients were tested for HIV at 6 HIV counselling and testing sites. Of those, 2786 (median age: 30; 38.1% males) were included in the study. Sensitivity of the testing algorithms ranged from 89.5% in Arua to 100% in Douala and Conakry, while specificity ranged from 98.3% in Doula to 100% in Conakry. Overall, 24 (0.9%) clients, and as many as 8 per site (1.7%), were misdiagnosed, with 16 false-positive and 8 false-negative results. Six false-negative specimens were retested with the on-site algorithm on the same sample and were found to be positive. Conversely, 13 false-positive specimens were retested: 8 remained false-positive with the on-site algorithm. The performance of algorithms at several sites failed to meet expectations and thresholds set by the World Health Organization, with unacceptably high rates of false results. Alongside the careful selection of rapid diagnostic tests and the validation of algorithms, strictly observing correct procedures can reduce the risk of false results. In the meantime, to identify false-positive diagnoses at initial testing, patients should be retested upon initiating antiretroviral therapy.
A Negative Selection Immune System Inspired Methodology for Fault Diagnosis of Wind Turbines.
Alizadeh, Esmaeil; Meskin, Nader; Khorasani, Khashayar
2017-11-01
High operational and maintenance costs represent as major economic constraints in the wind turbine (WT) industry. These concerns have made investigation into fault diagnosis of WT systems an extremely important and active area of research. In this paper, an immune system (IS) inspired methodology for performing fault detection and isolation (FDI) of a WT system is proposed and developed. The proposed scheme is based on a self nonself discrimination paradigm of a biological IS. Specifically, the negative selection mechanism [negative selection algorithm (NSA)] of the human body is utilized. In this paper, a hierarchical bank of NSAs are designed to detect and isolate both individual as well as simultaneously occurring faults common to the WTs. A smoothing moving window filter is then utilized to further improve the reliability and performance of the FDI scheme. Moreover, the performance of our proposed scheme is compared with another state-of-the-art data-driven technique, namely the support vector machines (SVMs) to demonstrate and illustrate the superiority and advantages of our proposed NSA-based FDI scheme. Finally, a nonparametric statistical comparison test is implemented to evaluate our proposed methodology with that of the SVM under various fault severities.
A fast automatic recognition and location algorithm for fetal genital organs in ultrasound images.
Tang, Sheng; Chen, Si-ping
2009-09-01
Severe sex ratio imbalance at birth is now becoming an important issue in several Asian countries. Its leading immediate cause is prenatal sex-selective abortion following illegal sex identification by ultrasound scanning. In this paper, a fast automatic recognition and location algorithm for fetal genital organs is proposed as an effective method to help prevent ultrasound technicians from unethically and illegally identifying the sex of the fetus. This automatic recognition algorithm can be divided into two stages. In the 'rough' stage, a few pixels in the image, which are likely to represent the genital organs, are automatically chosen as points of interest (POIs) according to certain salient characteristics of fetal genital organs. In the 'fine' stage, a specifically supervised learning framework, which fuses an effective feature data preprocessing mechanism into the multiple classifier architecture, is applied to every POI. The basic classifiers in the framework are selected from three widely used classifiers: radial basis function network, backpropagation network, and support vector machine. The classification results of all the POIs are then synthesized to determine whether the fetal genital organ is present in the image, and to locate the genital organ within the positive image. Experiments were designed and carried out based on an image dataset comprising 658 positive images (images with fetal genital organs) and 500 negative images (images without fetal genital organs). The experimental results showed true positive (TP) and true negative (TN) results from 80.5% (265 from 329) and 83.0% (415 from 500) of samples, respectively. The average computation time was 453 ms per image.
The influence of negative training set size on machine learning-based virtual screening.
Kurczab, Rafał; Smusz, Sabina; Bojarski, Andrzej J
2014-01-01
The paper presents a thorough analysis of the influence of the number of negative training examples on the performance of machine learning methods. The impact of this rather neglected aspect of machine learning methods application was examined for sets containing a fixed number of positive and a varying number of negative examples randomly selected from the ZINC database. An increase in the ratio of positive to negative training instances was found to greatly influence most of the investigated evaluating parameters of ML methods in simulated virtual screening experiments. In a majority of cases, substantial increases in precision and MCC were observed in conjunction with some decreases in hit recall. The analysis of dynamics of those variations let us recommend an optimal composition of training data. The study was performed on several protein targets, 5 machine learning algorithms (SMO, Naïve Bayes, Ibk, J48 and Random Forest) and 2 types of molecular fingerprints (MACCS and CDK FP). The most effective classification was provided by the combination of CDK FP with SMO or Random Forest algorithms. The Naïve Bayes models appeared to be hardly sensitive to changes in the number of negative instances in the training set. In conclusion, the ratio of positive to negative training instances should be taken into account during the preparation of machine learning experiments, as it might significantly influence the performance of particular classifier. What is more, the optimization of negative training set size can be applied as a boosting-like approach in machine learning-based virtual screening.
The influence of negative training set size on machine learning-based virtual screening
2014-01-01
Background The paper presents a thorough analysis of the influence of the number of negative training examples on the performance of machine learning methods. Results The impact of this rather neglected aspect of machine learning methods application was examined for sets containing a fixed number of positive and a varying number of negative examples randomly selected from the ZINC database. An increase in the ratio of positive to negative training instances was found to greatly influence most of the investigated evaluating parameters of ML methods in simulated virtual screening experiments. In a majority of cases, substantial increases in precision and MCC were observed in conjunction with some decreases in hit recall. The analysis of dynamics of those variations let us recommend an optimal composition of training data. The study was performed on several protein targets, 5 machine learning algorithms (SMO, Naïve Bayes, Ibk, J48 and Random Forest) and 2 types of molecular fingerprints (MACCS and CDK FP). The most effective classification was provided by the combination of CDK FP with SMO or Random Forest algorithms. The Naïve Bayes models appeared to be hardly sensitive to changes in the number of negative instances in the training set. Conclusions In conclusion, the ratio of positive to negative training instances should be taken into account during the preparation of machine learning experiments, as it might significantly influence the performance of particular classifier. What is more, the optimization of negative training set size can be applied as a boosting-like approach in machine learning-based virtual screening. PMID:24976867
Nguyen, Van Thi Thuy; Best, Susan; Pham, Hong Thang; Troung, Thi Xuan Lien; Hoang, Thi Thanh Ha; Wilson, Kim; Ngo, Thi Hong Hanh; Chien, Xuan; Lai, Kim Anh; Bui, Duc Duong; Kato, Masaya
2017-08-29
In Vietnam, HIV testing services had been available only at provincial and district health facilities, but not at the primary health facilities. Consequently, access to HIV testing services had been limited especially in rural areas. In 2012, Vietnam piloted decentralization and integration of HIV services at commune health stations (CHSs). As a part of this pilot, a three-rapid test algorithm was introduced at CHSs. The objective of this study was to assess the performance of a three-rapid test algorithm and the implementation of quality assurance measures to prevent misdiagnosis, at primary health facilities. The three-rapid test algorithm (Determine HIV-1/2, followed by ACON HIV 1/2 and DoubleCheckGold HIV 1&2 in parallel) was piloted at CHSs from August 2012 to December 2013. Commune health staff were trained to perform HIV testing. Specimens from CHSs were sent to the provincial confirmatory laboratory (PCL) for confirmatory and validation testing. Quality assurance measures were undertaken including training, competency assessment, field technical assistance, supervision and monitoring and external quality assessment (EQA). Data on HIV testing were collected from the testing logbooks at commune and provincial facilities. Descriptive analysis was conducted. Sensitivity and specificity of the rapid testing algorithm were calculated. A total of 1,373 people received HIV testing and counselling (HTC) at CHSs. Eighty people were diagnosed with HIV infection (5.8%). The 755/1244 specimens reported as HIV negative at the CHS were sent to PCL and confirmed as negative, and all 80 specimens reported as HIV positive at CHS were confirmed as positive at the PCL. Forty-nine specimens that were reactive with Determine but negative with ACON and DoubleCheckGold at the CHSs were confirmed negative at the PCL. The results show this rapid test algorithm to be 100% sensitive and 100% specific. Of 21 CHSs that received two rounds of EQA panels, 20 CHSs submitted accurate results. Decentralization of HIV confirmatory testing to CHS is feasible in Vietnam. The results obtained from this pilot provided strong evidence of the feasibility of HIV testing at primary health facilities. Quality assurance measures including training, competency assessment, regular monitoring and supervision and an EQA scheme are essential for prevention of misdiagnosis.
A globally optimal k-anonymity method for the de-identification of health data.
El Emam, Khaled; Dankar, Fida Kamal; Issa, Romeo; Jonker, Elizabeth; Amyot, Daniel; Cogo, Elise; Corriveau, Jean-Pierre; Walker, Mark; Chowdhury, Sadrul; Vaillancourt, Regis; Roffey, Tyson; Bottomley, Jim
2009-01-01
Explicit patient consent requirements in privacy laws can have a negative impact on health research, leading to selection bias and reduced recruitment. Often legislative requirements to obtain consent are waived if the information collected or disclosed is de-identified. The authors developed and empirically evaluated a new globally optimal de-identification algorithm that satisfies the k-anonymity criterion and that is suitable for health datasets. Authors compared OLA (Optimal Lattice Anonymization) empirically to three existing k-anonymity algorithms, Datafly, Samarati, and Incognito, on six public, hospital, and registry datasets for different values of k and suppression limits. Measurement Three information loss metrics were used for the comparison: precision, discernability metric, and non-uniform entropy. Each algorithm's performance speed was also evaluated. The Datafly and Samarati algorithms had higher information loss than OLA and Incognito; OLA was consistently faster than Incognito in finding the globally optimal de-identification solution. For the de-identification of health datasets, OLA is an improvement on existing k-anonymity algorithms in terms of information loss and performance.
Performance evaluation of image segmentation algorithms on microscopic image data.
Beneš, Miroslav; Zitová, Barbara
2015-01-01
In our paper, we present a performance evaluation of image segmentation algorithms on microscopic image data. In spite of the existence of many algorithms for image data partitioning, there is no universal and 'the best' method yet. Moreover, images of microscopic samples can be of various character and quality which can negatively influence the performance of image segmentation algorithms. Thus, the issue of selecting suitable method for a given set of image data is of big interest. We carried out a large number of experiments with a variety of segmentation methods to evaluate the behaviour of individual approaches on the testing set of microscopic images (cross-section images taken in three different modalities from the field of art restoration). The segmentation results were assessed by several indices used for measuring the output quality of image segmentation algorithms. In the end, the benefit of segmentation combination approach is studied and applicability of achieved results on another representatives of microscopic data category - biological samples - is shown. © 2014 The Authors Journal of Microscopy © 2014 Royal Microscopical Society.
A Cancer Gene Selection Algorithm Based on the K-S Test and CFS.
Su, Qiang; Wang, Yina; Jiang, Xiaobing; Chen, Fuxue; Lu, Wen-Cong
2017-01-01
To address the challenging problem of selecting distinguished genes from cancer gene expression datasets, this paper presents a gene subset selection algorithm based on the Kolmogorov-Smirnov (K-S) test and correlation-based feature selection (CFS) principles. The algorithm selects distinguished genes first using the K-S test, and then, it uses CFS to select genes from those selected by the K-S test. We adopted support vector machines (SVM) as the classification tool and used the criteria of accuracy to evaluate the performance of the classifiers on the selected gene subsets. This approach compared the proposed gene subset selection algorithm with the K-S test, CFS, minimum-redundancy maximum-relevancy (mRMR), and ReliefF algorithms. The average experimental results of the aforementioned gene selection algorithms for 5 gene expression datasets demonstrate that, based on accuracy, the performance of the new K-S and CFS-based algorithm is better than those of the K-S test, CFS, mRMR, and ReliefF algorithms. The experimental results show that the K-S test-CFS gene selection algorithm is a very effective and promising approach compared to the K-S test, CFS, mRMR, and ReliefF algorithms.
Abràmoff, Michael D; Niemeijer, Meindert; Suttorp-Schulten, Maria S A; Viergever, Max A; Russell, Stephen R; van Ginneken, Bram
2008-02-01
To evaluate the performance of a system for automated detection of diabetic retinopathy in digital retinal photographs, built from published algorithms, in a large, representative, screening population. We conducted a retrospective analysis of 10,000 consecutive patient visits, specifically exams (four retinal photographs, two left and two right) from 5,692 unique patients from the EyeCheck diabetic retinopathy screening project imaged with three types of cameras at 10 centers. Inclusion criteria included no previous diagnosis of diabetic retinopathy, no previous visit to ophthalmologist for dilated eye exam, and both eyes photographed. One of three retinal specialists evaluated each exam as unacceptable quality, no referable retinopathy, or referable retinopathy. We then selected exams with sufficient image quality and determined presence or absence of referable retinopathy. Outcome measures included area under the receiver operating characteristic curve (number needed to miss one case [NNM]) and type of false negative. Total area under the receiver operating characteristic curve was 0.84, and NNM was 80 at a sensitivity of 0.84 and a specificity of 0.64. At this point, 7,689 of 10,000 exams had sufficient image quality, 4,648 of 7,689 (60%) were true negatives, 59 of 7,689 (0.8%) were false negatives, 319 of 7,689 (4%) were true positives, and 2,581 of 7,689 (33%) were false positives. Twenty-seven percent of false negatives contained large hemorrhages and/or neovascularizations. Automated detection of diabetic retinopathy using published algorithms cannot yet be recommended for clinical practice. However, performance is such that evaluation on validated, publicly available datasets should be pursued. If algorithms can be improved, such a system may in the future lead to improved prevention of blindness and vision loss in patients with diabetes.
Constructing better classifier ensemble based on weighted accuracy and diversity measure.
Zeng, Xiaodong; Wong, Derek F; Chao, Lidia S
2014-01-01
A weighted accuracy and diversity (WAD) method is presented, a novel measure used to evaluate the quality of the classifier ensemble, assisting in the ensemble selection task. The proposed measure is motivated by a commonly accepted hypothesis; that is, a robust classifier ensemble should not only be accurate but also different from every other member. In fact, accuracy and diversity are mutual restraint factors; that is, an ensemble with high accuracy may have low diversity, and an overly diverse ensemble may negatively affect accuracy. This study proposes a method to find the balance between accuracy and diversity that enhances the predictive ability of an ensemble for unknown data. The quality assessment for an ensemble is performed such that the final score is achieved by computing the harmonic mean of accuracy and diversity, where two weight parameters are used to balance them. The measure is compared to two representative measures, Kappa-Error and GenDiv, and two threshold measures that consider only accuracy or diversity, with two heuristic search algorithms, genetic algorithm, and forward hill-climbing algorithm, in ensemble selection tasks performed on 15 UCI benchmark datasets. The empirical results demonstrate that the WAD measure is superior to others in most cases.
Constructing Better Classifier Ensemble Based on Weighted Accuracy and Diversity Measure
Chao, Lidia S.
2014-01-01
A weighted accuracy and diversity (WAD) method is presented, a novel measure used to evaluate the quality of the classifier ensemble, assisting in the ensemble selection task. The proposed measure is motivated by a commonly accepted hypothesis; that is, a robust classifier ensemble should not only be accurate but also different from every other member. In fact, accuracy and diversity are mutual restraint factors; that is, an ensemble with high accuracy may have low diversity, and an overly diverse ensemble may negatively affect accuracy. This study proposes a method to find the balance between accuracy and diversity that enhances the predictive ability of an ensemble for unknown data. The quality assessment for an ensemble is performed such that the final score is achieved by computing the harmonic mean of accuracy and diversity, where two weight parameters are used to balance them. The measure is compared to two representative measures, Kappa-Error and GenDiv, and two threshold measures that consider only accuracy or diversity, with two heuristic search algorithms, genetic algorithm, and forward hill-climbing algorithm, in ensemble selection tasks performed on 15 UCI benchmark datasets. The empirical results demonstrate that the WAD measure is superior to others in most cases. PMID:24672402
Detecting negative selection on recurrent mutations using gene genealogy
2013-01-01
Background Whether or not a mutant allele in a population is under selection is an important issue in population genetics, and various neutrality tests have been invented so far to detect selection. However, detection of negative selection has been notoriously difficult, partly because negatively selected alleles are usually rare in the population and have little impact on either population dynamics or the shape of the gene genealogy. Recently, through studies of genetic disorders and genome-wide analyses, many structural variations were shown to occur recurrently in the population. Such “recurrent mutations” might be revealed as deleterious by exploiting the signal of negative selection in the gene genealogy enhanced by their recurrence. Results Motivated by the above idea, we devised two new test statistics. One is the total number of mutants at a recurrently mutating locus among sampled sequences, which is tested conditionally on the number of forward mutations mapped on the sequence genealogy. The other is the size of the most common class of identical-by-descent mutants in the sample, again tested conditionally on the number of forward mutations mapped on the sequence genealogy. To examine the performance of these two tests, we simulated recurrently mutated loci each flanked by sites with neutral single nucleotide polymorphisms (SNPs), with no recombination. Using neutral recurrent mutations as null models, we attempted to detect deleterious recurrent mutations. Our analyses demonstrated high powers of our new tests under constant population size, as well as their moderate power to detect selection in expanding populations. We also devised a new maximum parsimony algorithm that, given the states of the sampled sequences at a recurrently mutating locus and an incompletely resolved genealogy, enumerates mutation histories with a minimum number of mutations while partially resolving genealogical relationships when necessary. Conclusions With their considerably high powers to detect negative selection, our new neutrality tests may open new venues for dealing with the population genetics of recurrent mutations as well as help identifying some types of genetic disorders that may have escaped identification by currently existing methods. PMID:23651527
Esteban, Santiago; Rodríguez Tablado, Manuel; Ricci, Ricardo Ignacio; Terrasa, Sergio; Kopitowski, Karin
2017-07-14
The implementation of electronic medical records (EMR) is becoming increasingly common. Error and data loss reduction, patient-care efficiency increase, decision-making assistance and facilitation of event surveillance, are some of the many processes that EMRs help improve. In addition, they show a lot of promise in terms of data collection to facilitate observational epidemiological studies and their use for this purpose has increased significantly over the recent years. Even though the quantity and availability of the data are clearly improved thanks to EMRs, still, the problem of the quality of the data remains. This is especially important when attempting to determine if an event has actually occurred or not. We sought to assess the sensitivity, specificity, and agreement level of a codes-based algorithm for the detection of clinically relevant cardiovascular (CaVD) and cerebrovascular (CeVD) disease cases, using data from EMRs. Three family physicians from the research group selected clinically relevant CaVD and CeVD terms from the international classification of primary care, Second Edition (ICPC-2), the ICD 10 version 2015 and SNOMED-CT 2015 Edition. These terms included both signs, symptoms, diagnoses and procedures associated with CaVD and CeVD. Terms not related to symptoms, signs, diagnoses or procedures of CaVD or CeVD and also those describing incidental findings without clinical relevance were excluded. The algorithm yielded a positive result if the patient had at least one of the selected terms in their medical records, as long as it was not recorded as an error. Else, if no terms were found, the patient was classified as negative. This algorithm was applied to a randomly selected sample of the active patients within the hospital's HMO by 1/1/2005 that were 40-79 years old, had at least one year of seniority in the HMO and at least one clinical encounter. Thus, patients were classified into four groups: (1) Negative patients (2) Patients with CaVD but without CeVD; (3) Patients with CeVD but without disease CaVD; (4) Patients with both diseases. To facilitate the validation process, a stratified sample was taken so that each of the groups represented approximately 25% of the sample. Manual chart review was used as the gold standard for assessing the algorithm's performance. One-third of the patients were assigned randomly to each reviewer (Cohen's kappa 0.91). Both coded and un-coded (free text) sections of the EMR were reviewed. This was done from the first present clinical note in the patients chart to the last one registered prior to 1/1/2005. The performance of the algorithm was compared against manual chart review. It yielded high sensitivity (0.99, 95% CI 0.938-0.9971) and acceptable specificity (0.86, 95% CI 0.818-0.895) for detecting cases of CaVD and CeVD combined. A qualitative analysis of the false positives and false negatives was performed. We developed a simple algorithm, using only standardized and non-standardized coded terms within an EMR that can properly detect clinically relevant events and symptoms of CaVD and CeVD. We believe that combining it with an analysis of the free text using an NLP approach would yield even better results.
Two-level structural sparsity regularization for identifying lattices and defects in noisy images
Li, Xin; Belianinov, Alex; Dyck, Ondrej E.; ...
2018-03-09
Here, this paper presents a regularized regression model with a two-level structural sparsity penalty applied to locate individual atoms in a noisy scanning transmission electron microscopy image (STEM). In crystals, the locations of atoms is symmetric, condensed into a few lattice groups. Therefore, by identifying the underlying lattice in a given image, individual atoms can be accurately located. We propose to formulate the identification of the lattice groups as a sparse group selection problem. Furthermore, real atomic scale images contain defects and vacancies, so atomic identification based solely on a lattice group may result in false positives and false negatives.more » To minimize error, model includes an individual sparsity regularization in addition to the group sparsity for a within-group selection, which results in a regression model with a two-level sparsity regularization. We propose a modification of the group orthogonal matching pursuit (gOMP) algorithm with a thresholding step to solve the atom finding problem. The convergence and statistical analyses of the proposed algorithm are presented. The proposed algorithm is also evaluated through numerical experiments with simulated images. The applicability of the algorithm on determination of atom structures and identification of imaging distortions and atomic defects was demonstrated using three real STEM images. In conclusion, we believe this is an important step toward automatic phase identification and assignment with the advent of genomic databases for materials.« less
Two-level structural sparsity regularization for identifying lattices and defects in noisy images
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Xin; Belianinov, Alex; Dyck, Ondrej E.
Here, this paper presents a regularized regression model with a two-level structural sparsity penalty applied to locate individual atoms in a noisy scanning transmission electron microscopy image (STEM). In crystals, the locations of atoms is symmetric, condensed into a few lattice groups. Therefore, by identifying the underlying lattice in a given image, individual atoms can be accurately located. We propose to formulate the identification of the lattice groups as a sparse group selection problem. Furthermore, real atomic scale images contain defects and vacancies, so atomic identification based solely on a lattice group may result in false positives and false negatives.more » To minimize error, model includes an individual sparsity regularization in addition to the group sparsity for a within-group selection, which results in a regression model with a two-level sparsity regularization. We propose a modification of the group orthogonal matching pursuit (gOMP) algorithm with a thresholding step to solve the atom finding problem. The convergence and statistical analyses of the proposed algorithm are presented. The proposed algorithm is also evaluated through numerical experiments with simulated images. The applicability of the algorithm on determination of atom structures and identification of imaging distortions and atomic defects was demonstrated using three real STEM images. In conclusion, we believe this is an important step toward automatic phase identification and assignment with the advent of genomic databases for materials.« less
Rough sets and Laplacian score based cost-sensitive feature selection
Yu, Shenglong
2018-01-01
Cost-sensitive feature selection learning is an important preprocessing step in machine learning and data mining. Recently, most existing cost-sensitive feature selection algorithms are heuristic algorithms, which evaluate the importance of each feature individually and select features one by one. Obviously, these algorithms do not consider the relationship among features. In this paper, we propose a new algorithm for minimal cost feature selection called the rough sets and Laplacian score based cost-sensitive feature selection. The importance of each feature is evaluated by both rough sets and Laplacian score. Compared with heuristic algorithms, the proposed algorithm takes into consideration the relationship among features with locality preservation of Laplacian score. We select a feature subset with maximal feature importance and minimal cost when cost is undertaken in parallel, where the cost is given by three different distributions to simulate different applications. Different from existing cost-sensitive feature selection algorithms, our algorithm simultaneously selects out a predetermined number of “good” features. Extensive experimental results show that the approach is efficient and able to effectively obtain the minimum cost subset. In addition, the results of our method are more promising than the results of other cost-sensitive feature selection algorithms. PMID:29912884
Rough sets and Laplacian score based cost-sensitive feature selection.
Yu, Shenglong; Zhao, Hong
2018-01-01
Cost-sensitive feature selection learning is an important preprocessing step in machine learning and data mining. Recently, most existing cost-sensitive feature selection algorithms are heuristic algorithms, which evaluate the importance of each feature individually and select features one by one. Obviously, these algorithms do not consider the relationship among features. In this paper, we propose a new algorithm for minimal cost feature selection called the rough sets and Laplacian score based cost-sensitive feature selection. The importance of each feature is evaluated by both rough sets and Laplacian score. Compared with heuristic algorithms, the proposed algorithm takes into consideration the relationship among features with locality preservation of Laplacian score. We select a feature subset with maximal feature importance and minimal cost when cost is undertaken in parallel, where the cost is given by three different distributions to simulate different applications. Different from existing cost-sensitive feature selection algorithms, our algorithm simultaneously selects out a predetermined number of "good" features. Extensive experimental results show that the approach is efficient and able to effectively obtain the minimum cost subset. In addition, the results of our method are more promising than the results of other cost-sensitive feature selection algorithms.
Empirical mode decomposition-based facial pose estimation inside video sequences
NASA Astrophysics Data System (ADS)
Qing, Chunmei; Jiang, Jianmin; Yang, Zhijing
2010-03-01
We describe a new pose-estimation algorithm via integration of the strength in both empirical mode decomposition (EMD) and mutual information. While mutual information is exploited to measure the similarity between facial images to estimate poses, EMD is exploited to decompose input facial images into a number of intrinsic mode function (IMF) components, which redistribute the effect of noise, expression changes, and illumination variations as such that, when the input facial image is described by the selected IMF components, all the negative effects can be minimized. Extensive experiments were carried out in comparisons to existing representative techniques, and the results show that the proposed algorithm achieves better pose-estimation performances with robustness to noise corruption, illumination variation, and facial expressions.
A Network Selection Algorithm Considering Power Consumption in Hybrid Wireless Networks
NASA Astrophysics Data System (ADS)
Joe, Inwhee; Kim, Won-Tae; Hong, Seokjoon
In this paper, we propose a novel network selection algorithm considering power consumption in hybrid wireless networks for vertical handover. CDMA, WiBro, WLAN networks are candidate networks for this selection algorithm. This algorithm is composed of the power consumption prediction algorithm and the final network selection algorithm. The power consumption prediction algorithm estimates the expected lifetime of the mobile station based on the current battery level, traffic class and power consumption for each network interface card of the mobile station. If the expected lifetime of the mobile station in a certain network is not long enough compared the handover delay, this particular network will be removed from the candidate network list, thereby preventing unnecessary handovers in the preprocessing procedure. On the other hand, the final network selection algorithm consists of AHP (Analytic Hierarchical Process) and GRA (Grey Relational Analysis). The global factors of the network selection structure are QoS, cost and lifetime. If user preference is lifetime, our selection algorithm selects the network that offers longest service duration due to low power consumption. Also, we conduct some simulations using the OPNET simulation tool. The simulation results show that the proposed algorithm provides longer lifetime in the hybrid wireless network environment.
Robust online tracking via adaptive samples selection with saliency detection
NASA Astrophysics Data System (ADS)
Yan, Jia; Chen, Xi; Zhu, QiuPing
2013-12-01
Online tracking has shown to be successful in tracking of previously unknown objects. However, there are two important factors which lead to drift problem of online tracking, the one is how to select the exact labeled samples even when the target locations are inaccurate, and the other is how to handle the confusors which have similar features with the target. In this article, we propose a robust online tracking algorithm with adaptive samples selection based on saliency detection to overcome the drift problem. To deal with the problem of degrading the classifiers using mis-aligned samples, we introduce the saliency detection method to our tracking problem. Saliency maps and the strong classifiers are combined to extract the most correct positive samples. Our approach employs a simple yet saliency detection algorithm based on image spectral residual analysis. Furthermore, instead of using the random patches as the negative samples, we propose a reasonable selection criterion, in which both the saliency confidence and similarity are considered with the benefits that confusors in the surrounding background are incorporated into the classifiers update process before the drift occurs. The tracking task is formulated as a binary classification via online boosting framework. Experiment results in several challenging video sequences demonstrate the accuracy and stability of our tracker.
Reid, Aylin Y; St Germaine-Smith, Christine; Liu, Mingfu; Sadiq, Shahnaz; Quan, Hude; Wiebe, Samuel; Faris, Peter; Dean, Stafford; Jetté, Nathalie
2012-12-01
The objective of this study was to develop and validate coding algorithms for epilepsy using ICD-coded inpatient claims, physician claims, and emergency room (ER) visits. 720/2049 charts from 2003 and 1533/3252 charts from 2006 were randomly selected for review from 13 neurologists' practices as the "gold standard" for diagnosis. Epilepsy status in each chart was determined by 2 trained physicians. The optimal algorithm to identify epilepsy cases was developed by linking the reviewed charts with three administrative databases (ICD 9 and 10 data from 2000 to 2008) including hospital discharges, ER visits and physician claims in a Canadian health region. Accepting chart review data as the gold standard, we calculated sensitivity, specificity, positive, and negative predictive value for each ICD-9 and ICD-10 administrative data algorithm (case definitions). Of 18 algorithms assessed, the most accurate algorithm to identify epilepsy cases was "2 physician claims or 1 hospitalization in 2 years coded" (ICD-9 345 or G40/G41) and the most sensitive algorithm was "1 physician clam or 1 hospitalization or 1 ER visit in 2 years." Accurate and sensitive case definitions are available for research requiring the identification of epilepsy cases in administrative health data. Copyright © 2012 Elsevier B.V. All rights reserved.
Liang, Ja-Der; Ping, Xiao-Ou; Tseng, Yi-Ju; Huang, Guan-Tarn; Lai, Feipei; Yang, Pei-Ming
2014-12-01
Recurrence of hepatocellular carcinoma (HCC) is an important issue despite effective treatments with tumor eradication. Identification of patients who are at high risk for recurrence may provide more efficacious screening and detection of tumor recurrence. The aim of this study was to develop recurrence predictive models for HCC patients who received radiofrequency ablation (RFA) treatment. From January 2007 to December 2009, 83 newly diagnosed HCC patients receiving RFA as their first treatment were enrolled. Five feature selection methods including genetic algorithm (GA), simulated annealing (SA) algorithm, random forests (RF) and hybrid methods (GA+RF and SA+RF) were utilized for selecting an important subset of features from a total of 16 clinical features. These feature selection methods were combined with support vector machine (SVM) for developing predictive models with better performance. Five-fold cross-validation was used to train and test SVM models. The developed SVM-based predictive models with hybrid feature selection methods and 5-fold cross-validation had averages of the sensitivity, specificity, accuracy, positive predictive value, negative predictive value, and area under the ROC curve as 67%, 86%, 82%, 69%, 90%, and 0.69, respectively. The SVM derived predictive model can provide suggestive high-risk recurrent patients, who should be closely followed up after complete RFA treatment. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Dynamic variable selection in SNP genotype autocalling from APEX microarray data.
Podder, Mohua; Welch, William J; Zamar, Ruben H; Tebbutt, Scott J
2006-11-30
Single nucleotide polymorphisms (SNPs) are DNA sequence variations, occurring when a single nucleotide--adenine (A), thymine (T), cytosine (C) or guanine (G)--is altered. Arguably, SNPs account for more than 90% of human genetic variation. Our laboratory has developed a highly redundant SNP genotyping assay consisting of multiple probes with signals from multiple channels for a single SNP, based on arrayed primer extension (APEX). This mini-sequencing method is a powerful combination of a highly parallel microarray with distinctive Sanger-based dideoxy terminator sequencing chemistry. Using this microarray platform, our current genotype calling system (known as SNP Chart) is capable of calling single SNP genotypes by manual inspection of the APEX data, which is time-consuming and exposed to user subjectivity bias. Using a set of 32 Coriell DNA samples plus three negative PCR controls as a training data set, we have developed a fully-automated genotyping algorithm based on simple linear discriminant analysis (LDA) using dynamic variable selection. The algorithm combines separate analyses based on the multiple probe sets to give a final posterior probability for each candidate genotype. We have tested our algorithm on a completely independent data set of 270 DNA samples, with validated genotypes, from patients admitted to the intensive care unit (ICU) of St. Paul's Hospital (plus one negative PCR control sample). Our method achieves a concordance rate of 98.9% with a 99.6% call rate for a set of 96 SNPs. By adjusting the threshold value for the final posterior probability of the called genotype, the call rate reduces to 94.9% with a higher concordance rate of 99.6%. We also reversed the two independent data sets in their training and testing roles, achieving a concordance rate up to 99.8%. The strength of this APEX chemistry-based platform is its unique redundancy having multiple probes for a single SNP. Our model-based genotype calling algorithm captures the redundancy in the system considering all the underlying probe features of a particular SNP, automatically down-weighting any 'bad data' corresponding to image artifacts on the microarray slide or failure of a specific chemistry. In this regard, our method is able to automatically select the probes which work well and reduce the effect of other so-called bad performing probes in a sample-specific manner, for any number of SNPs.
Algorithms for detecting antibodies to HIV-1: results from a rural Ugandan cohort.
Nunn, A J; Biryahwaho, B; Downing, R G; van der Groen, G; Ojwiya, A; Mulder, D W
1993-08-01
To evaluate an algorithm using two enzyme immunoassays (EIA) for anti-HIV-1 antibodies in a rural African population and to assess alternative simplified algorithms. Sera obtained from 7895 individuals in a rural population survey were tested using an algorithm based on two different EIA systems: Recombigen HIV-1 EIA and Wellcozyme HIV-1 Recombinant. Alternative algorithms were assessed using negative or confirmed positive sera. None of the 227 sera classified as unequivocably negative by the two assays were positive by Western blot. Of 192 sera unequivocably positive by both assays, four were seronegative by Western blot. The possibility of technical error cannot be ruled out in three of these. One of the alternative algorithms assessed classified all borderline or discordant assay results as negative had a specificity of 100% and a sensitivity of 98.4%. The cost of this algorithm is one-third that of the conventional algorithm. Our evaluation suggests that high specificity and sensitivity can be obtained without using Western blot and at a considerable reduction in cost.
NASA Astrophysics Data System (ADS)
Mechlem, Korbinian; Ehn, Sebastian; Sellerer, Thorsten; Pfeiffer, Franz; Noël, Peter B.
2017-03-01
In spectral computed tomography (spectral CT), the additional information about the energy dependence of attenuation coefficients can be exploited to generate material selective images. These images have found applications in various areas such as artifact reduction, quantitative imaging or clinical diagnosis. However, significant noise amplification on material decomposed images remains a fundamental problem of spectral CT. Most spectral CT algorithms separate the process of material decomposition and image reconstruction. Separating these steps is suboptimal because the full statistical information contained in the spectral tomographic measurements cannot be exploited. Statistical iterative reconstruction (SIR) techniques provide an alternative, mathematically elegant approach to obtaining material selective images with improved tradeoffs between noise and resolution. Furthermore, image reconstruction and material decomposition can be performed jointly. This is accomplished by a forward model which directly connects the (expected) spectral projection measurements and the material selective images. To obtain this forward model, detailed knowledge of the different photon energy spectra and the detector response was assumed in previous work. However, accurately determining the spectrum is often difficult in practice. In this work, a new algorithm for statistical iterative material decomposition is presented. It uses a semi-empirical forward model which relies on simple calibration measurements. Furthermore, an efficient optimization algorithm based on separable surrogate functions is employed. This partially negates one of the major shortcomings of SIR, namely high computational cost and long reconstruction times. Numerical simulations and real experiments show strongly improved image quality and reduced statistical bias compared to projection-based material decomposition.
A Hybrid Algorithm for Non-negative Matrix Factorization Based on Symmetric Information Divergence
Devarajan, Karthik; Ebrahimi, Nader; Soofi, Ehsan
2017-01-01
The objective of this paper is to provide a hybrid algorithm for non-negative matrix factorization based on a symmetric version of Kullback-Leibler divergence, known as intrinsic information. The convergence of the proposed algorithm is shown for several members of the exponential family such as the Gaussian, Poisson, gamma and inverse Gaussian models. The speed of this algorithm is examined and its usefulness is illustrated through some applied problems. PMID:28868206
Zhang, Daqing; Xiao, Jianfeng; Zhou, Nannan; Luo, Xiaomin; Jiang, Hualiang; Chen, Kaixian
2015-01-01
Blood-brain barrier (BBB) is a highly complex physical barrier determining what substances are allowed to enter the brain. Support vector machine (SVM) is a kernel-based machine learning method that is widely used in QSAR study. For a successful SVM model, the kernel parameters for SVM and feature subset selection are the most important factors affecting prediction accuracy. In most studies, they are treated as two independent problems, but it has been proven that they could affect each other. We designed and implemented genetic algorithm (GA) to optimize kernel parameters and feature subset selection for SVM regression and applied it to the BBB penetration prediction. The results show that our GA/SVM model is more accurate than other currently available log BB models. Therefore, to optimize both SVM parameters and feature subset simultaneously with genetic algorithm is a better approach than other methods that treat the two problems separately. Analysis of our log BB model suggests that carboxylic acid group, polar surface area (PSA)/hydrogen-bonding ability, lipophilicity, and molecular charge play important role in BBB penetration. Among those properties relevant to BBB penetration, lipophilicity could enhance the BBB penetration while all the others are negatively correlated with BBB penetration. PMID:26504797
Rebmann, Terri
2005-12-01
Many US hospitals lack the capacity to house safely a surge of potentially infectious patients, increasing the risk of secondary transmission. Respiratory protection and negative-pressure rooms are needed to prevent transmission of airborne-spread diseases, but US hospitals lack available and/or properly functioning negative-pressure rooms. Creating new rooms or retrofitting existing facilities is time-consuming and expensive. Safe methods of managing patients with airborne-spread diseases and establishing temporary negative-pressure and/or protective environments were determined by a literature review. Relevant data were analyzed and synthesized to generate a response algorithm. Ideal patient management and placement guidelines, including instructions for choosing respiratory protection and creating temporary negative-pressure or other protective environments, were delineated. Findings were summarized in a treatment algorithm. The threat of bioterrorism and emerging infections increases health care's need for negative-pressure and/or protective environments. The algorithm outlines appropriate response steps to decrease transmission risk until an ideal protective environment can be utilized. Using this algorithm will prepare infection control professionals to respond more effectively during a surge of potentially infectious patients following a bioterrorism attack or emerging infectious disease outbreak.
Arbuthnot, Mary; Mooney, David P
2017-01-01
It is crucial to identify cervical spine injuries while minimizing ionizing radiation. This study analyzes the sensitivity and negative predictive value of a pediatric cervical spine clearance algorithm. We performed a retrospective review of all children <21years old who were admitted following blunt trauma and underwent cervical spine clearance utilizing our institution's cervical spine clearance algorithm over a 10-year period. Age, gender, International Classification of Diseases 9th Edition diagnosis codes, presence or absence of cervical collar on arrival, Injury Severity Score, and type of cervical spine imaging obtained were extracted from the trauma registry and electronic medical record. Descriptive statistics were used and the sensitivity and negative predictive value of the algorithm were calculated. Approximately 125,000 children were evaluated in the Emergency Department and 11,331 were admitted. Of the admitted children, 1023 patients arrived in a cervical collar without advanced cervical spine imaging and were evaluated using the cervical spine clearance algorithm. Algorithm sensitivity was 94.4% and the negative predictive value was 99.9%. There was one missed injury, a spinous process tip fracture in a teenager maintained in a collar. Our algorithm was associated with a low missed injury rate and low CT utilization rate, even in children <3years old. IV. Published by Elsevier Inc.
A Feature and Algorithm Selection Method for Improving the Prediction of Protein Structural Class.
Ni, Qianwu; Chen, Lei
2017-01-01
Correct prediction of protein structural class is beneficial to investigation on protein functions, regulations and interactions. In recent years, several computational methods have been proposed in this regard. However, based on various features, it is still a great challenge to select proper classification algorithm and extract essential features to participate in classification. In this study, a feature and algorithm selection method was presented for improving the accuracy of protein structural class prediction. The amino acid compositions and physiochemical features were adopted to represent features and thirty-eight machine learning algorithms collected in Weka were employed. All features were first analyzed by a feature selection method, minimum redundancy maximum relevance (mRMR), producing a feature list. Then, several feature sets were constructed by adding features in the list one by one. For each feature set, thirtyeight algorithms were executed on a dataset, in which proteins were represented by features in the set. The predicted classes yielded by these algorithms and true class of each protein were collected to construct a dataset, which were analyzed by mRMR method, yielding an algorithm list. From the algorithm list, the algorithm was taken one by one to build an ensemble prediction model. Finally, we selected the ensemble prediction model with the best performance as the optimal ensemble prediction model. Experimental results indicate that the constructed model is much superior to models using single algorithm and other models that only adopt feature selection procedure or algorithm selection procedure. The feature selection procedure or algorithm selection procedure are really helpful for building an ensemble prediction model that can yield a better performance. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Hariri, Susan; Steinau, Martin; Rinas, Allen; Gargano, Julia W; Ludema, Christina; Unger, Elizabeth R; Carter, Alicia L; Grant, Kathy L; Bamberg, Melanie; McDermott, James E; Markowitz, Lauri E; Brewer, Noel T; Smith, Jennifer S
2012-01-01
HPV typing using formalin fixed paraffin embedded (FFPE) cervical tissue is used to evaluate HPV vaccine impact, but DNA yield and quality in FFPE specimens can negatively affect test results. This study aimed to evaluate 2 commercial assays for HPV detection and typing using FFPE cervical specimens. Four large North Carolina pathology laboratories provided FFPE specimens from 299 women ages18 and older diagnosed with cervical disease from 2001 to 2006. For each woman, one diagnostic block was selected and unstained serial sections were prepared for DNA typing. Extracts from samples with residual lesion were used to detect and type HPV using parallel and serial testing algorithms with the Linear Array and LiPA HPV genotyping assays. LA and LiPA concordance was 0.61 for detecting any high-risk (HR) and 0.20 for detecting any low-risk (LR) types, with significant differences in marginal proportions for HPV16, 51, 52, and any HR types. Discordant results were most often LiPA-positive, LA-negative. The parallel algorithm yielded the highest prevalence of any HPV type (95.7%). HR type prevalence was similar using parallel (93.1%) and serial (92.1%) approaches. HPV16, 33, and 52 prevalence was slightly lower using the serial algorithm, but the median number of HR types per woman (1) did not differ by algorithm. Using the serial algorithm, HPV DNA was detected in >85% of invasive and >95% of pre-invasive lesions. The most common type was HPV16, followed by 52, 18, 31, 33, and 35; HPV16/18 was detected in 56.5% of specimens. Multiple HPV types were more common in lower grade lesions. We developed an efficient algorithm for testing and reporting results of two commercial assays for HPV detection and typing in FFPE specimens, and describe HPV type distribution in pre-invasive and invasive cervical lesions in a state-based sample prior to HPV vaccine introduction.
McTwo: a two-step feature selection algorithm based on maximal information coefficient.
Ge, Ruiquan; Zhou, Manli; Luo, Youxi; Meng, Qinghan; Mai, Guoqin; Ma, Dongli; Wang, Guoqing; Zhou, Fengfeng
2016-03-23
High-throughput bio-OMIC technologies are producing high-dimension data from bio-samples at an ever increasing rate, whereas the training sample number in a traditional experiment remains small due to various difficulties. This "large p, small n" paradigm in the area of biomedical "big data" may be at least partly solved by feature selection algorithms, which select only features significantly associated with phenotypes. Feature selection is an NP-hard problem. Due to the exponentially increased time requirement for finding the globally optimal solution, all the existing feature selection algorithms employ heuristic rules to find locally optimal solutions, and their solutions achieve different performances on different datasets. This work describes a feature selection algorithm based on a recently published correlation measurement, Maximal Information Coefficient (MIC). The proposed algorithm, McTwo, aims to select features associated with phenotypes, independently of each other, and achieving high classification performance of the nearest neighbor algorithm. Based on the comparative study of 17 datasets, McTwo performs about as well as or better than existing algorithms, with significantly reduced numbers of selected features. The features selected by McTwo also appear to have particular biomedical relevance to the phenotypes from the literature. McTwo selects a feature subset with very good classification performance, as well as a small feature number. So McTwo may represent a complementary feature selection algorithm for the high-dimensional biomedical datasets.
Youngs, Noah; Penfold-Brown, Duncan; Drew, Kevin; Shasha, Dennis; Bonneau, Richard
2013-05-01
Computational biologists have demonstrated the utility of using machine learning methods to predict protein function from an integration of multiple genome-wide data types. Yet, even the best performing function prediction algorithms rely on heuristics for important components of the algorithm, such as choosing negative examples (proteins without a given function) or determining key parameters. The improper choice of negative examples, in particular, can hamper the accuracy of protein function prediction. We present a novel approach for choosing negative examples, using a parameterizable Bayesian prior computed from all observed annotation data, which also generates priors used during function prediction. We incorporate this new method into the GeneMANIA function prediction algorithm and demonstrate improved accuracy of our algorithm over current top-performing function prediction methods on the yeast and mouse proteomes across all metrics tested. Code and Data are available at: http://bonneaulab.bio.nyu.edu/funcprop.html
Zhao, Yu-Xiang; Chou, Chien-Hsing
2016-01-01
In this study, a new feature selection algorithm, the neighborhood-relationship feature selection (NRFS) algorithm, is proposed for identifying rat electroencephalogram signals and recognizing Chinese characters. In these two applications, dependent relationships exist among the feature vectors and their neighboring feature vectors. Therefore, the proposed NRFS algorithm was designed for solving this problem. By applying the NRFS algorithm, unselected feature vectors have a high priority of being added into the feature subset if the neighboring feature vectors have been selected. In addition, selected feature vectors have a high priority of being eliminated if the neighboring feature vectors are not selected. In the experiments conducted in this study, the NRFS algorithm was compared with two feature algorithms. The experimental results indicated that the NRFS algorithm can extract the crucial frequency bands for identifying rat vigilance states and identifying crucial character regions for recognizing Chinese characters. PMID:27314346
Phenotyping and Visualizing Infusion-Related Reactions for Breast Cancer Patients.
Sun, Deyu; Sarda, Gopal; Skube, Steven J; Blaes, Anne H; Khairat, Saif; Melton, Genevieve B; Zhang, Rui
2017-01-01
Infusion-related reactions (IRRs) are typical adverse events for breast cancer patients. Detecting IRRs and visualizing their occurance associated with the drug treatment would potentially assist clinicians to improve patient safety and help researchers model IRRs and analyze their risk factors. We developed and evaluated a phenotyping algorithm to detect IRRs for breast cancer patients. We also designed a visualization prototype to render IRR patients' medications, lab tests and vital signs over time. By comparing with the 42 randomly selected doses that are manually labeled by a domain expert, the sensitivity, positive predictive value, specificity, and negative predictive value of the algorithms are 69%, 60%, 79%, and 85%, respectively. Using the algorithm, an incidence of 6.4% of patients and 1.8% of doses for docetaxel, 8.7% and 3.2% for doxorubicin, 10.4% and 1.2% for paclitaxel, 16.1% and 1.1% for trastuzumab were identified retrospectively. The incidences estimated are consistent with related studies.
Phenotyping and Visualizing Infusion-Related Reactions for Breast Cancer Patients
Sun, Deyu; Sarda, Gopal; Skube, Steven J.; Blaes, Anne H.; Khairat, Saif; Melton, Genevieve B.; Zhang, Rui
2018-01-01
Infusion-related reactions (IRRs) are typical adverse events for breast cancer patients. Detecting IRRs and visualizing their occurance associated with the drug treatment would potentially assist clinicians to improve patient safety and help researchers model IRRs and analyze their risk factors. We developed and evaluated a phenotyping algorithm to detect IRRs for breast cancer patients. We also designed a visualization prototype to render IRR patients’ medications, lab tests and vital signs over time. By comparing with the 42 randomly selected doses that are manually labeled by a domain expert, the sensitivity, positive predictive value, specificity, and negative predictive value of the algorithms are 69%, 60%, 79%, and 85%, respectively. Using the algorithm, an incidence of 6.4% of patients and 1.8% of doses for docetaxel, 8.7% and 3.2% for doxorubicin, 10.4% and 1.2% for paclitaxel, 16.1% and 1.1% for trastuzumab were identified retrospectively. The incidences estimated are consistent with related studies. PMID:29295166
Community Detection in Signed Networks: the Role of Negative ties in Different Scales
Esmailian, Pouya; Jalili, Mahdi
2015-01-01
Extracting community structure of complex network systems has many applications from engineering to biology and social sciences. There exist many algorithms to discover community structure of networks. However, it has been significantly under-explored for networks with positive and negative links as compared to unsigned ones. Trying to fill this gap, we measured the quality of partitions by introducing a Map Equation for signed networks. It is based on the assumption that negative relations weaken positive flow from a node towards a community, and thus, external (internal) negative ties increase the probability of staying inside (escaping from) a community. We further extended the Constant Potts Model, providing a map spectrum for signed networks. Accordingly, a partition is selected through balancing between abridgment and expatiation of a signed network. Most importantly, multi-scale spectrum of signed networks revealed how informative are negative ties in different scales, and quantified the topological placement of negative ties between dense positive ones. Moreover, an inconsistency was found in the signed Modularity: as the number of negative ties increases, the density of positive ties is neglected more. These results shed lights on the community structure of signed networks. PMID:26395815
Sparse feature selection for classification and prediction of metastasis in endometrial cancer.
Ahsen, Mehmet Eren; Boren, Todd P; Singh, Nitin K; Misganaw, Burook; Mutch, David G; Moore, Kathleen N; Backes, Floor J; McCourt, Carolyn K; Lea, Jayanthi S; Miller, David S; White, Michael A; Vidyasagar, Mathukumalli
2017-03-27
Metastasis via pelvic and/or para-aortic lymph nodes is a major risk factor for endometrial cancer. Lymph-node resection ameliorates risk but is associated with significant co-morbidities. Incidence in patients with stage I disease is 4-22% but no mechanism exists to accurately predict it. Therefore, national guidelines for primary staging surgery include pelvic and para-aortic lymph node dissection for all patients whose tumor exceeds 2cm in diameter. We sought to identify a robust molecular signature that can accurately classify risk of lymph node metastasis in endometrial cancer patients. 86 tumors matched for age and race, and evenly distributed between lymph node-positive and lymph node-negative cases, were selected as a training cohort. Genomic micro-RNA expression was profiled for each sample to serve as the predictive feature matrix. An independent set of 28 tumor samples was collected and similarly characterized to serve as a test cohort. A feature selection algorithm was designed for applications where the number of samples is far smaller than the number of measured features per sample. A predictive miRNA expression signature was developed using this algorithm, which was then used to predict the metastatic status of the independent test cohort. A weighted classifier, using 18 micro-RNAs, achieved 100% accuracy on the training cohort. When applied to the testing cohort, the classifier correctly predicted 90% of node-positive cases, and 80% of node-negative cases (FDR = 6.25%). Results indicate that the evaluation of the quantitative sparse-feature classifier proposed here in clinical trials may lead to significant improvement in the prediction of lymphatic metastases in endometrial cancer patients.
Alshamlan, Hala M; Badr, Ghada H; Alohali, Yousef A
2015-06-01
Naturally inspired evolutionary algorithms prove effectiveness when used for solving feature selection and classification problems. Artificial Bee Colony (ABC) is a relatively new swarm intelligence method. In this paper, we propose a new hybrid gene selection method, namely Genetic Bee Colony (GBC) algorithm. The proposed algorithm combines the used of a Genetic Algorithm (GA) along with Artificial Bee Colony (ABC) algorithm. The goal is to integrate the advantages of both algorithms. The proposed algorithm is applied to a microarray gene expression profile in order to select the most predictive and informative genes for cancer classification. In order to test the accuracy performance of the proposed algorithm, extensive experiments were conducted. Three binary microarray datasets are use, which include: colon, leukemia, and lung. In addition, another three multi-class microarray datasets are used, which are: SRBCT, lymphoma, and leukemia. Results of the GBC algorithm are compared with our recently proposed technique: mRMR when combined with the Artificial Bee Colony algorithm (mRMR-ABC). We also compared the combination of mRMR with GA (mRMR-GA) and Particle Swarm Optimization (mRMR-PSO) algorithms. In addition, we compared the GBC algorithm with other related algorithms that have been recently published in the literature, using all benchmark datasets. The GBC algorithm shows superior performance as it achieved the highest classification accuracy along with the lowest average number of selected genes. This proves that the GBC algorithm is a promising approach for solving the gene selection problem in both binary and multi-class cancer classification. Copyright © 2015 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Feigenbaum, Eyal; Hiszpanski, Anna M.
2017-07-01
A phase accumulation tracking (PAT) algorithm is proposed and demonstrated for the retrieval of the effective index of fishnet metamaterials (FMMs) in order to avoid the multi-branch uncertainty problem. This algorithm tracks the phase and amplitude of the dominant propagation mode across the FMM slab. The suggested PAT algorithm applies to resonant guided wave networks having only one mode that carries the light between the two slab ends, where the FMM is one example of this metamaterials sub-class. The effective index is a net effect of positive and negative accumulated phase in the alternating FMM metal and dielectric layers, with a negative effective index occurring when negative phase accumulation dominates.
An affine projection algorithm using grouping selection of input vectors
NASA Astrophysics Data System (ADS)
Shin, JaeWook; Kong, NamWoong; Park, PooGyeon
2011-10-01
This paper present an affine projection algorithm (APA) using grouping selection of input vectors. To improve the performance of conventional APA, the proposed algorithm adjusts the number of the input vectors using two procedures: grouping procedure and selection procedure. In grouping procedure, the some input vectors that have overlapping information for update is grouped using normalized inner product. Then, few input vectors that have enough information for for coefficient update is selected using steady-state mean square error (MSE) in selection procedure. Finally, the filter coefficients update using selected input vectors. The experimental results show that the proposed algorithm has small steady-state estimation errors comparing with the existing algorithms.
A study of metaheuristic algorithms for high dimensional feature selection on microarray data
NASA Astrophysics Data System (ADS)
Dankolo, Muhammad Nasiru; Radzi, Nor Haizan Mohamed; Sallehuddin, Roselina; Mustaffa, Noorfa Haszlinna
2017-11-01
Microarray systems enable experts to examine gene profile at molecular level using machine learning algorithms. It increases the potentials of classification and diagnosis of many diseases at gene expression level. Though, numerous difficulties may affect the efficiency of machine learning algorithms which includes vast number of genes features comprised in the original data. Many of these features may be unrelated to the intended analysis. Therefore, feature selection is necessary to be performed in the data pre-processing. Many feature selection algorithms are developed and applied on microarray which including the metaheuristic optimization algorithms. This paper discusses the application of the metaheuristics algorithms for feature selection in microarray dataset. This study reveals that, the algorithms have yield an interesting result with limited resources thereby saving computational expenses of machine learning algorithms.
MotieGhader, Habib; Gharaghani, Sajjad; Masoudi-Sobhanzadeh, Yosef; Masoudi-Nejad, Ali
2017-01-01
Feature selection is of great importance in Quantitative Structure-Activity Relationship (QSAR) analysis. This problem has been solved using some meta-heuristic algorithms such as GA, PSO, ACO and so on. In this work two novel hybrid meta-heuristic algorithms i.e. Sequential GA and LA (SGALA) and Mixed GA and LA (MGALA), which are based on Genetic algorithm and learning automata for QSAR feature selection are proposed. SGALA algorithm uses advantages of Genetic algorithm and Learning Automata sequentially and the MGALA algorithm uses advantages of Genetic Algorithm and Learning Automata simultaneously. We applied our proposed algorithms to select the minimum possible number of features from three different datasets and also we observed that the MGALA and SGALA algorithms had the best outcome independently and in average compared to other feature selection algorithms. Through comparison of our proposed algorithms, we deduced that the rate of convergence to optimal result in MGALA and SGALA algorithms were better than the rate of GA, ACO, PSO and LA algorithms. In the end, the results of GA, ACO, PSO, LA, SGALA, and MGALA algorithms were applied as the input of LS-SVR model and the results from LS-SVR models showed that the LS-SVR model had more predictive ability with the input from SGALA and MGALA algorithms than the input from all other mentioned algorithms. Therefore, the results have corroborated that not only is the predictive efficiency of proposed algorithms better, but their rate of convergence is also superior to the all other mentioned algorithms. PMID:28979308
MotieGhader, Habib; Gharaghani, Sajjad; Masoudi-Sobhanzadeh, Yosef; Masoudi-Nejad, Ali
2017-01-01
Feature selection is of great importance in Quantitative Structure-Activity Relationship (QSAR) analysis. This problem has been solved using some meta-heuristic algorithms such as GA, PSO, ACO and so on. In this work two novel hybrid meta-heuristic algorithms i.e. Sequential GA and LA (SGALA) and Mixed GA and LA (MGALA), which are based on Genetic algorithm and learning automata for QSAR feature selection are proposed. SGALA algorithm uses advantages of Genetic algorithm and Learning Automata sequentially and the MGALA algorithm uses advantages of Genetic Algorithm and Learning Automata simultaneously. We applied our proposed algorithms to select the minimum possible number of features from three different datasets and also we observed that the MGALA and SGALA algorithms had the best outcome independently and in average compared to other feature selection algorithms. Through comparison of our proposed algorithms, we deduced that the rate of convergence to optimal result in MGALA and SGALA algorithms were better than the rate of GA, ACO, PSO and LA algorithms. In the end, the results of GA, ACO, PSO, LA, SGALA, and MGALA algorithms were applied as the input of LS-SVR model and the results from LS-SVR models showed that the LS-SVR model had more predictive ability with the input from SGALA and MGALA algorithms than the input from all other mentioned algorithms. Therefore, the results have corroborated that not only is the predictive efficiency of proposed algorithms better, but their rate of convergence is also superior to the all other mentioned algorithms.
Nankali, Saber; Miandoab, Payam Samadi; Baghizadeh, Amin
2016-01-01
In external‐beam radiotherapy, using external markers is one of the most reliable tools to predict tumor position, in clinical applications. The main challenge in this approach is tumor motion tracking with highest accuracy that depends heavily on external markers location, and this issue is the objective of this study. Four commercially available feature selection algorithms entitled 1) Correlation‐based Feature Selection, 2) Classifier, 3) Principal Components, and 4) Relief were proposed to find optimum location of external markers in combination with two “Genetic” and “Ranker” searching procedures. The performance of these algorithms has been evaluated using four‐dimensional extended cardiac‐torso anthropomorphic phantom. Six tumors in lung, three tumors in liver, and 49 points on the thorax surface were taken into account to simulate internal and external motions, respectively. The root mean square error of an adaptive neuro‐fuzzy inference system (ANFIS) as prediction model was considered as metric for quantitatively evaluating the performance of proposed feature selection algorithms. To do this, the thorax surface region was divided into nine smaller segments and predefined tumors motion was predicted by ANFIS using external motion data of given markers at each small segment, separately. Our comparative results showed that all feature selection algorithms can reasonably select specific external markers from those segments where the root mean square error of the ANFIS model is minimum. Moreover, the performance accuracy of proposed feature selection algorithms was compared, separately. For this, each tumor motion was predicted using motion data of those external markers selected by each feature selection algorithm. Duncan statistical test, followed by F‐test, on final results reflected that all proposed feature selection algorithms have the same performance accuracy for lung tumors. But for liver tumors, a correlation‐based feature selection algorithm, in combination with a genetic search algorithm, proved to yield best performance accuracy for selecting optimum markers. PACS numbers: 87.55.km, 87.56.Fc PMID:26894358
Nankali, Saber; Torshabi, Ahmad Esmaili; Miandoab, Payam Samadi; Baghizadeh, Amin
2016-01-08
In external-beam radiotherapy, using external markers is one of the most reliable tools to predict tumor position, in clinical applications. The main challenge in this approach is tumor motion tracking with highest accuracy that depends heavily on external markers location, and this issue is the objective of this study. Four commercially available feature selection algorithms entitled 1) Correlation-based Feature Selection, 2) Classifier, 3) Principal Components, and 4) Relief were proposed to find optimum location of external markers in combination with two "Genetic" and "Ranker" searching procedures. The performance of these algorithms has been evaluated using four-dimensional extended cardiac-torso anthropomorphic phantom. Six tumors in lung, three tumors in liver, and 49 points on the thorax surface were taken into account to simulate internal and external motions, respectively. The root mean square error of an adaptive neuro-fuzzy inference system (ANFIS) as prediction model was considered as metric for quantitatively evaluating the performance of proposed feature selection algorithms. To do this, the thorax surface region was divided into nine smaller segments and predefined tumors motion was predicted by ANFIS using external motion data of given markers at each small segment, separately. Our comparative results showed that all feature selection algorithms can reasonably select specific external markers from those segments where the root mean square error of the ANFIS model is minimum. Moreover, the performance accuracy of proposed feature selection algorithms was compared, separately. For this, each tumor motion was predicted using motion data of those external markers selected by each feature selection algorithm. Duncan statistical test, followed by F-test, on final results reflected that all proposed feature selection algorithms have the same performance accuracy for lung tumors. But for liver tumors, a correlation-based feature selection algorithm, in combination with a genetic search algorithm, proved to yield best performance accuracy for selecting optimum markers.
Platt, R D; Griggs, R A
1993-08-01
In four experiments with 760 subjects, the present study examined Cosmides' Darwinian algorithm theory of reasoning: specifically, its explanation of facilitation on the Wason selection task. The first experiment replicated Cosmides' finding of facilitation for social contract versions of the selection task, using both her multiple-problem format and a single-problem format. Experiment 2 examined performance on Cosmides' three main social contract problems while manipulating the perspective of the subject and the presence and absence of cost-benefit information. The presence of cost-benefit information improved performance in two of the three problems while the perspective manipulation had no effect. In Experiment 3, the cost-benefit effect was replicated; and performance on one of the three problems was enhanced by the presence of explicit negatives on the NOT-P and NOT-Q cards. Experiment 4 examined the role of the deontic term "must" in the facilitation observed for two of the social contract problems. The presence of "must" led to a significant improvement in performance. The results of these experiments are strongly supportive of social contract theory in that cost-benefit information is necessary for substantial facilitation to be observed in Cosmides' problems. These findings also suggest the presence of other cues that can help guide subjects to a deontic social contract interpretation when the social contract nature of the problem is not clear.
Modelling Evolutionary Algorithms with Stochastic Differential Equations.
Heredia, Jorge Pérez
2017-11-20
There has been renewed interest in modelling the behaviour of evolutionary algorithms (EAs) by more traditional mathematical objects, such as ordinary differential equations or Markov chains. The advantage is that the analysis becomes greatly facilitated due to the existence of well established methods. However, this typically comes at the cost of disregarding information about the process. Here, we introduce the use of stochastic differential equations (SDEs) for the study of EAs. SDEs can produce simple analytical results for the dynamics of stochastic processes, unlike Markov chains which can produce rigorous but unwieldy expressions about the dynamics. On the other hand, unlike ordinary differential equations (ODEs), they do not discard information about the stochasticity of the process. We show that these are especially suitable for the analysis of fixed budget scenarios and present analogues of the additive and multiplicative drift theorems from runtime analysis. In addition, we derive a new more general multiplicative drift theorem that also covers non-elitist EAs. This theorem simultaneously allows for positive and negative results, providing information on the algorithm's progress even when the problem cannot be optimised efficiently. Finally, we provide results for some well-known heuristics namely Random Walk (RW), Random Local Search (RLS), the (1+1) EA, the Metropolis Algorithm (MA), and the Strong Selection Weak Mutation (SSWM) algorithm.
A novel approach for medical research on lymphomas
Conte, Cécile; Palmaro, Aurore; Grosclaude, Pascale; Daubisse-Marliac, Laetitia; Despas, Fabien; Lapeyre-Mestre, Maryse
2018-01-01
Abstract The use of claims database to study lymphomas in real-life conditions is a crucial issue in the future. In this way, it is essential to develop validated algorithms for the identification of lymphomas in these databases. The aim of this study was to assess the validity of diagnosis codes in the French health insurance database to identify incident cases of lymphomas according to results of a regional cancer registry, as the gold standard. Between 2010 and 2013, incident lymphomas were identified in hospital data through 2 algorithms of selection. The results of the identification process and characteristics of incident lymphomas cases were compared with data from the Tarn Cancer Registry. Each algorithm's performance was assessed by estimating sensitivity, predictive positive value, specificity (SPE), and negative predictive value. During the period, the registry recorded 476 incident cases of lymphomas, of which 52 were Hodgkin lymphomas and 424 non-Hodgkin lymphomas. For corresponding area and period, algorithm 1 provides a number of incident cases close to the Registry, whereas algorithm 2 overestimated the number of incident cases by approximately 30%. Both algorithms were highly specific (SPE = 99.9%) but moderately sensitive. The comparative analysis illustrates that similar distribution and characteristics are observed in both sources. Given these findings, the use of claims database can be consider as a pertinent and powerful tool to conduct medico-economic or pharmacoepidemiological studies in lymphomas. PMID:29480830
A hybrid approach for efficient anomaly detection using metaheuristic methods
Ghanem, Tamer F.; Elkilani, Wail S.; Abdul-kader, Hatem M.
2014-01-01
Network intrusion detection based on anomaly detection techniques has a significant role in protecting networks and systems against harmful activities. Different metaheuristic techniques have been used for anomaly detector generation. Yet, reported literature has not studied the use of the multi-start metaheuristic method for detector generation. This paper proposes a hybrid approach for anomaly detection in large scale datasets using detectors generated based on multi-start metaheuristic method and genetic algorithms. The proposed approach has taken some inspiration of negative selection-based detector generation. The evaluation of this approach is performed using NSL-KDD dataset which is a modified version of the widely used KDD CUP 99 dataset. The results show its effectiveness in generating a suitable number of detectors with an accuracy of 96.1% compared to other competitors of machine learning algorithms. PMID:26199752
A hybrid approach for efficient anomaly detection using metaheuristic methods.
Ghanem, Tamer F; Elkilani, Wail S; Abdul-Kader, Hatem M
2015-07-01
Network intrusion detection based on anomaly detection techniques has a significant role in protecting networks and systems against harmful activities. Different metaheuristic techniques have been used for anomaly detector generation. Yet, reported literature has not studied the use of the multi-start metaheuristic method for detector generation. This paper proposes a hybrid approach for anomaly detection in large scale datasets using detectors generated based on multi-start metaheuristic method and genetic algorithms. The proposed approach has taken some inspiration of negative selection-based detector generation. The evaluation of this approach is performed using NSL-KDD dataset which is a modified version of the widely used KDD CUP 99 dataset. The results show its effectiveness in generating a suitable number of detectors with an accuracy of 96.1% compared to other competitors of machine learning algorithms.
Jeyasingh, Suganthi; Veluchamy, Malathi
2017-05-01
Early diagnosis of breast cancer is essential to save lives of patients. Usually, medical datasets include a large variety of data that can lead to confusion during diagnosis. The Knowledge Discovery on Database (KDD) process helps to improve efficiency. It requires elimination of inappropriate and repeated data from the dataset before final diagnosis. This can be done using any of the feature selection algorithms available in data mining. Feature selection is considered as a vital step to increase the classification accuracy. This paper proposes a Modified Bat Algorithm (MBA) for feature selection to eliminate irrelevant features from an original dataset. The Bat algorithm was modified using simple random sampling to select the random instances from the dataset. Ranking was with the global best features to recognize the predominant features available in the dataset. The selected features are used to train a Random Forest (RF) classification algorithm. The MBA feature selection algorithm enhanced the classification accuracy of RF in identifying the occurrence of breast cancer. The Wisconsin Diagnosis Breast Cancer Dataset (WDBC) was used for estimating the performance analysis of the proposed MBA feature selection algorithm. The proposed algorithm achieved better performance in terms of Kappa statistic, Mathew’s Correlation Coefficient, Precision, F-measure, Recall, Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Relative Absolute Error (RAE) and Root Relative Squared Error (RRSE). Creative Commons Attribution License
Machine Learning to Improve the Effectiveness of ANRS in Predicting HIV Drug Resistance.
Singh, Yashik
2017-10-01
Human immunodeficiency virus infection and acquired immune deficiency syndrome (HIV/AIDS) is one of the major burdens of disease in developing countries, and the standard-of-care treatment includes prescribing antiretroviral drugs. However, antiretroviral drug resistance is inevitable due to selective pressure associated with the high mutation rate of HIV. Determining antiretroviral resistance can be done by phenotypic laboratory tests or by computer-based interpretation algorithms. Computer-based algorithms have been shown to have many advantages over laboratory tests. The ANRS (Agence Nationale de Recherches sur le SIDA) is regarded as a gold standard in interpreting HIV drug resistance using mutations in genomes. The aim of this study was to improve the prediction of the ANRS gold standard in predicting HIV drug resistance. A genome sequence and HIV drug resistance measures were obtained from the Stanford HIV database (http://hivdb.stanford.edu/). Feature selection was used to determine the most important mutations associated with resistance prediction. These mutations were added to the ANRS rules, and the difference in the prediction ability was measured. This study uncovered important mutations that were not associated with the original ANRS rules. On average, the ANRS algorithm was improved by 79% ± 6.6%. The positive predictive value improved by 28%, and the negative predicative value improved by 10%. The study shows that there is a significant improvement in the prediction ability of ANRS gold standard.
NASA Astrophysics Data System (ADS)
Lim, Hongki; Dewaraja, Yuni K.; Fessler, Jeffrey A.
2018-02-01
Most existing PET image reconstruction methods impose a nonnegativity constraint in the image domain that is natural physically, but can lead to biased reconstructions. This bias is particularly problematic for Y-90 PET because of the low probability positron production and high random coincidence fraction. This paper investigates a new PET reconstruction formulation that enforces nonnegativity of the projections instead of the voxel values. This formulation allows some negative voxel values, thereby potentially reducing bias. Unlike the previously reported NEG-ML approach that modifies the Poisson log-likelihood to allow negative values, the new formulation retains the classical Poisson statistical model. To relax the non-negativity constraint embedded in the standard methods for PET reconstruction, we used an alternating direction method of multipliers (ADMM). Because choice of ADMM parameters can greatly influence convergence rate, we applied an automatic parameter selection method to improve the convergence speed. We investigated the methods using lung to liver slices of XCAT phantom. We simulated low true coincidence count-rates with high random fractions corresponding to the typical values from patient imaging in Y-90 microsphere radioembolization. We compared our new methods with standard reconstruction algorithms and NEG-ML and a regularized version thereof. Both our new method and NEG-ML allow more accurate quantification in all volumes of interest while yielding lower noise than the standard method. The performance of NEG-ML can degrade when its user-defined parameter is tuned poorly, while the proposed algorithm is robust to any count level without requiring parameter tuning.
Statistical analysis for validating ACO-KNN algorithm as feature selection in sentiment analysis
NASA Astrophysics Data System (ADS)
Ahmad, Siti Rohaidah; Yusop, Nurhafizah Moziyana Mohd; Bakar, Azuraliza Abu; Yaakub, Mohd Ridzwan
2017-10-01
This research paper aims to propose a hybrid of ant colony optimization (ACO) and k-nearest neighbor (KNN) algorithms as feature selections for selecting and choosing relevant features from customer review datasets. Information gain (IG), genetic algorithm (GA), and rough set attribute reduction (RSAR) were used as baseline algorithms in a performance comparison with the proposed algorithm. This paper will also discuss the significance test, which was used to evaluate the performance differences between the ACO-KNN, IG-GA, and IG-RSAR algorithms. This study evaluated the performance of the ACO-KNN algorithm using precision, recall, and F-score, which were validated using the parametric statistical significance tests. The evaluation process has statistically proven that this ACO-KNN algorithm has been significantly improved compared to the baseline algorithms. The evaluation process has statistically proven that this ACO-KNN algorithm has been significantly improved compared to the baseline algorithms. In addition, the experimental results have proven that the ACO-KNN can be used as a feature selection technique in sentiment analysis to obtain quality, optimal feature subset that can represent the actual data in customer review data.
The artificial-free technique along the objective direction for the simplex algorithm
NASA Astrophysics Data System (ADS)
Boonperm, Aua-aree; Sinapiromsaran, Krung
2014-03-01
The simplex algorithm is a popular algorithm for solving linear programming problems. If the origin point satisfies all constraints then the simplex can be started. Otherwise, artificial variables will be introduced to start the simplex algorithm. If we can start the simplex algorithm without using artificial variables then the simplex iterate will require less time. In this paper, we present the artificial-free technique for the simplex algorithm by mapping the problem into the objective plane and splitting constraints into three groups. In the objective plane, one of variables which has a nonzero coefficient of the objective function is fixed in terms of another variable. Then it can split constraints into three groups: the positive coefficient group, the negative coefficient group and the zero coefficient group. Along the objective direction, some constraints from the positive coefficient group will form the optimal solution. If the positive coefficient group is nonempty, the algorithm starts with relaxing constraints from the negative coefficient group and the zero coefficient group. We guarantee the feasible region obtained from the positive coefficient group to be nonempty. The transformed problem is solved using the simplex algorithm. Additional constraints from the negative coefficient group and the zero coefficient group will be added to the solved problem and use the dual simplex method to determine the new optimal solution. An example shows the effectiveness of our algorithm.
de Paula, Lauro C. M.; Soares, Anderson S.; de Lima, Telma W.; Delbem, Alexandre C. B.; Coelho, Clarimar J.; Filho, Arlindo R. G.
2014-01-01
Several variable selection algorithms in multivariate calibration can be accelerated using Graphics Processing Units (GPU). Among these algorithms, the Firefly Algorithm (FA) is a recent proposed metaheuristic that may be used for variable selection. This paper presents a GPU-based FA (FA-MLR) with multiobjective formulation for variable selection in multivariate calibration problems and compares it with some traditional sequential algorithms in the literature. The advantage of the proposed implementation is demonstrated in an example involving a relatively large number of variables. The results showed that the FA-MLR, in comparison with the traditional algorithms is a more suitable choice and a relevant contribution for the variable selection problem. Additionally, the results also demonstrated that the FA-MLR performed in a GPU can be five times faster than its sequential implementation. PMID:25493625
de Paula, Lauro C M; Soares, Anderson S; de Lima, Telma W; Delbem, Alexandre C B; Coelho, Clarimar J; Filho, Arlindo R G
2014-01-01
Several variable selection algorithms in multivariate calibration can be accelerated using Graphics Processing Units (GPU). Among these algorithms, the Firefly Algorithm (FA) is a recent proposed metaheuristic that may be used for variable selection. This paper presents a GPU-based FA (FA-MLR) with multiobjective formulation for variable selection in multivariate calibration problems and compares it with some traditional sequential algorithms in the literature. The advantage of the proposed implementation is demonstrated in an example involving a relatively large number of variables. The results showed that the FA-MLR, in comparison with the traditional algorithms is a more suitable choice and a relevant contribution for the variable selection problem. Additionally, the results also demonstrated that the FA-MLR performed in a GPU can be five times faster than its sequential implementation.
Xie, Jianwen; Douglas, Pamela K; Wu, Ying Nian; Brody, Arthur L; Anderson, Ariana E
2017-04-15
Brain networks in fMRI are typically identified using spatial independent component analysis (ICA), yet other mathematical constraints provide alternate biologically-plausible frameworks for generating brain networks. Non-negative matrix factorization (NMF) would suppress negative BOLD signal by enforcing positivity. Spatial sparse coding algorithms (L1 Regularized Learning and K-SVD) would impose local specialization and a discouragement of multitasking, where the total observed activity in a single voxel originates from a restricted number of possible brain networks. The assumptions of independence, positivity, and sparsity to encode task-related brain networks are compared; the resulting brain networks within scan for different constraints are used as basis functions to encode observed functional activity. These encodings are then decoded using machine learning, by using the time series weights to predict within scan whether a subject is viewing a video, listening to an audio cue, or at rest, in 304 fMRI scans from 51 subjects. The sparse coding algorithm of L1 Regularized Learning outperformed 4 variations of ICA (p<0.001) for predicting the task being performed within each scan using artifact-cleaned components. The NMF algorithms, which suppressed negative BOLD signal, had the poorest accuracy compared to the ICA and sparse coding algorithms. Holding constant the effect of the extraction algorithm, encodings using sparser spatial networks (containing more zero-valued voxels) had higher classification accuracy (p<0.001). Lower classification accuracy occurred when the extracted spatial maps contained more CSF regions (p<0.001). The success of sparse coding algorithms suggests that algorithms which enforce sparsity, discourage multitasking, and promote local specialization may capture better the underlying source processes than those which allow inexhaustible local processes such as ICA. Negative BOLD signal may capture task-related activations. Copyright © 2017 Elsevier B.V. All rights reserved.
Selecting materialized views using random algorithm
NASA Astrophysics Data System (ADS)
Zhou, Lijuan; Hao, Zhongxiao; Liu, Chi
2007-04-01
The data warehouse is a repository of information collected from multiple possibly heterogeneous autonomous distributed databases. The information stored at the data warehouse is in form of views referred to as materialized views. The selection of the materialized views is one of the most important decisions in designing a data warehouse. Materialized views are stored in the data warehouse for the purpose of efficiently implementing on-line analytical processing queries. The first issue for the user to consider is query response time. So in this paper, we develop algorithms to select a set of views to materialize in data warehouse in order to minimize the total view maintenance cost under the constraint of a given query response time. We call it query_cost view_ selection problem. First, cost graph and cost model of query_cost view_ selection problem are presented. Second, the methods for selecting materialized views by using random algorithms are presented. The genetic algorithm is applied to the materialized views selection problem. But with the development of genetic process, the legal solution produced become more and more difficult, so a lot of solutions are eliminated and producing time of the solutions is lengthened in genetic algorithm. Therefore, improved algorithm has been presented in this paper, which is the combination of simulated annealing algorithm and genetic algorithm for the purpose of solving the query cost view selection problem. Finally, in order to test the function and efficiency of our algorithms experiment simulation is adopted. The experiments show that the given methods can provide near-optimal solutions in limited time and works better in practical cases. Randomized algorithms will become invaluable tools for data warehouse evolution.
FluBreaks: early epidemic detection from Google flu trends.
Pervaiz, Fahad; Pervaiz, Mansoor; Abdur Rehman, Nabeel; Saif, Umar
2012-10-04
The Google Flu Trends service was launched in 2008 to track changes in the volume of online search queries related to flu-like symptoms. Over the last few years, the trend data produced by this service has shown a consistent relationship with the actual number of flu reports collected by the US Centers for Disease Control and Prevention (CDC), often identifying increases in flu cases weeks in advance of CDC records. However, contrary to popular belief, Google Flu Trends is not an early epidemic detection system. Instead, it is designed as a baseline indicator of the trend, or changes, in the number of disease cases. To evaluate whether these trends can be used as a basis for an early warning system for epidemics. We present the first detailed algorithmic analysis of how Google Flu Trends can be used as a basis for building a fully automated system for early warning of epidemics in advance of methods used by the CDC. Based on our work, we present a novel early epidemic detection system, called FluBreaks (dritte.org/flubreaks), based on Google Flu Trends data. We compared the accuracy and practicality of three types of algorithms: normal distribution algorithms, Poisson distribution algorithms, and negative binomial distribution algorithms. We explored the relative merits of these methods, and related our findings to changes in Internet penetration and population size for the regions in Google Flu Trends providing data. Across our performance metrics of percentage true-positives (RTP), percentage false-positives (RFP), percentage overlap (OT), and percentage early alarms (EA), Poisson- and negative binomial-based algorithms performed better in all except RFP. Poisson-based algorithms had average values of 99%, 28%, 71%, and 76% for RTP, RFP, OT, and EA, respectively, whereas negative binomial-based algorithms had average values of 97.8%, 17.8%, 60%, and 55% for RTP, RFP, OT, and EA, respectively. Moreover, the EA was also affected by the region's population size. Regions with larger populations (regions 4 and 6) had higher values of EA than region 10 (which had the smallest population) for negative binomial- and Poisson-based algorithms. The difference was 12.5% and 13.5% on average in negative binomial- and Poisson-based algorithms, respectively. We present the first detailed comparative analysis of popular early epidemic detection algorithms on Google Flu Trends data. We note that realizing this opportunity requires moving beyond the cumulative sum and historical limits method-based normal distribution approaches, traditionally employed by the CDC, to negative binomial- and Poisson-based algorithms to deal with potentially noisy search query data from regions with varying population and Internet penetrations. Based on our work, we have developed FluBreaks, an early warning system for flu epidemics using Google Flu Trends.
Chen, Qiang; Chen, Yunhao; Jiang, Weiguo
2016-07-30
In the field of multiple features Object-Based Change Detection (OBCD) for very-high-resolution remotely sensed images, image objects have abundant features and feature selection affects the precision and efficiency of OBCD. Through object-based image analysis, this paper proposes a Genetic Particle Swarm Optimization (GPSO)-based feature selection algorithm to solve the optimization problem of feature selection in multiple features OBCD. We select the Ratio of Mean to Variance (RMV) as the fitness function of GPSO, and apply the proposed algorithm to the object-based hybrid multivariate alternative detection model. Two experiment cases on Worldview-2/3 images confirm that GPSO can significantly improve the speed of convergence, and effectively avoid the problem of premature convergence, relative to other feature selection algorithms. According to the accuracy evaluation of OBCD, GPSO is superior at overall accuracy (84.17% and 83.59%) and Kappa coefficient (0.6771 and 0.6314) than other algorithms. Moreover, the sensitivity analysis results show that the proposed algorithm is not easily influenced by the initial parameters, but the number of features to be selected and the size of the particle swarm would affect the algorithm. The comparison experiment results reveal that RMV is more suitable than other functions as the fitness function of GPSO-based feature selection algorithm.
Affine Projection Algorithm with Improved Data-Selective Method Using the Condition Number
NASA Astrophysics Data System (ADS)
Ban, Sung Jun; Lee, Chang Woo; Kim, Sang Woo
Recently, a data-selective method has been proposed to achieve low misalignment in affine projection algorithm (APA) by keeping the condition number of an input data matrix small. We present an improved method, and a complexity reduction algorithm for the APA with the data-selective method. Experimental results show that the proposed algorithm has lower misalignment and a lower condition number for an input data matrix than both the conventional APA and the APA with the previous data-selective method.
A selective-update affine projection algorithm with selective input vectors
NASA Astrophysics Data System (ADS)
Kong, NamWoong; Shin, JaeWook; Park, PooGyeon
2011-10-01
This paper proposes an affine projection algorithm (APA) with selective input vectors, which based on the concept of selective-update in order to reduce estimation errors and computations. The algorithm consists of two procedures: input- vector-selection and state-decision. The input-vector-selection procedure determines the number of input vectors by checking with mean square error (MSE) whether the input vectors have enough information for update. The state-decision procedure determines the current state of the adaptive filter by using the state-decision criterion. As the adaptive filter is in transient state, the algorithm updates the filter coefficients with the selected input vectors. On the other hand, as soon as the adaptive filter reaches the steady state, the update procedure is not performed. Through these two procedures, the proposed algorithm achieves small steady-state estimation errors, low computational complexity and low update complexity for colored input signals.
Novel search algorithms for a mid-infrared spectral library of cotton contaminants.
Loudermilk, J Brian; Himmelsbach, David S; Barton, Franklin E; de Haseth, James A
2008-06-01
During harvest, a variety of plant based contaminants are collected along with cotton lint. The USDA previously created a mid-infrared, attenuated total reflection (ATR), Fourier transform infrared (FT-IR) spectral library of cotton contaminants for contaminant identification as the contaminants have negative impacts on yarn quality. This library has shown impressive identification rates for extremely similar cellulose based contaminants in cases where the library was representative of the samples searched. When spectra of contaminant samples from crops grown in different geographic locations, seasons, and conditions and measured with a different spectrometer and accessories were searched, identification rates for standard search algorithms decreased significantly. Six standard algorithms were examined: dot product, correlation, sum of absolute values of differences, sum of the square root of the absolute values of differences, sum of absolute values of differences of derivatives, and sum of squared differences of derivatives. Four categories of contaminants derived from cotton plants were considered: leaf, stem, seed coat, and hull. Experiments revealed that the performance of the standard search algorithms depended upon the category of sample being searched and that different algorithms provided complementary information about sample identity. These results indicated that choosing a single standard algorithm to search the library was not possible. Three voting scheme algorithms based on result frequency, result rank, category frequency, or a combination of these factors for the results returned by the standard algorithms were developed and tested for their capability to overcome the unpredictability of the standard algorithms' performances. The group voting scheme search was based on the number of spectra from each category of samples represented in the library returned in the top ten results of the standard algorithms. This group algorithm was able to identify correctly as many test spectra as the best standard algorithm without relying on human choice to select a standard algorithm to perform the searches.
Archana, Siddaiah; Nongkrynh, B; Anand, K; Pandav, C S
2015-09-21
High prevalence of reproductive morbidities is seen among adolescents in India. Health workers play an important role in providing health services in the community, including the adolescent reproductive health services. A study was done to assess the feasibility of training female health workers (FHWs) in the classification and management of selected adolescent girls' reproductive health problems according to modified WHO algorithms. The study was conducted between Jan-Sept 2011 in Northern India. Thirteen FHWs were trained regarding adolescent girls' reproductive health as per WHO Adolescent Job-Aid booklet. A pre and post-test assessment of the knowledge of the FHWs was carried out. All FHWs were given five modified WHO algorithms to classify and manage common reproductive morbidities among adolescent girls. All the FHWs applied the algorithms on at least ten adolescent girls at their respective sub-centres. Simultaneously, a medical doctor independently applied the same algorithms in all girls. Classification of the condition was followed by relevant management and advice provided in the algorithm. Focus group discussion with the FHWs was carried out to receive their feedback. After training the median score of the FHWs increased from 19.2 to 25.2 (p - 0.0071). Out of 144 girls examined by the FHWs 108 were classified as true positives and 30 as true negatives and agreement as measured by kappa was 0.7 (0.5-0.9). Sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) were 94.3% (88.2-97.4), 78.9% (63.6-88.9), 92.5% (86.0-96.2), and 83.3% (68.1-92.1) respectively. A consistent and significant difference between pre and post training knowledge scores of the FHWs were observed and hence it was possible to use the modified Job Aid algorithms with ease. Limitation of this study was the munber of FHWs trained was small. Issues such as time management during routine work, timing of training, overhead cost of training etc were not taken into account. Training was successful in increasing the knowledge of the FHWs about adolescent girls' reproductive health issues. The FHWs were able to satisfactorily classify the common adolescent girls' problems using the modified WHO algorithms.
Al-Rajab, Murad; Lu, Joan; Xu, Qiang
2017-07-01
This paper examines the accuracy and efficiency (time complexity) of high performance genetic data feature selection and classification algorithms for colon cancer diagnosis. The need for this research derives from the urgent and increasing need for accurate and efficient algorithms. Colon cancer is a leading cause of death worldwide, hence it is vitally important for the cancer tissues to be expertly identified and classified in a rapid and timely manner, to assure both a fast detection of the disease and to expedite the drug discovery process. In this research, a three-phase approach was proposed and implemented: Phases One and Two examined the feature selection algorithms and classification algorithms employed separately, and Phase Three examined the performance of the combination of these. It was found from Phase One that the Particle Swarm Optimization (PSO) algorithm performed best with the colon dataset as a feature selection (29 genes selected) and from Phase Two that the Support Vector Machine (SVM) algorithm outperformed other classifications, with an accuracy of almost 86%. It was also found from Phase Three that the combined use of PSO and SVM surpassed other algorithms in accuracy and performance, and was faster in terms of time analysis (94%). It is concluded that applying feature selection algorithms prior to classification algorithms results in better accuracy than when the latter are applied alone. This conclusion is important and significant to industry and society. Copyright © 2017 Elsevier B.V. All rights reserved.
Yu, Qiang; Wei, Dingbang; Huo, Hongwei
2018-06-18
Given a set of t n-length DNA sequences, q satisfying 0 < q ≤ 1, and l and d satisfying 0 ≤ d < l < n, the quorum planted motif search (qPMS) finds l-length strings that occur in at least qt input sequences with up to d mismatches and is mainly used to locate transcription factor binding sites in DNA sequences. Existing qPMS algorithms have been able to efficiently process small standard datasets (e.g., t = 20 and n = 600), but they are too time consuming to process large DNA datasets, such as ChIP-seq datasets that contain thousands of sequences or more. We analyze the effects of t and q on the time performance of qPMS algorithms and find that a large t or a small q causes a longer computation time. Based on this information, we improve the time performance of existing qPMS algorithms by selecting a sample sequence set D' with a small t and a large q from the large input dataset D and then executing qPMS algorithms on D'. A sample sequence selection algorithm named SamSelect is proposed. The experimental results on both simulated and real data show (1) that SamSelect can select D' efficiently and (2) that the qPMS algorithms executed on D' can find implanted or real motifs in a significantly shorter time than when executed on D. We improve the ability of existing qPMS algorithms to process large DNA datasets from the perspective of selecting high-quality sample sequence sets so that the qPMS algorithms can find motifs in a short time in the selected sample sequence set D', rather than take an unfeasibly long time to search the original sequence set D. Our motif discovery method is an approximate algorithm.
2017-07-07
RESEARCH ARTICLE Self-reported HIV-positive status but subsequent HIV-negative test result using rapid diagnostic testing algorithms among seven sub...America * judith.harbertson.ctr@mail.mil Abstract HIV rapid diagnostic tests (RDTs) combined in an algorithm are the current standard for HIV diagnosis...in many sub-Saharan African countries, and extensive laboratory testing has con- firmed HIV RDTs have excellent sensitivity and specificity. However
Fast object detection algorithm based on HOG and CNN
NASA Astrophysics Data System (ADS)
Lu, Tongwei; Wang, Dandan; Zhang, Yanduo
2018-04-01
In the field of computer vision, object classification and object detection are widely used in many fields. The traditional object detection have two main problems:one is that sliding window of the regional selection strategy is high time complexity and have window redundancy. And the other one is that Robustness of the feature is not well. In order to solve those problems, Regional Proposal Network (RPN) is used to select candidate regions instead of selective search algorithm. Compared with traditional algorithms and selective search algorithms, RPN has higher efficiency and accuracy. We combine HOG feature and convolution neural network (CNN) to extract features. And we use SVM to classify. For TorontoNet, our algorithm's mAP is 1.6 percentage points higher. For OxfordNet, our algorithm's mAP is 1.3 percentage higher.
Introducing Bayesian thinking to high-throughput screening for false-negative rate estimation.
Wei, Xin; Gao, Lin; Zhang, Xiaolei; Qian, Hong; Rowan, Karen; Mark, David; Peng, Zhengwei; Huang, Kuo-Sen
2013-10-01
High-throughput screening (HTS) has been widely used to identify active compounds (hits) that bind to biological targets. Because of cost concerns, the comprehensive screening of millions of compounds is typically conducted without replication. Real hits that fail to exhibit measurable activity in the primary screen due to random experimental errors will be lost as false-negatives. Conceivably, the projected false-negative rate is a parameter that reflects screening quality. Furthermore, it can be used to guide the selection of optimal numbers of compounds for hit confirmation. Therefore, a method that predicts false-negative rates from the primary screening data is extremely valuable. In this article, we describe the implementation of a pilot screen on a representative fraction (1%) of the screening library in order to obtain information about assay variability as well as a preliminary hit activity distribution profile. Using this training data set, we then developed an algorithm based on Bayesian logic and Monte Carlo simulation to estimate the number of true active compounds and potential missed hits from the full library screen. We have applied this strategy to five screening projects. The results demonstrate that this method produces useful predictions on the numbers of false negatives.
Novel and efficient tag SNPs selection algorithms.
Chen, Wen-Pei; Hung, Che-Lun; Tsai, Suh-Jen Jane; Lin, Yaw-Ling
2014-01-01
SNPs are the most abundant forms of genetic variations amongst species; the association studies between complex diseases and SNPs or haplotypes have received great attention. However, these studies are restricted by the cost of genotyping all SNPs; thus, it is necessary to find smaller subsets, or tag SNPs, representing the rest of the SNPs. In fact, the existing tag SNP selection algorithms are notoriously time-consuming. An efficient algorithm for tag SNP selection was presented, which was applied to analyze the HapMap YRI data. The experimental results show that the proposed algorithm can achieve better performance than the existing tag SNP selection algorithms; in most cases, this proposed algorithm is at least ten times faster than the existing methods. In many cases, when the redundant ratio of the block is high, the proposed algorithm can even be thousands times faster than the previously known methods. Tools and web services for haplotype block analysis integrated by hadoop MapReduce framework are also developed using the proposed algorithm as computation kernels.
Chen, Qiang; Chen, Yunhao; Jiang, Weiguo
2016-01-01
In the field of multiple features Object-Based Change Detection (OBCD) for very-high-resolution remotely sensed images, image objects have abundant features and feature selection affects the precision and efficiency of OBCD. Through object-based image analysis, this paper proposes a Genetic Particle Swarm Optimization (GPSO)-based feature selection algorithm to solve the optimization problem of feature selection in multiple features OBCD. We select the Ratio of Mean to Variance (RMV) as the fitness function of GPSO, and apply the proposed algorithm to the object-based hybrid multivariate alternative detection model. Two experiment cases on Worldview-2/3 images confirm that GPSO can significantly improve the speed of convergence, and effectively avoid the problem of premature convergence, relative to other feature selection algorithms. According to the accuracy evaluation of OBCD, GPSO is superior at overall accuracy (84.17% and 83.59%) and Kappa coefficient (0.6771 and 0.6314) than other algorithms. Moreover, the sensitivity analysis results show that the proposed algorithm is not easily influenced by the initial parameters, but the number of features to be selected and the size of the particle swarm would affect the algorithm. The comparison experiment results reveal that RMV is more suitable than other functions as the fitness function of GPSO-based feature selection algorithm. PMID:27483285
Online selective kernel-based temporal difference learning.
Chen, Xingguo; Gao, Yang; Wang, Ruili
2013-12-01
In this paper, an online selective kernel-based temporal difference (OSKTD) learning algorithm is proposed to deal with large scale and/or continuous reinforcement learning problems. OSKTD includes two online procedures: online sparsification and parameter updating for the selective kernel-based value function. A new sparsification method (i.e., a kernel distance-based online sparsification method) is proposed based on selective ensemble learning, which is computationally less complex compared with other sparsification methods. With the proposed sparsification method, the sparsified dictionary of samples is constructed online by checking if a sample needs to be added to the sparsified dictionary. In addition, based on local validity, a selective kernel-based value function is proposed to select the best samples from the sample dictionary for the selective kernel-based value function approximator. The parameters of the selective kernel-based value function are iteratively updated by using the temporal difference (TD) learning algorithm combined with the gradient descent technique. The complexity of the online sparsification procedure in the OSKTD algorithm is O(n). In addition, two typical experiments (Maze and Mountain Car) are used to compare with both traditional and up-to-date O(n) algorithms (GTD, GTD2, and TDC using the kernel-based value function), and the results demonstrate the effectiveness of our proposed algorithm. In the Maze problem, OSKTD converges to an optimal policy and converges faster than both traditional and up-to-date algorithms. In the Mountain Car problem, OSKTD converges, requires less computation time compared with other sparsification methods, gets a better local optima than the traditional algorithms, and converges much faster than the up-to-date algorithms. In addition, OSKTD can reach a competitive ultimate optima compared with the up-to-date algorithms.
Data Reduction Algorithm Using Nonnegative Matrix Factorization with Nonlinear Constraints
NASA Astrophysics Data System (ADS)
Sembiring, Pasukat
2017-12-01
Processing ofdata with very large dimensions has been a hot topic in recent decades. Various techniques have been proposed in order to execute the desired information or structure. Non- Negative Matrix Factorization (NMF) based on non-negatives data has become one of the popular methods for shrinking dimensions. The main strength of this method is non-negative object, the object model by a combination of some basic non-negative parts, so as to provide a physical interpretation of the object construction. The NMF is a dimension reduction method thathasbeen used widely for numerous applications including computer vision,text mining, pattern recognitions,and bioinformatics. Mathematical formulation for NMF did not appear as a convex optimization problem and various types of algorithms have been proposed to solve the problem. The Framework of Alternative Nonnegative Least Square(ANLS) are the coordinates of the block formulation approaches that have been proven reliable theoretically and empirically efficient. This paper proposes a new algorithm to solve NMF problem based on the framework of ANLS.This algorithm inherits the convergenceproperty of the ANLS framework to nonlinear constraints NMF formulations.
Muyoyeta, Monde; Moyo, Maureen; Kasese, Nkatya; Ndhlovu, Mapopa; Milimo, Deborah; Mwanza, Winfridah; Kapata, Nathan; Schaap, Albertus; Godfrey Faussett, Peter; Ayles, Helen
2015-01-01
The current cost of Xpert MTB RIF (Xpert) consumables is such that algorithms are needed to select which patients to prioritise for testing with Xpert. To evaluate two algorithms for prioritisation of Xpert in primary health care settings in a high TB and HIV burden setting. Consecutive, presumptive TB patients with a cough of any duration were offered either Xpert or Fluorescence microscopy (FM) test depending on their CXR score or HIV status. In one facility, sputa from patients with an abnormal CXR were tested with Xpert and those with a normal CXR were tested with FM ("CXR algorithm"). CXR was scored automatically using a Computer Aided Diagnosis (CAD) program. In the other facility, patients who were HIV positive were tested using Xpert and those who were HIV negative were tested with FM ("HIV algorithm"). Of 9482 individuals pre-screened with CXR, Xpert detected TB in 2090/6568 (31.8%) with an abnormal CXR, and FM was AFB positive in 8/2455 (0.3%) with a normal CXR. Of 4444 pre-screened with HIV, Xpert detected TB in 508/2265 (22.4%) HIV positive and FM was AFB positive in 212/1920 (11.0%) in HIV negative individuals. The notification rate of new bacteriologically confirmed TB increased; from 366 to 620/ 100,000/yr and from 145 to 261/100,000/yr at the CXR and HIV algorithm sites respectively. The median time to starting TB treatment at the CXR site compared to the HIV algorithm site was; 1(IQR 1-3 days) and 3 (2-5 days) (p<0.0001) respectively. Use of Xpert in a resource-limited setting at primary care level in conjunction with pre-screening tests reduced the number of Xpert tests performed. The routine use of Xpert resulted in additional cases of confirmed TB patients starting treatment. However, there was no increase in absolute numbers of patients starting TB treatment. Same day diagnosis and treatment commencement was achieved for both bacteriologically confirmed and empirically diagnosed patients where Xpert was used in conjunction with CXR.
Leader personality and crew effectiveness - A full-mission simulation experiment
NASA Technical Reports Server (NTRS)
Chidester, Thomas R.; Foushee, H. Clayton
1989-01-01
A full-mission simulation research study was completed to assess the impact of individual personality on crew performance. Using a selection algorithm described by Chidester (1987), captains were classified as fitting one of three profiles along a battery of personality assessment scales. The performances of 23 crews led by captains fitting each profile were contrasted over a one and one-half day simulated trip. Crews led by captains fitting a positive Instrumental-Expressive profile (high achievement motivation and interpersonal skill) were consistently effective and made fewer errors. Crews led by captains fitting a Negative Expressive profile (below average achievement motivation, negative expressive style, such as complaining) were consistently less effective and made more errors. Crews led by captains fitting a Negative Instrumental profile (high levels of competitiveness, Verbal Aggressiveness, and Impatience and Irritability) were less effective on the first day but equal to the best on the second day. These results underscore the importance of stable personality variables as predictors of team coordination and performance.
Low complexity adaptive equalizers for underwater acoustic communications
NASA Astrophysics Data System (ADS)
Soflaei, Masoumeh; Azmi, Paeiz
2014-08-01
Interference signals due to scattering from surface and reflecting from bottom is one of the most important problems of reliable communications in shallow water channels. To solve this problem, one of the best suggested ways is to use adaptive equalizers. Convergence rate and misadjustment error in adaptive algorithms play important roles in adaptive equalizer performance. In this paper, affine projection algorithm (APA), selective regressor APA(SR-APA), family of selective partial update (SPU) algorithms, family of set-membership (SM) algorithms and selective partial update selective regressor APA (SPU-SR-APA) are compared with conventional algorithms such as the least mean square (LMS) in underwater acoustic communications. We apply experimental data from the Strait of Hormuz for demonstrating the efficiency of the proposed methods over shallow water channel. We observe that the values of the steady-state mean square error (MSE) of SR-APA, SPU-APA, SPU-normalized least mean square (SPU-NLMS), SPU-SR-APA, SM-APA and SM-NLMS algorithms decrease in comparison with the LMS algorithm. Also these algorithms have better convergence rates than LMS type algorithm.
Evaluation of Sienna Cancer Diagnostics hTERT Antibody on 500 Consecutive Urinary Tract Specimens.
Allison, Derek B; Sharma, Rajni; Cowan, Morgan L; VandenBussche, Christopher J
2018-06-06
Telomerase activity can be detected in up to 90% of urothelial carcinomas (UC). Telomerase activity can also be detected in urinary tract cytology (UTC) specimens and indicate an increased risk of UC. We evaluated the performance of a commercially available antibody that putatively binds the telomerase reverse transcriptase (hTERT) subunit on 500 UTC specimens. Unstained CytospinTM preparations were created from residual urine specimens and were stained using the anti-hTERT antibody (SCD-A7). Two algorithms were developed for concatenating the hTERT result and cytologic diagnosis: a "no indeterminates algorithm," in which a negative cytology and positive hTERT result are considered positive, and a "high-specificity algorithm," in which a negative cytology and positive hTERT result are considered indeterminate (and thus negative for comparison to the gold standard). The "no indeterminates algorithm" and "high-specificity algorithm" yielded a sensitivity of 60.6 and 52.1%, a specificity of 70.4 and 90.7%, a positive predictive value of 39.1 and 63.8%, and a negative predictive value of 85.0 and 85.8%, respectively. A positive hTERT result may identify a subset of patients with an increased risk of high-grade UC (HGUC) who may otherwise not be closely followed, while a negative hTERT immunocytochemistry result is associated with a reduction in risk for HGUC. © 2018 The Author(s) Published by S. Karger AG, Basel.
Afzal, Zubair; Pons, Ewoud; Kang, Ning; Sturkenboom, Miriam C J M; Schuemie, Martijn J; Kors, Jan A
2014-11-29
In order to extract meaningful information from electronic medical records, such as signs and symptoms, diagnoses, and treatments, it is important to take into account the contextual properties of the identified information: negation, temporality, and experiencer. Most work on automatic identification of these contextual properties has been done on English clinical text. This study presents ContextD, an adaptation of the English ConText algorithm to the Dutch language, and a Dutch clinical corpus. We created a Dutch clinical corpus containing four types of anonymized clinical documents: entries from general practitioners, specialists' letters, radiology reports, and discharge letters. Using a Dutch list of medical terms extracted from the Unified Medical Language System, we identified medical terms in the corpus with exact matching. The identified terms were annotated for negation, temporality, and experiencer properties. To adapt the ConText algorithm, we translated English trigger terms to Dutch and added several general and document specific enhancements, such as negation rules for general practitioners' entries and a regular expression based temporality module. The ContextD algorithm utilized 41 unique triggers to identify the contextual properties in the clinical corpus. For the negation property, the algorithm obtained an F-score from 87% to 93% for the different document types. For the experiencer property, the F-score was 99% to 100%. For the historical and hypothetical values of the temporality property, F-scores ranged from 26% to 54% and from 13% to 44%, respectively. The ContextD showed good performance in identifying negation and experiencer property values across all Dutch clinical document types. Accurate identification of the temporality property proved to be difficult and requires further work. The anonymized and annotated Dutch clinical corpus can serve as a useful resource for further algorithm development.
Cawley, Gavin C; Talbot, Nicola L C
2006-10-01
Gene selection algorithms for cancer classification, based on the expression of a small number of biomarker genes, have been the subject of considerable research in recent years. Shevade and Keerthi propose a gene selection algorithm based on sparse logistic regression (SLogReg) incorporating a Laplace prior to promote sparsity in the model parameters, and provide a simple but efficient training procedure. The degree of sparsity obtained is determined by the value of a regularization parameter, which must be carefully tuned in order to optimize performance. This normally involves a model selection stage, based on a computationally intensive search for the minimizer of the cross-validation error. In this paper, we demonstrate that a simple Bayesian approach can be taken to eliminate this regularization parameter entirely, by integrating it out analytically using an uninformative Jeffrey's prior. The improved algorithm (BLogReg) is then typically two or three orders of magnitude faster than the original algorithm, as there is no longer a need for a model selection step. The BLogReg algorithm is also free from selection bias in performance estimation, a common pitfall in the application of machine learning algorithms in cancer classification. The SLogReg, BLogReg and Relevance Vector Machine (RVM) gene selection algorithms are evaluated over the well-studied colon cancer and leukaemia benchmark datasets. The leave-one-out estimates of the probability of test error and cross-entropy of the BLogReg and SLogReg algorithms are very similar, however the BlogReg algorithm is found to be considerably faster than the original SLogReg algorithm. Using nested cross-validation to avoid selection bias, performance estimation for SLogReg on the leukaemia dataset takes almost 48 h, whereas the corresponding result for BLogReg is obtained in only 1 min 24 s, making BLogReg by far the more practical algorithm. BLogReg also demonstrates better estimates of conditional probability than the RVM, which are of great importance in medical applications, with similar computational expense. A MATLAB implementation of the sparse logistic regression algorithm with Bayesian regularization (BLogReg) is available from http://theoval.cmp.uea.ac.uk/~gcc/cbl/blogreg/
Compact cancer biomarkers discovery using a swarm intelligence feature selection algorithm.
Martinez, Emmanuel; Alvarez, Mario Moises; Trevino, Victor
2010-08-01
Biomarker discovery is a typical application from functional genomics. Due to the large number of genes studied simultaneously in microarray data, feature selection is a key step. Swarm intelligence has emerged as a solution for the feature selection problem. However, swarm intelligence settings for feature selection fail to select small features subsets. We have proposed a swarm intelligence feature selection algorithm based on the initialization and update of only a subset of particles in the swarm. In this study, we tested our algorithm in 11 microarray datasets for brain, leukemia, lung, prostate, and others. We show that the proposed swarm intelligence algorithm successfully increase the classification accuracy and decrease the number of selected features compared to other swarm intelligence methods. Copyright © 2010 Elsevier Ltd. All rights reserved.
Multi-task feature selection in microarray data by binary integer programming.
Lan, Liang; Vucetic, Slobodan
2013-12-20
A major challenge in microarray classification is that the number of features is typically orders of magnitude larger than the number of examples. In this paper, we propose a novel feature filter algorithm to select the feature subset with maximal discriminative power and minimal redundancy by solving a quadratic objective function with binary integer constraints. To improve the computational efficiency, the binary integer constraints are relaxed and a low-rank approximation to the quadratic term is applied. The proposed feature selection algorithm was extended to solve multi-task microarray classification problems. We compared the single-task version of the proposed feature selection algorithm with 9 existing feature selection methods on 4 benchmark microarray data sets. The empirical results show that the proposed method achieved the most accurate predictions overall. We also evaluated the multi-task version of the proposed algorithm on 8 multi-task microarray datasets. The multi-task feature selection algorithm resulted in significantly higher accuracy than when using the single-task feature selection methods.
Alshamlan, Hala; Badr, Ghada; Alohali, Yousef
2015-01-01
An artificial bee colony (ABC) is a relatively recent swarm intelligence optimization approach. In this paper, we propose the first attempt at applying ABC algorithm in analyzing a microarray gene expression profile. In addition, we propose an innovative feature selection algorithm, minimum redundancy maximum relevance (mRMR), and combine it with an ABC algorithm, mRMR-ABC, to select informative genes from microarray profile. The new approach is based on a support vector machine (SVM) algorithm to measure the classification accuracy for selected genes. We evaluate the performance of the proposed mRMR-ABC algorithm by conducting extensive experiments on six binary and multiclass gene expression microarray datasets. Furthermore, we compare our proposed mRMR-ABC algorithm with previously known techniques. We reimplemented two of these techniques for the sake of a fair comparison using the same parameters. These two techniques are mRMR when combined with a genetic algorithm (mRMR-GA) and mRMR when combined with a particle swarm optimization algorithm (mRMR-PSO). The experimental results prove that the proposed mRMR-ABC algorithm achieves accurate classification performance using small number of predictive genes when tested using both datasets and compared to previously suggested methods. This shows that mRMR-ABC is a promising approach for solving gene selection and cancer classification problems. PMID:25961028
Alshamlan, Hala; Badr, Ghada; Alohali, Yousef
2015-01-01
An artificial bee colony (ABC) is a relatively recent swarm intelligence optimization approach. In this paper, we propose the first attempt at applying ABC algorithm in analyzing a microarray gene expression profile. In addition, we propose an innovative feature selection algorithm, minimum redundancy maximum relevance (mRMR), and combine it with an ABC algorithm, mRMR-ABC, to select informative genes from microarray profile. The new approach is based on a support vector machine (SVM) algorithm to measure the classification accuracy for selected genes. We evaluate the performance of the proposed mRMR-ABC algorithm by conducting extensive experiments on six binary and multiclass gene expression microarray datasets. Furthermore, we compare our proposed mRMR-ABC algorithm with previously known techniques. We reimplemented two of these techniques for the sake of a fair comparison using the same parameters. These two techniques are mRMR when combined with a genetic algorithm (mRMR-GA) and mRMR when combined with a particle swarm optimization algorithm (mRMR-PSO). The experimental results prove that the proposed mRMR-ABC algorithm achieves accurate classification performance using small number of predictive genes when tested using both datasets and compared to previously suggested methods. This shows that mRMR-ABC is a promising approach for solving gene selection and cancer classification problems.
Luomaranta, Anna; Bützow, Ralf; Pauna, Arja-Riitta; Leminen, Arto; Loukovaara, Mikko
2015-01-01
To compare two treatment strategies in women undergoing surgery for endometrial carcinoma. Retrospective cohort study. Tertiary care center. 1166 patients. Uterine biopsy/curettage was obtained in 1140 women, of whom 229 also had pelvic magnetic resonance imaging (MRI). We compared two strategies: (i) routine pelvic lymphadenectomy and (ii) selective pelvic lymphadenectomy for women with high-risk carcinomas as determined from preoperative histology and MRI. High-risk carcinomas included grade 1-2 endometrioid carcinomas with ≥50% myometrial invasion, grade 3 endometrioid carcinomas, and nonendometrioid carcinomas. Others were considered low-risk carcinomas. Diagnostic indices, treatment algorithms. Of the women who underwent lymphadenectomy, positive pelvic nodes were found in 2.3% of low-risk carcinomas and 18.3% of high-risk carcinomas. The combination of preoperative histology and MRI detected high-risk carcinomas with a sensitivity of 85.7%, a specificity of 75.0%, a positive predictive value of 74.4%, and a negative predictive value of 86.1%. Area under curve was 0.804. In the routine lymphadenectomy algorithm, 54.1% of lymphadenectomies were performed for low-risk carcinomas. In the selective lymphadenectomy algorithm, 14.3% of women with high-risk carcinomas did not receive lymphadenectomy. Missed positive pelvic nodes were estimated to occur in 2.1% of patients in the selective strategy. Similarly, the estimated risk for isolated para-aortic metastasis was 2.1%, regardless of treatment strategy. The combination of preoperative histology and MRI is moderately sensitive and specific in detecting high-risk endometrial carcinomas. The clinical utility of the method is hampered by the relatively high proportion of high-risk cases that remain unrecognized preoperatively. © 2014 Nordic Federation of Societies of Obstetrics and Gynecology.
Threshold automatic selection hybrid phase unwrapping algorithm for digital holographic microscopy
NASA Astrophysics Data System (ADS)
Zhou, Meiling; Min, Junwei; Yao, Baoli; Yu, Xianghua; Lei, Ming; Yan, Shaohui; Yang, Yanlong; Dan, Dan
2015-01-01
Conventional quality-guided (QG) phase unwrapping algorithm is hard to be applied to digital holographic microscopy because of the long execution time. In this paper, we present a threshold automatic selection hybrid phase unwrapping algorithm that combines the existing QG algorithm and the flood-filled (FF) algorithm to solve this problem. The original wrapped phase map is divided into high- and low-quality sub-maps by selecting a threshold automatically, and then the FF and QG unwrapping algorithms are used in each level to unwrap the phase, respectively. The feasibility of the proposed method is proved by experimental results, and the execution speed is shown to be much faster than that of the original QG unwrapping algorithm.
Dobson-Belaire, Wendy; Goodfield, Jason; Borrelli, Richard; Liu, Fei Fei; Khan, Zeba M
2018-01-01
Using diagnosis code-based algorithms is the primary method of identifying patient cohorts for retrospective studies; nevertheless, many databases lack reliable diagnosis code information. To develop precise algorithms based on medication claims/prescriber visits (MCs/PVs) to identify psoriasis (PsO) patients and psoriatic patients with arthritic conditions (PsO-AC), a proxy for psoriatic arthritis, in Canadian databases lacking diagnosis codes. Algorithms were developed using medications with narrow indication profiles in combination with prescriber specialty to define PsO and PsO-AC. For a 3-year study period from July 1, 2009, algorithms were validated using the PharMetrics Plus database, which contains both adjudicated medication claims and diagnosis codes. Positive predictive value (PPV), negative predictive value (NPV), sensitivity, and specificity of the developed algorithms were assessed using diagnosis code as the reference standard. Chosen algorithms were then applied to Canadian drug databases to profile the algorithm-identified PsO and PsO-AC cohorts. In the selected database, 183,328 patients were identified for validation. The highest PPVs for PsO (85%) and PsO-AC (65%) occurred when a predictive algorithm of two or more MCs/PVs was compared with the reference standard of one or more diagnosis codes. NPV and specificity were high (99%-100%), whereas sensitivity was low (≤30%). Reducing the number of MCs/PVs or increasing diagnosis claims decreased the algorithms' PPVs. We have developed an MC/PV-based algorithm to identify PsO patients with a high degree of accuracy, but accuracy for PsO-AC requires further investigation. Such methods allow researchers to conduct retrospective studies in databases in which diagnosis codes are absent. Copyright © 2018 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.
Developing a dengue forecast model using machine learning: A case study in China.
Guo, Pi; Liu, Tao; Zhang, Qin; Wang, Li; Xiao, Jianpeng; Zhang, Qingying; Luo, Ganfeng; Li, Zhihao; He, Jianfeng; Zhang, Yonghui; Ma, Wenjun
2017-10-01
In China, dengue remains an important public health issue with expanded areas and increased incidence recently. Accurate and timely forecasts of dengue incidence in China are still lacking. We aimed to use the state-of-the-art machine learning algorithms to develop an accurate predictive model of dengue. Weekly dengue cases, Baidu search queries and climate factors (mean temperature, relative humidity and rainfall) during 2011-2014 in Guangdong were gathered. A dengue search index was constructed for developing the predictive models in combination with climate factors. The observed year and week were also included in the models to control for the long-term trend and seasonality. Several machine learning algorithms, including the support vector regression (SVR) algorithm, step-down linear regression model, gradient boosted regression tree algorithm (GBM), negative binomial regression model (NBM), least absolute shrinkage and selection operator (LASSO) linear regression model and generalized additive model (GAM), were used as candidate models to predict dengue incidence. Performance and goodness of fit of the models were assessed using the root-mean-square error (RMSE) and R-squared measures. The residuals of the models were examined using the autocorrelation and partial autocorrelation function analyses to check the validity of the models. The models were further validated using dengue surveillance data from five other provinces. The epidemics during the last 12 weeks and the peak of the 2014 large outbreak were accurately forecasted by the SVR model selected by a cross-validation technique. Moreover, the SVR model had the consistently smallest prediction error rates for tracking the dynamics of dengue and forecasting the outbreaks in other areas in China. The proposed SVR model achieved a superior performance in comparison with other forecasting techniques assessed in this study. The findings can help the government and community respond early to dengue epidemics.
NASA Astrophysics Data System (ADS)
Zhang, Chen; Ni, Zhiwei; Ni, Liping; Tang, Na
2016-10-01
Feature selection is an important method of data preprocessing in data mining. In this paper, a novel feature selection method based on multi-fractal dimension and harmony search algorithm is proposed. Multi-fractal dimension is adopted as the evaluation criterion of feature subset, which can determine the number of selected features. An improved harmony search algorithm is used as the search strategy to improve the efficiency of feature selection. The performance of the proposed method is compared with that of other feature selection algorithms on UCI data-sets. Besides, the proposed method is also used to predict the daily average concentration of PM2.5 in China. Experimental results show that the proposed method can obtain competitive results in terms of both prediction accuracy and the number of selected features.
Lin, Kuan-Cheng; Hsieh, Yi-Hsiu
2015-10-01
The classification and analysis of data is an important issue in today's research. Selecting a suitable set of features makes it possible to classify an enormous quantity of data quickly and efficiently. Feature selection is generally viewed as a problem of feature subset selection, such as combination optimization problems. Evolutionary algorithms using random search methods have proven highly effective in obtaining solutions to problems of optimization in a diversity of applications. In this study, we developed a hybrid evolutionary algorithm based on endocrine-based particle swarm optimization (EPSO) and artificial bee colony (ABC) algorithms in conjunction with a support vector machine (SVM) for the selection of optimal feature subsets for the classification of datasets. The results of experiments using specific UCI medical datasets demonstrate that the accuracy of the proposed hybrid evolutionary algorithm is superior to that of basic PSO, EPSO and ABC algorithms, with regard to classification accuracy using subsets with a reduced number of features.
NASA Astrophysics Data System (ADS)
Lohvithee, Manasavee; Biguri, Ander; Soleimani, Manuchehr
2017-12-01
There are a number of powerful total variation (TV) regularization methods that have great promise in limited data cone-beam CT reconstruction with an enhancement of image quality. These promising TV methods require careful selection of the image reconstruction parameters, for which there are no well-established criteria. This paper presents a comprehensive evaluation of parameter selection in a number of major TV-based reconstruction algorithms. An appropriate way of selecting the values for each individual parameter has been suggested. Finally, a new adaptive-weighted projection-controlled steepest descent (AwPCSD) algorithm is presented, which implements the edge-preserving function for CBCT reconstruction with limited data. The proposed algorithm shows significant robustness compared to three other existing algorithms: ASD-POCS, AwASD-POCS and PCSD. The proposed AwPCSD algorithm is able to preserve the edges of the reconstructed images better with fewer sensitive parameters to tune.
Kenttä, Tuomas; Porthan, Kimmo; Tikkanen, Jani T; Väänänen, Heikki; Oikarinen, Lasse; Viitasalo, Matti; Karanko, Hannu; Laaksonen, Maarit; Huikuri, Heikki V
2015-07-01
Early repolarization (ER) is defined as an elevation of the QRS-ST junction in at least two inferior or lateral leads of the standard 12-lead electrocardiogram (ECG). Our purpose was to create an algorithm for the automated detection and classification of ER. A total of 6,047 electrocardiograms were manually graded for ER by two experienced readers. The automated detection of ER was based on quantification of the characteristic slurring or notching in ER-positive leads. The ER detection algorithm was tested and its results were compared with manual grading, which served as the reference. Readers graded 183 ECGs (3.0%) as ER positive, of which the algorithm detected 176 recordings, resulting in sensitivity of 96.2%. Of the 5,864 ER-negative recordings, the algorithm classified 5,281 as negative, resulting in 90.1% specificity. Positive and negative predictive values for the algorithm were 23.2% and 99.9%, respectively, and its accuracy was 90.2%. Inferior ER was correctly detected in 84.6% and lateral ER in 98.6% of the cases. As the automatic algorithm has high sensitivity, it could be used as a prescreening tool for ER; only the electrocardiograms graded positive by the algorithm would be reviewed manually. This would reduce the need for manual labor by 90%. © 2014 Wiley Periodicals, Inc.
Ma, Li; Fan, Suohai
2017-03-14
The random forests algorithm is a type of classifier with prominent universality, a wide application range, and robustness for avoiding overfitting. But there are still some drawbacks to random forests. Therefore, to improve the performance of random forests, this paper seeks to improve imbalanced data processing, feature selection and parameter optimization. We propose the CURE-SMOTE algorithm for the imbalanced data classification problem. Experiments on imbalanced UCI data reveal that the combination of Clustering Using Representatives (CURE) enhances the original synthetic minority oversampling technique (SMOTE) algorithms effectively compared with the classification results on the original data using random sampling, Borderline-SMOTE1, safe-level SMOTE, C-SMOTE, and k-means-SMOTE. Additionally, the hybrid RF (random forests) algorithm has been proposed for feature selection and parameter optimization, which uses the minimum out of bag (OOB) data error as its objective function. Simulation results on binary and higher-dimensional data indicate that the proposed hybrid RF algorithms, hybrid genetic-random forests algorithm, hybrid particle swarm-random forests algorithm and hybrid fish swarm-random forests algorithm can achieve the minimum OOB error and show the best generalization ability. The training set produced from the proposed CURE-SMOTE algorithm is closer to the original data distribution because it contains minimal noise. Thus, better classification results are produced from this feasible and effective algorithm. Moreover, the hybrid algorithm's F-value, G-mean, AUC and OOB scores demonstrate that they surpass the performance of the original RF algorithm. Hence, this hybrid algorithm provides a new way to perform feature selection and parameter optimization.
A novel artificial immune clonal selection classification and rule mining with swarm learning model
NASA Astrophysics Data System (ADS)
Al-Sheshtawi, Khaled A.; Abdul-Kader, Hatem M.; Elsisi, Ashraf B.
2013-06-01
Metaheuristic optimisation algorithms have become popular choice for solving complex problems. By integrating Artificial Immune clonal selection algorithm (CSA) and particle swarm optimisation (PSO) algorithm, a novel hybrid Clonal Selection Classification and Rule Mining with Swarm Learning Algorithm (CS2) is proposed. The main goal of the approach is to exploit and explore the parallel computation merit of Clonal Selection and the speed and self-organisation merits of Particle Swarm by sharing information between clonal selection population and particle swarm. Hence, we employed the advantages of PSO to improve the mutation mechanism of the artificial immune CSA and to mine classification rules within datasets. Consequently, our proposed algorithm required less training time and memory cells in comparison to other AIS algorithms. In this paper, classification rule mining has been modelled as a miltiobjective optimisation problem with predictive accuracy. The multiobjective approach is intended to allow the PSO algorithm to return an approximation to the accuracy and comprehensibility border, containing solutions that are spread across the border. We compared our proposed algorithm classification accuracy CS2 with five commonly used CSAs, namely: AIRS1, AIRS2, AIRS-Parallel, CLONALG, and CSCA using eight benchmark datasets. We also compared our proposed algorithm classification accuracy CS2 with other five methods, namely: Naïve Bayes, SVM, MLP, CART, and RFB. The results show that the proposed algorithm is comparable to the 10 studied algorithms. As a result, the hybridisation, built of CSA and PSO, can develop respective merit, compensate opponent defect, and make search-optimal effect and speed better.
Liu, Yecai; Posey, Drew L; Cetron, Martin S; Painter, John A
2015-03-17
Before 2007, immigrants and refugees bound for the United States were screened for tuberculosis (TB) by a smear-based algorithm that could not diagnose smear-negative/culture-positive TB. In 2007, the Centers for Disease Control and Prevention implemented a culture-based algorithm. To evaluate the effect of the culture-based algorithm on preventing the importation of TB to the United States by immigrants and refugees from foreign countries. Population-based, cross-sectional study. Panel physician sites for overseas medical examination. Immigrants and refugees with TB. Comparison of the increase of smear-negative/culture-positive TB cases diagnosed overseas among immigrants and refugees by the culture-based algorithm with the decline of reported cases among foreign-born persons within 1 year after arrival in the United States from 2007 to 2012. Of the 3 212 421 arrivals of immigrants and refugees from 2007 to 2012, a total of 1 650 961 (51.4%) were screened by the smear-based algorithm and 1 561 460 (48.6%) were screened by the culture-based algorithm. Among the 4032 TB cases diagnosed by the culture-based algorithm, 2195 (54.4%) were smear-negative/culture-positive. Before implementation (2002 to 2006), the annual number of reported cases among foreign-born persons within 1 year after arrival was relatively constant (range, 1424 to 1626 cases; mean, 1504 cases) but decreased from 1511 to 940 cases during implementation (2007 to 2012). During the same period, the annual number of smear-negative/culture-positive TB cases diagnosed overseas among immigrants and refugees bound for the United States by the culture-based algorithm increased from 4 to 629. This analysis did not control for the decline in new arrivals of nonimmigrant visitors to the United States and the decrease of incidence of TB in their countries of origin. Implementation of the culture-based algorithm may have substantially reduced the incidence of TB among newly arrived, foreign-born persons in the United States. None.
Development and Validation of an Algorithm to Identify Planned Readmissions From Claims Data.
Horwitz, Leora I; Grady, Jacqueline N; Cohen, Dorothy B; Lin, Zhenqiu; Volpe, Mark; Ngo, Chi K; Masica, Andrew L; Long, Theodore; Wang, Jessica; Keenan, Megan; Montague, Julia; Suter, Lisa G; Ross, Joseph S; Drye, Elizabeth E; Krumholz, Harlan M; Bernheim, Susannah M
2015-10-01
It is desirable not to include planned readmissions in readmission measures because they represent deliberate, scheduled care. To develop an algorithm to identify planned readmissions, describe its performance characteristics, and identify improvements. Consensus-driven algorithm development and chart review validation study at 7 acute-care hospitals in 2 health systems. For development, all discharges qualifying for the publicly reported hospital-wide readmission measure. For validation, all qualifying same-hospital readmissions that were characterized by the algorithm as planned, and a random sampling of same-hospital readmissions that were characterized as unplanned. We calculated weighted sensitivity and specificity, and positive and negative predictive values of the algorithm (version 2.1), compared to gold standard chart review. In consultation with 27 experts, we developed an algorithm that characterizes 7.8% of readmissions as planned. For validation we reviewed 634 readmissions. The weighted sensitivity of the algorithm was 45.1% overall, 50.9% in large teaching centers and 40.2% in smaller community hospitals. The weighted specificity was 95.9%, positive predictive value was 51.6%, and negative predictive value was 94.7%. We identified 4 minor changes to improve algorithm performance. The revised algorithm had a weighted sensitivity 49.8% (57.1% at large hospitals), weighted specificity 96.5%, positive predictive value 58.7%, and negative predictive value 94.5%. Positive predictive value was poor for the 2 most common potentially planned procedures: diagnostic cardiac catheterization (25%) and procedures involving cardiac devices (33%). An administrative claims-based algorithm to identify planned readmissions is feasible and can facilitate public reporting of primarily unplanned readmissions. © 2015 Society of Hospital Medicine.
A Comparative Study of Optimization Algorithms for Engineering Synthesis.
1983-03-01
the ADS program demonstrates the flexibility a design engineer would have in selecting an optimization algorithm best suited to solve a particular...demonstrates the flexibility a design engineer would have in selecting an optimization algorithm best suited to solve a particular problem. 4 TABLE OF...algorithm to suit a particular problem. The ADS library of design optimization algorithms was . developed by Vanderplaats in response to the first
Energy design for protein-protein interactions
Ravikant, D. V. S.; Elber, Ron
2011-01-01
Proteins bind to other proteins efficiently and specifically to carry on many cell functions such as signaling, activation, transport, enzymatic reactions, and more. To determine the geometry and strength of binding of a protein pair, an energy function is required. An algorithm to design an optimal energy function, based on empirical data of protein complexes, is proposed and applied. Emphasis is made on negative design in which incorrect geometries are presented to the algorithm that learns to avoid them. For the docking problem the search for plausible geometries can be performed exhaustively. The possible geometries of the complex are generated on a grid with the help of a fast Fourier transform algorithm. A novel formulation of negative design makes it possible to investigate iteratively hundreds of millions of negative examples while monotonically improving the quality of the potential. Experimental structures for 640 protein complexes are used to generate positive and negative examples for learning parameters. The algorithm designed in this work finds the correct binding structure as the lowest energy minimum in 318 cases of the 640 examples. Further benchmarks on independent sets confirm the significant capacity of the scoring function to recognize correct modes of interactions. PMID:21842951
NASA Astrophysics Data System (ADS)
Agjee, Na'eem Hoosen; Ismail, Riyad; Mutanga, Onisimo
2016-10-01
Water hyacinth plants (Eichhornia crassipes) are threatening freshwater ecosystems throughout Africa. The Neochetina spp. weevils are seen as an effective solution that can combat the proliferation of the invasive alien plant. We aimed to determine if multitemporal hyperspectral data could be utilized to detect the efficacy of the biocontrol agent. The random forest (RF) algorithm was used to classify variable infestation levels for 6 weeks using: (1) all the hyperspectral bands, (2) bands selected by the recursive feature elimination (RFE) algorithm, and (3) bands selected by the Boruta algorithm. Results showed that the RF model using all the bands successfully produced low-classification errors (12.50% to 32.29%) for all 6 weeks. However, the RF model using Boruta selected bands produced lower classification errors (8.33% to 15.62%) than the RF model using all the bands or bands selected by the RFE algorithm (11.25% to 21.25%) for all 6 weeks, highlighting the utility of Boruta as an all relevant band selection algorithm. All relevant bands selected by Boruta included: 352, 754, 770, 771, 775, 781, 782, 783, 786, and 789 nm. It was concluded that RF coupled with Boruta band-selection algorithm can be utilized to undertake multitemporal monitoring of variable infestation levels on water hyacinth plants.
Sale, Mark; Sherer, Eric A
2015-01-01
The current algorithm for selecting a population pharmacokinetic/pharmacodynamic model is based on the well-established forward addition/backward elimination method. A central strength of this approach is the opportunity for a modeller to continuously examine the data and postulate new hypotheses to explain observed biases. This algorithm has served the modelling community well, but the model selection process has essentially remained unchanged for the last 30 years. During this time, more robust approaches to model selection have been made feasible by new technology and dramatic increases in computation speed. We review these methods, with emphasis on genetic algorithm approaches and discuss the role these methods may play in population pharmacokinetic/pharmacodynamic model selection. PMID:23772792
Developing operation algorithms for vision subsystems in autonomous mobile robots
NASA Astrophysics Data System (ADS)
Shikhman, M. V.; Shidlovskiy, S. V.
2018-05-01
The paper analyzes algorithms for selecting keypoints on the image for the subsequent automatic detection of people and obstacles. The algorithm is based on the histogram of oriented gradients and the support vector method. The combination of these methods allows successful selection of dynamic and static objects. The algorithm can be applied in various autonomous mobile robots.
Solving TSP problem with improved genetic algorithm
NASA Astrophysics Data System (ADS)
Fu, Chunhua; Zhang, Lijun; Wang, Xiaojing; Qiao, Liying
2018-05-01
The TSP is a typical NP problem. The optimization of vehicle routing problem (VRP) and city pipeline optimization can use TSP to solve; therefore it is very important to the optimization for solving TSP problem. The genetic algorithm (GA) is one of ideal methods in solving it. The standard genetic algorithm has some limitations. Improving the selection operator of genetic algorithm, and importing elite retention strategy can ensure the select operation of quality, In mutation operation, using the adaptive algorithm selection can improve the quality of search results and variation, after the chromosome evolved one-way evolution reverse operation is added which can make the offspring inherit gene of parental quality improvement opportunities, and improve the ability of searching the optimal solution algorithm.
A Feature Selection Algorithm to Compute Gene Centric Methylation from Probe Level Methylation Data.
Baur, Brittany; Bozdag, Serdar
2016-01-01
DNA methylation is an important epigenetic event that effects gene expression during development and various diseases such as cancer. Understanding the mechanism of action of DNA methylation is important for downstream analysis. In the Illumina Infinium HumanMethylation 450K array, there are tens of probes associated with each gene. Given methylation intensities of all these probes, it is necessary to compute which of these probes are most representative of the gene centric methylation level. In this study, we developed a feature selection algorithm based on sequential forward selection that utilized different classification methods to compute gene centric DNA methylation using probe level DNA methylation data. We compared our algorithm to other feature selection algorithms such as support vector machines with recursive feature elimination, genetic algorithms and ReliefF. We evaluated all methods based on the predictive power of selected probes on their mRNA expression levels and found that a K-Nearest Neighbors classification using the sequential forward selection algorithm performed better than other algorithms based on all metrics. We also observed that transcriptional activities of certain genes were more sensitive to DNA methylation changes than transcriptional activities of other genes. Our algorithm was able to predict the expression of those genes with high accuracy using only DNA methylation data. Our results also showed that those DNA methylation-sensitive genes were enriched in Gene Ontology terms related to the regulation of various biological processes.
Jiang, Junfeng; Wang, Shaohua; Liu, Tiegen; Liu, Kun; Yin, Jinde; Meng, Xiange; Zhang, Yimo; Wang, Shuang; Qin, Zunqi; Wu, Fan; Li, Dingjie
2012-07-30
A demodulation algorithm based on absolute phase recovery of a selected monochromatic frequency is proposed for optical fiber Fabry-Perot pressure sensing system. The algorithm uses Fourier transform to get the relative phase and intercept of the unwrapped phase-frequency linear fit curve to identify its interference-order, which are then used to recover the absolute phase. A simplified mathematical model of the polarized low-coherence interference fringes was established to illustrate the principle of the proposed algorithm. Phase unwrapping and the selection of monochromatic frequency were discussed in detail. Pressure measurement experiment was carried out to verify the effectiveness of the proposed algorithm. Results showed that the demodulation precision by our algorithm could reach up to 0.15kPa, which has been improved by 13 times comparing with phase slope based algorithm.
NASA Astrophysics Data System (ADS)
Mahalakshmi; Murugesan, R.
2018-04-01
This paper regards with the minimization of total cost of Greenhouse Gas (GHG) efficiency in Automated Storage and Retrieval System (AS/RS). A mathematical model is constructed based on tax cost, penalty cost and discount cost of GHG emission of AS/RS. A two stage algorithm namely positive selection based clonal selection principle (PSBCSP) is used to find the optimal solution of the constructed model. In the first stage positive selection principle is used to reduce the search space of the optimal solution by fixing a threshold value. In the later stage clonal selection principle is used to generate best solutions. The obtained results are compared with other existing algorithms in the literature, which shows that the proposed algorithm yields a better result compared to others.
Mallick, Himel; Tiwari, Hemant K.
2016-01-01
Count data are increasingly ubiquitous in genetic association studies, where it is possible to observe excess zero counts as compared to what is expected based on standard assumptions. For instance, in rheumatology, data are usually collected in multiple joints within a person or multiple sub-regions of a joint, and it is not uncommon that the phenotypes contain enormous number of zeroes due to the presence of excessive zero counts in majority of patients. Most existing statistical methods assume that the count phenotypes follow one of these four distributions with appropriate dispersion-handling mechanisms: Poisson, Zero-inflated Poisson (ZIP), Negative Binomial, and Zero-inflated Negative Binomial (ZINB). However, little is known about their implications in genetic association studies. Also, there is a relative paucity of literature on their usefulness with respect to model misspecification and variable selection. In this article, we have investigated the performance of several state-of-the-art approaches for handling zero-inflated count data along with a novel penalized regression approach with an adaptive LASSO penalty, by simulating data under a variety of disease models and linkage disequilibrium patterns. By taking into account data-adaptive weights in the estimation procedure, the proposed method provides greater flexibility in multi-SNP modeling of zero-inflated count phenotypes. A fast coordinate descent algorithm nested within an EM (expectation-maximization) algorithm is implemented for estimating the model parameters and conducting variable selection simultaneously. Results show that the proposed method has optimal performance in the presence of multicollinearity, as measured by both prediction accuracy and empirical power, which is especially apparent as the sample size increases. Moreover, the Type I error rates become more or less uncontrollable for the competing methods when a model is misspecified, a phenomenon routinely encountered in practice. PMID:27066062
Mallick, Himel; Tiwari, Hemant K
2016-01-01
Count data are increasingly ubiquitous in genetic association studies, where it is possible to observe excess zero counts as compared to what is expected based on standard assumptions. For instance, in rheumatology, data are usually collected in multiple joints within a person or multiple sub-regions of a joint, and it is not uncommon that the phenotypes contain enormous number of zeroes due to the presence of excessive zero counts in majority of patients. Most existing statistical methods assume that the count phenotypes follow one of these four distributions with appropriate dispersion-handling mechanisms: Poisson, Zero-inflated Poisson (ZIP), Negative Binomial, and Zero-inflated Negative Binomial (ZINB). However, little is known about their implications in genetic association studies. Also, there is a relative paucity of literature on their usefulness with respect to model misspecification and variable selection. In this article, we have investigated the performance of several state-of-the-art approaches for handling zero-inflated count data along with a novel penalized regression approach with an adaptive LASSO penalty, by simulating data under a variety of disease models and linkage disequilibrium patterns. By taking into account data-adaptive weights in the estimation procedure, the proposed method provides greater flexibility in multi-SNP modeling of zero-inflated count phenotypes. A fast coordinate descent algorithm nested within an EM (expectation-maximization) algorithm is implemented for estimating the model parameters and conducting variable selection simultaneously. Results show that the proposed method has optimal performance in the presence of multicollinearity, as measured by both prediction accuracy and empirical power, which is especially apparent as the sample size increases. Moreover, the Type I error rates become more or less uncontrollable for the competing methods when a model is misspecified, a phenomenon routinely encountered in practice.
NASA Astrophysics Data System (ADS)
Tan, Maxine; Leader, Joseph K.; Liu, Hong; Zheng, Bin
2015-03-01
We recently investigated a new mammographic image feature based risk factor to predict near-term breast cancer risk after a woman has a negative mammographic screening. We hypothesized that unlike the conventional epidemiology-based long-term (or lifetime) risk factors, the mammographic image feature based risk factor value will increase as the time lag between the negative and positive mammography screening decreases. The purpose of this study is to test this hypothesis. From a large and diverse full-field digital mammography (FFDM) image database with 1278 cases, we collected all available sequential FFDM examinations for each case including the "current" and 1 to 3 most recently "prior" examinations. All "prior" examinations were interpreted negative, and "current" ones were either malignant or recalled negative/benign. We computed 92 global mammographic texture and density based features, and included three clinical risk factors (woman's age, family history and subjective breast density BIRADS ratings). On this initial feature set, we applied a fast and accurate Sequential Forward Floating Selection (SFFS) feature selection algorithm to reduce feature dimensionality. The features computed on both mammographic views were individually/ separately trained using two artificial neural network (ANN) classifiers. The classification scores of the two ANNs were then merged with a sequential ANN. The results show that the maximum adjusted odds ratios were 5.59, 7.98, and 15.77 for using the 3rd, 2nd, and 1st "prior" FFDM examinations, respectively, which demonstrates a higher association of mammographic image feature change and an increasing risk trend of developing breast cancer in the near-term after a negative screening.
V2.1.4 L2AS Detailed Release Description September 27, 2001
Atmospheric Science Data Center
2013-03-14
... 27, 2001 Algorithm Changes Change method of selecting radiance pixels to use in aerosol retrieval over ... het. surface retrieval algorithm over areas of 100% dark water. Modify algorithm for selecting a default aerosol model to use in ...
NASA Astrophysics Data System (ADS)
de Siqueira e Oliveira, Fernanda S.; Giana, Hector E.; Silveira, Landulfo, Jr.
2012-03-01
It has been proposed a method based on Raman spectroscopy for identification of different microorganisms involved in bacterial urinary tract infections. Spectra were collected from different bacterial colonies (Gram negative: E. coli, K. pneumoniae, P. mirabilis, P. aeruginosa, E. cloacae and Gram positive: S. aureus and Enterococcus sp.), grown in culture medium (Agar), using a Raman spectrometer with a fiber Raman probe (830 nm). Colonies were scraped from Agar surface placed in an aluminum foil for Raman measurements. After pre-processing, spectra were submitted to a Principal Component Analysis and Mahalanobis distance (PCA/MD) discrimination algorithm. It has been found that the mean Raman spectra of different bacterial species show similar bands, being the S. aureus well characterized by strong bands related to carotenoids. PCA/MD could discriminate Gram positive bacteria with sensitivity and specificity of 100% and Gram negative bacteria with good sensitivity and high specificity.
Adaptive object tracking via both positive and negative models matching
NASA Astrophysics Data System (ADS)
Li, Shaomei; Gao, Chao; Wang, Yawen
2015-03-01
To improve tracking drift which often occurs in adaptive tracking, an algorithm based on the fusion of tracking and detection is proposed in this paper. Firstly, object tracking is posed as abinary classification problem and is modeled by partial least squares (PLS) analysis. Secondly, tracking object frame by frame via particle filtering. Thirdly, validating the tracking reliability based on both positive and negative models matching. Finally, relocating the object based on SIFT features matching and voting when drift occurs. Object appearance model is updated at the same time. The algorithm can not only sense tracking drift but also relocate the object whenever needed. Experimental results demonstrate that this algorithm outperforms state-of-the-art algorithms on many challenging sequences.
Optimal Parameter Design of Coarse Alignment for Fiber Optic Gyro Inertial Navigation System.
Lu, Baofeng; Wang, Qiuying; Yu, Chunmei; Gao, Wei
2015-06-25
Two different coarse alignment algorithms for Fiber Optic Gyro (FOG) Inertial Navigation System (INS) based on inertial reference frame are discussed in this paper. Both of them are based on gravity vector integration, therefore, the performance of these algorithms is determined by integration time. In previous works, integration time is selected by experience. In order to give a criterion for the selection process, and make the selection of the integration time more accurate, optimal parameter design of these algorithms for FOG INS is performed in this paper. The design process is accomplished based on the analysis of the error characteristics of these two coarse alignment algorithms. Moreover, this analysis and optimal parameter design allow us to make an adequate selection of the most accurate algorithm for FOG INS according to the actual operational conditions. The analysis and simulation results show that the parameter provided by this work is the optimal value, and indicate that in different operational conditions, the coarse alignment algorithms adopted for FOG INS are different in order to achieve better performance. Lastly, the experiment results validate the effectiveness of the proposed algorithm.
Log-linear model based behavior selection method for artificial fish swarm algorithm.
Huang, Zhehuang; Chen, Yidong
2015-01-01
Artificial fish swarm algorithm (AFSA) is a population based optimization technique inspired by social behavior of fishes. In past several years, AFSA has been successfully applied in many research and application areas. The behavior of fishes has a crucial impact on the performance of AFSA, such as global exploration ability and convergence speed. How to construct and select behaviors of fishes are an important task. To solve these problems, an improved artificial fish swarm algorithm based on log-linear model is proposed and implemented in this paper. There are three main works. Firstly, we proposed a new behavior selection algorithm based on log-linear model which can enhance decision making ability of behavior selection. Secondly, adaptive movement behavior based on adaptive weight is presented, which can dynamically adjust according to the diversity of fishes. Finally, some new behaviors are defined and introduced into artificial fish swarm algorithm at the first time to improve global optimization capability. The experiments on high dimensional function optimization showed that the improved algorithm has more powerful global exploration ability and reasonable convergence speed compared with the standard artificial fish swarm algorithm.
Electroweak Sudakov Corrections to New Physics Searches at the LHC
NASA Astrophysics Data System (ADS)
Chiesa, Mauro; Montagna, Guido; Barzè, Luca; Moretti, Mauro; Nicrosini, Oreste; Piccinini, Fulvio; Tramontano, Francesco
2013-09-01
We compute the one-loop electroweak Sudakov corrections to the production process Z(νν¯)+n jets, with n=1, 2, 3, in pp collisions at the LHC. It represents the main irreducible background to new physics searches at the energy frontier. The results are obtained at the leading and next-to-leading logarithmic accuracy by implementing the general algorithm of Denner and Pozzorini in the event generator for multiparton processes alpgen. For the standard selection cuts used by the ATLAS and CMS Collaborations, we show that the Sudakov corrections to the relevant observables can grow up to -40% at s=14TeV. We also include the contribution due to undetected real radiation of massive gauge bosons, to show to what extent the partial cancellation with the large negative virtual corrections takes place in realistic event selections.
NASA Astrophysics Data System (ADS)
Salamatova, T.; Zhukov, V.
2017-02-01
The paper presents the application of the artificial immune systems apparatus as a heuristic method of network intrusion detection for algorithmic provision of intrusion detection systems. The coevolutionary immune algorithm of artificial immune systems with clonal selection was elaborated. In testing different datasets the empirical results of evaluation of the algorithm effectiveness were achieved. To identify the degree of efficiency the algorithm was compared with analogs. The fundamental rules based of solutions generated by this algorithm are described in the article.
Asymmetric bagging and feature selection for activities prediction of drug molecules.
Li, Guo-Zheng; Meng, Hao-Hua; Lu, Wen-Cong; Yang, Jack Y; Yang, Mary Qu
2008-05-28
Activities of drug molecules can be predicted by QSAR (quantitative structure activity relationship) models, which overcomes the disadvantages of high cost and long cycle by employing the traditional experimental method. With the fact that the number of drug molecules with positive activity is rather fewer than that of negatives, it is important to predict molecular activities considering such an unbalanced situation. Here, asymmetric bagging and feature selection are introduced into the problem and asymmetric bagging of support vector machines (asBagging) is proposed on predicting drug activities to treat the unbalanced problem. At the same time, the features extracted from the structures of drug molecules affect prediction accuracy of QSAR models. Therefore, a novel algorithm named PRIFEAB is proposed, which applies an embedded feature selection method to remove redundant and irrelevant features for asBagging. Numerical experimental results on a data set of molecular activities show that asBagging improve the AUC and sensitivity values of molecular activities and PRIFEAB with feature selection further helps to improve the prediction ability. Asymmetric bagging can help to improve prediction accuracy of activities of drug molecules, which can be furthermore improved by performing feature selection to select relevant features from the drug molecules data sets.
A hybrid intelligent algorithm for portfolio selection problem with fuzzy returns
NASA Astrophysics Data System (ADS)
Li, Xiang; Zhang, Yang; Wong, Hau-San; Qin, Zhongfeng
2009-11-01
Portfolio selection theory with fuzzy returns has been well developed and widely applied. Within the framework of credibility theory, several fuzzy portfolio selection models have been proposed such as mean-variance model, entropy optimization model, chance constrained programming model and so on. In order to solve these nonlinear optimization models, a hybrid intelligent algorithm is designed by integrating simulated annealing algorithm, neural network and fuzzy simulation techniques, where the neural network is used to approximate the expected value and variance for fuzzy returns and the fuzzy simulation is used to generate the training data for neural network. Since these models are used to be solved by genetic algorithm, some comparisons between the hybrid intelligent algorithm and genetic algorithm are given in terms of numerical examples, which imply that the hybrid intelligent algorithm is robust and more effective. In particular, it reduces the running time significantly for large size problems.
NASA Astrophysics Data System (ADS)
Shams Esfand Abadi, Mohammad; AbbasZadeh Arani, Seyed Ali Asghar
2011-12-01
This paper extends the recently introduced variable step-size (VSS) approach to the family of adaptive filter algorithms. This method uses prior knowledge of the channel impulse response statistic. Accordingly, optimal step-size vector is obtained by minimizing the mean-square deviation (MSD). The presented algorithms are the VSS affine projection algorithm (VSS-APA), the VSS selective partial update NLMS (VSS-SPU-NLMS), the VSS-SPU-APA, and the VSS selective regressor APA (VSS-SR-APA). In VSS-SPU adaptive algorithms the filter coefficients are partially updated which reduce the computational complexity. In VSS-SR-APA, the optimal selection of input regressors is performed during the adaptation. The presented algorithms have good convergence speed, low steady state mean square error (MSE), and low computational complexity features. We demonstrate the good performance of the proposed algorithms through several simulations in system identification scenario.
Andersson, M; Kolodziej, B; Andersson, R E
2017-10-01
The role of imaging in the diagnosis of appendicitis is controversial. This prospective interventional study and nested randomized trial analysed the impact of implementing a risk stratification algorithm based on the Appendicitis Inflammatory Response (AIR) score, and compared routine imaging with selective imaging after clinical reassessment. Patients presenting with suspicion of appendicitis between September 2009 and January 2012 from age 10 years were included at 21 emergency surgical centres and from age 5 years at three university paediatric centres. Registration of clinical characteristics, treatments and outcomes started during the baseline period. The AIR score-based algorithm was implemented during the intervention period. Intermediate-risk patients were randomized to routine imaging or selective imaging after clinical reassessment. The baseline period included 1152 patients, and the intervention period 2639, of whom 1068 intermediate-risk patients were randomized. In low-risk patients, use of the AIR score-based algorithm resulted in less imaging (19·2 versus 34·5 per cent; P < 0·001), fewer admissions (29·5 versus 42·8 per cent; P < 0·001), and fewer negative explorations (1·6 versus 3·2 per cent; P = 0·030) and operations for non-perforated appendicitis (6·8 versus 9·7 per cent; P = 0·034). Intermediate-risk patients randomized to the imaging and observation groups had the same proportion of negative appendicectomies (6·4 versus 6·7 per cent respectively; P = 0·884), number of admissions, number of perforations and length of hospital stay, but routine imaging was associated with an increased proportion of patients treated for appendicitis (53·4 versus 46·3 per cent; P = 0·020). AIR score-based risk classification can safely reduce the use of diagnostic imaging and hospital admissions in patients with suspicion of appendicitis. Registration number: NCT00971438 ( http://www.clinicaltrials.gov). © 2017 BJS Society Ltd Published by John Wiley & Sons Ltd.
Zhang, Xiaoheng; Wang, Lirui; Cao, Yao; Wang, Pin; Zhang, Cheng; Yang, Liuyang; Li, Yongming; Zhang, Yanling; Cheng, Oumei
2018-02-01
Diagnosis of Parkinson's disease (PD) based on speech data has been proved to be an effective way in recent years. However, current researches just care about the feature extraction and classifier design, and do not consider the instance selection. Former research by authors showed that the instance selection can lead to improvement on classification accuracy. However, no attention is paid on the relationship between speech sample and feature until now. Therefore, a new diagnosis algorithm of PD is proposed in this paper by simultaneously selecting speech sample and feature based on relevant feature weighting algorithm and multiple kernel method, so as to find their synergy effects, thereby improving classification accuracy. Experimental results showed that this proposed algorithm obtained apparent improvement on classification accuracy. It can obtain mean classification accuracy of 82.5%, which was 30.5% higher than the relevant algorithm. Besides, the proposed algorithm detected the synergy effects of speech sample and feature, which is valuable for speech marker extraction.
Majorization Minimization by Coordinate Descent for Concave Penalized Generalized Linear Models
Jiang, Dingfeng; Huang, Jian
2013-01-01
Recent studies have demonstrated theoretical attractiveness of a class of concave penalties in variable selection, including the smoothly clipped absolute deviation and minimax concave penalties. The computation of the concave penalized solutions in high-dimensional models, however, is a difficult task. We propose a majorization minimization by coordinate descent (MMCD) algorithm for computing the concave penalized solutions in generalized linear models. In contrast to the existing algorithms that use local quadratic or local linear approximation to the penalty function, the MMCD seeks to majorize the negative log-likelihood by a quadratic loss, but does not use any approximation to the penalty. This strategy makes it possible to avoid the computation of a scaling factor in each update of the solutions, which improves the efficiency of coordinate descent. Under certain regularity conditions, we establish theoretical convergence property of the MMCD. We implement this algorithm for a penalized logistic regression model using the SCAD and MCP penalties. Simulation studies and a data example demonstrate that the MMCD works sufficiently fast for the penalized logistic regression in high-dimensional settings where the number of covariates is much larger than the sample size. PMID:25309048
Ganière, Vincent; Domenichini, Giulia; Niculescu, Viviana; Cassagneau, Romain; Defaye, Pascal; Burri, Haran
2013-03-01
The prerequisite for cardiac resynchronization therapy (CRT) is ventricular capture, which may be verified by analysis of the surface electrocardiogram (ECG). Few algorithms exist to diagnose loss of ventricular capture. Electrocardiograms from 126 CRT patients were analysed during biventricular (BV), right ventricular (RV), and left ventricular (LV) pacing. An algorithm evaluating QRS narrowing in the limb leads and increasing negativity in lead I to diagnose changes in ventricular capture was devised, prospectively validated, and compared with two existing algorithms. Performance of the algorithm according to ventricular lead position was also assessed. Our algorithm had an accuracy of 88% to correctly identify the changes in ventricular capture (either loss or gain of RV or LV capture). The algorithm had a sensitivity of 94% and a specificity of 96% with an accuracy of 96% for identifying loss of LV capture (the most clinically relevant change), and compared favourably with the existing algorithms. Performance of the algorithms was not significantly affected by RV or LV lead position. A simple two-step algorithm evaluating QRS width in the limb leads and changes in negativity in lead I can accurately diagnose the lead responsible for intermittent loss of ventricular capture in CRT. This simple tool may be of particular use outside the setting of specialized device clinics.
Software platform for managing the classification of error- related potentials of observers
NASA Astrophysics Data System (ADS)
Asvestas, P.; Ventouras, E.-C.; Kostopoulos, S.; Sidiropoulos, K.; Korfiatis, V.; Korda, A.; Uzunolglu, A.; Karanasiou, I.; Kalatzis, I.; Matsopoulos, G.
2015-09-01
Human learning is partly based on observation. Electroencephalographic recordings of subjects who perform acts (actors) or observe actors (observers), contain a negative waveform in the Evoked Potentials (EPs) of the actors that commit errors and of observers who observe the error-committing actors. This waveform is called the Error-Related Negativity (ERN). Its detection has applications in the context of Brain-Computer Interfaces. The present work describes a software system developed for managing EPs of observers, with the aim of classifying them into observations of either correct or incorrect actions. It consists of an integrated platform for the storage, management, processing and classification of EPs recorded during error-observation experiments. The system was developed using C# and the following development tools and frameworks: MySQL, .NET Framework, Entity Framework and Emgu CV, for interfacing with the machine learning library of OpenCV. Up to six features can be computed per EP recording per electrode. The user can select among various feature selection algorithms and then proceed to train one of three types of classifiers: Artificial Neural Networks, Support Vector Machines, k-nearest neighbour. Next the classifier can be used for classifying any EP curve that has been inputted to the database.
Parametrization and calibration of a quasi-analytical algorithm for tropical eutrophic waters
NASA Astrophysics Data System (ADS)
Watanabe, Fernanda; Mishra, Deepak R.; Astuti, Ike; Rodrigues, Thanan; Alcântara, Enner; Imai, Nilton N.; Barbosa, Cláudio
2016-11-01
Quasi-analytical algorithm (QAA) was designed to derive the inherent optical properties (IOPs) of water bodies from above-surface remote sensing reflectance (Rrs). Several variants of QAA have been developed for environments with different bio-optical characteristics. However, most variants of QAA suffer from moderate to high negative IOP prediction when applied to tropical eutrophic waters. This research is aimed at parametrizing a QAA for tropical eutrophic water dominated by cyanobacteria. The alterations proposed in the algorithm yielded accurate absorption coefficients and chlorophyll-a (Chl-a) concentration. The main changes accomplished were the selection of wavelengths representative of the optically relevant constituents (ORCs) and calibration of values directly associated with the pigments and detritus plus colored dissolved organic material (CDM) absorption coefficients. The re-parametrized QAA eliminated the retrieval of negative values, commonly identified in other variants of QAA. The calibrated model generated a normalized root mean square error (NRMSE) of 21.88% and a mean absolute percentage error (MAPE) of 28.27% for at(λ), where the largest errors were found at 412 nm and 620 nm. Estimated NRMSE for aCDM(λ) was 18.86% with a MAPE of 31.17%. A NRMSE of 22.94% and a MAPE of 60.08% were obtained for aφ(λ). Estimated aφ(665) and aφ(709) was used to predict Chl-a concentration. aφ(665) derived from QAA for Barra Bonita Hydroelectric Reservoir (QAA_BBHR) was able to predict Chl-a accurately, with a NRMSE of 11.3% and MAPE of 38.5%. The performance of the Chl-a model was comparable to some of the most widely used empirical algorithms such as 2-band, 3-band, and the normalized difference chlorophyll index (NDCI). The new QAA was parametrized based on the band configuration of MEdium Resolution Imaging Spectrometer (MERIS), Sentinel-2A and 3A and can be readily scaled-up for spatio-temporal monitoring of IOPs in tropical waters.
Elyasigomari, V; Lee, D A; Screen, H R C; Shaheed, M H
2017-03-01
For each cancer type, only a few genes are informative. Due to the so-called 'curse of dimensionality' problem, the gene selection task remains a challenge. To overcome this problem, we propose a two-stage gene selection method called MRMR-COA-HS. In the first stage, the minimum redundancy and maximum relevance (MRMR) feature selection is used to select a subset of relevant genes. The selected genes are then fed into a wrapper setup that combines a new algorithm, COA-HS, using the support vector machine as a classifier. The method was applied to four microarray datasets, and the performance was assessed by the leave one out cross-validation method. Comparative performance assessment of the proposed method with other evolutionary algorithms suggested that the proposed algorithm significantly outperforms other methods in selecting a fewer number of genes while maintaining the highest classification accuracy. The functions of the selected genes were further investigated, and it was confirmed that the selected genes are biologically relevant to each cancer type. Copyright © 2017. Published by Elsevier Inc.
A framework for implementing biodiversity offsets: selecting sites and determining scale
Kiesecker, Joseph M.; Copeland, Holly; Pocewicz, Amy; Nibbelink, Nate; McKenney, Bruce; Dahlke, John; Holloran, Matthew J.; Stroud, Dan
2009-01-01
Biodiversity offsets provide a mechanism for maintaining or enhancing environmental values in situations where development is sought despite detrimental environmental impacts. They seek to ensure that unavoidable negative environmental impacts of development are balanced by environmental gains, with the overall aim of achieving a net neutral or positive outcome. Once the decision has been made to offset, multiple issues arise regarding how to do so in practice. A key concern is site selection. In light of the general aim to locate offsets close to the affected sites to ensure that benefits accrue in the same area, what is the appropriate spatial scale for identifying potential offset sites (e.g., local, ecoregional)? We use the Marxan site-selection algorithm to address conceptual and methodological challenges associated with identifying a set of potential offset sites and determining an appropriate spatial scale for them. To demonstrate this process, we examined the design of offsets for impacts from development on the Jonah natural gas field in Wyoming.
Multiobjective immune algorithm with nondominated neighbor-based selection.
Gong, Maoguo; Jiao, Licheng; Du, Haifeng; Bo, Liefeng
2008-01-01
Abstract Nondominated Neighbor Immune Algorithm (NNIA) is proposed for multiobjective optimization by using a novel nondominated neighbor-based selection technique, an immune inspired operator, two heuristic search operators, and elitism. The unique selection technique of NNIA only selects minority isolated nondominated individuals in the population. The selected individuals are then cloned proportionally to their crowding-distance values before heuristic search. By using the nondominated neighbor-based selection and proportional cloning, NNIA pays more attention to the less-crowded regions of the current trade-off front. We compare NNIA with NSGA-II, SPEA2, PESA-II, and MISA in solving five DTLZ problems, five ZDT problems, and three low-dimensional problems. The statistical analysis based on three performance metrics including the coverage of two sets, the convergence metric, and the spacing, show that the unique selection method is effective, and NNIA is an effective algorithm for solving multiobjective optimization problems. The empirical study on NNIA's scalability with respect to the number of objectives shows that the new algorithm scales well along the number of objectives.
Mao, Yong; Zhou, Xiao-Bo; Pi, Dao-Ying; Sun, You-Xian; Wong, Stephen T C
2005-10-01
In microarray-based cancer classification, gene selection is an important issue owing to the large number of variables and small number of samples as well as its non-linearity. It is difficult to get satisfying results by using conventional linear statistical methods. Recursive feature elimination based on support vector machine (SVM RFE) is an effective algorithm for gene selection and cancer classification, which are integrated into a consistent framework. In this paper, we propose a new method to select parameters of the aforementioned algorithm implemented with Gaussian kernel SVMs as better alternatives to the common practice of selecting the apparently best parameters by using a genetic algorithm to search for a couple of optimal parameter. Fast implementation issues for this method are also discussed for pragmatic reasons. The proposed method was tested on two representative hereditary breast cancer and acute leukaemia datasets. The experimental results indicate that the proposed method performs well in selecting genes and achieves high classification accuracies with these genes.
The Texas Medication Algorithm Project (TMAP) schizophrenia algorithms.
Miller, A L; Chiles, J A; Chiles, J K; Crismon, M L; Rush, A J; Shon, S P
1999-10-01
In the Texas Medication Algorithm Project (TMAP), detailed guidelines for medication management of schizophrenia and related disorders, bipolar disorders, and major depressive disorders have been developed and implemented. This article describes the algorithms developed for medication treatment of schizophrenia and related disorders. The guidelines recommend a sequence of medications and discuss dosing, duration, and switch-over tactics. They also specify response criteria at each stage of the algorithm for both positive and negative symptoms. The rationale and evidence for each aspect of the algorithms are presented.
Status Report on the First Round of the Development of the Advanced Encryption Standard
Nechvatal, James; Barker, Elaine; Dodson, Donna; Dworkin, Morris; Foti, James; Roback, Edward
1999-01-01
In 1997, the National Institute of Standards and Technology (NIST) initiated a process to select a symmetric-key encryption algorithm to be used to protect sensitive (unclassified) Federal information in furtherance of NIST’s statutory responsibilities. In 1998, NIST announced the acceptance of 15 candidate algorithms and requested the assistance of the cryptographic research community in analyzing the candidates. This analysis included an initial examination of the security and efficiency characteristics for each algorithm. NIST has reviewed the results of this research and selected five algorithms (MARS, RC6™, Rijndael, Serpent and Twofish) as finalists. The research results and rationale for the selection of the finalists are documented in this report. The five finalists will be the subject of further study before the selection of one or more of these algorithms for inclusion in the Advanced Encryption Standard.
NASA Technical Reports Server (NTRS)
Peck, Charles C.; Dhawan, Atam P.; Meyer, Claudia M.
1991-01-01
A genetic algorithm is used to select the inputs to a neural network function approximator. In the application considered, modeling critical parameters of the space shuttle main engine (SSME), the functional relationship between measured parameters is unknown and complex. Furthermore, the number of possible input parameters is quite large. Many approaches have been used for input selection, but they are either subjective or do not consider the complex multivariate relationships between parameters. Due to the optimization and space searching capabilities of genetic algorithms they were employed to systematize the input selection process. The results suggest that the genetic algorithm can generate parameter lists of high quality without the explicit use of problem domain knowledge. Suggestions for improving the performance of the input selection process are also provided.
Sarafijanović, Slavisa; Le Boudec, Jean-Yves
2005-09-01
In mobile ad hoc networks, nodes act both as terminals and information relays, and they participate in a common routing protocol, such as dynamic source routing (DSR). The network is vulnerable to routing misbehavior, due to faulty or malicious nodes. Misbehavior detection systems aim at removing this vulnerability. In this paper, we investigate the use of an artificial immune system (AIS) to detect node misbehavior in a mobile ad hoc network using DSR. The system is inspired by the natural immune system (IS) of vertebrates. Our goal is to build a system that, like its natural counterpart, automatically learns, and detects new misbehavior. We describe our solution for the classification task of the AIS; it employs negative selection and clonal selection, the algorithms for learning and adaptation used by the natural IS. We define how we map the natural IS concepts such as self, antigen, and antibody to a mobile ad hoc network and give the resulting algorithm for classifying nodes as misbehaving. We implemented the system in the network simulator Glomosim; we present detection results and discuss how the system parameters affect the performance of primary and secondary response. Further steps will extend the design by using an analogy to the innate system, danger signal, and memory cells.
Fleet, Jamie L; Dixon, Stephanie N; Shariff, Salimah Z; Quinn, Robert R; Nash, Danielle M; Harel, Ziv; Garg, Amit X
2013-04-05
Large, population-based administrative healthcare databases can be used to identify patients with chronic kidney disease (CKD) when serum creatinine laboratory results are unavailable. We examined the validity of algorithms that used combined hospital encounter and physician claims database codes for the detection of CKD in Ontario, Canada. We accrued 123,499 patients over the age of 65 from 2007 to 2010. All patients had a baseline serum creatinine value to estimate glomerular filtration rate (eGFR). We developed an algorithm of physician claims and hospital encounter codes to search administrative databases for the presence of CKD. We determined the sensitivity, specificity, positive and negative predictive values of this algorithm to detect our primary threshold of CKD, an eGFR <45 mL/min per 1.73 m² (15.4% of patients). We also assessed serum creatinine and eGFR values in patients with and without CKD codes (algorithm positive and negative, respectively). Our algorithm required evidence of at least one of eleven CKD codes and 7.7% of patients were algorithm positive. The sensitivity was 32.7% [95% confidence interval: (95% CI): 32.0 to 33.3%]. Sensitivity was lower in women compared to men (25.7 vs. 43.7%; p <0.001) and in the oldest age category (over 80 vs. 66 to 80; 28.4 vs. 37.6 %; p < 0.001). All specificities were over 94%. The positive and negative predictive values were 65.4% (95% CI: 64.4 to 66.3%) and 88.8% (95% CI: 88.6 to 89.0%), respectively. In algorithm positive patients, the median [interquartile range (IQR)] baseline serum creatinine value was 135 μmol/L (106 to 179 μmol/L) compared to 82 μmol/L (69 to 98 μmol/L) for algorithm negative patients. Corresponding eGFR values were 38 mL/min per 1.73 m² (26 to 51 mL/min per 1.73 m²) vs. 69 mL/min per 1.73 m² (56 to 82 mL/min per 1.73 m²), respectively. Patients with CKD as identified by our database algorithm had distinctly higher baseline serum creatinine values and lower eGFR values than those without such codes. However, because of limited sensitivity, the prevalence of CKD was underestimated.
2013-01-01
Background Large, population-based administrative healthcare databases can be used to identify patients with chronic kidney disease (CKD) when serum creatinine laboratory results are unavailable. We examined the validity of algorithms that used combined hospital encounter and physician claims database codes for the detection of CKD in Ontario, Canada. Methods We accrued 123,499 patients over the age of 65 from 2007 to 2010. All patients had a baseline serum creatinine value to estimate glomerular filtration rate (eGFR). We developed an algorithm of physician claims and hospital encounter codes to search administrative databases for the presence of CKD. We determined the sensitivity, specificity, positive and negative predictive values of this algorithm to detect our primary threshold of CKD, an eGFR <45 mL/min per 1.73 m2 (15.4% of patients). We also assessed serum creatinine and eGFR values in patients with and without CKD codes (algorithm positive and negative, respectively). Results Our algorithm required evidence of at least one of eleven CKD codes and 7.7% of patients were algorithm positive. The sensitivity was 32.7% [95% confidence interval: (95% CI): 32.0 to 33.3%]. Sensitivity was lower in women compared to men (25.7 vs. 43.7%; p <0.001) and in the oldest age category (over 80 vs. 66 to 80; 28.4 vs. 37.6 %; p < 0.001). All specificities were over 94%. The positive and negative predictive values were 65.4% (95% CI: 64.4 to 66.3%) and 88.8% (95% CI: 88.6 to 89.0%), respectively. In algorithm positive patients, the median [interquartile range (IQR)] baseline serum creatinine value was 135 μmol/L (106 to 179 μmol/L) compared to 82 μmol/L (69 to 98 μmol/L) for algorithm negative patients. Corresponding eGFR values were 38 mL/min per 1.73 m2 (26 to 51 mL/min per 1.73 m2) vs. 69 mL/min per 1.73 m2 (56 to 82 mL/min per 1.73 m2), respectively. Conclusions Patients with CKD as identified by our database algorithm had distinctly higher baseline serum creatinine values and lower eGFR values than those without such codes. However, because of limited sensitivity, the prevalence of CKD was underestimated. PMID:23560464
Selection method of terrain matching area for TERCOM algorithm
NASA Astrophysics Data System (ADS)
Zhang, Qieqie; Zhao, Long
2017-10-01
The performance of terrain aided navigation is closely related to the selection of terrain matching area. The different matching algorithms have different adaptability to terrain. This paper mainly studies the adaptability to terrain of TERCOM algorithm, analyze the relation between terrain feature and terrain characteristic parameters by qualitative and quantitative methods, and then research the relation between matching probability and terrain characteristic parameters by the Monte Carlo method. After that, we propose a selection method of terrain matching area for TERCOM algorithm, and verify the method correctness with real terrain data by simulation experiment. Experimental results show that the matching area obtained by the method in this paper has the good navigation performance and the matching probability of TERCOM algorithm is great than 90%
NASA Astrophysics Data System (ADS)
Ariyanto, P.; Syuhada; Rosid, S.; Anggono, T.; Januarti, Y.
2018-03-01
In this study, we applied receiver functions analysis to determine the crustal thickness, the ratio of Vp/Vs and the S wave velocity in the southern part of the Central Java. We selected tele-seismic data with magnitude more than 6 (M>6) and epicenter distance 30°-90° recorded from 3 broadband stations: UGM, YOGI, and WOJI station, as part of Indonesia-Geophone Network (IA-GE). Inversions were performed using nonlinear Neighborhood Algorithm (NA). We observed Ps phase conversion on the receiver functions corresponding to Moho depth at around 36-39 km. We also observed strong negative phase arrivals at around 10-12 s which might be associated with Indo-Australian subducting slab underneath the stations. The inversion results show the presence of low velocity zone with high Vp/Vs ratio (>1.78) in the middle crust around the study area which could be related to the Merapi-Lawu Anomaly (MLA).
Kim, Taegu; Hong, Jungsik; Kang, Pilsung
2017-01-01
Accurate box office forecasting models are developed by considering competition and word-of-mouth (WOM) effects in addition to screening-related information. Nationality, genre, ratings, and distributors of motion pictures running concurrently with the target motion picture are used to describe the competition, whereas the numbers of informative, positive, and negative mentions posted on social network services (SNS) are used to gauge the atmosphere spread by WOM. Among these candidate variables, only significant variables are selected by genetic algorithm (GA), based on which machine learning algorithms are trained to build forecasting models. The forecasts are combined to improve forecasting performance. Experimental results on the Korean film market show that the forecasting accuracy in early screening periods can be significantly improved by considering competition. In addition, WOM has a stronger influence on total box office forecasting. Considering both competition and WOM improves forecasting performance to a larger extent than when only one of them is considered.
Kim, Taegu; Hong, Jungsik
2017-01-01
Accurate box office forecasting models are developed by considering competition and word-of-mouth (WOM) effects in addition to screening-related information. Nationality, genre, ratings, and distributors of motion pictures running concurrently with the target motion picture are used to describe the competition, whereas the numbers of informative, positive, and negative mentions posted on social network services (SNS) are used to gauge the atmosphere spread by WOM. Among these candidate variables, only significant variables are selected by genetic algorithm (GA), based on which machine learning algorithms are trained to build forecasting models. The forecasts are combined to improve forecasting performance. Experimental results on the Korean film market show that the forecasting accuracy in early screening periods can be significantly improved by considering competition. In addition, WOM has a stronger influence on total box office forecasting. Considering both competition and WOM improves forecasting performance to a larger extent than when only one of them is considered. PMID:28819355
Cheung, Vincent C. K.; Devarajan, Karthik; Severini, Giacomo; Turolla, Andrea; Bonato, Paolo
2017-01-01
The non-negative matrix factorization algorithm (NMF) decomposes a data matrix into a set of non-negative basis vectors, each scaled by a coefficient. In its original formulation, the NMF assumes the data samples and dimensions to be independently distributed, making it a less-than-ideal algorithm for the analysis of time series data with temporal correlations. Here, we seek to derive an NMF that accounts for temporal dependencies in the data by explicitly incorporating a very simple temporal constraint for the coefficients into the NMF update rules. We applied the modified algorithm to 2 multi-dimensional electromyographic data sets collected from the human upper-limb to identify muscle synergies. We found that because it reduced the number of free parameters in the model, our modified NMF made it possible to use the Akaike Information Criterion to objectively identify a model order (i.e., the number of muscle synergies composing the data) that is more functionally interpretable, and closer to the numbers previously determined using ad hoc measures. PMID:26737046
Log-Linear Model Based Behavior Selection Method for Artificial Fish Swarm Algorithm
Huang, Zhehuang; Chen, Yidong
2015-01-01
Artificial fish swarm algorithm (AFSA) is a population based optimization technique inspired by social behavior of fishes. In past several years, AFSA has been successfully applied in many research and application areas. The behavior of fishes has a crucial impact on the performance of AFSA, such as global exploration ability and convergence speed. How to construct and select behaviors of fishes are an important task. To solve these problems, an improved artificial fish swarm algorithm based on log-linear model is proposed and implemented in this paper. There are three main works. Firstly, we proposed a new behavior selection algorithm based on log-linear model which can enhance decision making ability of behavior selection. Secondly, adaptive movement behavior based on adaptive weight is presented, which can dynamically adjust according to the diversity of fishes. Finally, some new behaviors are defined and introduced into artificial fish swarm algorithm at the first time to improve global optimization capability. The experiments on high dimensional function optimization showed that the improved algorithm has more powerful global exploration ability and reasonable convergence speed compared with the standard artificial fish swarm algorithm. PMID:25691895
Doss, Jayanth; Mo, Huan; Carroll, Robert J.; Crofford, Leslie J.; Denny, Joshua C.
2016-01-01
Objective The differences between seronegative and seropositive rheumatoid arthritis (RA) have not been widely reported. We performed electronic health record (EHR)-based phenome-wide association studies (PheWAS) to identify disease associations in seropositive and seronegative RA. Methods A validated algorithm identified RA subjects from the de-identified EHR. Serotypes were determined by values of rheumatoid factor (RF) and anti-cyclic citrullinated peptide antibody (ACPA). We tested EHR-derived phenotypes using PheWAS comparing seropositive RA against seronegative RA, yielding disease associations. PheWAS was also performed on RF-positive versus RF-negative subjects and ACPA-positive versus ACPA-negative subjects. Following PheWAS, select phenotypes were then manually reviewed and fibromyalgia was specifically evaluated using a validated algorithm. Results There were 2199 individuals identified with RA and either RF or ACPA testing. Of these, 1382 (63%) were seropositive. Seronegative RA was associated with “Myalgia and Myositis” (odds ratio [OR] 2.1, P=3.7×10−10) and back pain. A manual record review showed 80% of Myalgia and Myositis codes were used for fibromyalgia, and follow-up with a specific EHR algorithm for fibromyalgia confirmed that seronegative RA was associated with fibromyalgia (OR=1.8, P=4.0×10−6). Seropositive RA was associated with Chronic Airway Obstruction (OR=2.2, P=1.4×10−4) and tobacco use (OR=2.2, P=7.0×10−4). Conclusion This PheWAS in RA patients identifies a strong association between seronegativity and fibromyalgia. It also affirms relationships between seropositivity with chronic airway obstruction and seropositivity with tobacco use. These findings demonstrate the utility of the PheWAS approach to discover novel phenotype associations within different subgroups of a disease. PMID:27589350
Wang, Shuaiqun; Aorigele; Kong, Wei; Zeng, Weiming; Hong, Xiaomin
2016-01-01
Gene expression data composed of thousands of genes play an important role in classification platforms and disease diagnosis. Hence, it is vital to select a small subset of salient features over a large number of gene expression data. Lately, many researchers devote themselves to feature selection using diverse computational intelligence methods. However, in the progress of selecting informative genes, many computational methods face difficulties in selecting small subsets for cancer classification due to the huge number of genes (high dimension) compared to the small number of samples, noisy genes, and irrelevant genes. In this paper, we propose a new hybrid algorithm HICATS incorporating imperialist competition algorithm (ICA) which performs global search and tabu search (TS) that conducts fine-tuned search. In order to verify the performance of the proposed algorithm HICATS, we have tested it on 10 well-known benchmark gene expression classification datasets with dimensions varying from 2308 to 12600. The performance of our proposed method proved to be superior to other related works including the conventional version of binary optimization algorithm in terms of classification accuracy and the number of selected genes.
Aorigele; Zeng, Weiming; Hong, Xiaomin
2016-01-01
Gene expression data composed of thousands of genes play an important role in classification platforms and disease diagnosis. Hence, it is vital to select a small subset of salient features over a large number of gene expression data. Lately, many researchers devote themselves to feature selection using diverse computational intelligence methods. However, in the progress of selecting informative genes, many computational methods face difficulties in selecting small subsets for cancer classification due to the huge number of genes (high dimension) compared to the small number of samples, noisy genes, and irrelevant genes. In this paper, we propose a new hybrid algorithm HICATS incorporating imperialist competition algorithm (ICA) which performs global search and tabu search (TS) that conducts fine-tuned search. In order to verify the performance of the proposed algorithm HICATS, we have tested it on 10 well-known benchmark gene expression classification datasets with dimensions varying from 2308 to 12600. The performance of our proposed method proved to be superior to other related works including the conventional version of binary optimization algorithm in terms of classification accuracy and the number of selected genes. PMID:27579323
Xu, Hang; Su, Shi; Tang, Wuji; Wei, Meng; Wang, Tao; Wang, Dongjin; Ge, Weihong
2015-09-01
A large number of warfarin pharmacogenetics algorithms have been published. Our research was aimed to evaluate the performance of the selected pharmacogenetic algorithms in patients with surgery of heart valve replacement and heart valvuloplasty during the phase of initial and stable anticoagulation treatment. 10 pharmacogenetic algorithms were selected by searching PubMed. We compared the performance of the selected algorithms in a cohort of 193 patients during the phase of initial and stable anticoagulation therapy. Predicted dose was compared to therapeutic dose by using a predicted dose percentage that falls within 20% threshold of the actual dose (percentage within 20%) and mean absolute error (MAE). The average warfarin dose for patients was 3.05±1.23mg/day for initial treatment and 3.45±1.18mg/day for stable treatment. The percentages of the predicted dose within 20% of the therapeutic dose were 44.0±8.8% and 44.6±9.7% for the initial and stable phases, respectively. The MAEs of the selected algorithms were 0.85±0.18mg/day and 0.93±0.19mg/day, respectively. All algorithms had better performance in the ideal group than in the low dose and high dose groups. The only exception is the Wadelius et al. algorithm, which had better performance in the high dose group. The algorithms had similar performance except for the Wadelius et al. and Miao et al. algorithms, which had poor accuracy in our study cohort. The Gage et al. algorithm had better performance in both phases of initial and stable treatment. Algorithms had relatively higher accuracy in the >50years group of patients on the stable phase. Copyright © 2015 Elsevier Ltd. All rights reserved.
Enhancing Breast Cancer Recurrence Algorithms Through Selective Use of Medical Record Data.
Kroenke, Candyce H; Chubak, Jessica; Johnson, Lisa; Castillo, Adrienne; Weltzien, Erin; Caan, Bette J
2016-03-01
The utility of data-based algorithms in research has been questioned because of errors in identification of cancer recurrences. We adapted previously published breast cancer recurrence algorithms, selectively using medical record (MR) data to improve classification. We evaluated second breast cancer event (SBCE) and recurrence-specific algorithms previously published by Chubak and colleagues in 1535 women from the Life After Cancer Epidemiology (LACE) and 225 women from the Women's Health Initiative cohorts and compared classification statistics to published values. We also sought to improve classification with minimal MR examination. We selected pairs of algorithms-one with high sensitivity/high positive predictive value (PPV) and another with high specificity/high PPV-using MR information to resolve discrepancies between algorithms, properly classifying events based on review; we called this "triangulation." Finally, in LACE, we compared associations between breast cancer survival risk factors and recurrence using MR data, single Chubak algorithms, and triangulation. The SBCE algorithms performed well in identifying SBCE and recurrences. Recurrence-specific algorithms performed more poorly than published except for the high-specificity/high-PPV algorithm, which performed well. The triangulation method (sensitivity = 81.3%, specificity = 99.7%, PPV = 98.1%, NPV = 96.5%) improved recurrence classification over two single algorithms (sensitivity = 57.1%, specificity = 95.5%, PPV = 71.3%, NPV = 91.9%; and sensitivity = 74.6%, specificity = 97.3%, PPV = 84.7%, NPV = 95.1%), with 10.6% MR review. Triangulation performed well in survival risk factor analyses vs analyses using MR-identified recurrences. Use of multiple recurrence algorithms in administrative data, in combination with selective examination of MR data, may improve recurrence data quality and reduce research costs. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Shen, Chongfei; Liu, Hongtao; Xie, Xb; Luk, Keith Dk; Hu, Yong
2007-01-01
Adaptive noise canceller (ANC) has been used to improve signal to noise ratio (SNR) of somsatosensory evoked potential (SEP). In order to efficiently apply the ANC in hardware system, fixed-point algorithm based ANC can achieve fast, cost-efficient construction, and low-power consumption in FPGA design. However, it is still questionable whether the SNR improvement performance by fixed-point algorithm is as good as that by floating-point algorithm. This study is to compare the outputs of ANC by floating-point and fixed-point algorithm ANC when it was applied to SEP signals. The selection of step-size parameter (micro) was found different in fixed-point algorithm from floating-point algorithm. In this simulation study, the outputs of fixed-point ANC showed higher distortion from real SEP signals than that of floating-point ANC. However, the difference would be decreased with increasing micro value. In the optimal selection of micro, fixed-point ANC can get as good results as floating-point algorithm.
Selective epidemic vaccination under the performant routing algorithms
NASA Astrophysics Data System (ADS)
Bamaarouf, O.; Alweimine, A. Ould Baba; Rachadi, A.; EZ-Zahraouy, H.
2018-04-01
Despite the extensive research on traffic dynamics and epidemic spreading, the effect of the routing algorithms strategies on the traffic-driven epidemic spreading has not received an adequate attention. It is well known that more performant routing algorithm strategies are used to overcome the congestion problem. However, our main result shows unexpectedly that these algorithms favor the virus spreading more than the case where the shortest path based algorithm is used. In this work, we studied the virus spreading in a complex network using the efficient path and the global dynamic routing algorithms as compared to shortest path strategy. Some previous studies have tried to modify the routing rules to limit the virus spreading, but at the expense of reducing the traffic transport efficiency. This work proposed a solution to overcome this drawback by using a selective vaccination procedure instead of a random vaccination used often in the literature. We found that the selective vaccination succeeded in eradicating the virus better than a pure random intervention for the performant routing algorithm strategies.
Tian, Xin; Xin, Mingyuan; Luo, Jian; Liu, Mingyao; Jiang, Zhenran
2017-02-01
The selection of relevant genes for breast cancer metastasis is critical for the treatment and prognosis of cancer patients. Although much effort has been devoted to the gene selection procedures by use of different statistical analysis methods or computational techniques, the interpretation of the variables in the resulting survival models has been limited so far. This article proposes a new Random Forest (RF)-based algorithm to identify important variables highly related with breast cancer metastasis, which is based on the important scores of two variable selection algorithms, including the mean decrease Gini (MDG) criteria of Random Forest and the GeneRank algorithm with protein-protein interaction (PPI) information. The new gene selection algorithm can be called PPIRF. The improved prediction accuracy fully illustrated the reliability and high interpretability of gene list selected by the PPIRF approach.
Resource efficient data compression algorithms for demanding, WSN based biomedical applications.
Antonopoulos, Christos P; Voros, Nikolaos S
2016-02-01
During the last few years, medical research areas of critical importance such as Epilepsy monitoring and study, increasingly utilize wireless sensor network technologies in order to achieve better understanding and significant breakthroughs. However, the limited memory and communication bandwidth offered by WSN platforms comprise a significant shortcoming to such demanding application scenarios. Although, data compression can mitigate such deficiencies there is a lack of objective and comprehensive evaluation of relative approaches and even more on specialized approaches targeting specific demanding applications. The research work presented in this paper focuses on implementing and offering an in-depth experimental study regarding prominent, already existing as well as novel proposed compression algorithms. All algorithms have been implemented in a common Matlab framework. A major contribution of this paper, that differentiates it from similar research efforts, is the employment of real world Electroencephalography (EEG) and Electrocardiography (ECG) datasets comprising the two most demanding Epilepsy modalities. Emphasis is put on WSN applications, thus the respective metrics focus on compression rate and execution latency for the selected datasets. The evaluation results reveal significant performance and behavioral characteristics of the algorithms related to their complexity and the relative negative effect on compression latency as opposed to the increased compression rate. It is noted that the proposed schemes managed to offer considerable advantage especially aiming to achieve the optimum tradeoff between compression rate-latency. Specifically, proposed algorithm managed to combine highly completive level of compression while ensuring minimum latency thus exhibiting real-time capabilities. Additionally, one of the proposed schemes is compared against state-of-the-art general-purpose compression algorithms also exhibiting considerable advantages as far as the compression rate is concerned. Copyright © 2015 Elsevier Inc. All rights reserved.
Data communications in a parallel active messaging interface of a parallel computer
Davis, Kristan D.; Faraj, Daniel A.
2014-07-22
Algorithm selection for data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including specifications of a client, a context, and a task, endpoints coupled for data communications through the PAMI, including associating in the PAMI data communications algorithms and ranges of message sizes so that each algorithm is associated with a separate range of message sizes; receiving in an origin endpoint of the PAMI a data communications instruction, the instruction specifying transmission of a data communications message from the origin endpoint to a target endpoint, the data communications message characterized by a message size; selecting, from among the associated algorithms and ranges, a data communications algorithm in dependence upon the message size; and transmitting, according to the selected data communications algorithm from the origin endpoint to the target endpoint, the data communications message.
Data communications in a parallel active messaging interface of a parallel computer
Davis, Kristan D; Faraj, Daniel A
2013-07-09
Algorithm selection for data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including specifications of a client, a context, and a task, endpoints coupled for data communications through the PAMI, including associating in the PAMI data communications algorithms and ranges of message sizes so that each algorithm is associated with a separate range of message sizes; receiving in an origin endpoint of the PAMI a data communications instruction, the instruction specifying transmission of a data communications message from the origin endpoint to a target endpoint, the data communications message characterized by a message size; selecting, from among the associated algorithms and ranges, a data communications algorithm in dependence upon the message size; and transmitting, according to the selected data communications algorithm from the origin endpoint to the target endpoint, the data communications message.
featsel: A framework for benchmarking of feature selection algorithms and cost functions
NASA Astrophysics Data System (ADS)
Reis, Marcelo S.; Estrela, Gustavo; Ferreira, Carlos Eduardo; Barrera, Junior
In this paper, we introduce featsel, a framework for benchmarking of feature selection algorithms and cost functions. This framework allows the user to deal with the search space as a Boolean lattice and has its core coded in C++ for computational efficiency purposes. Moreover, featsel includes Perl scripts to add new algorithms and/or cost functions, generate random instances, plot graphs and organize results into tables. Besides, this framework already comes with dozens of algorithms and cost functions for benchmarking experiments. We also provide illustrative examples, in which featsel outperforms the popular Weka workbench in feature selection procedures on data sets from the UCI Machine Learning Repository.
Silva, Adão; Gameiro, Atílio
2014-01-01
We present in this work a low-complexity algorithm to solve the sum rate maximization problem in multiuser MIMO broadcast channels with downlink beamforming. Our approach decouples the user selection problem from the resource allocation problem and its main goal is to create a set of quasiorthogonal users. The proposed algorithm exploits physical metrics of the wireless channels that can be easily computed in such a way that a null space projection power can be approximated efficiently. Based on the derived metrics we present a mathematical model that describes the dynamics of the user selection process which renders the user selection problem into an integer linear program. Numerical results show that our approach is highly efficient to form groups of quasiorthogonal users when compared to previously proposed algorithms in the literature. Our user selection algorithm achieves a large portion of the optimum user selection sum rate (90%) for a moderate number of active users. PMID:24574928
A networked voting rule for democratic representation
NASA Astrophysics Data System (ADS)
Hernández, Alexis R.; Gracia-Lázaro, Carlos; Brigatti, Edgardo; Moreno, Yamir
2018-03-01
We introduce a general framework for exploring the problem of selecting a committee of representatives with the aim of studying a networked voting rule based on a decentralized large-scale platform, which can assure a strong accountability of the elected. The results of our simulations suggest that this algorithm-based approach is able to obtain a high representativeness for relatively small committees, performing even better than a classical voting rule based on a closed list of candidates. We show that a general relation between committee size and representatives exists in the form of an inverse square root law and that the normalized committee size approximately scales with the inverse of the community size, allowing the scalability to very large populations. These findings are not strongly influenced by the different networks used to describe the individuals' interactions, except for the presence of few individuals with very high connectivity which can have a marginal negative effect in the committee selection process.
Tractable Goal Selection with Oversubscribed Resources
NASA Technical Reports Server (NTRS)
Rabideau, Gregg; Chien, Steve; McLaren, David
2009-01-01
We describe an efficient, online goal selection algorithm and its use for selecting goals at runtime. Our focus is on the re-planning that must be performed in a timely manner on the embedded system where computational resources are limited. In particular, our algorithm generates near optimal solutions to problems with fully specified goal requests that oversubscribe available resources but have no temporal flexibility. By using a fast, incremental algorithm, goal selection can be postponed in a "just-in-time" fashion allowing requests to be changed or added at the last minute. This enables shorter response cycles and greater autonomy for the system under control.
Bandyopadhyay, Sanghamitra; Mitra, Ramkrishna
2009-10-15
Prediction of microRNA (miRNA) target mRNAs using machine learning approaches is an important area of research. However, most of the methods suffer from either high false positive or false negative rates. One reason for this is the marked deficiency of negative examples or miRNA non-target pairs. Systematic identification of non-target mRNAs is still not addressed properly, and therefore, current machine learning approaches are compelled to rely on artificially generated negative examples for training. In this article, we have identified approximately 300 tissue-specific negative examples using a novel approach that involves expression profiling of both miRNAs and mRNAs, miRNA-mRNA structural interactions and seed-site conservation. The newly generated negative examples are validated with pSILAC dataset, which elucidate the fact that the identified non-targets are indeed non-targets.These high-throughput tissue-specific negative examples and a set of experimentally verified positive examples are then used to build a system called TargetMiner, a support vector machine (SVM)-based classifier. In addition to assessing the prediction accuracy on cross-validation experiments, TargetMiner has been validated with a completely independent experimental test dataset. Our method outperforms 10 existing target prediction algorithms and provides a good balance between sensitivity and specificity that is not reflected in the existing methods. We achieve a significantly higher sensitivity and specificity of 69% and 67.8% based on a pool of 90 feature set and 76.5% and 66.1% using a set of 30 selected feature set on the completely independent test dataset. In order to establish the effectiveness of the systematically generated negative examples, the SVM is trained using a different set of negative data generated using the method in Yousef et al. A significantly higher false positive rate (70.6%) is observed when tested on the independent set, while all other factors are kept the same. Again, when an existing method (NBmiRTar) is executed with the our proposed negative data, we observe an improvement in its performance. These clearly establish the effectiveness of the proposed approach of selecting the negative examples systematically. TargetMiner is now available as an online tool at www.isical.ac.in/ approximately bioinfo_miu
A modified genetic algorithm with fuzzy roulette wheel selection for job-shop scheduling problems
NASA Astrophysics Data System (ADS)
Thammano, Arit; Teekeng, Wannaporn
2015-05-01
The job-shop scheduling problem is one of the most difficult production planning problems. Since it is in the NP-hard class, a recent trend in solving the job-shop scheduling problem is shifting towards the use of heuristic and metaheuristic algorithms. This paper proposes a novel metaheuristic algorithm, which is a modification of the genetic algorithm. This proposed algorithm introduces two new concepts to the standard genetic algorithm: (1) fuzzy roulette wheel selection and (2) the mutation operation with tabu list. The proposed algorithm has been evaluated and compared with several state-of-the-art algorithms in the literature. The experimental results on 53 JSSPs show that the proposed algorithm is very effective in solving the combinatorial optimization problems. It outperforms all state-of-the-art algorithms on all benchmark problems in terms of the ability to achieve the optimal solution and the computational time.
History-based route selection for reactive ad hoc routing protocols
NASA Astrophysics Data System (ADS)
Medidi, Sirisha; Cappetto, Peter
2007-04-01
Ad hoc networks rely on cooperation in order to operate, but in a resource constrained environment not all nodes behave altruistically. Selfish nodes preserve their own resources and do not forward packets not in their own self interest. These nodes degrade the performance of the network, but judicious route selection can help maintain performance despite this behavior. Many route selection algorithms place importance on shortness of the route rather than its reliability. We introduce a light-weight route selection algorithm that uses past behavior to judge the quality of a route rather than solely on the length of the route. It draws information from the underlying routing layer at no extra cost and selects routes with a simple algorithm. This technique maintains this data in a small table, which does not place a high cost on memory. History-based route selection's minimalism suits the needs the portable wireless devices and is easy to implement. We implemented our algorithm and tested it in the ns2 environment. Our simulation results show that history-based route selection achieves higher packet delivery and improved stability than its length-based counterpart.
Online feature selection with streaming features.
Wu, Xindong; Yu, Kui; Ding, Wei; Wang, Hao; Zhu, Xingquan
2013-05-01
We propose a new online feature selection framework for applications with streaming features where the knowledge of the full feature space is unknown in advance. We define streaming features as features that flow in one by one over time whereas the number of training examples remains fixed. This is in contrast with traditional online learning methods that only deal with sequentially added observations, with little attention being paid to streaming features. The critical challenges for Online Streaming Feature Selection (OSFS) include 1) the continuous growth of feature volumes over time, 2) a large feature space, possibly of unknown or infinite size, and 3) the unavailability of the entire feature set before learning starts. In the paper, we present a novel Online Streaming Feature Selection method to select strongly relevant and nonredundant features on the fly. An efficient Fast-OSFS algorithm is proposed to improve feature selection performance. The proposed algorithms are evaluated extensively on high-dimensional datasets and also with a real-world case study on impact crater detection. Experimental results demonstrate that the algorithms achieve better compactness and higher prediction accuracy than existing streaming feature selection algorithms.
Zeng, Xueqiang; Luo, Gang
2017-12-01
Machine learning is broadly used for clinical data analysis. Before training a model, a machine learning algorithm must be selected. Also, the values of one or more model parameters termed hyper-parameters must be set. Selecting algorithms and hyper-parameter values requires advanced machine learning knowledge and many labor-intensive manual iterations. To lower the bar to machine learning, miscellaneous automatic selection methods for algorithms and/or hyper-parameter values have been proposed. Existing automatic selection methods are inefficient on large data sets. This poses a challenge for using machine learning in the clinical big data era. To address the challenge, this paper presents progressive sampling-based Bayesian optimization, an efficient and automatic selection method for both algorithms and hyper-parameter values. We report an implementation of the method. We show that compared to a state of the art automatic selection method, our method can significantly reduce search time, classification error rate, and standard deviation of error rate due to randomization. This is major progress towards enabling fast turnaround in identifying high-quality solutions required by many machine learning-based clinical data analysis tasks.
Devarajan, Karthik; Cheung, Vincent C.K.
2017-01-01
Non-negative matrix factorization (NMF) by the multiplicative updates algorithm is a powerful machine learning method for decomposing a high-dimensional nonnegative matrix V into two nonnegative matrices, W and H where V ~ WH. It has been successfully applied in the analysis and interpretation of large-scale data arising in neuroscience, computational biology and natural language processing, among other areas. A distinctive feature of NMF is its nonnegativity constraints that allow only additive linear combinations of the data, thus enabling it to learn parts that have distinct physical representations in reality. In this paper, we describe an information-theoretic approach to NMF for signal-dependent noise based on the generalized inverse Gaussian model. Specifically, we propose three novel algorithms in this setting, each based on multiplicative updates and prove monotonicity of updates using the EM algorithm. In addition, we develop algorithm-specific measures to evaluate their goodness-of-fit on data. Our methods are demonstrated using experimental data from electromyography studies as well as simulated data in the extraction of muscle synergies, and compared with existing algorithms for signal-dependent noise. PMID:24684448
Chen, Zhiru; Hong, Wenxue
2016-02-01
Considering the low accuracy of prediction in the positive samples and poor overall classification effects caused by unbalanced sample data of MicroRNA (miRNA) target, we proposes a support vector machine (SVM)-integration of under-sampling and weight (IUSM) algorithm in this paper, an under-sampling based on the ensemble learning algorithm. The algorithm adopts SVM as learning algorithm and AdaBoost as integration framework, and embeds clustering-based under-sampling into the iterative process, aiming at reducing the degree of unbalanced distribution of positive and negative samples. Meanwhile, in the process of adaptive weight adjustment of the samples, the SVM-IUSM algorithm eliminates the abnormal ones in negative samples with robust sample weights smoothing mechanism so as to avoid over-learning. Finally, the prediction of miRNA target integrated classifier is achieved with the combination of multiple weak classifiers through the voting mechanism. The experiment revealed that the SVM-IUSW, compared with other algorithms on unbalanced dataset collection, could not only improve the accuracy of positive targets and the overall effect of classification, but also enhance the generalization ability of miRNA target classifier.
The admissible portfolio selection problem with transaction costs and an improved PSO algorithm
NASA Astrophysics Data System (ADS)
Chen, Wei; Zhang, Wei-Guo
2010-05-01
In this paper, we discuss the portfolio selection problem with transaction costs under the assumption that there exist admissible errors on expected returns and risks of assets. We propose a new admissible efficient portfolio selection model and design an improved particle swarm optimization (PSO) algorithm because traditional optimization algorithms fail to work efficiently for our proposed problem. Finally, we offer a numerical example to illustrate the proposed effective approaches and compare the admissible portfolio efficient frontiers under different constraints.
Nasr, Ahmed; Sullivan, Katrina J; Chan, Emily W; Wong, Coralie A; Benchimol, Eric I
2017-01-01
Objective Incidence rates of Hirschsprung disease (HD) vary by geographical region, yet no recent population-based estimate exists for Canada. The objective of our study was to validate and use health administrative data from Ontario, Canada to describe trends in incidence of HD between 1991 and 2013. Study design To identify children with HD we tested algorithms consisting of a combination of diagnostic, procedural, and intervention codes against the reference standard of abstracted clinical charts from a tertiary pediatric hospital. The algorithm with the highest positive predictive value (PPV) that could maintain high sensitivity was applied to health administrative data from April 31, 1991 to March 31, 2014 (fiscal years 1991–2013) to determine annual incidence. Temporal trends were evaluated using Poisson regression, controlling for sex as a covariate. Results The selected algorithm was highly sensitive (93.5%) and specific (>99.9%) with excellent predictive abilities (PPV 89.6% and negative predictive value >99.9%). Using the algorithm, a total of 679 patients diagnosed with HD were identified in Ontario between 1991 and 2013. The overall incidence during this time was 2.05 per 10,000 live births (or 1 in 4,868 live births). The incidence did not change significantly over time (odds ratio 0.998, 95% confidence interval 0.983–1.013, p = 0.80). Conclusion Ontario health administrative data can be used to accurately identify cases of HD and describe trends in incidence. There has not been a significant change in HD incidence over time in Ontario between 1991 and 2013. PMID:29180902
Matrix Multiplication Algorithm Selection with Support Vector Machines
2015-05-01
libraries that could intelligently choose the optimal algorithm for a particular set of inputs. Users would be oblivious to the underlying algorithmic...SAT.” J. Artif . Intell. Res.(JAIR), vol. 32, pp. 565–606, 2008. [9] M. G. Lagoudakis and M. L. Littman, “Algorithm selection using reinforcement...Artificial Intelligence , vol. 21, no. 05, pp. 961–976, 2007. [15] C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vector machines,” ACM
SFM: A novel sequence-based fusion method for disease genes identification and prioritization.
Yousef, Abdulaziz; Moghadam Charkari, Nasrollah
2015-10-21
The identification of disease genes from human genome is of great importance to improve diagnosis and treatment of disease. Several machine learning methods have been introduced to identify disease genes. However, these methods mostly differ in the prior knowledge used to construct the feature vector for each instance (gene), the ways of selecting negative data (non-disease genes) where there is no investigational approach to find them and the classification methods used to make the final decision. In this work, a novel Sequence-based fusion method (SFM) is proposed to identify disease genes. In this regard, unlike existing methods, instead of using a noisy and incomplete prior-knowledge, the amino acid sequence of the proteins which is universal data has been carried out to present the genes (proteins) into four different feature vectors. To select more likely negative data from candidate genes, the intersection set of four negative sets which are generated using distance approach is considered. Then, Decision Tree (C4.5) has been applied as a fusion method to combine the results of four independent state-of the-art predictors based on support vector machine (SVM) algorithm, and to make the final decision. The experimental results of the proposed method have been evaluated by some standard measures. The results indicate the precision, recall and F-measure of 82.6%, 85.6% and 84, respectively. These results confirm the efficiency and validity of the proposed method. Copyright © 2015 Elsevier Ltd. All rights reserved.
Enhancing Breast Cancer Recurrence Algorithms Through Selective Use of Medical Record Data
Chubak, Jessica; Johnson, Lisa; Castillo, Adrienne; Weltzien, Erin; Caan, Bette J.
2016-01-01
Abstract Background: The utility of data-based algorithms in research has been questioned because of errors in identification of cancer recurrences. We adapted previously published breast cancer recurrence algorithms, selectively using medical record (MR) data to improve classification. Methods: We evaluated second breast cancer event (SBCE) and recurrence-specific algorithms previously published by Chubak and colleagues in 1535 women from the Life After Cancer Epidemiology (LACE) and 225 women from the Women’s Health Initiative cohorts and compared classification statistics to published values. We also sought to improve classification with minimal MR examination. We selected pairs of algorithms—one with high sensitivity/high positive predictive value (PPV) and another with high specificity/high PPV—using MR information to resolve discrepancies between algorithms, properly classifying events based on review; we called this “triangulation.” Finally, in LACE, we compared associations between breast cancer survival risk factors and recurrence using MR data, single Chubak algorithms, and triangulation. Results: The SBCE algorithms performed well in identifying SBCE and recurrences. Recurrence-specific algorithms performed more poorly than published except for the high-specificity/high-PPV algorithm, which performed well. The triangulation method (sensitivity = 81.3%, specificity = 99.7%, PPV = 98.1%, NPV = 96.5%) improved recurrence classification over two single algorithms (sensitivity = 57.1%, specificity = 95.5%, PPV = 71.3%, NPV = 91.9%; and sensitivity = 74.6%, specificity = 97.3%, PPV = 84.7%, NPV = 95.1%), with 10.6% MR review. Triangulation performed well in survival risk factor analyses vs analyses using MR-identified recurrences. Conclusions: Use of multiple recurrence algorithms in administrative data, in combination with selective examination of MR data, may improve recurrence data quality and reduce research costs. PMID:26582243
Effective traffic features selection algorithm for cyber-attacks samples
NASA Astrophysics Data System (ADS)
Li, Yihong; Liu, Fangzheng; Du, Zhenyu
2018-05-01
By studying the defense scheme of Network attacks, this paper propose an effective traffic features selection algorithm based on k-means++ clustering to deal with the problem of high dimensionality of traffic features which extracted from cyber-attacks samples. Firstly, this algorithm divide the original feature set into attack traffic feature set and background traffic feature set by the clustering. Then, we calculates the variation of clustering performance after removing a certain feature. Finally, evaluating the degree of distinctiveness of the feature vector according to the result. Among them, the effective feature vector is whose degree of distinctiveness exceeds the set threshold. The purpose of this paper is to select out the effective features from the extracted original feature set. In this way, it can reduce the dimensionality of the features so as to reduce the space-time overhead of subsequent detection. The experimental results show that the proposed algorithm is feasible and it has some advantages over other selection algorithms.
Diversified models for portfolio selection based on uncertain semivariance
NASA Astrophysics Data System (ADS)
Chen, Lin; Peng, Jin; Zhang, Bo; Rosyida, Isnaini
2017-02-01
Since the financial markets are complex, sometimes the future security returns are represented mainly based on experts' estimations due to lack of historical data. This paper proposes a semivariance method for diversified portfolio selection, in which the security returns are given subjective to experts' estimations and depicted as uncertain variables. In the paper, three properties of the semivariance of uncertain variables are verified. Based on the concept of semivariance of uncertain variables, two types of mean-semivariance diversified models for uncertain portfolio selection are proposed. Since the models are complex, a hybrid intelligent algorithm which is based on 99-method and genetic algorithm is designed to solve the models. In this hybrid intelligent algorithm, 99-method is applied to compute the expected value and semivariance of uncertain variables, and genetic algorithm is employed to seek the best allocation plan for portfolio selection. At last, several numerical examples are presented to illustrate the modelling idea and the effectiveness of the algorithm.
Asian dust aerosol: Optical effect on satellite ocean color signal and a scheme of its correction
NASA Astrophysics Data System (ADS)
Fukushima, H.; Toratani, M.
1997-07-01
The paper first exhibits the influence of the Asian dust aerosol (KOSA) on a coastal zone color scanner (CZCS) image which records erroneously low or negative satellite-derived water-leaving radiance especially in a shorter wavelength region. This suggests the presence of spectrally dependent absorption which was disregarded in the past atmospheric correction algorithms. On the basis of the analysis of the scene, a semiempirical optical model of the Asian dust aerosol that relates aerosol single scattering albedo (ωA) to the spectral ratio of aerosol optical thickness between 550 nm and 670 nm is developed. Then, as a modification to a standard CZCS atmospheric correction algorithm (NASA standard algorithm), a scheme which estimates pixel-wise aerosol optical thickness, and in turn ωA, is proposed. The assumption of constant normalized water-leaving radiance at 550 nm is adopted together with a model of aerosol scattering phase function. The scheme is combined to the standard algorithm, performing atmospheric correction just the same as the standard version with a fixed Angstrom coefficient except in the case where the presence of Asian dust aerosol is detected by the lowered satellite-derived Angstrom exponent. Some of the model parameter values are determined so that the scheme does not produce any spatial discontinuity with the standard scheme. The algorithm was tested against the Japanese Asian dust CZCS scene with parameter values of the spectral dependency of ωA, first statistically determined and second optimized for selected pixels. Analysis suggests that the parameter values depend on the assumed Angstrom coefficient for standard algorithm, at the same time defining the spatial extent of the area to apply the Asian dust scheme. The algorithm was also tested for a Saharan dust scene, showing the relevance of the scheme but with different parameter setting. Finally, the algorithm was applied to a data set of 25 CZCS scenes to produce a monthly composite of pigment concentration for April 1981. Through these analyses, the modified algorithm is considered robust in the sense that it operates most compatibly with the standard algorithm yet performs adaptively in response to the magnitude of the dust effect.
Chemical function based pharmacophore generation of endothelin-A selective receptor antagonists.
Funk, Oliver F; Kettmann, Viktor; Drimal, Jan; Langer, Thierry
2004-05-20
Both quantitative and qualitative chemical function based pharmacophore models of endothelin-A (ET(A)) selective receptor antagonists were generated by using the two algorithms HypoGen and HipHop, respectively, which are implemented in the Catalyst molecular modeling software. The input for HypoGen is a training set of 18 ET(A) antagonists exhibiting IC(50) values ranging between 0.19 nM and 67 microM. The best output hypothesis consists of five features: two hydrophobic (HY), one ring aromatic (RA), one hydrogen bond acceptor (HBA), and one negative ionizable (NI) function. The highest scoring Hip Hop model consists of six features: three hydrophobic (HY), one ring aromatic (RA), one hydrogen bond acceptor (HBA), and one negative ionizable (NI). It is the result of an input of three highly active, selective, and structurally diverse ET(A) antagonists. The predictive power of the quantitative model could be approved by using a test set of 30 compounds, whose activity values spread over 6 orders of magnitude. The two pharmacophores were tested according to their ability to extract known endothelin antagonists from the 3D molecular structure database of Derwent's World Drug Index. Thereby the main part of selective ET(A) antagonistic entries was detected by the two hypotheses. Furthermore, the pharmacophores were used to screen the Maybridge database. Six compounds were chosen from the output hit lists for in vitro testing of their ability to displace endothelin-1 from its receptor. Two of these are new potential lead compounds because they are structurally novel and exhibit satisfactory activity in the binding assay.
Dibble, Elizabeth H; Swenson, David W; Cartagena, Claudia; Baird, Grayson L; Herliczek, Thaddeus W
2018-03-01
Purpose To establish, in a large cohort, the diagnostic performance of a staged algorithm involving ultrasonography (US) followed by conditional unenhanced magnetic resonance (MR) imaging for the imaging work-up of pediatric appendicitis. Materials and Methods A staged imaging algorithm in which US and unenhanced MR imaging were performed in pediatric patients suspected of having appendicitis was implemented at the authors' institution on January 1, 2011, with US as the initial modality followed by unenhanced MR imaging when US findings were equivocal. A search of the radiology database revealed 2180 pediatric patients who had undergone imaging for suspected appendicitis from January 1, 2011, through December 31, 2012. Of the 2180 patients, 1982 (90.9%) were evaluated according to the algorithm. The authors reviewed the electronic medical records and imaging reports for all patients. Imaging reports were reviewed and classified as positive, negative, or equivocal for appendicitis and correlated with surgical and pathology reports. Results The frequency of appendicitis was 20.5% (407 of 1982 patients). US alone was performed in 1905 of the 1982 patients (96.1%), yielding a sensitivity of 98.7% (386 of 391 patients) and specificity of 97.1% (1470 of 1514 patients) for appendicitis. Seventy-seven patients underwent unenhanced MR imaging after equivocal US findings, yielding an overall algorithm sensitivity of 98.2% (400 of 407 patients) and specificity of 97.1% (1530 of 1575 patients). Seven of the 1982 patients (0.4%) had false-negative results with the staged algorithm. The negative predictive value of the staged algorithm was 99.5% (1530 of 1537 patients). Conclusion A staged algorithm of US and unenhanced MR imaging for pediatric appendicitis appears to be effective. The results of this study demonstrate that this staged algorithm is 98.2% sensitive and 97.1% specific for the diagnosis of appendicitis in pediatric patients. © RSNA, 2017.
Tenório, Josceli Maria; Hummel, Anderson Diniz; Cohrs, Frederico Molina; Sdepanian, Vera Lucia; Pisa, Ivan Torres; de Fátima Marin, Heimar
2013-01-01
Background Celiac disease (CD) is a difficult-to-diagnose condition because of its multiple clinical presentations and symptoms shared with other diseases. Gold-standard diagnostic confirmation of suspected CD is achieved by biopsying the small intestine. Objective To develop a clinical decision–support system (CDSS) integrated with an automated classifier to recognize CD cases, by selecting from experimental models developed using intelligence artificial techniques. Methods A web-based system was designed for constructing a retrospective database that included 178 clinical cases for training. Tests were run on 270 automated classifiers available in Weka 3.6.1 using five artificial intelligence techniques, namely decision trees, Bayesian inference, k-nearest neighbor algorithm, support vector machines and artificial neural networks. The parameters evaluated were accuracy, sensitivity, specificity and area under the ROC curve (AUC). AUC was used as a criterion for selecting the CDSS algorithm. A testing database was constructed including 38 clinical CD cases for CDSS evaluation. The diagnoses suggested by CDSS were compared with those made by physicians during patient consultations. Results The most accurate method during the training phase was the averaged one-dependence estimator (AODE) algorithm (a Bayesian classifier), which showed accuracy 80.0%, sensitivity 0.78, specificity 0.80 and AUC 0.84. This classifier was integrated into the web-based decision–support system. The gold-standard validation of CDSS achieved accuracy of 84.2% and k = 0.68 (p < 0.0001) with good agreement. The same accuracy was achieved in the comparison between the physician’s diagnostic impression and the gold standard k = 0. 64 (p < 0.0001). There was moderate agreement between the physician’s diagnostic impression and CDSS k = 0.46 (p = 0.0008). Conclusions The study results suggest that CDSS could be used to help in diagnosing CD, since the algorithm tested achieved excellent accuracy in differentiating possible positive from negative CD diagnoses. This study may contribute towards developing of a computer-assisted environment to support CD diagnosis. PMID:21917512
Developing a dengue forecast model using machine learning: A case study in China
Zhang, Qin; Wang, Li; Xiao, Jianpeng; Zhang, Qingying; Luo, Ganfeng; Li, Zhihao; He, Jianfeng; Zhang, Yonghui; Ma, Wenjun
2017-01-01
Background In China, dengue remains an important public health issue with expanded areas and increased incidence recently. Accurate and timely forecasts of dengue incidence in China are still lacking. We aimed to use the state-of-the-art machine learning algorithms to develop an accurate predictive model of dengue. Methodology/Principal findings Weekly dengue cases, Baidu search queries and climate factors (mean temperature, relative humidity and rainfall) during 2011–2014 in Guangdong were gathered. A dengue search index was constructed for developing the predictive models in combination with climate factors. The observed year and week were also included in the models to control for the long-term trend and seasonality. Several machine learning algorithms, including the support vector regression (SVR) algorithm, step-down linear regression model, gradient boosted regression tree algorithm (GBM), negative binomial regression model (NBM), least absolute shrinkage and selection operator (LASSO) linear regression model and generalized additive model (GAM), were used as candidate models to predict dengue incidence. Performance and goodness of fit of the models were assessed using the root-mean-square error (RMSE) and R-squared measures. The residuals of the models were examined using the autocorrelation and partial autocorrelation function analyses to check the validity of the models. The models were further validated using dengue surveillance data from five other provinces. The epidemics during the last 12 weeks and the peak of the 2014 large outbreak were accurately forecasted by the SVR model selected by a cross-validation technique. Moreover, the SVR model had the consistently smallest prediction error rates for tracking the dynamics of dengue and forecasting the outbreaks in other areas in China. Conclusion and significance The proposed SVR model achieved a superior performance in comparison with other forecasting techniques assessed in this study. The findings can help the government and community respond early to dengue epidemics. PMID:29036169
Tenório, Josceli Maria; Hummel, Anderson Diniz; Cohrs, Frederico Molina; Sdepanian, Vera Lucia; Pisa, Ivan Torres; de Fátima Marin, Heimar
2011-11-01
Celiac disease (CD) is a difficult-to-diagnose condition because of its multiple clinical presentations and symptoms shared with other diseases. Gold-standard diagnostic confirmation of suspected CD is achieved by biopsying the small intestine. To develop a clinical decision-support system (CDSS) integrated with an automated classifier to recognize CD cases, by selecting from experimental models developed using intelligence artificial techniques. A web-based system was designed for constructing a retrospective database that included 178 clinical cases for training. Tests were run on 270 automated classifiers available in Weka 3.6.1 using five artificial intelligence techniques, namely decision trees, Bayesian inference, k-nearest neighbor algorithm, support vector machines and artificial neural networks. The parameters evaluated were accuracy, sensitivity, specificity and area under the ROC curve (AUC). AUC was used as a criterion for selecting the CDSS algorithm. A testing database was constructed including 38 clinical CD cases for CDSS evaluation. The diagnoses suggested by CDSS were compared with those made by physicians during patient consultations. The most accurate method during the training phase was the averaged one-dependence estimator (AODE) algorithm (a Bayesian classifier), which showed accuracy 80.0%, sensitivity 0.78, specificity 0.80 and AUC 0.84. This classifier was integrated into the web-based decision-support system. The gold-standard validation of CDSS achieved accuracy of 84.2% and k=0.68 (p<0.0001) with good agreement. The same accuracy was achieved in the comparison between the physician's diagnostic impression and the gold standard k=0. 64 (p<0.0001). There was moderate agreement between the physician's diagnostic impression and CDSS k=0.46 (p=0.0008). The study results suggest that CDSS could be used to help in diagnosing CD, since the algorithm tested achieved excellent accuracy in differentiating possible positive from negative CD diagnoses. This study may contribute towards developing of a computer-assisted environment to support CD diagnosis. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Attia, Khalid A. M.; Nassar, Mohammed W. I.; El-Zeiny, Mohamed B.; Serag, Ahmed
2017-01-01
For the first time, a new variable selection method based on swarm intelligence namely firefly algorithm is coupled with three different multivariate calibration models namely, concentration residual augmented classical least squares, artificial neural network and support vector regression in UV spectral data. A comparative study between the firefly algorithm and the well-known genetic algorithm was developed. The discussion revealed the superiority of using this new powerful algorithm over the well-known genetic algorithm. Moreover, different statistical tests were performed and no significant differences were found between all the models regarding their predictabilities. This ensures that simpler and faster models were obtained without any deterioration of the quality of the calibration.
Marchetti, Antonio; Pace, Maria Vittoria; Di Lorito, Alessia; Canarecci, Sara; Felicioni, Lara; D'Antuono, Tommaso; Liberatore, Marcella; Filice, Giampaolo; Guetti, Luigi; Mucilli, Felice; Buttitta, Fiamma
2016-09-01
Anaplastic Lymphoma Kinase (ALK) gene rearrangements have been described in 3-5% of lung adenocarcinomas (ADC) and their identification is essential to select patients for treatment with ALK tyrosine kinase inhibitors. For several years, fluorescent in situ hybridization (FISH) has been considered as the only validated diagnostic assay. Currently, alternative methods are commercially available as diagnostic tests. A series of 217 ADC comprising 196 consecutive resected tumors and 21 ALK FISH-positive cases from an independent series of 702 ADC were investigated. All specimens were screened by IHC (ALK-D5F3-CDx-Ventana), FISH (Vysis ALK Break-Apart-Abbott) and RT-PCR (ALK RGQ RT-PCR-Qiagen). Results were compared and discordant cases subjected to Next Generation Sequencing. Thirty-nine of 217 samples were positive by the ALK RGQ RT-PCR assay, using a threshold cycle (Ct) cut-off ≤35.9, as recommended. Of these positive samples, 14 were negative by IHC and 12 by FISH. ALK RGQ RT-PCR/FISH discordant cases were analyzed by the NGS assay with results concordant with FISH data. In order to obtain the maximum level of agreement between FISH and ALK RGQ RT-PCR data, we introduced a new scoring algorithm based on the ΔCt value. A ΔCt cut-off level ≤3.5 was used in a pilot series. Then the algorithm was tested on a completely independent validation series. By using the new scoring algorithm and FISH as reference standard, the sensitivity and the specificity of the ALK RGQ RT-PCR(ΔCt) assay were 100% and 100%, respectively. Our results suggest that the ALK RGQ RT-PCR test could be useful in clinical practice as a complementary assay in multi-test diagnostic algorithms or even, if our data will be confirmed in independent studies, as a standalone or screening test for the selection of patients to be treated with ALK inhibitors. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Automatic motor task selection via a bandit algorithm for a brain-controlled button
NASA Astrophysics Data System (ADS)
Fruitet, Joan; Carpentier, Alexandra; Munos, Rémi; Clerc, Maureen
2013-02-01
Objective. Brain-computer interfaces (BCIs) based on sensorimotor rhythms use a variety of motor tasks, such as imagining moving the right or left hand, the feet or the tongue. Finding the tasks that yield best performance, specifically to each user, is a time-consuming preliminary phase to a BCI experiment. This study presents a new adaptive procedure to automatically select (online) the most promising motor task for an asynchronous brain-controlled button. Approach. We develop for this purpose an adaptive algorithm UCB-classif based on the stochastic bandit theory and design an EEG experiment to test our method. We compare (offline) the adaptive algorithm to a naïve selection strategy which uses uniformly distributed samples from each task. We also run the adaptive algorithm online to fully validate the approach. Main results. By not wasting time on inefficient tasks, and focusing on the most promising ones, this algorithm results in a faster task selection and a more efficient use of the BCI training session. More precisely, the offline analysis reveals that the use of this algorithm can reduce the time needed to select the most appropriate task by almost half without loss in precision, or alternatively, allow us to investigate twice the number of tasks within a similar time span. Online tests confirm that the method leads to an optimal task selection. Significance. This study is the first one to optimize the task selection phase by an adaptive procedure. By increasing the number of tasks that can be tested in a given time span, the proposed method could contribute to reducing ‘BCI illiteracy’.
Momeni, Saba; Pourghassem, Hossein
2014-08-01
Recently image fusion has prominent role in medical image processing and is useful to diagnose and treat many diseases. Digital subtraction angiography is one of the most applicable imaging to diagnose brain vascular diseases and radiosurgery of brain. This paper proposes an automatic fuzzy-based multi-temporal fusion algorithm for 2-D digital subtraction angiography images. In this algorithm, for blood vessel map extraction, the valuable frames of brain angiography video are automatically determined to form the digital subtraction angiography images based on a novel definition of vessel dispersion generated by injected contrast material. Our proposed fusion scheme contains different fusion methods for high and low frequency contents based on the coefficient characteristic of wrapping second generation of curvelet transform and a novel content selection strategy. Our proposed content selection strategy is defined based on sample correlation of the curvelet transform coefficients. In our proposed fuzzy-based fusion scheme, the selection of curvelet coefficients are optimized by applying weighted averaging and maximum selection rules for the high frequency coefficients. For low frequency coefficients, the maximum selection rule based on local energy criterion is applied to better visual perception. Our proposed fusion algorithm is evaluated on a perfect brain angiography image dataset consisting of one hundred 2-D internal carotid rotational angiography videos. The obtained results demonstrate the effectiveness and efficiency of our proposed fusion algorithm in comparison with common and basic fusion algorithms.
An Approximate Approach to Automatic Kernel Selection.
Ding, Lizhong; Liao, Shizhong
2016-02-02
Kernel selection is a fundamental problem of kernel-based learning algorithms. In this paper, we propose an approximate approach to automatic kernel selection for regression from the perspective of kernel matrix approximation. We first introduce multilevel circulant matrices into automatic kernel selection, and develop two approximate kernel selection algorithms by exploiting the computational virtues of multilevel circulant matrices. The complexity of the proposed algorithms is quasi-linear in the number of data points. Then, we prove an approximation error bound to measure the effect of the approximation in kernel matrices by multilevel circulant matrices on the hypothesis and further show that the approximate hypothesis produced with multilevel circulant matrices converges to the accurate hypothesis produced with kernel matrices. Experimental evaluations on benchmark datasets demonstrate the effectiveness of approximate kernel selection.
A Fast Algorithm of Convex Hull Vertices Selection for Online Classification.
Ding, Shuguang; Nie, Xiangli; Qiao, Hong; Zhang, Bo
2018-04-01
Reducing samples through convex hull vertices selection (CHVS) within each class is an important and effective method for online classification problems, since the classifier can be trained rapidly with the selected samples. However, the process of CHVS is NP-hard. In this paper, we propose a fast algorithm to select the convex hull vertices, based on the convex hull decomposition and the property of projection. In the proposed algorithm, the quadratic minimization problem of computing the distance between a point and a convex hull is converted into a linear equation problem with a low computational complexity. When the data dimension is high, an approximate, instead of exact, convex hull is allowed to be selected by setting an appropriate termination condition in order to delete more nonimportant samples. In addition, the impact of outliers is also considered, and the proposed algorithm is improved by deleting the outliers in the initial procedure. Furthermore, a dimension convention technique via the kernel trick is used to deal with nonlinearly separable problems. An upper bound is theoretically proved for the difference between the support vector machines based on the approximate convex hull vertices selected and all the training samples. Experimental results on both synthetic and real data sets show the effectiveness and validity of the proposed algorithm.
Fang, Hongqing; He, Lei; Si, Hao; Liu, Peng; Xie, Xiaolei
2014-09-01
In this paper, Back-propagation(BP) algorithm has been used to train the feed forward neural network for human activity recognition in smart home environments, and inter-class distance method for feature selection of observed motion sensor events is discussed and tested. And then, the human activity recognition performances of neural network using BP algorithm have been evaluated and compared with other probabilistic algorithms: Naïve Bayes(NB) classifier and Hidden Markov Model(HMM). The results show that different feature datasets yield different activity recognition accuracy. The selection of unsuitable feature datasets increases the computational complexity and degrades the activity recognition accuracy. Furthermore, neural network using BP algorithm has relatively better human activity recognition performances than NB classifier and HMM. Copyright © 2014 ISA. Published by Elsevier Ltd. All rights reserved.
Algorithm for identifying and separating beats from arterial pulse records
Treo, Ernesto F; Herrera, Myriam C; Valentinuzzi, Max E
2005-01-01
Background This project was designed as an epidemiological aid-selecting tool for a small country health center with the general objective of screening out possible coronary patients. Peripheral artery function can be non-invasively evaluated by impedance plethysmography. Changes in these vessels appear as good predictors of future coronary behavior. Impedance plethysmography detects volume variations after simple occlusive maneuvers that may show indicative modifications in arterial/venous responses. Averaging of a series of pulses is needed and this, in turn, requires proper determination of the beginning and end of each beat. Thus, the objective here is to describe an algorithm to identify and separate out beats from a plethysmographic record. A secondary objective was to compare the output given by human operators against the algorithm. Methods The identification algorithm detected the beat's onset and end on the basis of the maximum rising phase, the choice of possible ventricular systolic starting points considering cardiac frequency, and the adjustment of some tolerance values to optimize the behavior. Out of 800 patients in the study, 40 occlusive records (supradiastolic- subsystolic) were randomly selected without any preliminary diagnosis. Radial impedance plethysmographic pulse and standard ECG were recorded digitizing and storing the data. Cardiac frequency was estimated with the Power Density Function and, thereafter, the signal was derived twice, followed by binarization of the first derivative and rectification of the second derivative. The product of the two latter results led to a weighing signal from which the cycles' onsets and ends were established. Weighed and frequency filters are needed along with the pre-establishment of their respective tolerances. Out of the 40 records, 30 seconds strands were randomly chosen to be analyzed by the algorithm and by two operators. Sensitivity and accuracy were calculated by means of the true/false and positive/negative criteria. Synchronization ability was measured through the coefficient of variation and the median value of correlation for each patient. These parameters were assessed by means of Friedman's ANOVA and Kendall Concordance test. Results Sensitivity was 97% and 91% for the two operators, respectively, while accuracy was cero for both of them. The synchronism variability analysis was significant (p < 0.01) for the two statistics, showing that the algorithm produced the best result. Conclusion The proposed algorithm showed good performance as expressed by its high sensitivity. The correlation analysis demonstrated that, from the synchronism point of view, the algorithm performed the best detection. Patients with marked arrhythmic processes are not good candidates for this kind of analysis. At most, they would be singled out by the algorithm and, thereafter, to be checked by an operator. PMID:16095532
Goal Selection for Embedded Systems with Oversubscribed Resources
NASA Technical Reports Server (NTRS)
Rabideau, Gregg; Chien, Steve; McLaren, David
2010-01-01
We describe an efficient, online goal selection algorithm and its use for selecting goals at runtime. Our focus is on the re-planning that must be performed in a timely manner on the embedded system where computational resources are limited. In particular, our algorithm generates near optimal solutions to problems with fully specified goal requests that oversubscribe available resources but have no temporal flexibility. By using a fast, incremental algorithm, goal selection can be postponed in a "just-in-time" fashion allowing requests to be changed or added at the last minute. This enables shorter response cycles and greater autonomy for the system under control.
Onboard Run-Time Goal Selection for Autonomous Operations
NASA Technical Reports Server (NTRS)
Rabideau, Gregg; Chien, Steve; McLaren, David
2010-01-01
We describe an efficient, online goal selection algorithm for use onboard spacecraft and its use for selecting goals at runtime. Our focus is on the re-planning that must be performed in a timely manner on the embedded system where computational resources are limited. In particular, our algorithm generates near optimal solutions to problems with fully specified goal requests that oversubscribe available resources but have no temporal flexibility. By using a fast, incremental algorithm, goal selection can be postponed in a "just-in-time" fashion allowing requests to be changed or added at the last minute. This enables shorter response cycles and greater autonomy for the system under control.
Nidheesh, N; Abdul Nazeer, K A; Ameer, P M
2017-12-01
Clustering algorithms with steps involving randomness usually give different results on different executions for the same dataset. This non-deterministic nature of algorithms such as the K-Means clustering algorithm limits their applicability in areas such as cancer subtype prediction using gene expression data. It is hard to sensibly compare the results of such algorithms with those of other algorithms. The non-deterministic nature of K-Means is due to its random selection of data points as initial centroids. We propose an improved, density based version of K-Means, which involves a novel and systematic method for selecting initial centroids. The key idea of the algorithm is to select data points which belong to dense regions and which are adequately separated in feature space as the initial centroids. We compared the proposed algorithm to a set of eleven widely used single clustering algorithms and a prominent ensemble clustering algorithm which is being used for cancer data classification, based on the performances on a set of datasets comprising ten cancer gene expression datasets. The proposed algorithm has shown better overall performance than the others. There is a pressing need in the Biomedical domain for simple, easy-to-use and more accurate Machine Learning tools for cancer subtype prediction. The proposed algorithm is simple, easy-to-use and gives stable results. Moreover, it provides comparatively better predictions of cancer subtypes from gene expression data. Copyright © 2017 Elsevier Ltd. All rights reserved.
Threshold-selecting strategy for best possible ground state detection with genetic algorithms
NASA Astrophysics Data System (ADS)
Lässig, Jörg; Hoffmann, Karl Heinz
2009-04-01
Genetic algorithms are a standard heuristic to find states of low energy in complex state spaces as given by physical systems such as spin glasses but also in combinatorial optimization. The paper considers the problem of selecting individuals in the current population in genetic algorithms for crossover. Many schemes have been considered in literature as possible crossover selection strategies. We show for a large class of quality measures that the best possible probability distribution for selecting individuals in each generation of the algorithm execution is a rectangular distribution over the individuals sorted by their energy values. This means uniform probabilities have to be assigned to a group of the individuals with lowest energy in the population but probabilities equal to zero to individuals which are corresponding to energy values higher than a fixed cutoff, which is equal to a certain rank in the vector sorted by the energy of the states in the current population. The considered strategy is dubbed threshold selecting. The proof applies basic arguments of Markov chains and linear optimization and makes only a few assumptions on the underlying principles and hence applies to a large class of algorithms.
Assistant for Analyzing Tropical-Rain-Mapping Radar Data
NASA Technical Reports Server (NTRS)
James, Mark
2006-01-01
A document is defined that describes an approach for a Tropical Rain Mapping Radar Data System (TDS). TDS is composed of software and hardware elements incorporating a two-frequency spaceborne radar system for measuring tropical precipitation. The TDS would be used primarily in generating data products for scientific investigations. The most novel part of the TDS would be expert-system software to aid in the selection of algorithms for converting raw radar-return data into such primary observables as rain rate, path-integrated rain rate, and surface backscatter. The expert-system approach would address the issue that selection of algorithms for processing the data requires a significant amount of preprocessing, non-intuitive reasoning, and heuristic application, making it infeasible, in many cases, to select the proper algorithm in real time. In the TDS, tentative selections would be made to enable conversions in real time. The expert system would remove straightforwardly convertible data from further consideration, and would examine ambiguous data, performing analysis in depth to determine which algorithms to select. Conversions performed by these algorithms, presumed to be correct, would be compared with the corresponding real-time conversions. Incorrect real-time conversions would be updated using the correct conversions.
Bayesian inference of nonlinear unsteady aerodynamics from aeroelastic limit cycle oscillations
NASA Astrophysics Data System (ADS)
Sandhu, Rimple; Poirel, Dominique; Pettit, Chris; Khalil, Mohammad; Sarkar, Abhijit
2016-07-01
A Bayesian model selection and parameter estimation algorithm is applied to investigate the influence of nonlinear and unsteady aerodynamic loads on the limit cycle oscillation (LCO) of a pitching airfoil in the transitional Reynolds number regime. At small angles of attack, laminar boundary layer trailing edge separation causes negative aerodynamic damping leading to the LCO. The fluid-structure interaction of the rigid, but elastically mounted, airfoil and nonlinear unsteady aerodynamics is represented by two coupled nonlinear stochastic ordinary differential equations containing uncertain parameters and model approximation errors. Several plausible aerodynamic models with increasing complexity are proposed to describe the aeroelastic system leading to LCO. The likelihood in the posterior parameter probability density function (pdf) is available semi-analytically using the extended Kalman filter for the state estimation of the coupled nonlinear structural and unsteady aerodynamic model. The posterior parameter pdf is sampled using a parallel and adaptive Markov Chain Monte Carlo (MCMC) algorithm. The posterior probability of each model is estimated using the Chib-Jeliazkov method that directly uses the posterior MCMC samples for evidence (marginal likelihood) computation. The Bayesian algorithm is validated through a numerical study and then applied to model the nonlinear unsteady aerodynamic loads using wind-tunnel test data at various Reynolds numbers.
Bayesian inference of nonlinear unsteady aerodynamics from aeroelastic limit cycle oscillations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sandhu, Rimple; Poirel, Dominique; Pettit, Chris
2016-07-01
A Bayesian model selection and parameter estimation algorithm is applied to investigate the influence of nonlinear and unsteady aerodynamic loads on the limit cycle oscillation (LCO) of a pitching airfoil in the transitional Reynolds number regime. At small angles of attack, laminar boundary layer trailing edge separation causes negative aerodynamic damping leading to the LCO. The fluid–structure interaction of the rigid, but elastically mounted, airfoil and nonlinear unsteady aerodynamics is represented by two coupled nonlinear stochastic ordinary differential equations containing uncertain parameters and model approximation errors. Several plausible aerodynamic models with increasing complexity are proposed to describe the aeroelastic systemmore » leading to LCO. The likelihood in the posterior parameter probability density function (pdf) is available semi-analytically using the extended Kalman filter for the state estimation of the coupled nonlinear structural and unsteady aerodynamic model. The posterior parameter pdf is sampled using a parallel and adaptive Markov Chain Monte Carlo (MCMC) algorithm. The posterior probability of each model is estimated using the Chib–Jeliazkov method that directly uses the posterior MCMC samples for evidence (marginal likelihood) computation. The Bayesian algorithm is validated through a numerical study and then applied to model the nonlinear unsteady aerodynamic loads using wind-tunnel test data at various Reynolds numbers.« less
Automated Detection of Atrial Fibrillation Based on Time-Frequency Analysis of Seismocardiograms.
Hurnanen, Tero; Lehtonen, Eero; Tadi, Mojtaba Jafari; Kuusela, Tom; Kiviniemi, Tuomas; Saraste, Antti; Vasankari, Tuija; Airaksinen, Juhani; Koivisto, Tero; Pankaala, Mikko
2017-09-01
In this paper, a novel method to detect atrial fibrillation (AFib) from a seismocardiogram (SCG) is presented. The proposed method is based on linear classification of the spectral entropy and a heart rate variability index computed from the SCG. The performance of the developed algorithm is demonstrated on data gathered from 13 patients in clinical setting. After motion artifact removal, in total 119 min of AFib data and 126 min of sinus rhythm data were considered for automated AFib detection. No other arrhythmias were considered in this study. The proposed algorithm requires no direct heartbeat peak detection from the SCG data, which makes it tolerant against interpersonal variations in the SCG morphology, and noise. Furthermore, the proposed method relies solely on the SCG and needs no complementary electrocardiography to be functional. For the considered data, the detection method performs well even on relatively low quality SCG signals. Using a majority voting scheme that takes five randomly selected segments from a signal and classifies these segments using the proposed algorithm, we obtained an average true positive rate of [Formula: see text] and an average true negative rate of [Formula: see text] for detecting AFib in leave-one-out cross-validation. This paper facilitates adoption of microelectromechanical sensor based heart monitoring devices for arrhythmia detection.
MPI-FAUN: An MPI-Based Framework for Alternating-Updating Nonnegative Matrix Factorization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kannan, Ramakrishnan; Ballard, Grey; Park, Haesun
Non-negative matrix factorization (NMF) is the problem of determining two non-negative low rank factors W and H, for the given input matrix A, such that A≈WH. NMF is a useful tool for many applications in different domains such as topic modeling in text mining, background separation in video analysis, and community detection in social networks. Despite its popularity in the data mining community, there is a lack of efficient parallel algorithms to solve the problem for big data sets. The main contribution of this work is a new, high-performance parallel computational framework for a broad class of NMF algorithms thatmore » iteratively solves alternating non-negative least squares (NLS) subproblems for W and H. It maintains the data and factor matrices in memory (distributed across processors), uses MPI for interprocessor communication, and, in the dense case, provably minimizes communication costs (under mild assumptions). The framework is flexible and able to leverage a variety of NMF and NLS algorithms, including Multiplicative Update, Hierarchical Alternating Least Squares, and Block Principal Pivoting. Our implementation allows us to benchmark and compare different algorithms on massive dense and sparse data matrices of size that spans from few hundreds of millions to billions. We demonstrate the scalability of our algorithm and compare it with baseline implementations, showing significant performance improvements. The code and the datasets used for conducting the experiments are available online.« less
MPI-FAUN: An MPI-Based Framework for Alternating-Updating Nonnegative Matrix Factorization
Kannan, Ramakrishnan; Ballard, Grey; Park, Haesun
2017-10-30
Non-negative matrix factorization (NMF) is the problem of determining two non-negative low rank factors W and H, for the given input matrix A, such that A≈WH. NMF is a useful tool for many applications in different domains such as topic modeling in text mining, background separation in video analysis, and community detection in social networks. Despite its popularity in the data mining community, there is a lack of efficient parallel algorithms to solve the problem for big data sets. The main contribution of this work is a new, high-performance parallel computational framework for a broad class of NMF algorithms thatmore » iteratively solves alternating non-negative least squares (NLS) subproblems for W and H. It maintains the data and factor matrices in memory (distributed across processors), uses MPI for interprocessor communication, and, in the dense case, provably minimizes communication costs (under mild assumptions). The framework is flexible and able to leverage a variety of NMF and NLS algorithms, including Multiplicative Update, Hierarchical Alternating Least Squares, and Block Principal Pivoting. Our implementation allows us to benchmark and compare different algorithms on massive dense and sparse data matrices of size that spans from few hundreds of millions to billions. We demonstrate the scalability of our algorithm and compare it with baseline implementations, showing significant performance improvements. The code and the datasets used for conducting the experiments are available online.« less
Newman-Janis Algorithm Revisited
NASA Astrophysics Data System (ADS)
Brauer, O.; Camargo, H. A.; Socolovsky, M.
2015-01-01
The purpose of the present article is to show that the Newman-Janis and Newman et al algorithm used to derive the Kerr and Kerr-Newman metrics respectively, automatically leads to the extension of the initial non negative polar radial coordinate r to a cartesian coordinate running from to , thus introducing in a natural way the region in the above spacetimes. Using Boyer-Lindquist and ellipsoidal coordinates, we discuss some geometrical aspects of the positive and negative regions of , like horizons, ergosurfaces, and foliation structures
Assessment of metal ion concentration in water with structured feature selection.
Naula, Pekka; Airola, Antti; Pihlasalo, Sari; Montoya Perez, Ileana; Salakoski, Tapio; Pahikkala, Tapio
2017-10-01
We propose a cost-effective system for the determination of metal ion concentration in water, addressing a central issue in water resources management. The system combines novel luminometric label array technology with a machine learning algorithm that selects a minimal number of array reagents (modulators) and liquid sample dilutions, such that enable accurate quantification. The algorithm is able to identify the optimal modulators and sample dilutions leading to cost reductions since less manual labour and resources are needed. Inferring the ion detector involves a unique type of a structured feature selection problem, which we formalize in this paper. We propose a novel Cartesian greedy forward feature selection algorithm for solving the problem. The novel algorithm was evaluated in the concentration assessment of five metal ions and the performance was compared to two known feature selection approaches. The results demonstrate that the proposed system can assist in lowering the costs with minimal loss in accuracy. Copyright © 2017 Elsevier Ltd. All rights reserved.
Constraint programming based biomarker optimization.
Zhou, Manli; Luo, Youxi; Sun, Guoquan; Mai, Guoqin; Zhou, Fengfeng
2015-01-01
Efficient and intuitive characterization of biological big data is becoming a major challenge for modern bio-OMIC based scientists. Interactive visualization and exploration of big data is proven to be one of the successful solutions. Most of the existing feature selection algorithms do not allow the interactive inputs from users in the optimizing process of feature selection. This study investigates this question as fixing a few user-input features in the finally selected feature subset and formulates these user-input features as constraints for a programming model. The proposed algorithm, fsCoP (feature selection based on constrained programming), performs well similar to or much better than the existing feature selection algorithms, even with the constraints from both literature and the existing algorithms. An fsCoP biomarker may be intriguing for further wet lab validation, since it satisfies both the classification optimization function and the biomedical knowledge. fsCoP may also be used for the interactive exploration of bio-OMIC big data by interactively adding user-defined constraints for modeling.
Iterative algorithms for a non-linear inverse problem in atmospheric lidar
NASA Astrophysics Data System (ADS)
Denevi, Giulia; Garbarino, Sara; Sorrentino, Alberto
2017-08-01
We consider the inverse problem of retrieving aerosol extinction coefficients from Raman lidar measurements. In this problem the unknown and the data are related through the exponential of a linear operator, the unknown is non-negative and the data follow the Poisson distribution. Standard methods work on the log-transformed data and solve the resulting linear inverse problem, but neglect to take into account the noise statistics. In this study we show that proper modelling of the noise distribution can improve substantially the quality of the reconstructed extinction profiles. To achieve this goal, we consider the non-linear inverse problem with non-negativity constraint, and propose two iterative algorithms derived using the Karush-Kuhn-Tucker conditions. We validate the algorithms with synthetic and experimental data. As expected, the proposed algorithms out-perform standard methods in terms of sensitivity to noise and reliability of the estimated profile.
Minimum Sample Size Requirements for Mokken Scale Analysis
ERIC Educational Resources Information Center
Straat, J. Hendrik; van der Ark, L. Andries; Sijtsma, Klaas
2014-01-01
An automated item selection procedure in Mokken scale analysis partitions a set of items into one or more Mokken scales, if the data allow. Two algorithms are available that pursue the same goal of selecting Mokken scales of maximum length: Mokken's original automated item selection procedure (AISP) and a genetic algorithm (GA). Minimum…
Harmony Search as a Powerful Tool for Feature Selection in QSPR Study of the Drugs Lipophilicity.
Bahadori, Behnoosh; Atabati, Morteza
2017-01-01
Aims & Scope: Lipophilicity represents one of the most studied and most frequently used fundamental physicochemical properties. In the present work, harmony search (HS) algorithm is suggested to feature selection in quantitative structure-property relationship (QSPR) modeling to predict lipophilicity of neutral, acidic, basic and amphotheric drugs that were determined by UHPLC. Harmony search is a music-based metaheuristic optimization algorithm. It was affected by the observation that the aim of music is to search for a perfect state of harmony. Semi-empirical quantum-chemical calculations at AM1 level were used to find the optimum 3D geometry of the studied molecules and variant descriptors (1497 descriptors) were calculated by the Dragon software. The selected descriptors by harmony search algorithm (9 descriptors) were applied for model development using multiple linear regression (MLR). In comparison with other feature selection methods such as genetic algorithm and simulated annealing, harmony search algorithm has better results. The root mean square error (RMSE) with and without leave-one out cross validation (LOOCV) were obtained 0.417 and 0.302, respectively. The results were compared with those obtained from the genetic algorithm and simulated annealing methods and it showed that the HS is a helpful tool for feature selection with fine performance. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Attia, Khalid A M; Nassar, Mohammed W I; El-Zeiny, Mohamed B; Serag, Ahmed
2017-01-05
For the first time, a new variable selection method based on swarm intelligence namely firefly algorithm is coupled with three different multivariate calibration models namely, concentration residual augmented classical least squares, artificial neural network and support vector regression in UV spectral data. A comparative study between the firefly algorithm and the well-known genetic algorithm was developed. The discussion revealed the superiority of using this new powerful algorithm over the well-known genetic algorithm. Moreover, different statistical tests were performed and no significant differences were found between all the models regarding their predictabilities. This ensures that simpler and faster models were obtained without any deterioration of the quality of the calibration. Copyright © 2016 Elsevier B.V. All rights reserved.
Secin, Fernando P; Bianco, Fernando J; Cronin, Angel; Eastham, James A; Scardino, Peter T; Guillonneau, Bertrand; Vickers, Andrew J
2009-02-01
A publication on behalf of the European Society of Urological Oncology questioned the need for removing the seminal vesicles during radical prostatectomy in patients with prostate specific antigen less than 10 ng/ml except when biopsy Gleason score is greater than 6 or there are greater than 50% positive biopsy cores. We applied the European Society of Urological Oncology algorithm to an independent data set to determine its predictive value. Data on 1,406 men who underwent radical prostatectomy and seminal vesicle removal between 1998 and 2004 were analyzed. Patients with and without seminal vesicle invasion were classified as positive or negative according to the European Society of Urological Oncology algorithm. Of 90 cases with seminal vesicle invasion 81 (6.4%) were positive for 90% sensitivity, while 656 of 1,316 without seminal vesicle invasion were negative for 50% specificity. The negative predictive value was 98.6%. In decision analytic terms if the loss in health when seminal vesicles are invaded and not completely removed is considered at least 75 times greater than when removing them unnecessarily, the algorithm proposed by the European Society of Urological Oncology should not be used. Whether to use the European Society of Urological Oncology algorithm depends not only on its accuracy, but also on the relative clinical consequences of false-positive and false-negative results. Our threshold of 75 is an intermediate value that is difficult to interpret, given uncertainties about the benefit of seminal vesicle sparing and harm associated with untreated seminal vesicle invasion. We recommend more formal decision analysis to determine the clinical value of the European Society of Urological Oncology algorithm.
Retrievals of ice cloud microphysical properties of deep convective systems using radar measurements
NASA Astrophysics Data System (ADS)
Tian, Jingjing; Dong, Xiquan; Xi, Baike; Wang, Jingyu; Homeyer, Cameron R.; McFarquhar, Greg M.; Fan, Jiwen
2016-09-01
This study presents newly developed algorithms for retrieving ice cloud microphysical properties (ice water content (IWC) and median mass diameter (Dm)) for the stratiform rain and thick anvil regions of deep convective systems (DCSs) using Next Generation Radar (NEXRAD) reflectivity and empirical relationships from aircraft in situ measurements. A typical DCS case (20 May 2011) during the Midlatitude Continental Convective Clouds Experiment (MC3E) is selected as an example to demonstrate the 4-D retrievals. The vertical distributions of retrieved IWC are compared with previous studies and cloud-resolving model simulations. The statistics from six selected cases during MC3E show that the aircraft in situ derived IWC and Dm are 0.47 ± 0.29 g m-3 and 2.02 ± 1.3 mm, while the mean values of retrievals have a positive bias of 0.19 g m-3 (40%) and negative bias of 0.41 mm (20%), respectively. To evaluate the new retrieval algorithms, IWC and Dm are retrieved for other DCSs observed during the Bow Echo and Mesoscale Convective Vortex Experiment (BAMEX) using NEXRAD reflectivity and compared with aircraft in situ measurements. During BAMEX, a total of 63, 1 min collocated aircraft and radar samples are available for comparisons, and the averages of radar retrieved and aircraft in situ measured IWC values are 1.52 g m-3 and 1.25 g m-3 with a correlation of 0.55, and their averaged Dm values are 2.08 and 1.77 mm. In general, the new retrieval algorithms are suitable for continental DCSs during BAMEX, especially within stratiform rain and thick anvil regions.
Wang, ShaoPeng; Zhang, Yu-Hang; Huang, GuoHua; Chen, Lei; Cai, Yu-Dong
2017-01-01
Myristoylation is an important hydrophobic post-translational modification that is covalently bound to the amino group of Gly residues on the N-terminus of proteins. The many diverse functions of myristoylation on proteins, such as membrane targeting, signal pathway regulation and apoptosis, are largely due to the lipid modification, whereas abnormal or irregular myristoylation on proteins can lead to several pathological changes in the cell. To better understand the function of myristoylated sites and to correctly identify them in protein sequences, this study conducted a novel computational investigation on identifying myristoylation sites in protein sequences. A training dataset with 196 positive and 84 negative peptide segments were obtained. Four types of features derived from the peptide segments following the myristoylation sites were used to specify myristoylatedand non-myristoylated sites. Then, feature selection methods including maximum relevance and minimum redundancy (mRMR), incremental feature selection (IFS), and a machine learning algorithm (extreme learning machine method) were adopted to extract optimal features for the algorithm to identify myristoylation sites in protein sequences, thereby building an optimal prediction model. As a result, 41 key features were extracted and used to build an optimal prediction model. The effectiveness of the optimal prediction model was further validated by its performance on a test dataset. Furthermore, detailed analyses were also performed on the extracted 41 features to gain insight into the mechanism of myristoylation modification. This study provided a new computational method for identifying myristoylation sites in protein sequences. We believe that it can be a useful tool to predict myristoylation sites from protein sequences. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Ling, Julia; Templeton, Jeremy Alan
2015-08-04
Reynolds Averaged Navier Stokes (RANS) models are widely used in industry to predict fluid flows, despite their acknowledged deficiencies. Not only do RANS models often produce inaccurate flow predictions, but there are very limited diagnostics available to assess RANS accuracy for a given flow configuration. If experimental or higher fidelity simulation results are not available for RANS validation, there is no reliable method to evaluate RANS accuracy. This paper explores the potential of utilizing machine learning algorithms to identify regions of high RANS uncertainty. Three different machine learning algorithms were evaluated: support vector machines, Adaboost decision trees, and random forests.more » The algorithms were trained on a database of canonical flow configurations for which validated direct numerical simulation or large eddy simulation results were available, and were used to classify RANS results on a point-by-point basis as having either high or low uncertainty, based on the breakdown of specific RANS modeling assumptions. Classifiers were developed for three different basic RANS eddy viscosity model assumptions: the isotropy of the eddy viscosity, the linearity of the Boussinesq hypothesis, and the non-negativity of the eddy viscosity. It is shown that these classifiers are able to generalize to flows substantially different from those on which they were trained. As a result, feature selection techniques, model evaluation, and extrapolation detection are discussed in the context of turbulence modeling applications.« less
NASA Astrophysics Data System (ADS)
Foronda, Augusto; Ohta, Chikara; Tamaki, Hisashi
Dirty paper coding (DPC) is a strategy to achieve the region capacity of multiple input multiple output (MIMO) downlink channels and a DPC scheduler is throughput optimal if users are selected according to their queue states and current rates. However, DPC is difficult to implement in practical systems. One solution, zero-forcing beamforming (ZFBF) strategy has been proposed to achieve the same asymptotic sum rate capacity as that of DPC with an exhaustive search over the entire user set. Some suboptimal user group selection schedulers with reduced complexity based on ZFBF strategy (ZFBF-SUS) and proportional fair (PF) scheduling algorithm (PF-ZFBF) have also been proposed to enhance the throughput and fairness among the users, respectively. However, they are not throughput optimal, fairness and throughput decrease if each user queue length is different due to different users channel quality. Therefore, we propose two different scheduling algorithms: a throughput optimal scheduling algorithm (ZFBF-TO) and a reduced complexity scheduling algorithm (ZFBF-RC). Both are based on ZFBF strategy and, at every time slot, the scheduling algorithms have to select some users based on user channel quality, user queue length and orthogonality among users. Moreover, the proposed algorithms have to produce the rate allocation and power allocation for the selected users based on a modified water filling method. We analyze the schedulers complexity and numerical results show that ZFBF-RC provides throughput and fairness improvements compared to the ZFBF-SUS and PF-ZFBF scheduling algorithms.
NASA Astrophysics Data System (ADS)
de Siqueira e Oliveira, Fernanda SantAna; Giana, Hector Enrique; Silveira, Landulfo
2012-10-01
A method, based on Raman spectroscopy, for identification of different microorganisms involved in bacterial urinary tract infections has been proposed. Spectra were collected from different bacterial colonies (Gram-negative: Escherichia coli, Klebsiella pneumoniae, Proteus mirabilis, Pseudomonas aeruginosa and Enterobacter cloacae, and Gram-positive: Staphylococcus aureus and Enterococcus spp.), grown on culture medium (agar), using a Raman spectrometer with a fiber Raman probe (830 nm). Colonies were scraped from the agar surface and placed on an aluminum foil for Raman measurements. After preprocessing, spectra were submitted to a principal component analysis and Mahalanobis distance (PCA/MD) discrimination algorithm. We found that the mean Raman spectra of different bacterial species show similar bands, and S. aureus was well characterized by strong bands related to carotenoids. PCA/MD could discriminate Gram-positive bacteria with sensitivity and specificity of 100% and Gram-negative bacteria with sensitivity ranging from 58 to 88% and specificity ranging from 87% to 99%.
On polynomial preconditioning for indefinite Hermitian matrices
NASA Technical Reports Server (NTRS)
Freund, Roland W.
1989-01-01
The minimal residual method is studied combined with polynomial preconditioning for solving large linear systems (Ax = b) with indefinite Hermitian coefficient matrices (A). The standard approach for choosing the polynomial preconditioners leads to preconditioned systems which are positive definite. Here, a different strategy is studied which leaves the preconditioned coefficient matrix indefinite. More precisely, the polynomial preconditioner is designed to cluster the positive, resp. negative eigenvalues of A around 1, resp. around some negative constant. In particular, it is shown that such indefinite polynomial preconditioners can be obtained as the optimal solutions of a certain two parameter family of Chebyshev approximation problems. Some basic results are established for these approximation problems and a Remez type algorithm is sketched for their numerical solution. The problem of selecting the parameters such that the resulting indefinite polynomial preconditioners speeds up the convergence of minimal residual method optimally is also addressed. An approach is proposed based on the concept of asymptotic convergence factors. Finally, some numerical examples of indefinite polynomial preconditioners are given.
A new processing scheme for ultra-high resolution direct infusion mass spectrometry data
NASA Astrophysics Data System (ADS)
Zielinski, Arthur T.; Kourtchev, Ivan; Bortolini, Claudio; Fuller, Stephen J.; Giorio, Chiara; Popoola, Olalekan A. M.; Bogialli, Sara; Tapparo, Andrea; Jones, Roderic L.; Kalberer, Markus
2018-04-01
High resolution, high accuracy mass spectrometry is widely used to characterise environmental or biological samples with highly complex composition enabling the identification of chemical composition of often unknown compounds. Despite instrumental advancements, the accurate molecular assignment of compounds acquired in high resolution mass spectra remains time consuming and requires automated algorithms, especially for samples covering a wide mass range and large numbers of compounds. A new processing scheme is introduced implementing filtering methods based on element assignment, instrumental error, and blank subtraction. Optional post-processing incorporates common ion selection across replicate measurements and shoulder ion removal. The scheme allows both positive and negative direct infusion electrospray ionisation (ESI) and atmospheric pressure photoionisation (APPI) acquisition with the same programs. An example application to atmospheric organic aerosol samples using an Orbitrap mass spectrometer is reported for both ionisation techniques resulting in final spectra with 0.8% and 8.4% of the peaks retained from the raw spectra for APPI positive and ESI negative acquisition, respectively.
Automated frame selection process for high-resolution microendoscopy
NASA Astrophysics Data System (ADS)
Ishijima, Ayumu; Schwarz, Richard A.; Shin, Dongsuk; Mondrik, Sharon; Vigneswaran, Nadarajah; Gillenwater, Ann M.; Anandasabapathy, Sharmila; Richards-Kortum, Rebecca
2015-04-01
We developed an automated frame selection algorithm for high-resolution microendoscopy video sequences. The algorithm rapidly selects a representative frame with minimal motion artifact from a short video sequence, enabling fully automated image analysis at the point-of-care. The algorithm was evaluated by quantitative comparison of diagnostically relevant image features and diagnostic classification results obtained using automated frame selection versus manual frame selection. A data set consisting of video sequences collected in vivo from 100 oral sites and 167 esophageal sites was used in the analysis. The area under the receiver operating characteristic curve was 0.78 (automated selection) versus 0.82 (manual selection) for oral sites, and 0.93 (automated selection) versus 0.92 (manual selection) for esophageal sites. The implementation of fully automated high-resolution microendoscopy at the point-of-care has the potential to reduce the number of biopsies needed for accurate diagnosis of precancer and cancer in low-resource settings where there may be limited infrastructure and personnel for standard histologic analysis.
Computational Selection of Transcriptomics Experiments Improves Guilt-by-Association Analyses
Bhat, Prajwal; Yang, Haixuan; Bögre, László; Devoto, Alessandra; Paccanaro, Alberto
2012-01-01
The Guilt-by-Association (GBA) principle, according to which genes with similar expression profiles are functionally associated, is widely applied for functional analyses using large heterogeneous collections of transcriptomics data. However, the use of such large collections could hamper GBA functional analysis for genes whose expression is condition specific. In these cases a smaller set of condition related experiments should instead be used, but identifying such functionally relevant experiments from large collections based on literature knowledge alone is an impractical task. We begin this paper by analyzing, both from a mathematical and a biological point of view, why only condition specific experiments should be used in GBA functional analysis. We are able to show that this phenomenon is independent of the functional categorization scheme and of the organisms being analyzed. We then present a semi-supervised algorithm that can select functionally relevant experiments from large collections of transcriptomics experiments. Our algorithm is able to select experiments relevant to a given GO term, MIPS FunCat term or even KEGG pathways. We extensively test our algorithm on large dataset collections for yeast and Arabidopsis. We demonstrate that: using the selected experiments there is a statistically significant improvement in correlation between genes in the functional category of interest; the selected experiments improve GBA-based gene function prediction; the effectiveness of the selected experiments increases with annotation specificity; our algorithm can be successfully applied to GBA-based pathway reconstruction. Importantly, the set of experiments selected by the algorithm reflects the existing literature knowledge about the experiments. [A MATLAB implementation of the algorithm and all the data used in this paper can be downloaded from the paper website: http://www.paccanarolab.org/papers/CorrGene/]. PMID:22879875
Bim-mediated apoptosis is not necessary for thymic negative selection to ubiquitous self-antigens.
Hu, Qian; Sader, Alyssa; Parkman, Julia C; Baldwin, Troy A
2009-12-15
T cell education in the thymus is critical for establishing a functional, yet self-tolerant, T cell repertoire. Negative selection is a key process in enforcing self-tolerance. There are many questions that surround the mechanism of negative selection, but it is currently held that apoptosis initiated by Bim and/or Nur77 is critical for negative selection. Recent studies, however, have questioned the necessity of Bim in maintaining both central and peripheral T cell tolerance. To reconcile these apparently contradictory findings, we examined the role of Bim in negative selection in the well-characterized, physiological HY(cd4) mouse model. We found that while Bim expression was required for CD4(+)CD8(+) double-positive thymocyte apoptosis, it was not required for negative selection. Furthermore, Bim deficiency did not alter the frequency or affinity of male reactive cells that escape negative selection in an oligoclonal repertoire. Collectively, these studies indicate that negative selection occurs efficiently in the absence of apoptosis and suggest that the current paradigm of negative selection requiring apoptosis be revisited.
Jastrzebski, Marek; Kukla, Piotr; Fijorek, Kamil; Czarnecka, Danuta
2014-08-01
An accurate and universal method for diagnosis of biventricular (BiV) capture using a standard 12-lead electrocardiogram (ECG) would be useful for assessment of cardiac resynchronization therapy (CRT) patients. Our objective was to develop and validate such an ECG method for BiV capture diagnosis that would be independent of pacing lead positions-a major confounder that significantly influences the morphologies of paced QRS complexes. On the basis of an evaluation of 789 ECGs of 443 patients with heart failure and various right ventricular (RV) and left ventricular (LV) lead positions, the following algorithm was constructed and validated. BiV capture was diagnosed if the QRS in lead I was predominantly negative and either V1 QRS was predominantly positive or V6 QRS was of negative onset and predominantly negative (step 1), or if QRS complex duration was <160 ms (step 2). All other ECGs were classified as loss of LV capture. The algorithm showed good accuracy (93%), sensitivity (97%), and specificity (90%) for detection of loss of LV capture. The performance of the algorithm did not differ among apical, midseptal, and outflow tract RV lead positions and various LV lead positions. LV capture leaves diagnostic hallmarks in the fused BiV QRS related to different vectors of depolarization and more rapid depolarization of the ventricles. An accurate two-step ECG algorithm for BiV capture diagnosis was developed and validated. This algorithm is universally applicable to all CRT patients, regardless of the positions of the pacing leads. ©2014 Wiley Periodicals, Inc.
Comparison of Two Sepsis Recognition Methods in a Pediatric Emergency Department
Balamuth, Fran; Alpern, Elizabeth R.; Grundmeier, Robert W.; Chilutti, Marianne; Weiss, Scott L.; Fitzgerald, Julie C.; Hayes, Katie; Bilker, Warren; Lautenbach, Ebbing
2015-01-01
Objectives To compare the effectiveness of physician judgment and an electronic algorithmic alert to identify pediatric patients with severe sepsis/septic shock in a pediatric emergency department (ED). Methods This was an observational cohort study of patients older than 56 days with fever or hypothermia. All patients were evaluated for potential sepsis in real time by the ED clinical team. An electronic algorithmic alert was retrospectively applied to identify patients with potential sepsis independent of physician judgment. The primary outcome was the proportion of patients correctly identified with severe sepsis/septic shock defined by consensus criteria. Test characteristics were determined and receiver operating characteristic (ROC) curves were compared. Results Of 19,524 eligible patient visits, 88 patients developed consensus-confirmed severe sepsis or septic shock. Physician judgment identified 159, and the algorithmic alert identified 3,301 patients with potential sepsis. Physician judgment had sensitivity of 72.7% (95% CI = 72.1% to 73.4%) and specificity 99.5% (95% CI = 99.4% to 99.6%); the algorithmic alert had sensitivity 92.1% (95% CI = 91.7% to 92.4%), and specificity 83.4% (95% CI = 82.9% to 83.9%) for severe sepsis/septic shock. There was no significant difference in the area under the ROC curve for physician judgment (0.86, 95% CI = 0.81 to 0.91) or the algorithm (0.88, 95% CI = 0.85 to 0.91; p = 0.54). A combination method using either positive physician judgment or an algorithmic alert improved sensitivity to 96.6% and specificity to 83.3%. A sequential approach, in which positive identification by the algorithmic alert was then confirmed by physician judgment, achieved 68.2% sensitivity and 99.6% specificity. Positive and negative predictive values for physician judgment vs. algorithmic alert were 40.3% vs. 2.5% and 99.88 % vs. 99.96%, respectively. Conclusions The electronic algorithmic alert was more sensitive but less specific than physician judgment for recognition of pediatric severe sepsis and septic shock. These findings can help to guide institutions in selecting pediatric sepsis recognition methods based on institutional needs and priorities. PMID:26474032
Comparison of Two Sepsis Recognition Methods in a Pediatric Emergency Department.
Balamuth, Fran; Alpern, Elizabeth R; Grundmeier, Robert W; Chilutti, Marianne; Weiss, Scott L; Fitzgerald, Julie C; Hayes, Katie; Bilker, Warren; Lautenbach, Ebbing
2015-11-01
The objective was to compare the effectiveness of physician judgment and an electronic algorithmic alert to identify pediatric patients with severe sepsis/septic shock in a pediatric emergency department (ED). This was an observational cohort study of patients older than 56 days with fever or hypothermia. All patients were evaluated for potential sepsis in real time by the ED clinical team. An electronic algorithmic alert was retrospectively applied to identify patients with potential sepsis independent of physician judgment. The primary outcome was the proportion of patients correctly identified with severe sepsis/septic shock defined by consensus criteria. Test characteristics were determined and receiver operating characteristic (ROC) curves were compared. Of 19,524 eligible patient visits, 88 patients developed consensus-confirmed severe sepsis or septic shock. Physician judgment identified 159 and the algorithmic alert identified 3,301 patients with potential sepsis. Physician judgment had sensitivity of 72.7% (95% confidence interval [CI] = 72.1% to 73.4%) and specificity of 99.5% (95% CI = 99.4% to 99.6%); the algorithmic alert had sensitivity of 92.1% (95% CI = 91.7% to 92.4%) and specificity of 83.4% (95% CI = 82.9% to 83.9%) for severe sepsis/septic shock. There was no significant difference in the area under the ROC curve for physician judgment (0.86, 95% CI = 0.81 to 0.91) or the algorithm (0.88, 95% CI = 0.85 to 0.91; p = 0.54). A combination method using either positive physician judgment or an algorithmic alert improved sensitivity to 96.6% and specificity to 83.3%. A sequential approach, in which positive identification by the algorithmic alert was then confirmed by physician judgment, achieved 68.2% sensitivity and 99.6% specificity. Positive and negative predictive values for physician judgment versus algorithmic alert were 40.3% versus 2.5% and 99.88% versus 99.96%, respectively. The electronic algorithmic alert was more sensitive but less specific than physician judgment for recognition of pediatric severe sepsis and septic shock. These findings can help to guide institutions in selecting pediatric sepsis recognition methods based on institutional needs and priorities. © 2015 by the Society for Academic Emergency Medicine.
Variable screening via quantile partial correlation
Ma, Shujie; Tsai, Chih-Ling
2016-01-01
In quantile linear regression with ultra-high dimensional data, we propose an algorithm for screening all candidate variables and subsequently selecting relevant predictors. Specifically, we first employ quantile partial correlation for screening, and then we apply the extended Bayesian information criterion (EBIC) for best subset selection. Our proposed method can successfully select predictors when the variables are highly correlated, and it can also identify variables that make a contribution to the conditional quantiles but are marginally uncorrelated or weakly correlated with the response. Theoretical results show that the proposed algorithm can yield the sure screening set. By controlling the false selection rate, model selection consistency can be achieved theoretically. In practice, we proposed using EBIC for best subset selection so that the resulting model is screening consistent. Simulation studies demonstrate that the proposed algorithm performs well, and an empirical example is presented. PMID:28943683
Xie, Rui; Wan, Xianrong; Hong, Sheng; Yi, Jianxin
2017-06-14
The performance of a passive radar network can be greatly improved by an optimal radar network structure. Generally, radar network structure optimization consists of two aspects, namely the placement of receivers in suitable places and selection of appropriate illuminators. The present study investigates issues concerning the joint optimization of receiver placement and illuminator selection for a passive radar network. Firstly, the required radar cross section (RCS) for target detection is chosen as the performance metric, and the joint optimization model boils down to the partition p -center problem (PPCP). The PPCP is then solved by a proposed bisection algorithm. The key of the bisection algorithm lies in solving the partition set covering problem (PSCP), which can be solved by a hybrid algorithm developed by coupling the convex optimization with the greedy dropping algorithm. In the end, the performance of the proposed algorithm is validated via numerical simulations.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Faraj, Daniel A.
Algorithm selection for data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including specifications of a client, a context, and a task, endpoints coupled for data communications through the PAMI, including associating in the PAMI data communications algorithms and bit masks; receiving in an origin endpoint of the PAMI a collective instruction, the instruction specifying transmission of a data communications message from the origin endpoint to a target endpoint; constructing a bit mask for the received collective instruction; selecting, from among the associated algorithms and bit masks,more » a data communications algorithm in dependence upon the constructed bit mask; and executing the collective instruction, transmitting, according to the selected data communications algorithm from the origin endpoint to the target endpoint, the data communications message.« less
Faraj, Daniel A
2013-07-16
Algorithm selection for data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including specifications of a client, a context, and a task, endpoints coupled for data communications through the PAMI, including associating in the PAMI data communications algorithms and bit masks; receiving in an origin endpoint of the PAMI a collective instruction, the instruction specifying transmission of a data communications message from the origin endpoint to a target endpoint; constructing a bit mask for the received collective instruction; selecting, from among the associated algorithms and bit masks, a data communications algorithm in dependence upon the constructed bit mask; and executing the collective instruction, transmitting, according to the selected data communications algorithm from the origin endpoint to the target endpoint, the data communications message.
Optimisation algorithms for ECG data compression.
Haugland, D; Heber, J G; Husøy, J H
1997-07-01
The use of exact optimisation algorithms for compressing digital electrocardiograms (ECGs) is demonstrated. As opposed to traditional time-domain methods, which use heuristics to select a small subset of representative signal samples, the problem of selecting the subset is formulated in rigorous mathematical terms. This approach makes it possible to derive algorithms guaranteeing the smallest possible reconstruction error when a bounded selection of signal samples is interpolated. The proposed model resembles well-known network models and is solved by a cubic dynamic programming algorithm. When applied to standard test problems, the algorithm produces a compressed representation for which the distortion is about one-half of that obtained by traditional time-domain compression techniques at reasonable compression ratios. This illustrates that, in terms of the accuracy of decoded signals, existing time-domain heuristics for ECG compression may be far from what is theoretically achievable. The paper is an attempt to bridge this gap.
Medical image classification based on multi-scale non-negative sparse coding.
Zhang, Ruijie; Shen, Jian; Wei, Fushan; Li, Xiong; Sangaiah, Arun Kumar
2017-11-01
With the rapid development of modern medical imaging technology, medical image classification has become more and more important in medical diagnosis and clinical practice. Conventional medical image classification algorithms usually neglect the semantic gap problem between low-level features and high-level image semantic, which will largely degrade the classification performance. To solve this problem, we propose a multi-scale non-negative sparse coding based medical image classification algorithm. Firstly, Medical images are decomposed into multiple scale layers, thus diverse visual details can be extracted from different scale layers. Secondly, for each scale layer, the non-negative sparse coding model with fisher discriminative analysis is constructed to obtain the discriminative sparse representation of medical images. Then, the obtained multi-scale non-negative sparse coding features are combined to form a multi-scale feature histogram as the final representation for a medical image. Finally, SVM classifier is combined to conduct medical image classification. The experimental results demonstrate that our proposed algorithm can effectively utilize multi-scale and contextual spatial information of medical images, reduce the semantic gap in a large degree and improve medical image classification performance. Copyright © 2017 Elsevier B.V. All rights reserved.
ERIC Educational Resources Information Center
Moreau, Nancy
2008-01-01
This article discusses the impact of patents for computer algorithms in course management systems. Referring to historical documents and court cases, the positive and negative aspects of software patents are presented. The key argument is the accessibility to algorithms comprising a course management software program such as Blackboard. The…
a Genetic Algorithm Based on Sexual Selection for the Multidimensional 0/1 Knapsack Problems
NASA Astrophysics Data System (ADS)
Varnamkhasti, Mohammad Jalali; Lee, Lai Soon
In this study, a new technique is presented for choosing mate chromosomes during sexual selection in a genetic algorithm. The population is divided into groups of males and females. During the sexual selection, the female chromosome is selected by the tournament selection while the male chromosome is selected based on the hamming distance from the selected female chromosome, fitness value or active genes. Computational experiments are conducted on the proposed technique and the results are compared with some selection mechanisms commonly used for solving multidimensional 0/1 knapsack problems published in the literature.
Determinants of Recent HIV Infection Among Seattle-Area Men Who Have Sex with Men
Jenkins, Richard A.; Carey, James W.; Hutcheson, Rebecca; Thomas, Katherine K.; Stall, Ronald D.; White, Edward; Allen, Iris; Mejia, Roberto; Golden, Matthew R.
2009-01-01
Objectives. We sought to identify HIV-infection risk factors related to partner selection and sexual behaviors with those partners among men who have sex with men (MSM) in King County, Washington. Methods. Participants were recruited from HIV testing sites in the Seattle area. Recent HIV infection status was determined by the Serologic Testing Algorithm for Recent HIV Seroconversion (STARHS) or a self-reported previous HIV-negative test. Data on behaviors with 3 male partners were collected via computer-based self-interviews. Generalized estimating equation models identified partnership factors associated with recent infection. Results. We analyzed data from 32 HIV-positive MSM (58 partners) and 110 HIV-negative MSM (213 partners). In multivariate analysis, recent HIV infection was associated with meeting partners at bathhouses or sex clubs, bars or dance clubs, or online; methamphetamine use during unprotected anal intercourse; and unprotected anal intercourse, except with HIV-negative primary partners. Conclusions. There is a need to improve efforts to promote condom use with casual partners, regardless of their partner's HIV status. New strategies to control methamphetamine use in MSM and to reduce risk behaviors related to meeting partners at high-risk venues are needed. PMID:18445808
a Band Selection Method for High Precision Registration of Hyperspectral Image
NASA Astrophysics Data System (ADS)
Yang, H.; Li, X.
2018-04-01
During the registration of hyperspectral images and high spatial resolution images, too much bands in a hyperspectral image make it difficult to select bands with good registration performance. Terrible bands are possible to reduce matching speed and accuracy. To solve this problem, an algorithm based on Cram'er-Rao lower bound theory is proposed to select good matching bands in this paper. The algorithm applies the Cram'er-Rao lower bound theory to the study of registration accuracy, and selects good matching bands by CRLB parameters. Experiments show that the algorithm in this paper can choose good matching bands and provide better data for the registration of hyperspectral image and high spatial resolution image.
Efficient selection of tagging single-nucleotide polymorphisms in multiple populations.
Howie, Bryan N; Carlson, Christopher S; Rieder, Mark J; Nickerson, Deborah A
2006-08-01
Common genetic polymorphism may explain a portion of the heritable risk for common diseases, so considerable effort has been devoted to finding and typing common single-nucleotide polymorphisms (SNPs) in the human genome. Many SNPs show correlated genotypes, or linkage disequilibrium (LD), suggesting that only a subset of all SNPs (known as tagging SNPs, or tagSNPs) need to be genotyped for disease association studies. Based on the genetic differences that exist among human populations, most tagSNP sets are defined in a single population and applied only in populations that are closely related. To improve the efficiency of multi-population analyses, we have developed an algorithm called MultiPop-TagSelect that finds a near-minimal union of population-specific tagSNP sets across an arbitrary number of populations. We present this approach as an extension of LD-select, a tagSNP selection method that uses a greedy algorithm to group SNPs into bins based on their pairwise association patterns, although the MultiPop-TagSelect algorithm could be used with any SNP tagging approach that allows choices between nearly equivalent SNPs. We evaluate the algorithm by considering tagSNP selection in candidate-gene resequencing data and lower density whole-chromosome data. Our analysis reveals that an exhaustive search is often intractable, while the developed algorithm can quickly and reliably find near-optimal solutions even for difficult tagSNP selection problems. Using populations of African, Asian, and European ancestry, we also show that an optimal multi-population set of tagSNPs can be substantially smaller (up to 44%) than a typical set obtained through independent or sequential selection.
Chastek, Benjamin J; Oleen-Burkey, Merrikay; Lopez-Bresnahan, Maria V
2010-01-01
Relapse is a common measure of disease activity in relapsing-remitting multiple sclerosis (MS). The objective of this study was to test the content validity of an operational algorithm for detecting relapse in claims data. A claims-based relapse detection algorithm was tested by comparing its detection rate over a 1-year period with relapses identified based on medical chart review. According to the algorithm, MS patients in a US healthcare claims database who had either (1) a primary claim for MS during hospitalization or (2) a corticosteroid claim following a MS-related outpatient visit were designated as having a relapse. Patient charts were examined for explicit indication of relapse or care suggestive of relapse. Positive and negative predictive values were calculated. Medical charts were reviewed for 300 MS patients, half of whom had a relapse according to the algorithm. The claims-based criteria correctly classified 67.3% of patients with relapses (positive predictive value) and 70.0% of patients without relapses (negative predictive value; kappa 0.373: p < 0.001). Alternative algorithms did not improve on the predictive value of the operational algorithm. Limitations of the algorithm include lack of differentiation between relapsing-remitting MS and other types, and that it does not incorporate measures of function and disability. The claims-based algorithm appeared to successfully detect moderate-to-severe MS relapse. This validated definition can be applied to future claims-based MS studies.
She, Ji; Wang, Fei; Zhou, Jianjiang
2016-01-01
Radar networks are proven to have numerous advantages over traditional monostatic and bistatic radar. With recent developments, radar networks have become an attractive platform due to their low probability of intercept (LPI) performance for target tracking. In this paper, a joint sensor selection and power allocation algorithm for multiple-target tracking in a radar network based on LPI is proposed. It is found that this algorithm can minimize the total transmitted power of a radar network on the basis of a predetermined mutual information (MI) threshold between the target impulse response and the reflected signal. The MI is required by the radar network system to estimate target parameters, and it can be calculated predictively with the estimation of target state. The optimization problem of sensor selection and power allocation, which contains two variables, is non-convex and it can be solved by separating power allocation problem from sensor selection problem. To be specific, the optimization problem of power allocation can be solved by using the bisection method for each sensor selection scheme. Also, the optimization problem of sensor selection can be solved by a lower complexity algorithm based on the allocated powers. According to the simulation results, it can be found that the proposed algorithm can effectively reduce the total transmitted power of a radar network, which can be conducive to improving LPI performance. PMID:28009819
Feature selection and back-projection algorithms for nonline-of-sight laser-gated viewing
NASA Astrophysics Data System (ADS)
Laurenzis, Martin; Velten, Andreas
2014-11-01
We discuss new approaches to analyze laser-gated viewing data for nonline-of-sight vision with a frame-to-frame back-projection as well as feature selection algorithms. Although first back-projection approaches use time transients for each pixel, our method has the ability to calculate the projection of imaging data on the voxel space for each frame. Further, different data analysis algorithms and their sequential application were studied with the aim of identifying and selecting signals from different target positions. A slight modification of commonly used filters leads to a powerful selection of local maximum values. It is demonstrated that the choice of the filter has an impact on the selectivity i.e., multiple target detection as well as on the localization precision.
Speech Emotion Feature Selection Method Based on Contribution Analysis Algorithm of Neural Network
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang Xiaojia; Mao Qirong; Zhan Yongzhao
There are many emotion features. If all these features are employed to recognize emotions, redundant features may be existed. Furthermore, recognition result is unsatisfying and the cost of feature extraction is high. In this paper, a method to select speech emotion features based on contribution analysis algorithm of NN is presented. The emotion features are selected by using contribution analysis algorithm of NN from the 95 extracted features. Cluster analysis is applied to analyze the effectiveness for the features selected, and the time of feature extraction is evaluated. Finally, 24 emotion features selected are used to recognize six speech emotions.more » The experiments show that this method can improve the recognition rate and the time of feature extraction.« less
NASA Astrophysics Data System (ADS)
Zhou, Yali; Zhang, Qizhi; Yin, Yixin
2015-05-01
In this paper, active control of impulsive noise with symmetric α-stable (SαS) distribution is studied. A general step-size normalized filtered-x Least Mean Square (FxLMS) algorithm is developed based on the analysis of existing algorithms, and the Gaussian distribution function is used to normalize the step size. Compared with existing algorithms, the proposed algorithm needs neither the parameter selection and thresholds estimation nor the process of cost function selection and complex gradient computation. Computer simulations have been carried out to suggest that the proposed algorithm is effective for attenuating SαS impulsive noise, and then the proposed algorithm has been implemented in an experimental ANC system. Experimental results show that the proposed scheme has good performance for SαS impulsive noise attenuation.
Stassi, D; Dutta, S; Ma, H; Soderman, A; Pazzani, D; Gros, E; Okerlund, D; Schmidt, T G
2016-01-01
Reconstructing a low-motion cardiac phase is expected to improve coronary artery visualization in coronary computed tomography angiography (CCTA) exams. This study developed an automated algorithm for selecting the optimal cardiac phase for CCTA reconstruction. The algorithm uses prospectively gated, single-beat, multiphase data made possible by wide cone-beam imaging. The proposed algorithm differs from previous approaches because the optimal phase is identified based on vessel image quality (IQ) directly, compared to previous approaches that included motion estimation and interphase processing. Because there is no processing of interphase information, the algorithm can be applied to any sampling of image phases, making it suited for prospectively gated studies where only a subset of phases are available. An automated algorithm was developed to select the optimal phase based on quantitative IQ metrics. For each reconstructed slice at each reconstructed phase, an image quality metric was calculated based on measures of circularity and edge strength of through-plane vessels. The image quality metric was aggregated across slices, while a metric of vessel-location consistency was used to ignore slices that did not contain through-plane vessels. The algorithm performance was evaluated using two observer studies. Fourteen single-beat cardiac CT exams (Revolution CT, GE Healthcare, Chalfont St. Giles, UK) reconstructed at 2% intervals were evaluated for best systolic (1), diastolic (6), or systolic and diastolic phases (7) by three readers and the algorithm. Pairwise inter-reader and reader-algorithm agreement was evaluated using the mean absolute difference (MAD) and concordance correlation coefficient (CCC) between the reader and algorithm-selected phases. A reader-consensus best phase was determined and compared to the algorithm selected phase. In cases where the algorithm and consensus best phases differed by more than 2%, IQ was scored by three readers using a five point Likert scale. There was no statistically significant difference between inter-reader and reader-algorithm agreement for either MAD or CCC metrics (p > 0.1). The algorithm phase was within 2% of the consensus phase in 15/21 of cases. The average absolute difference between consensus and algorithm best phases was 2.29% ± 2.47%, with a maximum difference of 8%. Average image quality scores for the algorithm chosen best phase were 4.01 ± 0.65 overall, 3.33 ± 1.27 for right coronary artery (RCA), 4.50 ± 0.35 for left anterior descending (LAD) artery, and 4.50 ± 0.35 for left circumflex artery (LCX). Average image quality scores for the consensus best phase were 4.11 ± 0.54 overall, 3.44 ± 1.03 for RCA, 4.39 ± 0.39 for LAD, and 4.50 ± 0.18 for LCX. There was no statistically significant difference (p > 0.1) between the image quality scores of the algorithm phase and the consensus phase. The proposed algorithm was statistically equivalent to a reader in selecting an optimal cardiac phase for CCTA exams. When reader and algorithm phases differed by >2%, image quality as rated by blinded readers was statistically equivalent. By detecting the optimal phase for CCTA reconstruction, the proposed algorithm is expected to improve coronary artery visualization in CCTA exams.
Targeted Feature Detection for Data-Dependent Shotgun Proteomics
2017-01-01
Label-free quantification of shotgun LC–MS/MS data is the prevailing approach in quantitative proteomics but remains computationally nontrivial. The central data analysis step is the detection of peptide-specific signal patterns, called features. Peptide quantification is facilitated by associating signal intensities in features with peptide sequences derived from MS2 spectra; however, missing values due to imperfect feature detection are a common problem. A feature detection approach that directly targets identified peptides (minimizing missing values) but also offers robustness against false-positive features (by assigning meaningful confidence scores) would thus be highly desirable. We developed a new feature detection algorithm within the OpenMS software framework, leveraging ideas and algorithms from the OpenSWATH toolset for DIA/SRM data analysis. Our software, FeatureFinderIdentification (“FFId”), implements a targeted approach to feature detection based on information from identified peptides. This information is encoded in an MS1 assay library, based on which ion chromatogram extraction and detection of feature candidates are carried out. Significantly, when analyzing data from experiments comprising multiple samples, our approach distinguishes between “internal” and “external” (inferred) peptide identifications (IDs) for each sample. On the basis of internal IDs, two sets of positive (true) and negative (decoy) feature candidates are defined. A support vector machine (SVM) classifier is then trained to discriminate between the sets and is subsequently applied to the “uncertain” feature candidates from external IDs, facilitating selection and confidence scoring of the best feature candidate for each peptide. This approach also enables our algorithm to estimate the false discovery rate (FDR) of the feature selection step. We validated FFId based on a public benchmark data set, comprising a yeast cell lysate spiked with protein standards that provide a known ground-truth. The algorithm reached almost complete (>99%) quantification coverage for the full set of peptides identified at 1% FDR (PSM level). Compared with other software solutions for label-free quantification, this is an outstanding result, which was achieved at competitive quantification accuracy and reproducibility across replicates. The FDR for the feature selection was estimated at a low 1.5% on average per sample (3% for features inferred from external peptide IDs). The FFId software is open-source and freely available as part of OpenMS (www.openms.org). PMID:28673088
Targeted Feature Detection for Data-Dependent Shotgun Proteomics.
Weisser, Hendrik; Choudhary, Jyoti S
2017-08-04
Label-free quantification of shotgun LC-MS/MS data is the prevailing approach in quantitative proteomics but remains computationally nontrivial. The central data analysis step is the detection of peptide-specific signal patterns, called features. Peptide quantification is facilitated by associating signal intensities in features with peptide sequences derived from MS2 spectra; however, missing values due to imperfect feature detection are a common problem. A feature detection approach that directly targets identified peptides (minimizing missing values) but also offers robustness against false-positive features (by assigning meaningful confidence scores) would thus be highly desirable. We developed a new feature detection algorithm within the OpenMS software framework, leveraging ideas and algorithms from the OpenSWATH toolset for DIA/SRM data analysis. Our software, FeatureFinderIdentification ("FFId"), implements a targeted approach to feature detection based on information from identified peptides. This information is encoded in an MS1 assay library, based on which ion chromatogram extraction and detection of feature candidates are carried out. Significantly, when analyzing data from experiments comprising multiple samples, our approach distinguishes between "internal" and "external" (inferred) peptide identifications (IDs) for each sample. On the basis of internal IDs, two sets of positive (true) and negative (decoy) feature candidates are defined. A support vector machine (SVM) classifier is then trained to discriminate between the sets and is subsequently applied to the "uncertain" feature candidates from external IDs, facilitating selection and confidence scoring of the best feature candidate for each peptide. This approach also enables our algorithm to estimate the false discovery rate (FDR) of the feature selection step. We validated FFId based on a public benchmark data set, comprising a yeast cell lysate spiked with protein standards that provide a known ground-truth. The algorithm reached almost complete (>99%) quantification coverage for the full set of peptides identified at 1% FDR (PSM level). Compared with other software solutions for label-free quantification, this is an outstanding result, which was achieved at competitive quantification accuracy and reproducibility across replicates. The FDR for the feature selection was estimated at a low 1.5% on average per sample (3% for features inferred from external peptide IDs). The FFId software is open-source and freely available as part of OpenMS ( www.openms.org ).
Zapata, Luis; Pich, Oriol; Serrano, Luis; Kondrashov, Fyodor A; Ossowski, Stephan; Schaefer, Martin H
2018-05-31
Natural selection shapes cancer genomes. Previous studies used signatures of positive selection to identify genes driving malignant transformation. However, the contribution of negative selection against somatic mutations that affect essential tumor functions or specific domains remains a controversial topic. Here, we analyze 7546 individual exomes from 26 tumor types from TCGA data to explore the portion of the cancer exome under negative selection. Although we find most of the genes neutrally evolving in a pan-cancer framework, we identify essential cancer genes and immune-exposed protein regions under significant negative selection. Moreover, our simulations suggest that the amount of negative selection is underestimated. We therefore choose an empirical approach to identify genes, functions, and protein regions under negative selection. We find that expression and mutation status of negatively selected genes is indicative of patient survival. Processes that are most strongly conserved are those that play fundamental cellular roles such as protein synthesis, glucose metabolism, and molecular transport. Intriguingly, we observe strong signals of selection in the immunopeptidome and proteins controlling peptide exposition, highlighting the importance of immune surveillance evasion. Additionally, tumor type-specific immune activity correlates with the strength of negative selection on human epitopes. In summary, our results show that negative selection is a hallmark of cell essentiality and immune response in cancer. The functional domains identified could be exploited therapeutically, ultimately allowing for the development of novel cancer treatments.
Logsdon, Benjamin A.; Mezey, Jason
2010-01-01
Cellular gene expression measurements contain regulatory information that can be used to discover novel network relationships. Here, we present a new algorithm for network reconstruction powered by the adaptive lasso, a theoretically and empirically well-behaved method for selecting the regulatory features of a network. Any algorithms designed for network discovery that make use of directed probabilistic graphs require perturbations, produced by either experiments or naturally occurring genetic variation, to successfully infer unique regulatory relationships from gene expression data. Our approach makes use of appropriately selected cis-expression Quantitative Trait Loci (cis-eQTL), which provide a sufficient set of independent perturbations for maximum network resolution. We compare the performance of our network reconstruction algorithm to four other approaches: the PC-algorithm, QTLnet, the QDG algorithm, and the NEO algorithm, all of which have been used to reconstruct directed networks among phenotypes leveraging QTL. We show that the adaptive lasso can outperform these algorithms for networks of ten genes and ten cis-eQTL, and is competitive with the QDG algorithm for networks with thirty genes and thirty cis-eQTL, with rich topologies and hundreds of samples. Using this novel approach, we identify unique sets of directed relationships in Saccharomyces cerevisiae when analyzing genome-wide gene expression data for an intercross between a wild strain and a lab strain. We recover novel putative network relationships between a tyrosine biosynthesis gene (TYR1), and genes involved in endocytosis (RCY1), the spindle checkpoint (BUB2), sulfonate catabolism (JLP1), and cell-cell communication (PRM7). Our algorithm provides a synthesis of feature selection methods and graphical model theory that has the potential to reveal new directed regulatory relationships from the analysis of population level genetic and gene expression data. PMID:21152011
Trellises and Trellis-Based Decoding Algorithms for Linear Block Codes. Part 3
NASA Technical Reports Server (NTRS)
Lin, Shu
1998-01-01
Decoding algorithms based on the trellis representation of a code (block or convolutional) drastically reduce decoding complexity. The best known and most commonly used trellis-based decoding algorithm is the Viterbi algorithm. It is a maximum likelihood decoding algorithm. Convolutional codes with the Viterbi decoding have been widely used for error control in digital communications over the last two decades. This chapter is concerned with the application of the Viterbi decoding algorithm to linear block codes. First, the Viterbi algorithm is presented. Then, optimum sectionalization of a trellis to minimize the computational complexity of a Viterbi decoder is discussed and an algorithm is presented. Some design issues for IC (integrated circuit) implementation of a Viterbi decoder are considered and discussed. Finally, a new decoding algorithm based on the principle of compare-select-add is presented. This new algorithm can be applied to both block and convolutional codes and is more efficient than the conventional Viterbi algorithm based on the add-compare-select principle. This algorithm is particularly efficient for rate 1/n antipodal convolutional codes and their high-rate punctured codes. It reduces computational complexity by one-third compared with the Viterbi algorithm.
Muyoyeta, Monde; Moyo, Maureen; Kasese, Nkatya; Ndhlovu, Mapopa; Milimo, Deborah; Mwanza, Winfridah; Kapata, Nathan; Schaap, Albertus; Godfrey Faussett, Peter; Ayles, Helen
2015-01-01
Background The current cost of Xpert MTB RIF (Xpert) consumables is such that algorithms are needed to select which patients to prioritise for testing with Xpert. Objective To evaluate two algorithms for prioritisation of Xpert in primary health care settings in a high TB and HIV burden setting. Method Consecutive, presumptive TB patients with a cough of any duration were offered either Xpert or Fluorescence microscopy (FM) test depending on their CXR score or HIV status. In one facility, sputa from patients with an abnormal CXR were tested with Xpert and those with a normal CXR were tested with FM (“CXR algorithm”). CXR was scored automatically using a Computer Aided Diagnosis (CAD) program. In the other facility, patients who were HIV positive were tested using Xpert and those who were HIV negative were tested with FM (“HIV algorithm”). Results Of 9482 individuals pre-screened with CXR, Xpert detected TB in 2090/6568 (31.8%) with an abnormal CXR, and FM was AFB positive in 8/2455 (0.3%) with a normal CXR. Of 4444 pre-screened with HIV, Xpert detected TB in 508/2265 (22.4%) HIV positive and FM was AFB positive in 212/1920 (11.0%) in HIV negative individuals. The notification rate of new bacteriologically confirmed TB increased; from 366 to 620/ 100,000/yr and from 145 to 261/100,000/yr at the CXR and HIV algorithm sites respectively. The median time to starting TB treatment at the CXR site compared to the HIV algorithm site was; 1(IQR 1-3 days) and 3 (2-5 days) (p<0.0001) respectively. Conclusion Use of Xpert in a resource-limited setting at primary care level in conjunction with pre-screening tests reduced the number of Xpert tests performed. The routine use of Xpert resulted in additional cases of confirmed TB patients starting treatment. However, there was no increase in absolute numbers of patients starting TB treatment. Same day diagnosis and treatment commencement was achieved for both bacteriologically confirmed and empirically diagnosed patients where Xpert was used in conjunction with CXR. PMID:26030301
NASA Astrophysics Data System (ADS)
Clarkin, T. J.; Kasprzyk, J. R.; Raseman, W. J.; Herman, J. D.
2015-12-01
This study contributes a diagnostic assessment of multiobjective evolutionary algorithm (MOEA) search on a set of water resources problem formulations with different configurations of constraints. Unlike constraints in classical optimization modeling, constraints within MOEA simulation-optimization represent limits on acceptable performance that delineate whether solutions within the search problem are feasible. Constraints are relevant because of the emergent pressures on water resources systems: increasing public awareness of their sustainability, coupled with regulatory pressures on water management agencies. In this study, we test several state-of-the-art MOEAs that utilize restricted tournament selection for constraint handling on varying configurations of water resources planning problems. For example, a problem that has no constraints on performance levels will be compared with a problem with several severe constraints, and a problem with constraints that have less severe values on the constraint thresholds. One such problem, Lower Rio Grande Valley (LRGV) portfolio planning, has been solved with a suite of constraints that ensure high reliability, low cost variability, and acceptable performance in a single year severe drought. But to date, it is unclear whether or not the constraints are negatively affecting MOEAs' ability to solve the problem effectively. Two categories of results are explored. The first category uses control maps of algorithm performance to determine if the algorithm's performance is sensitive to user-defined parameters. The second category uses run-time performance metrics to determine the time required for the algorithm to reach sufficient levels of convergence and diversity on the solution sets. Our work exploring the effect of constraints will better enable practitioners to define MOEA problem formulations for real-world systems, especially when stakeholders are concerned with achieving fixed levels of performance according to one or more metrics.
Hou, Tingjun; Xu, Xiaojie
2002-12-01
In this study, the relationships between the brain-blood concentration ratio of 96 structurally diverse compounds with a large number of structurally derived descriptors were investigated. The linear models were based on molecular descriptors that can be calculated for any compound simply from a knowledge of its molecular structure. The linear correlation coefficients of the models were optimized by genetic algorithms (GAs), and the descriptors used in the linear models were automatically selected from 27 structurally derived descriptors. The GA optimizations resulted in a group of linear models with three or four molecular descriptors with good statistical significance. The change of descriptor use as the evolution proceeds demonstrates that the octane/water partition coefficient and the partial negative solvent-accessible surface area multiplied by the negative charge are crucial to brain-blood barrier permeability. Moreover, we found that the predictions using multiple QSPR models from GA optimization gave quite good results in spite of the diversity of structures, which was better than the predictions using the best single model. The predictions for the two external sets with 37 diverse compounds using multiple QSPR models indicate that the best linear models with four descriptors are sufficiently effective for predictive use. Considering the ease of computation of the descriptors, the linear models may be used as general utilities to screen the blood-brain barrier partitioning of drugs in a high-throughput fashion.
Multi-period project portfolio selection under risk considerations and stochastic income
NASA Astrophysics Data System (ADS)
Tofighian, Ali Asghar; Moezzi, Hamid; Khakzar Barfuei, Morteza; Shafiee, Mahmood
2018-02-01
This paper deals with multi-period project portfolio selection problem. In this problem, the available budget is invested on the best portfolio of projects in each period such that the net profit is maximized. We also consider more realistic assumptions to cover wider range of applications than those reported in previous studies. A novel mathematical model is presented to solve the problem, considering risks, stochastic incomes, and possibility of investing extra budget in each time period. Due to the complexity of the problem, an effective meta-heuristic method hybridized with a local search procedure is presented to solve the problem. The algorithm is based on genetic algorithm (GA), which is a prominent method to solve this type of problems. The GA is enhanced by a new solution representation and well selected operators. It also is hybridized with a local search mechanism to gain better solution in shorter time. The performance of the proposed algorithm is then compared with well-known algorithms, like basic genetic algorithm (GA), particle swarm optimization (PSO), and electromagnetism-like algorithm (EM-like) by means of some prominent indicators. The computation results show the superiority of the proposed algorithm in terms of accuracy, robustness and computation time. At last, the proposed algorithm is wisely combined with PSO to improve the computing time considerably.
Comparison of Multispot EIA with Western blot for confirmatory serodiagnosis of HIV.
Torian, Lucia V; Forgione, Lisa A; Punsalang, Amado E; Pirillo, Robert E; Oleszko, William R
2011-12-01
Recent improvements in the sensitivity of immunoassays (IA) used for HIV screening, coupled with increasing recognition of the importance of rapid point-of-care testing, have led to proposals to adjust the algorithm for serodiagnosis of HIV so that screening and confirmation can be performed using a dual or triple IA sequence that does not require Western blotting for confirmation. One IA that has been proposed as a second or confirmatory test is the Bio-Rad Multispot(®) Rapid HIV-1/HIV-2 Test. This test would have the added advantage of differentiating between HIV-1 and HIV-2 antibodies. To compare the sensitivity and type-specificity of an algorithm combining a 3rd generation enzyme immunoassay (EIA) followed by a confirmatory Multispot with the conventional algorithm that combines a 3rd generation EIA (Bio-Rad GS HIV-1/HIV-2 Plus O EIA) followed by confirmatory Western blot (Bio-Rad GS HIV-1 WB). 8760 serum specimens submitted for HIV testing to the New York City Public Health Laboratory between May 22, 2007, and April 30, 2010, tested repeatedly positive on 3rd generation HIV-1-2+O EIA screening and received parallel confirmatory testing by WB and Multispot (MS). 8678/8760 (99.1%) specimens tested WB-positive; 82 (0.9%) tested WB-negative or indeterminate (IND). 8690/8760 specimens (99.2%) tested MS-positive, of which 14 (17.1%) had been classified as negative or IND by WB. Among the HIV-1 WB-positive specimens, MS classified 26 (0.29%) as HIV-2. Among the HIV-1 WB negative and IND, MS detected 12 HIV-2. MS detected an additional 14 HIV-1 infections among WB negative or IND specimens, differentiated 26 HIV-1 WB positives as HIV-2, and detected 12 additional HIV-2 infections among WB negative/IND. A dual 3rd generation EIA algorithm incorporating MS had equivalent HIV-1 sensitivity to the 3rd generation EIA-WB algorithm and had the added advantage of detecting 12 HIV-2 specimens that were not HIV-1 WB cross-reactors. In this series an algorithm using EIA followed by MS would have resulted in the expedited referral of 38 specimens for HIV-2 testing and 40 specimens for nucleic acid confirmation. Further testing using a combined gold standard of nucleic acid detection and WB is needed to calculate specificity and validate the substitution of MS for WB in the diagnostic algorithm used by a large public health laboratory. Copyright © 2011. Published by Elsevier B.V.
USE OF POPULATION VIABILITY ANALYSIS AND RESERVE SELECTION ALGORITHMS IN REGIONAL CONSERVATION PLANS
Current reserve selection algorithms have difficulty evaluating connectivity and other factors
necessary to conserve wide-ranging species in developing landscapes. Conversely, population viability analyses may incorporate detailed demographic data but often lack sufficient spa...
Entropy-Based Search Algorithm for Experimental Design
NASA Astrophysics Data System (ADS)
Malakar, N. K.; Knuth, K. H.
2011-03-01
The scientific method relies on the iterated processes of inference and inquiry. The inference phase consists of selecting the most probable models based on the available data; whereas the inquiry phase consists of using what is known about the models to select the most relevant experiment. Optimizing inquiry involves searching the parameterized space of experiments to select the experiment that promises, on average, to be maximally informative. In the case where it is important to learn about each of the model parameters, the relevance of an experiment is quantified by Shannon entropy of the distribution of experimental outcomes predicted by a probable set of models. If the set of potential experiments is described by many parameters, we must search this high-dimensional entropy space. Brute force search methods will be slow and computationally expensive. We present an entropy-based search algorithm, called nested entropy sampling, to select the most informative experiment for efficient experimental design. This algorithm is inspired by Skilling's nested sampling algorithm used in inference and borrows the concept of a rising threshold while a set of experiment samples are maintained. We demonstrate that this algorithm not only selects highly relevant experiments, but also is more efficient than brute force search. Such entropic search techniques promise to greatly benefit autonomous experimental design.
NASA Astrophysics Data System (ADS)
Zhou, Peng; Zhang, Xi; Sun, Weifeng; Dai, Yongshou; Wan, Yong
2018-01-01
An algorithm based on time-frequency analysis is proposed to select an imaging time window for the inverse synthetic aperture radar imaging of ships. An appropriate range bin is selected to perform the time-frequency analysis after radial motion compensation. The selected range bin is that with the maximum mean amplitude among the range bins whose echoes are confirmed to be contributed by a dominant scatter. The criterion for judging whether the echoes of a range bin are contributed by a dominant scatter is key to the proposed algorithm and is therefore described in detail. When the first range bin that satisfies the judgment criterion is found, a sequence composed of the frequencies that have the largest amplitudes in every moment's time-frequency spectrum corresponding to this range bin is employed to calculate the length and the center moment of the optimal imaging time window. Experiments performed with simulation data and real data show the effectiveness of the proposed algorithm, and comparisons between the proposed algorithm and the image contrast-based algorithm (ICBA) are provided. Similar image contrast and lower entropy are acquired using the proposed algorithm as compared with those values when using the ICBA.
Stochastic subset selection for learning with kernel machines.
Rhinelander, Jason; Liu, Xiaoping P
2012-06-01
Kernel machines have gained much popularity in applications of machine learning. Support vector machines (SVMs) are a subset of kernel machines and generalize well for classification, regression, and anomaly detection tasks. The training procedure for traditional SVMs involves solving a quadratic programming (QP) problem. The QP problem scales super linearly in computational effort with the number of training samples and is often used for the offline batch processing of data. Kernel machines operate by retaining a subset of observed data during training. The data vectors contained within this subset are referred to as support vectors (SVs). The work presented in this paper introduces a subset selection method for the use of kernel machines in online, changing environments. Our algorithm works by using a stochastic indexing technique when selecting a subset of SVs when computing the kernel expansion. The work described here is novel because it separates the selection of kernel basis functions from the training algorithm used. The subset selection algorithm presented here can be used in conjunction with any online training technique. It is important for online kernel machines to be computationally efficient due to the real-time requirements of online environments. Our algorithm is an important contribution because it scales linearly with the number of training samples and is compatible with current training techniques. Our algorithm outperforms standard techniques in terms of computational efficiency and provides increased recognition accuracy in our experiments. We provide results from experiments using both simulated and real-world data sets to verify our algorithm.
Behets, F. M.; Miller, W. C.; Cohen, M. S.
2001-01-01
The syndromic treatment of gonococcal and chlamydial infections in women seeking primary care in clinics where resources are scarce, as recommended by WHO and implemented in many developing countries, necessitates a balance to be struck between overtreatment and undertreatment. The present paper identifies factors that are relevant to the selection of specific strategies for syndromic treatment in the above circumstances. Among them are the general aspects of decision-making and caveats concerning the rational decision-making approach. The positive and negative implications are outlined of providing or withholding treatment following a specific algorithm with a given accuracy to detect infection, i.e. sensitivity, specificity and predictive values. Other decision-making considerations that are identified are related to implementation and include the stability of risk factors with regard to time, space and the implementer, acceptability by stakeholders, and environmental constraints. There is a need to consider empirically developed treatment algorithms as a basis for policy discourse, to be evaluated together with the evidence, alternatives and arguments by the stakeholders. PMID:11731816
Liu, Hongfang; Maxwell, Kara N.; Pathak, Jyotishman; Zhang, Rui
2018-01-01
Abstract Precision medicine is at the forefront of biomedical research. Cancer registries provide rich perspectives and electronic health records (EHRs) are commonly utilized to gather additional clinical data elements needed for translational research. However, manual annotation is resource‐intense and not readily scalable. Informatics‐based phenotyping presents an ideal solution, but perspectives obtained can be impacted by both data source and algorithm selection. We derived breast cancer (BC) receptor status phenotypes from structured and unstructured EHR data using rule‐based algorithms, including natural language processing (NLP). Overall, the use of NLP increased BC receptor status coverage by 39.2% from 69.1% with structured medication information alone. Using all available EHR data, estrogen receptor‐positive BC cases were ascertained with high precision (P = 0.976) and recall (R = 0.987) compared with gold standard chart‐reviewed patients. However, status negation (R = 0.591) decreased 40.2% when relying on structured medications alone. Using multiple EHR data types (and thorough understanding of the perspectives offered) are necessary to derive robust EHR‐based precision medicine phenotypes. PMID:29084368
Closed Loop Guidance Trade Study for Space Launch System Block-1B Vehicle
NASA Technical Reports Server (NTRS)
Von der Porten, Paul; Ahmad, Naeem; Hawkins, Matt
2018-01-01
NASA is currently building the Space Launch System (SLS) Block-1 launch vehicle for the Exploration Mission 1 (EM-1) test flight. The design of the next evolution of SLS, Block-1B, is well underway. The Block-1B vehicle is more capable overall than Block-1; however, the relatively low thrust-to-weight ratio of the Exploration Upper Stage (EUS) presents a challenge to the Powered Explicit Guidance (PEG) algorithm used by Block-1. To handle the long burn durations (on the order of 1000 seconds) of EUS missions, two algorithms were examined. An alternative algorithm, OPGUID, was introduced, while modifications were made to PEG. A trade study was conducted to select the guidance algorithm for future SLS vehicles. The chosen algorithm needs to support a wide variety of mission operations: ascent burns to LEO, apogee raise burns, trans-lunar injection burns, hyperbolic Earth departure burns, and contingency disposal burns using the Reaction Control System (RCS). Additionally, the algorithm must be able to respond to a single engine failure scenario. Each algorithm was scored based on pre-selected criteria, including insertion accuracy, algorithmic complexity and robustness, extensibility for potential future missions, and flight heritage. Monte Carlo analysis was used to select the final algorithm. This paper covers the design criteria, approach, and results of this trade study, showing impacts and considerations when adapting launch vehicle guidance algorithms to a broader breadth of in-space operations.
Accommodation of practical constraints by a linear programming jet select. [for Space Shuttle
NASA Technical Reports Server (NTRS)
Bergmann, E.; Weiler, P.
1983-01-01
An experimental spacecraft control system will be incorporated into the Space Shuttle flight software and exercised during a forthcoming mission to evaluate its performance and handling qualities. The control system incorporates a 'phase space' control law to generate rate change requests and a linear programming jet select to compute jet firings. Posed as a linear programming problem, jet selection must represent the rate change request as a linear combination of jet acceleration vectors where the coefficients are the jet firing times, while minimizing the fuel expended in satisfying that request. This problem is solved in real time using a revised Simplex algorithm. In order to implement the jet selection algorithm in the Shuttle flight control computer, it was modified to accommodate certain practical features of the Shuttle such as limited computer throughput, lengthy firing times, and a large number of control jets. To the authors' knowledge, this is the first such application of linear programming. It was made possible by careful consideration of the jet selection problem in terms of the properties of linear programming and the Simplex algorithm. These modifications to the jet select algorithm may by useful for the design of reaction controlled spacecraft.
Bamber, A I; Fitzsimmons, K; Cunniffe, J G; Beasor, C C; Mackintosh, C A; Hobbs, G
2012-01-01
The laboratory diagnosis of Clostridium difficile infection (CDI) needs to be accurate and timely to ensure optimal patient management, infection control and reliable surveillance. Three methods are evaluated using 810 consecutive stool samples against toxigenic culture: CDT TOX A/B Premier enzyme immunoassay (EIA) kit (Meridian Bioscience, Europe), Premier EIA for C. difficile glutamate dehydrogenase (GDH) (Meridian Bioscience, Europe) and the Illumigene kit (Meridian Bioscience, Europe), both individually and within combined testing algorithms. The study revealed that the CDT TOX A/B Premier EIA gave rise to false-positive and false-negative results and demonstrated poor sensitivity (56.47%), compared to Premier EIA for C. difficile GDH (97.65%), suggesting this GDH EIA can be a useful negative screening method. Results for the Illumigene assay alone showed sensitivity, specificity, negative predictive value (NPV) and positive predictive value (PPV) of 91.57%, 98.07%, 99.03% and 84.44%, respectively. A two-stage algorithm using Premier EIA for C. difficile GDH/Illumigene assay yielded superior results compared with other testing algorithms (91.57%, 98.07%, 99.03% and 84.44%, respectively), mirroring the Illumigene performance. However, Illumigene is approximately half the cost of current polymerase chain reaction (PCR) methods, has a rapid turnaround time and requires no specialised skill base, making it an attractive alternative to assays such as the Xpert C. difficile assay (Cepheid, Sunnyvale, CA). A three-stage algorithm offered no improvement and would hamper workflow.
An Efficient Next Hop Selection Algorithm for Multi-Hop Body Area Networks
Ayatollahitafti, Vahid; Ngadi, Md Asri; Mohamad Sharif, Johan bin; Abdullahi, Mohammed
2016-01-01
Body Area Networks (BANs) consist of various sensors which gather patient’s vital signs and deliver them to doctors. One of the most significant challenges faced, is the design of an energy-efficient next hop selection algorithm to satisfy Quality of Service (QoS) requirements for different healthcare applications. In this paper, a novel efficient next hop selection algorithm is proposed in multi-hop BANs. This algorithm uses the minimum hop count and a link cost function jointly in each node to choose the best next hop node. The link cost function includes the residual energy, free buffer size, and the link reliability of the neighboring nodes, which is used to balance the energy consumption and to satisfy QoS requirements in terms of end to end delay and reliability. Extensive simulation experiments were performed to evaluate the efficiency of the proposed algorithm using the NS-2 simulator. Simulation results show that our proposed algorithm provides significant improvement in terms of energy consumption, number of packets forwarded, end to end delay and packet delivery ratio compared to the existing routing protocol. PMID:26771586
Community-aware task allocation for social networked multiagent systems.
Wang, Wanyuan; Jiang, Yichuan
2014-09-01
In this paper, we propose a novel community-aware task allocation model for social networked multiagent systems (SN-MASs), where the agent' cooperation domain is constrained in community and each agent can negotiate only with its intracommunity member agents. Under such community-aware scenarios, we prove that it remains NP-hard to maximize system overall profit. To solve this problem effectively, we present a heuristic algorithm that is composed of three phases: 1) task selection: select the desirable task to be allocated preferentially; 2) allocation to community: allocate the selected task to communities based on a significant task-first heuristics; and 3) allocation to agent: negotiate resources for the selected task based on a nonoverlap agent-first and breadth-first resource negotiation mechanism. Through the theoretical analyses and experiments, the advantages of our presented heuristic algorithm and community-aware task allocation model are validated. 1) Our presented heuristic algorithm performs very closely to the benchmark exponential brute-force optimal algorithm and the network flow-based greedy algorithm in terms of system overall profit in small-scale applications. Moreover, in the large-scale applications, the presented heuristic algorithm achieves approximately the same overall system profit, but significantly reduces the computational load compared with the greedy algorithm. 2) Our presented community-aware task allocation model reduces the system communication cost compared with the previous global-aware task allocation model and improves the system overall profit greatly compared with the previous local neighbor-aware task allocation model.
2013-01-01
intelligently selecting waveform parameters using adaptive algorithms. The adaptive algorithms optimize the waveform parameters based on (1) the EM...the environment. 15. SUBJECT TERMS cognitive radar, adaptive sensing, spectrum sensing, multi-objective optimization, genetic algorithms, machine...detection and classification block diagram. .........................................................6 Figure 5. Genetic algorithm block diagram
Research on sparse feature matching of improved RANSAC algorithm
NASA Astrophysics Data System (ADS)
Kong, Xiangsi; Zhao, Xian
2018-04-01
In this paper, a sparse feature matching method based on modified RANSAC algorithm is proposed to improve the precision and speed. Firstly, the feature points of the images are extracted using the SIFT algorithm. Then, the image pair is matched roughly by generating SIFT feature descriptor. At last, the precision of image matching is optimized by the modified RANSAC algorithm,. The RANSAC algorithm is improved from three aspects: instead of the homography matrix, this paper uses the fundamental matrix generated by the 8 point algorithm as the model; the sample is selected by a random block selecting method, which ensures the uniform distribution and the accuracy; adds sequential probability ratio test(SPRT) on the basis of standard RANSAC, which cut down the overall running time of the algorithm. The experimental results show that this method can not only get higher matching accuracy, but also greatly reduce the computation and improve the matching speed.
NASA Astrophysics Data System (ADS)
Laurenzis, Martin; Velten, Andreas
2014-10-01
In the present paper, we discuss new approaches to analyze laser gated viewing data for non-line-of-sight vision with a novel frame-to-frame back projection as well as feature selection algorithms. While first back projection approaches use time transients for each pixel, our new method has the ability to calculate the projection of imaging data on the obscured voxel space for each frame. Further, four different data analysis algorithms were studied with the aim to identify and select signals from different target positions. A slight modification of commonly used filters leads to powerful selection of local maximum values. It is demonstrated that the choice of the filter has impact on the selectivity i.e. multiple target detection as well as on the localization precision.
Chen, Zhaoxue; Yu, Haizhong; Chen, Hao
2013-12-01
To solve the problem of traditional K-means clustering in which initial clustering centers are selected randomly, we proposed a new K-means segmentation algorithm based on robustly selecting 'peaks' standing for White Matter, Gray Matter and Cerebrospinal Fluid in multi-peaks gray histogram of MRI brain image. The new algorithm takes gray value of selected histogram 'peaks' as the initial K-means clustering center and can segment the MRI brain image into three parts of tissue more effectively, accurately, steadily and successfully. Massive experiments have proved that the proposed algorithm can overcome many shortcomings caused by traditional K-means clustering method such as low efficiency, veracity, robustness and time consuming. The histogram 'peak' selecting idea of the proposed segmentootion method is of more universal availability.
Torshabi, Ahmad Esmaili; Nankali, Saber
2016-01-01
In external beam radiotherapy, one of the most common and reliable methods for patient geometrical setup and/or predicting the tumor location is use of external markers. In this study, the main challenging issue is increasing the accuracy of patient setup by investigating external markers location. Since the location of each external marker may yield different patient setup accuracy, it is important to assess different locations of external markers using appropriate selective algorithms. To do this, two commercially available algorithms entitled a) canonical correlation analysis (CCA) and b) principal component analysis (PCA) were proposed as input selection algorithms. They work on the basis of maximum correlation coefficient and minimum variance between given datasets. The proposed input selection algorithms work in combination with an adaptive neuro‐fuzzy inference system (ANFIS) as a correlation model to give patient positioning information as output. Our proposed algorithms provide input file of ANFIS correlation model accurately. The required dataset for this study was prepared by means of a NURBS‐based 4D XCAT anthropomorphic phantom that can model the shape and structure of complex organs in human body along with motion information of dynamic organs. Moreover, a database of four real patients undergoing radiation therapy for lung cancers was utilized in this study for validation of proposed strategy. Final analyzed results demonstrate that input selection algorithms can reasonably select specific external markers from those areas of the thorax region where root mean square error (RMSE) of ANFIS model has minimum values at that given area. It is also found that the selected marker locations lie closely in those areas where surface point motion has a large amplitude and a high correlation. PACS number(s): 87.55.km, 87.55.N PMID:27929479
Hash Bit Selection for Nearest Neighbor Search.
Xianglong Liu; Junfeng He; Shih-Fu Chang
2017-11-01
To overcome the barrier of storage and computation when dealing with gigantic-scale data sets, compact hashing has been studied extensively to approximate the nearest neighbor search. Despite the recent advances, critical design issues remain open in how to select the right features, hashing algorithms, and/or parameter settings. In this paper, we address these by posing an optimal hash bit selection problem, in which an optimal subset of hash bits are selected from a pool of candidate bits generated by different features, algorithms, or parameters. Inspired by the optimization criteria used in existing hashing algorithms, we adopt the bit reliability and their complementarity as the selection criteria that can be carefully tailored for hashing performance in different tasks. Then, the bit selection solution is discovered by finding the best tradeoff between search accuracy and time using a modified dynamic programming method. To further reduce the computational complexity, we employ the pairwise relationship among hash bits to approximate the high-order independence property, and formulate it as an efficient quadratic programming method that is theoretically equivalent to the normalized dominant set problem in a vertex- and edge-weighted graph. Extensive large-scale experiments have been conducted under several important application scenarios of hash techniques, where our bit selection framework can achieve superior performance over both the naive selection methods and the state-of-the-art hashing algorithms, with significant accuracy gains ranging from 10% to 50%, relatively.
Automatic parameter selection for feature-based multi-sensor image registration
NASA Astrophysics Data System (ADS)
DelMarco, Stephen; Tom, Victor; Webb, Helen; Chao, Alan
2006-05-01
Accurate image registration is critical for applications such as precision targeting, geo-location, change-detection, surveillance, and remote sensing. However, the increasing volume of image data is exceeding the current capacity of human analysts to perform manual registration. This image data glut necessitates the development of automated approaches to image registration, including algorithm parameter value selection. Proper parameter value selection is crucial to the success of registration techniques. The appropriate algorithm parameters can be highly scene and sensor dependent. Therefore, robust algorithm parameter value selection approaches are a critical component of an end-to-end image registration algorithm. In previous work, we developed a general framework for multisensor image registration which includes feature-based registration approaches. In this work we examine the problem of automated parameter selection. We apply the automated parameter selection approach of Yitzhaky and Peli to select parameters for feature-based registration of multisensor image data. The approach consists of generating multiple feature-detected images by sweeping over parameter combinations and using these images to generate estimated ground truth. The feature-detected images are compared to the estimated ground truth images to generate ROC points associated with each parameter combination. We develop a strategy for selecting the optimal parameter set by choosing the parameter combination corresponding to the optimal ROC point. We present numerical results showing the effectiveness of the approach using registration of collected SAR data to reference EO data.
NASA Astrophysics Data System (ADS)
Khehra, Baljit Singh; Pharwaha, Amar Partap Singh
2017-04-01
Ductal carcinoma in situ (DCIS) is one type of breast cancer. Clusters of microcalcifications (MCCs) are symptoms of DCIS that are recognized by mammography. Selection of robust features vector is the process of selecting an optimal subset of features from a large number of available features in a given problem domain after the feature extraction and before any classification scheme. Feature selection reduces the feature space that improves the performance of classifier and decreases the computational burden imposed by using many features on classifier. Selection of an optimal subset of features from a large number of available features in a given problem domain is a difficult search problem. For n features, the total numbers of possible subsets of features are 2n. Thus, selection of an optimal subset of features problem belongs to the category of NP-hard problems. In this paper, an attempt is made to find the optimal subset of MCCs features from all possible subsets of features using genetic algorithm (GA), particle swarm optimization (PSO) and biogeography-based optimization (BBO). For simulation, a total of 380 benign and malignant MCCs samples have been selected from mammogram images of DDSM database. A total of 50 features extracted from benign and malignant MCCs samples are used in this study. In these algorithms, fitness function is correct classification rate of classifier. Support vector machine is used as a classifier. From experimental results, it is also observed that the performance of PSO-based and BBO-based algorithms to select an optimal subset of features for classifying MCCs as benign or malignant is better as compared to GA-based algorithm.
Feature Selection and Effective Classifiers.
ERIC Educational Resources Information Center
Deogun, Jitender S.; Choubey, Suresh K.; Raghavan, Vijay V.; Sever, Hayri
1998-01-01
Develops and analyzes four algorithms for feature selection in the context of rough set methodology. Experimental results confirm the expected relationship between the time complexity of these algorithms and the classification accuracy of the resulting upper classifiers. When compared, results of upper classifiers perform better than lower…
Suner, Aslı; Karakülah, Gökhan; Dicle, Oğuz
2014-01-01
Statistical hypothesis testing is an essential component of biological and medical studies for making inferences and estimations from the collected data in the study; however, the misuse of statistical tests is widely common. In order to prevent possible errors in convenient statistical test selection, it is currently possible to consult available test selection algorithms developed for various purposes. However, the lack of an algorithm presenting the most common statistical tests used in biomedical research in a single flowchart causes several problems such as shifting users among the algorithms, poor decision support in test selection and lack of satisfaction of potential users. Herein, we demonstrated a unified flowchart; covers mostly used statistical tests in biomedical domain, to provide decision aid to non-statistician users while choosing the appropriate statistical test for testing their hypothesis. We also discuss some of the findings while we are integrating the flowcharts into each other to develop a single but more comprehensive decision algorithm.
Ad Hoc Access Gateway Selection Algorithm
NASA Astrophysics Data System (ADS)
Jie, Liu
With the continuous development of mobile communication technology, Ad Hoc access network has become a hot research, Ad Hoc access network nodes can be used to expand capacity of multi-hop communication range of mobile communication system, even business adjacent to the community, improve edge data rates. For mobile nodes in Ad Hoc network to internet, internet communications in the peer nodes must be achieved through the gateway. Therefore, the key Ad Hoc Access Networks will focus on the discovery gateway, as well as gateway selection in the case of multi-gateway and handover problems between different gateways. This paper considers the mobile node and the gateway, based on the average number of hops from an average access time and the stability of routes, improved gateway selection algorithm were proposed. An improved gateway selection algorithm, which mainly considers the algorithm can improve the access time of Ad Hoc nodes and the continuity of communication between the gateways, were proposed. This can improve the quality of communication across the network.
Dai, Qiong; Cheng, Jun-Hu; Sun, Da-Wen; Zeng, Xin-An
2015-01-01
There is an increased interest in the applications of hyperspectral imaging (HSI) for assessing food quality, safety, and authenticity. HSI provides abundance of spatial and spectral information from foods by combining both spectroscopy and imaging, resulting in hundreds of contiguous wavebands for each spatial position of food samples, also known as the curse of dimensionality. It is desirable to employ feature selection algorithms for decreasing computation burden and increasing predicting accuracy, which are especially relevant in the development of online applications. Recently, a variety of feature selection algorithms have been proposed that can be categorized into three groups based on the searching strategy namely complete search, heuristic search and random search. This review mainly introduced the fundamental of each algorithm, illustrated its applications in hyperspectral data analysis in the food field, and discussed the advantages and disadvantages of these algorithms. It is hoped that this review should provide a guideline for feature selections and data processing in the future development of hyperspectral imaging technique in foods.
Efficient least angle regression for identification of linear-in-the-parameters models
Beach, Thomas H.; Rezgui, Yacine
2017-01-01
Least angle regression, as a promising model selection method, differentiates itself from conventional stepwise and stagewise methods, in that it is neither too greedy nor too slow. It is closely related to L1 norm optimization, which has the advantage of low prediction variance through sacrificing part of model bias property in order to enhance model generalization capability. In this paper, we propose an efficient least angle regression algorithm for model selection for a large class of linear-in-the-parameters models with the purpose of accelerating the model selection process. The entire algorithm works completely in a recursive manner, where the correlations between model terms and residuals, the evolving directions and other pertinent variables are derived explicitly and updated successively at every subset selection step. The model coefficients are only computed when the algorithm finishes. The direct involvement of matrix inversions is thereby relieved. A detailed computational complexity analysis indicates that the proposed algorithm possesses significant computational efficiency, compared with the original approach where the well-known efficient Cholesky decomposition is involved in solving least angle regression. Three artificial and real-world examples are employed to demonstrate the effectiveness, efficiency and numerical stability of the proposed algorithm. PMID:28293140
Optimal input selection for neural machine interfaces predicting multiple non-explicit outputs.
Krepkovich, Eileen T; Perreault, Eric J
2008-01-01
This study implemented a novel algorithm that optimally selects inputs for neural machine interface (NMI) devices intended to control multiple outputs and evaluated its performance on systems lacking explicit output. NMIs often incorporate signals from multiple physiological sources and provide predictions for multidimensional control, leading to multiple-input multiple-output systems. Further, NMIs often are used with subjects who have motor disabilities and thus lack explicit motor outputs. Our algorithm was tested on simulated multiple-input multiple-output systems and on electromyogram and kinematic data collected from healthy subjects performing arm reaches. Effects of output noise in simulated systems indicated that the algorithm could be useful for systems with poor estimates of the output states, as is true for systems lacking explicit motor output. To test efficacy on physiological data, selection was performed using inputs from one subject and outputs from a different subject. Selection was effective for these cases, again indicating that this algorithm will be useful for predictions where there is no motor output, as often is the case for disabled subjects. Further, prediction results generalized for different movement types not used for estimation. These results demonstrate the efficacy of this algorithm for the development of neural machine interfaces.
Žuvela, Petar; Liu, J Jay; Macur, Katarzyna; Bączek, Tomasz
2015-10-06
In this work, performance of five nature-inspired optimization algorithms, genetic algorithm (GA), particle swarm optimization (PSO), artificial bee colony (ABC), firefly algorithm (FA), and flower pollination algorithm (FPA), was compared in molecular descriptor selection for development of quantitative structure-retention relationship (QSRR) models for 83 peptides that originate from eight model proteins. The matrix with 423 descriptors was used as input, and QSRR models based on selected descriptors were built using partial least squares (PLS), whereas root mean square error of prediction (RMSEP) was used as a fitness function for their selection. Three performance criteria, prediction accuracy, computational cost, and the number of selected descriptors, were used to evaluate the developed QSRR models. The results show that all five variable selection methods outperform interval PLS (iPLS), sparse PLS (sPLS), and the full PLS model, whereas GA is superior because of its lowest computational cost and higher accuracy (RMSEP of 5.534%) with a smaller number of variables (nine descriptors). The GA-QSRR model was validated initially through Y-randomization. In addition, it was successfully validated with an external testing set out of 102 peptides originating from Bacillus subtilis proteomes (RMSEP of 22.030%). Its applicability domain was defined, from which it was evident that the developed GA-QSRR exhibited strong robustness. All the sources of the model's error were identified, thus allowing for further application of the developed methodology in proteomics.
Selected-node stochastic simulation algorithm
NASA Astrophysics Data System (ADS)
Duso, Lorenzo; Zechner, Christoph
2018-04-01
Stochastic simulations of biochemical networks are of vital importance for understanding complex dynamics in cells and tissues. However, existing methods to perform such simulations are associated with computational difficulties and addressing those remains a daunting challenge to the present. Here we introduce the selected-node stochastic simulation algorithm (snSSA), which allows us to exclusively simulate an arbitrary, selected subset of molecular species of a possibly large and complex reaction network. The algorithm is based on an analytical elimination of chemical species, thereby avoiding explicit simulation of the associated chemical events. These species are instead described continuously in terms of statistical moments derived from a stochastic filtering equation, resulting in a substantial speedup when compared to Gillespie's stochastic simulation algorithm (SSA). Moreover, we show that statistics obtained via snSSA profit from a variance reduction, which can significantly lower the number of Monte Carlo samples needed to achieve a certain performance. We demonstrate the algorithm using several biological case studies for which the simulation time could be reduced by orders of magnitude.
Efficient Constant-Time Complexity Algorithm for Stochastic Simulation of Large Reaction Networks.
Thanh, Vo Hong; Zunino, Roberto; Priami, Corrado
2017-01-01
Exact stochastic simulation is an indispensable tool for a quantitative study of biochemical reaction networks. The simulation realizes the time evolution of the model by randomly choosing a reaction to fire and update the system state according to a probability that is proportional to the reaction propensity. Two computationally expensive tasks in simulating large biochemical networks are the selection of next reaction firings and the update of reaction propensities due to state changes. We present in this work a new exact algorithm to optimize both of these simulation bottlenecks. Our algorithm employs the composition-rejection on the propensity bounds of reactions to select the next reaction firing. The selection of next reaction firings is independent of the number reactions while the update of propensities is skipped and performed only when necessary. It therefore provides a favorable scaling for the computational complexity in simulating large reaction networks. We benchmark our new algorithm with the state of the art algorithms available in literature to demonstrate its applicability and efficiency.
Ebtehaj, Isa; Bonakdari, Hossein
2014-01-01
The existence of sediments in wastewater greatly affects the performance of the sewer and wastewater transmission systems. Increased sedimentation in wastewater collection systems causes problems such as reduced transmission capacity and early combined sewer overflow. The article reviews the performance of the genetic algorithm (GA) and imperialist competitive algorithm (ICA) in minimizing the target function (mean square error of observed and predicted Froude number). To study the impact of bed load transport parameters, using four non-dimensional groups, six different models have been presented. Moreover, the roulette wheel selection method is used to select the parents. The ICA with root mean square error (RMSE) = 0.007, mean absolute percentage error (MAPE) = 3.5% show better results than GA (RMSE = 0.007, MAPE = 5.6%) for the selected model. All six models return better results than the GA. Also, the results of these two algorithms were compared with multi-layer perceptron and existing equations.
PCA-LBG-based algorithms for VQ codebook generation
NASA Astrophysics Data System (ADS)
Tsai, Jinn-Tsong; Yang, Po-Yuan
2015-04-01
Vector quantisation (VQ) codebooks are generated by combining principal component analysis (PCA) algorithms with Linde-Buzo-Gray (LBG) algorithms. All training vectors are grouped according to the projected values of the principal components. The PCA-LBG-based algorithms include (1) PCA-LBG-Median, which selects the median vector of each group, (2) PCA-LBG-Centroid, which adopts the centroid vector of each group, and (3) PCA-LBG-Random, which randomly selects a vector of each group. The LBG algorithm finds a codebook based on the better vectors sent to an initial codebook by the PCA. The PCA performs an orthogonal transformation to convert a set of potentially correlated variables into a set of variables that are not linearly correlated. Because the orthogonal transformation efficiently distinguishes test image vectors, the proposed PCA-LBG-based algorithm is expected to outperform conventional algorithms in designing VQ codebooks. The experimental results confirm that the proposed PCA-LBG-based algorithms indeed obtain better results compared to existing methods reported in the literature.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stassi, D.; Ma, H.; Schmidt, T. G., E-mail: taly.gilat-schmidt@marquette.edu
Purpose: Reconstructing a low-motion cardiac phase is expected to improve coronary artery visualization in coronary computed tomography angiography (CCTA) exams. This study developed an automated algorithm for selecting the optimal cardiac phase for CCTA reconstruction. The algorithm uses prospectively gated, single-beat, multiphase data made possible by wide cone-beam imaging. The proposed algorithm differs from previous approaches because the optimal phase is identified based on vessel image quality (IQ) directly, compared to previous approaches that included motion estimation and interphase processing. Because there is no processing of interphase information, the algorithm can be applied to any sampling of image phases, makingmore » it suited for prospectively gated studies where only a subset of phases are available. Methods: An automated algorithm was developed to select the optimal phase based on quantitative IQ metrics. For each reconstructed slice at each reconstructed phase, an image quality metric was calculated based on measures of circularity and edge strength of through-plane vessels. The image quality metric was aggregated across slices, while a metric of vessel-location consistency was used to ignore slices that did not contain through-plane vessels. The algorithm performance was evaluated using two observer studies. Fourteen single-beat cardiac CT exams (Revolution CT, GE Healthcare, Chalfont St. Giles, UK) reconstructed at 2% intervals were evaluated for best systolic (1), diastolic (6), or systolic and diastolic phases (7) by three readers and the algorithm. Pairwise inter-reader and reader-algorithm agreement was evaluated using the mean absolute difference (MAD) and concordance correlation coefficient (CCC) between the reader and algorithm-selected phases. A reader-consensus best phase was determined and compared to the algorithm selected phase. In cases where the algorithm and consensus best phases differed by more than 2%, IQ was scored by three readers using a five point Likert scale. Results: There was no statistically significant difference between inter-reader and reader-algorithm agreement for either MAD or CCC metrics (p > 0.1). The algorithm phase was within 2% of the consensus phase in 15/21 of cases. The average absolute difference between consensus and algorithm best phases was 2.29% ± 2.47%, with a maximum difference of 8%. Average image quality scores for the algorithm chosen best phase were 4.01 ± 0.65 overall, 3.33 ± 1.27 for right coronary artery (RCA), 4.50 ± 0.35 for left anterior descending (LAD) artery, and 4.50 ± 0.35 for left circumflex artery (LCX). Average image quality scores for the consensus best phase were 4.11 ± 0.54 overall, 3.44 ± 1.03 for RCA, 4.39 ± 0.39 for LAD, and 4.50 ± 0.18 for LCX. There was no statistically significant difference (p > 0.1) between the image quality scores of the algorithm phase and the consensus phase. Conclusions: The proposed algorithm was statistically equivalent to a reader in selecting an optimal cardiac phase for CCTA exams. When reader and algorithm phases differed by >2%, image quality as rated by blinded readers was statistically equivalent. By detecting the optimal phase for CCTA reconstruction, the proposed algorithm is expected to improve coronary artery visualization in CCTA exams.« less
O'Boyle, Noel M; Palmer, David S; Nigsch, Florian; Mitchell, John Bo
2008-10-29
We present a novel feature selection algorithm, Winnowing Artificial Ant Colony (WAAC), that performs simultaneous feature selection and model parameter optimisation for the development of predictive quantitative structure-property relationship (QSPR) models. The WAAC algorithm is an extension of the modified ant colony algorithm of Shen et al. (J Chem Inf Model 2005, 45: 1024-1029). We test the ability of the algorithm to develop a predictive partial least squares model for the Karthikeyan dataset (J Chem Inf Model 2005, 45: 581-590) of melting point values. We also test its ability to perform feature selection on a support vector machine model for the same dataset. Starting from an initial set of 203 descriptors, the WAAC algorithm selected a PLS model with 68 descriptors which has an RMSE on an external test set of 46.6 degrees C and R2 of 0.51. The number of components chosen for the model was 49, which was close to optimal for this feature selection. The selected SVM model has 28 descriptors (cost of 5, epsilon of 0.21) and an RMSE of 45.1 degrees C and R2 of 0.54. This model outperforms a kNN model (RMSE of 48.3 degrees C, R2 of 0.47) for the same data and has similar performance to a Random Forest model (RMSE of 44.5 degrees C, R2 of 0.55). However it is much less prone to bias at the extremes of the range of melting points as shown by the slope of the line through the residuals: -0.43 for WAAC/SVM, -0.53 for Random Forest. With a careful choice of objective function, the WAAC algorithm can be used to optimise machine learning and regression models that suffer from overfitting. Where model parameters also need to be tuned, as is the case with support vector machine and partial least squares models, it can optimise these simultaneously. The moving probabilities used by the algorithm are easily interpreted in terms of the best and current models of the ants, and the winnowing procedure promotes the removal of irrelevant descriptors.
Richter, S. S.; Beekmann, S. E.; Croco, J. L.; Diekema, D. J.; Koontz, F. P.; Pfaller, M. A.; Doern, G. V.
2002-01-01
An algorithm was implemented in the clinical microbiology laboratory to assess the clinical significance of organisms that are often considered contaminants (coagulase-negative staphylococci, aerobic and anaerobic diphtheroids, Micrococcus spp., Bacillus spp., and viridans group streptococci) when isolated from blood cultures. From 25 August 1999 through 30 April 2000, 12,374 blood cultures were submitted to the University of Iowa Clinical Microbiology Laboratory. Potential contaminants were recovered from 495 of 1,040 positive blood cultures. If one or more additional blood cultures were obtained within ±48 h and all were negative, the isolate was considered a contaminant. Antimicrobial susceptibility testing (AST) of these probable contaminants was not performed unless requested. If no additional blood cultures were submitted or there were additional positive blood cultures (within ±48 h), a pathology resident gathered patient clinical information and made a judgment regarding the isolate's significance. To evaluate the accuracy of these algorithm-based assignments, a nurse epidemiologist in approximately 60% of the cases performed a retrospective chart review. Agreement between the findings of the retrospective chart review and the automatic classification of the isolates with additional negative blood cultures as probable contaminants occurred among 85.8% of 225 isolates. In response to physician requests, AST had been performed on 15 of the 32 isolates with additional negative cultures considered significant by retrospective chart review. Agreement of pathology resident assignment with the retrospective chart review occurred among 74.6% of 71 isolates. The laboratory-based algorithm provided an acceptably accurate means for assessing the clinical significance of potential contaminants recovered from blood cultures. PMID:12089259
Manzanares-Laya, S; Burón, A; Murta-Nascimento, C; Servitja, S; Castells, X; Macià, F
2014-01-01
Hospital cancer registries and hospital databases are valuable and efficient sources of information for research into cancer recurrences. The aim of this study was to develop and validate algorithms for the detection of breast cancer recurrence. A retrospective observational study was conducted on breast cancer cases from the cancer registry of a third level university hospital diagnosed between 2003 and 2009. Different probable cancer recurrence algorithms were obtained by linking the hospital databases and the construction of several operational definitions, with their corresponding sensitivity, specificity, positive predictive value and negative predictive value. A total of 1,523 patients were diagnosed of breast cancer between 2003 and 2009. A request for bone gammagraphy after 6 months from the first oncological treatment showed the highest sensitivity (53.8%) and negative predictive value (93.8%), and a pathology test after 6 months after the diagnosis showed the highest specificity (93.8%) and negative predictive value (92.6%). The combination of different definitions increased the specificity and the positive predictive value, but decreased the sensitivity. Several diagnostic algorithms were obtained, and the different definitions could be useful depending on the interest and resources of the researcher. A higher positive predictive value could be interesting for a quick estimation of the number of cases, and a higher negative predictive value for a more exact estimation if more resources are available. It is a versatile and adaptable tool for other types of tumors, as well as for the needs of the researcher. Copyright © 2014 SECA. Published by Elsevier Espana. All rights reserved.
Carvalho, Gustavo A; Minnett, Peter J; Fleming, Lora E; Banzon, Viva F; Baringer, Warner
2010-06-01
In a continuing effort to develop suitable methods for the surveillance of Harmful Algal Blooms (HABs) of Karenia brevis using satellite radiometers, a new multi-algorithm method was developed to explore whether improvements in the remote sensing detection of the Florida Red Tide was possible. A Hybrid Scheme was introduced that sequentially applies the optimized versions of two pre-existing satellite-based algorithms: an Empirical Approach (using water-leaving radiance as a function of chlorophyll concentration) and a Bio-optical Technique (using particulate backscatter along with chlorophyll concentration). The long-term evaluation of the new multi-algorithm method was performed using a multi-year MODIS dataset (2002 to 2006; during the boreal Summer-Fall periods - July to December) along the Central West Florida Shelf between 25.75°N and 28.25°N. Algorithm validation was done with in situ measurements of the abundances of K. brevis; cell counts ≥1.5×10(4) cells l(-1) defined a detectable HAB. Encouraging statistical results were derived when either or both algorithms correctly flagged known samples. The majority of the valid match-ups were correctly identified (~80% of both HABs and non-blooming conditions) and few false negatives or false positives were produced (~20% of each). Additionally, most of the HAB-positive identifications in the satellite data were indeed HAB samples (positive predictive value: ~70%) and those classified as HAB-negative were almost all non-bloom cases (negative predictive value: ~86%). These results demonstrate an excellent detection capability, on average ~10% more accurate than the individual algorithms used separately. Thus, the new Hybrid Scheme could become a powerful tool for environmental monitoring of K. brevis blooms, with valuable consequences including leading to the more rapid and efficient use of ships to make in situ measurements of HABs.
Carvalho, Gustavo A.; Minnett, Peter J.; Fleming, Lora E.; Banzon, Viva F.; Baringer, Warner
2010-01-01
In a continuing effort to develop suitable methods for the surveillance of Harmful Algal Blooms (HABs) of Karenia brevis using satellite radiometers, a new multi-algorithm method was developed to explore whether improvements in the remote sensing detection of the Florida Red Tide was possible. A Hybrid Scheme was introduced that sequentially applies the optimized versions of two pre-existing satellite-based algorithms: an Empirical Approach (using water-leaving radiance as a function of chlorophyll concentration) and a Bio-optical Technique (using particulate backscatter along with chlorophyll concentration). The long-term evaluation of the new multi-algorithm method was performed using a multi-year MODIS dataset (2002 to 2006; during the boreal Summer-Fall periods – July to December) along the Central West Florida Shelf between 25.75°N and 28.25°N. Algorithm validation was done with in situ measurements of the abundances of K. brevis; cell counts ≥1.5×104 cells l−1 defined a detectable HAB. Encouraging statistical results were derived when either or both algorithms correctly flagged known samples. The majority of the valid match-ups were correctly identified (~80% of both HABs and non-blooming conditions) and few false negatives or false positives were produced (~20% of each). Additionally, most of the HAB-positive identifications in the satellite data were indeed HAB samples (positive predictive value: ~70%) and those classified as HAB-negative were almost all non-bloom cases (negative predictive value: ~86%). These results demonstrate an excellent detection capability, on average ~10% more accurate than the individual algorithms used separately. Thus, the new Hybrid Scheme could become a powerful tool for environmental monitoring of K. brevis blooms, with valuable consequences including leading to the more rapid and efficient use of ships to make in situ measurements of HABs. PMID:21037979
An outlet breaching algorithm for the treatment of closed depressions in a raster DEM
NASA Astrophysics Data System (ADS)
Martz, Lawrence W.; Garbrecht, Jurgen
1999-08-01
Automated drainage analysis of raster DEMs typically begins with the simulated filling of all closed depressions and the imposition of a drainage pattern on the resulting flat areas. The elimination of closed depressions by filling implicitly assumes that all depressions are caused by elevation underestimation. This assumption is difficult to support, as depressions can be produced by overestimation as well as by underestimation of DEM values.This paper presents a new algorithm that is applied in conjunction with conventional depression filling to provide a more realistic treatment of those depressions that are likely due to overestimation errors. The algorithm lowers the elevation of selected cells on the edge of closed depressions to simulate breaching of the depression outlets. Application of this breaching algorithm prior to depression filling can substantially reduce the number and size of depressions that need to be filled, especially in low relief terrain.Removing or reducing the size of a depression by breaching implicitly assumes that the depression is due to a spurious flow blockage caused by elevation overestimation. Removing a depression by filling, on the other hand, implicitly assumes that the depression is a direct artifact of elevation underestimation. Although the breaching algorithm cannot distinguish between overestimation and underestimation errors in a DEM, a constraining parameter for breaching length can be used to restrict breaching to closed depressions caused by narrow blockages along well-defined drainage courses. These are considered the depressions most likely to have arisen from overestimation errors. Applying the constrained breaching algorithm prior to a conventional depression-filling algorithm allows both positive and negative elevation adjustments to be used to remove depressions.The breaching algorithm was incorporated into the DEM pre-processing operations of the TOPAZ software system. The effect of the algorithm is illustrated by the application of TOPAZ to a DEM of a low-relief landscape. The use of the breaching algorithm during DEM pre-processing substantially reduced the number of cells that needed to be subsequently raised in elevation to remove depressions. The number and kind of depression cells that were eliminated by the breaching algorithm suggested that the algorithm effectively targeted those topographic situations for which it was intended. A detailed inspection of a portion of the DEM that was processed using breaching algorithm in conjunction with depression-filling also suggested the effects of the algorithm were as intended.The breaching algorithm provides an empirically satisfactory and robust approach to treating closed depressions in a raster DEM. It recognises that depressions in certain topographic settings are as likely to be due to elevation overestimation as to elevation underestimation errors. The algorithm allows a more realistic treatment of depressions in these situations than conventional methods that rely solely on depression-filling.
A networked voting rule for democratic representation
Brigatti, Edgardo; Moreno, Yamir
2018-01-01
We introduce a general framework for exploring the problem of selecting a committee of representatives with the aim of studying a networked voting rule based on a decentralized large-scale platform, which can assure a strong accountability of the elected. The results of our simulations suggest that this algorithm-based approach is able to obtain a high representativeness for relatively small committees, performing even better than a classical voting rule based on a closed list of candidates. We show that a general relation between committee size and representatives exists in the form of an inverse square root law and that the normalized committee size approximately scales with the inverse of the community size, allowing the scalability to very large populations. These findings are not strongly influenced by the different networks used to describe the individuals’ interactions, except for the presence of few individuals with very high connectivity which can have a marginal negative effect in the committee selection process. PMID:29657817
Structure-activity relationships between sterols and their thermal stability in oil matrix.
Hu, Yinzhou; Xu, Junli; Huang, Weisu; Zhao, Yajing; Li, Maiquan; Wang, Mengmeng; Zheng, Lufei; Lu, Baiyi
2018-08-30
Structure-activity relationships between 20 sterols and their thermal stabilities were studied in a model oil system. All sterol degradations were found to be consistent with a first-order kinetic model with determination of coefficient (R 2 ) higher than 0.9444. The number of double bonds in the sterol structure was negatively correlated with the thermal stability of sterol, whereas the length of the branch chain was positively correlated with the thermal stability of sterol. A quantitative structure-activity relationship (QSAR) model to predict thermal stability of sterol was developed by using partial least squares regression (PLSR) combined with genetic algorithm (GA). A regression model was built with R 2 of 0.806. Almost all sterol degradation constants can be predicted accurately with R 2 of cross-validation equals to 0.680. Four important variables were selected in optimal QSAR model and the selected variables were observed to be related with information indices, RDF descriptors, and 3D-MoRSE descriptors. Copyright © 2018 Elsevier Ltd. All rights reserved.
Continued research on selected parameters to minimize community annoyance from airplane noise
NASA Technical Reports Server (NTRS)
Frair, L.
1981-01-01
Results from continued research on selected parameters to minimize community annoyance from airport noise are reported. First, a review of the initial work on this problem is presented. Then the research focus is expanded by considering multiobjective optimization approaches for this problem. A multiobjective optimization algorithm review from the open literature is presented. This is followed by the multiobjective mathematical formulation for the problem of interest. A discussion of the appropriate solution algorithm for the multiobjective formulation is conducted. Alternate formulations and associated solution algorithms are discussed and evaluated for this airport noise problem. Selected solution algorithms that have been implemented are then used to produce computational results for example airports. These computations involved finding the optimal operating scenario for a moderate size airport and a series of sensitivity analyses for a smaller example airport.
De Beer, Maarten; Lynen, Fréderic; Chen, Kai; Ferguson, Paul; Hanna-Brown, Melissa; Sandra, Pat
2010-03-01
Stationary-phase optimized selectivity liquid chromatography (SOS-LC) is a tool in reversed-phase LC (RP-LC) to optimize the selectivity for a given separation by combining stationary phases in a multisegment column. The presently (commercially) available SOS-LC optimization procedure and algorithm are only applicable to isocratic analyses. Step gradient SOS-LC has been developed, but this is still not very elegant for the analysis of complex mixtures composed of components covering a broad hydrophobicity range. A linear gradient prediction algorithm has been developed allowing one to apply SOS-LC as a generic RP-LC optimization method. The algorithm allows operation in isocratic, stepwise, and linear gradient run modes. The features of SOS-LC in the linear gradient mode are demonstrated by means of a mixture of 13 steroids, whereby baseline separation is predicted and experimentally demonstrated.
Simulation of biochemical reactions with time-dependent rates by the rejection-based algorithm
DOE Office of Scientific and Technical Information (OSTI.GOV)
Thanh, Vo Hong, E-mail: vo@cosbi.eu; Priami, Corrado, E-mail: priami@cosbi.eu; Department of Mathematics, University of Trento, Trento
We address the problem of simulating biochemical reaction networks with time-dependent rates and propose a new algorithm based on our rejection-based stochastic simulation algorithm (RSSA) [Thanh et al., J. Chem. Phys. 141(13), 134116 (2014)]. The computation for selecting next reaction firings by our time-dependent RSSA (tRSSA) is computationally efficient. Furthermore, the generated trajectory is exact by exploiting the rejection-based mechanism. We benchmark tRSSA on different biological systems with varying forms of reaction rates to demonstrate its applicability and efficiency. We reveal that for nontrivial cases, the selection of reaction firings in existing algorithms introduces approximations because the integration of reactionmore » rates is very computationally demanding and simplifying assumptions are introduced. The selection of the next reaction firing by our approach is easier while preserving the exactness.« less
A bootstrap based Neyman-Pearson test for identifying variable importance.
Ditzler, Gregory; Polikar, Robi; Rosen, Gail
2015-04-01
Selection of most informative features that leads to a small loss on future data are arguably one of the most important steps in classification, data analysis and model selection. Several feature selection (FS) algorithms are available; however, due to noise present in any data set, FS algorithms are typically accompanied by an appropriate cross-validation scheme. In this brief, we propose a statistical hypothesis test derived from the Neyman-Pearson lemma for determining if a feature is statistically relevant. The proposed approach can be applied as a wrapper to any FS algorithm, regardless of the FS criteria used by that algorithm, to determine whether a feature belongs in the relevant set. Perhaps more importantly, this procedure efficiently determines the number of relevant features given an initial starting point. We provide freely available software implementations of the proposed methodology.
Husain, N; Blais, P; Kramer, J; Kowalkowski, M; Richardson, P; El-Serag, H B; Kanwal, F
2014-10-01
In practice, nonalcoholic fatty liver disease (NAFLD) is diagnosed based on elevated liver enzymes and confirmatory liver biopsy or abdominal imaging. Neither method is feasible in identifying individuals with NAFLD in a large-scale healthcare system. To develop and validate an algorithm to identify patients with NAFLD using automated data. Using the Veterans Administration Corporate Data Warehouse, we identified patients who had persistent ALT elevation (≥2 values ≥40 IU/mL ≥6 months apart) and did not have evidence of hepatitis B, hepatitis C or excessive alcohol use. We conducted a structured chart review of 450 patients classified as NAFLD and 150 patients who were classified as non-NAFLD by the database algorithm, and subsequently refined the database algorithm. The sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) for the initial database definition of NAFLD were 78.4% (95% CI: 70.0-86.8%), 74.5% (95% CI: 68.1-80.9%), 64.1% (95% CI: 56.4-71.7%) and 85.6% (95% CI: 79.4-91.8%), respectively. Reclassifying patients as having NAFLD if they had two elevated ALTs that were at least 6 months apart but within 2 years of each other, increased the specificity and PPV of the algorithm to 92.4% (95% CI: 88.8-96.0%) and 80.8% (95% CI: 72.5-89.0%), respectively. However, the sensitivity and NPV decreased to 55.0% (95% CI: 46.1-63.9%) and 78.0% (95% CI: 72.1-83.8%), respectively. Predictive algorithms using automated data can be used to identify patients with NAFLD, determine prevalence of NAFLD at the system-wide level, and may help select a target population for future clinical studies in veterans with NAFLD. © 2014 John Wiley & Sons Ltd.
A dynamic data source selection system for smartwatch platform.
Nemati, Ebrahim; Sideris, Konstantinos; Kalantarian, Haik; Sarrafzadeh, Majid
2016-08-01
A novel data source selection algorithm is proposed for ambulatory activity tracking of elderly people. The algorithm introduces the concept of dynamic switching between the data collection modules (a smartwatch and a smartphone) to improve accuracy and battery life using contextual information. We show that by making offloading decisions as a function of activity, the proposed algorithm improves power consumption and accuracy of the previous work by 7 hours and 5% respectively compared to the baseline.
Social Media: Menagerie of Metrics
2010-01-27
intelligence, an evolutionary algorithm (EA) is a subset of evolutionary computation, a generic population-based metaheuristic optimization algorithm . An EA...Cloning - 22 Animals were cloned to date; genetic algorithms can help prediction (e.g. “elitism” - attempts to ensure selection by including performers...28, 2010 Evolutionary Algorithm • Evolutionary algorithm From Wikipedia, the free encyclopedia Artificial intelligence portal In artificial
Algorithms for selecting informative marker panels for population assignment.
Rosenberg, Noah A
2005-11-01
Given a set of potential source populations, genotypes of an individual of unknown origin at a collection of markers can be used to predict the correct source population of the individual. For improved efficiency, informative markers can be chosen from a larger set of markers to maximize the accuracy of this prediction. However, selecting the loci that are individually most informative does not necessarily produce the optimal panel. Here, using genotypes from eight species--carp, cat, chicken, dog, fly, grayling, human, and maize--this univariate accumulation procedure is compared to new multivariate "greedy" and "maximin" algorithms for choosing marker panels. The procedures generally suggest similar panels, although the greedy method often recommends inclusion of loci that are not chosen by the other algorithms. In seven of the eight species, when applied to five or more markers, all methods achieve at least 94% assignment accuracy on simulated individuals, with one species--dog--producing this level of accuracy with only three markers, and the eighth species--human--requiring approximately 13-16 markers. The new algorithms produce substantial improvements over use of randomly selected markers; where differences among the methods are noticeable, the greedy algorithm leads to slightly higher probabilities of correct assignment. Although none of the approaches necessarily chooses the panel with optimal performance, the algorithms all likely select panels with performance near enough to the maximum that they all are suitable for practical use.
Leger, Stefan; Zwanenburg, Alex; Pilz, Karoline; Lohaus, Fabian; Linge, Annett; Zöphel, Klaus; Kotzerke, Jörg; Schreiber, Andreas; Tinhofer, Inge; Budach, Volker; Sak, Ali; Stuschke, Martin; Balermpas, Panagiotis; Rödel, Claus; Ganswindt, Ute; Belka, Claus; Pigorsch, Steffi; Combs, Stephanie E; Mönnich, David; Zips, Daniel; Krause, Mechthild; Baumann, Michael; Troost, Esther G C; Löck, Steffen; Richter, Christian
2017-10-16
Radiomics applies machine learning algorithms to quantitative imaging data to characterise the tumour phenotype and predict clinical outcome. For the development of radiomics risk models, a variety of different algorithms is available and it is not clear which one gives optimal results. Therefore, we assessed the performance of 11 machine learning algorithms combined with 12 feature selection methods by the concordance index (C-Index), to predict loco-regional tumour control (LRC) and overall survival for patients with head and neck squamous cell carcinoma. The considered algorithms are able to deal with continuous time-to-event survival data. Feature selection and model building were performed on a multicentre cohort (213 patients) and validated using an independent cohort (80 patients). We found several combinations of machine learning algorithms and feature selection methods which achieve similar results, e.g. C-Index = 0.71 and BT-COX: C-Index = 0.70 in combination with Spearman feature selection. Using the best performing models, patients were stratified into groups of low and high risk of recurrence. Significant differences in LRC were obtained between both groups on the validation cohort. Based on the presented analysis, we identified a subset of algorithms which should be considered in future radiomics studies to develop stable and clinically relevant predictive models for time-to-event endpoints.
Kufa, Tendesayi; Kharsany, Ayesha BM; Cawood, Cherie; Khanyile, David; Lewis, Lara; Grobler, Anneke; Chipeta, Zawadi; Bere, Alfred; Glenshaw, Mary; Puren, Adrian
2017-01-01
Abstract Introduction: We describe the overall accuracy and performance of a serial rapid HIV testing algorithm used in community-based HIV testing in the context of a population-based household survey conducted in two sub-districts of uMgungundlovu district, KwaZulu-Natal, South Africa, against reference fourth-generation HIV-1/2 antibody and p24 antigen combination immunoassays. We discuss implications of the findings on rapid HIV testing programmes. Methods: Cross-sectional design: Following enrolment into the survey, questionnaires were administered to eligible and consenting participants in order to obtain demographic and HIV-related data. Peripheral blood samples were collected for HIV-related testing. Participants were offered community-based HIV testing in the home by trained field workers using a serial algorithm with two rapid diagnostic tests (RDTs) in series. In the laboratory, reference HIV testing was conducted using two fourth-generation immunoassays with all positives in the confirmatory test considered true positives. Accuracy, sensitivity, specificity, positive predictive value, negative predictive value and false-positive and false-negative rates were determined. Results: Of 10,236 individuals enrolled in the survey, 3740 were tested in the home (median age 24 years (interquartile range 19–31 years), 42.1% males and HIV positivity on RDT algorithm 8.0%). From those tested, 3729 (99.7%) had a definitive RDT result as well as a laboratory immunoassay result. The overall accuracy of the RDT when compared to the fourth-generation immunoassays was 98.8% (95% confidence interval (CI) 98.5–99.2). The sensitivity, specificity, positive predictive value and negative predictive value were 91.1% (95% CI 87.5–93.7), 99.9% (95% CI 99.8–100), 99.3% (95% CI 97.4–99.8) and 99.1% (95% CI 98.8–99.4) respectively. The false-positive and false-negative rates were 0.06% (95% CI 0.01–0.24) and 8.9% (95% CI 6.3–12.53). Compared to true positives, false negatives were more likely to be recently infected on limited antigen avidity assay and to report antiretroviral therapy (ART) use. Conclusions: The overall accuracy of the RDT algorithm was high. However, there were few false positives, and the sensitivity was lower than expected with high false negatives, despite implementation of quality assurance measures. False negatives were associated with recent (early) infection and ART exposure. The RDT algorithm was able to correctly identify the majority of HIV infections in community-based HIV testing. Messaging on the potential for false positives and false negatives should be included in these programmes. PMID:28872274
Recursive Branching Simulated Annealing Algorithm
NASA Technical Reports Server (NTRS)
Bolcar, Matthew; Smith, J. Scott; Aronstein, David
2012-01-01
This innovation is a variation of a simulated-annealing optimization algorithm that uses a recursive-branching structure to parallelize the search of a parameter space for the globally optimal solution to an objective. The algorithm has been demonstrated to be more effective at searching a parameter space than traditional simulated-annealing methods for a particular problem of interest, and it can readily be applied to a wide variety of optimization problems, including those with a parameter space having both discrete-value parameters (combinatorial) and continuous-variable parameters. It can take the place of a conventional simulated- annealing, Monte-Carlo, or random- walk algorithm. In a conventional simulated-annealing (SA) algorithm, a starting configuration is randomly selected within the parameter space. The algorithm randomly selects another configuration from the parameter space and evaluates the objective function for that configuration. If the objective function value is better than the previous value, the new configuration is adopted as the new point of interest in the parameter space. If the objective function value is worse than the previous value, the new configuration may be adopted, with a probability determined by a temperature parameter, used in analogy to annealing in metals. As the optimization continues, the region of the parameter space from which new configurations can be selected shrinks, and in conjunction with lowering the annealing temperature (and thus lowering the probability for adopting configurations in parameter space with worse objective functions), the algorithm can converge on the globally optimal configuration. The Recursive Branching Simulated Annealing (RBSA) algorithm shares some features with the SA algorithm, notably including the basic principles that a starting configuration is randomly selected from within the parameter space, the algorithm tests other configurations with the goal of finding the globally optimal solution, and the region from which new configurations can be selected shrinks as the search continues. The key difference between these algorithms is that in the SA algorithm, a single path, or trajectory, is taken in parameter space, from the starting point to the globally optimal solution, while in the RBSA algorithm, many trajectories are taken; by exploring multiple regions of the parameter space simultaneously, the algorithm has been shown to converge on the globally optimal solution about an order of magnitude faster than when using conventional algorithms. Novel features of the RBSA algorithm include: 1. More efficient searching of the parameter space due to the branching structure, in which multiple random configurations are generated and multiple promising regions of the parameter space are explored; 2. The implementation of a trust region for each parameter in the parameter space, which provides a natural way of enforcing upper- and lower-bound constraints on the parameters; and 3. The optional use of a constrained gradient- search optimization, performed on the continuous variables around each branch s configuration in parameter space to improve search efficiency by allowing for fast fine-tuning of the continuous variables within the trust region at that configuration point.
Niazi, Muhammad Khalid Khan; Abas, Fazly Salleh; Senaras, Caglar; Pennell, Michael; Sahiner, Berkman; Chen, Weijie; Opfer, John; Hasserjian, Robert; Louissaint, Abner; Shana'ah, Arwa; Lozanski, Gerard; Gurcan, Metin N
2018-01-01
Automatic and accurate detection of positive and negative nuclei from images of immunostained tissue biopsies is critical to the success of digital pathology. The evaluation of most nuclei detection algorithms relies on manually generated ground truth prepared by pathologists, which is unfortunately time-consuming and suffers from inter-pathologist variability. In this work, we developed a digital immunohistochemistry (IHC) phantom that can be used for evaluating computer algorithms for enumeration of IHC positive cells. Our phantom development consists of two main steps, 1) extraction of the individual as well as nuclei clumps of both positive and negative nuclei from real WSI images, and 2) systematic placement of the extracted nuclei clumps on an image canvas. The resulting images are visually similar to the original tissue images. We created a set of 42 images with different concentrations of positive and negative nuclei. These images were evaluated by four board certified pathologists in the task of estimating the ratio of positive to total number of nuclei. The resulting concordance correlation coefficients (CCC) between the pathologist and the true ratio range from 0.86 to 0.95 (point estimates). The same ratio was also computed by an automated computer algorithm, which yielded a CCC value of 0.99. Reading the phantom data with known ground truth, the human readers show substantial variability and lower average performance than the computer algorithm in terms of CCC. This shows the limitation of using a human reader panel to establish a reference standard for the evaluation of computer algorithms, thereby highlighting the usefulness of the phantom developed in this work. Using our phantom images, we further developed a function that can approximate the true ratio from the area of the positive and negative nuclei, hence avoiding the need to detect individual nuclei. The predicted ratios of 10 held-out images using the function (trained on 32 images) are within ±2.68% of the true ratio. Moreover, we also report the evaluation of a computerized image analysis method on the synthetic tissue dataset.
Colombo, Roberto; Sterpi, Irma; Mazzone, Alessandra; Delconte, Carmen; Pisano, Fabrizio
2012-05-01
In robot-assisted neurorehabilitation, matching the task difficulty level to the patient's needs and abilities, both initially and as the relearning process progresses, can enhance the effectiveness of training and improve patients' motivation and outcome. This study presents a Progressive Task Regulation algorithm implemented in a robot for upper limb rehabilitation. It evaluates the patient's performance during training through the computation of robot-measured parameters, and automatically changes the features of the reaching movements, adapting the difficulty level of the motor task to the patient's abilities. In particular, it can select different types of assistance (time-triggered, activity-triggered, and negative assistance) and implement varied therapy practice to promote generalization processes. The algorithm was tuned by assessing the performance data obtained in 22 chronic stroke patients who underwent robotic rehabilitation, in which the difficulty level of the task was manually adjusted by the therapist. Thus, we could verify the patient's recovery strategies and implement task transition rules to match both the patient's and therapist's behavior. In addition, the algorithm was tested in a sample of five chronic stroke patients. The findings show good agreement with the therapist decisions so indicating that it could be useful for the implementation of training protocols allowing individualized and gradual treatment of upper limb disabilities in patients after stroke. The application of this algorithm during robot-assisted therapy should allow an easier management of the different motor tasks administered during training, thereby facilitating the therapist's activity in the treatment of different pathologic conditions of the neuromuscular system.
[New methodological advances: algorithm proposal for management of Clostridium difficile infection].
González-Abad, María José; Alonso-Sanz, Mercedes
2015-06-01
Clostridium difficile infection (CDI) is considered the most common cause of health care-associated diarrhea and also is an etiologic agent of community diarrhea. The aim of this study was to assess the potential benefit of a test that detects glutamate dehydrogenase (GDH) antigen and C. difficile toxin A/B, simultaneously, followed by detection of C. difficile toxin B (tcdB) gene by PCR as confirmatory assay on discrepant samples, and to propose an algorithm more efficient. From June 2012 to January 2013 at Hospital Infantil Universitario Niño Jesús, Madrid, the stool samples were studied for the simultaneous detection of GDH and toxin A/B, and also for detection of toxin A/B alone. When results between GDH and toxin A/B were discordant, a single sample for patient was selected for detection of C. difficile toxin B (tcdB) gene. A total of 116 samples (52 patients) were tested. Four were positive and 75 negative for toxigenic C. difficile (Toxin A/B, alone or combined with GDH). C. difficile was detected in the remaining 37 samples but not toxin A/B, regardless of the method used, except one. Twenty of the 37 specimens were further tested for C. difficile toxin B (tcdB) gene and 7 were positive. The simultaneous detection of GDH and toxin A/B combined with PCR recovered undiagnosed cases of CDI. In accordance with our data, we propose a two-step algorithm: detection of GDH and PCR (in samples GDH positive). This algorithm could provide a superior cost-benefit ratio in our population.
Doss, Jayanth; Mo, Huan; Carroll, Robert J; Crofford, Leslie J; Denny, Joshua C
2017-02-01
The differences between seronegative and seropositive rheumatoid arthritis (RA) have not been widely reported. We performed electronic health record (EHR)-based phenome-wide association studies (PheWAS) to identify disease associations in seropositive and seronegative RA. A validated algorithm identified RA subjects from the de-identified version of the Vanderbilt University Medical Center EHR. Serotypes were determined by rheumatoid factor (RF) and anti-cyclic citrullinated peptide antibody (ACPA) values. We tested EHR-derived phenotypes using PheWAS comparing seropositive RA and seronegative RA, yielding disease associations. PheWAS was also performed in RF-positive versus RF-negative subjects and ACPA-positive versus ACPA-negative subjects. Following PheWAS, select phenotypes were then manually reviewed, and fibromyalgia was specifically evaluated using a validated algorithm. A total of 2,199 RA individuals with either RF or ACPA testing were identified. Of these, 1,382 patients (63%) were classified as seropositive. Seronegative RA was associated with myalgia and myositis (odds ratio [OR] 2.1, P = 3.7 × 10 -10 ) and back pain. A manual review of the health record showed that among subjects coded for Myalgia and Myositis, ∼80% had fibromyalgia. Follow-up with a specific EHR algorithm for fibromyalgia confirmed that seronegative RA was associated with fibromyalgia (OR 1.8, P = 4.0 × 10 -6 ). Seropositive RA was associated with chronic airway obstruction (OR 2.2, P = 1.4 × 10 -4 ) and tobacco use (OR 2.2, P = 7.0 × 10 -4 ). This PheWAS of RA patients identifies a strong association between seronegativity and fibromyalgia. It also affirms relationships between seropositivity and chronic airway obstruction and between seropositivity and tobacco use. These findings demonstrate the utility of the PheWAS approach to discover novel phenotype associations within different subgroups of a disease. © 2016, American College of Rheumatology.
Clustering analysis of moving target signatures
NASA Astrophysics Data System (ADS)
Martone, Anthony; Ranney, Kenneth; Innocenti, Roberto
2010-04-01
Previously, we developed a moving target indication (MTI) processing approach to detect and track slow-moving targets inside buildings, which successfully detected moving targets (MTs) from data collected by a low-frequency, ultra-wideband radar. Our MTI algorithms include change detection, automatic target detection (ATD), clustering, and tracking. The MTI algorithms can be implemented in a real-time or near-real-time system; however, a person-in-the-loop is needed to select input parameters for the clustering algorithm. Specifically, the number of clusters to input into the cluster algorithm is unknown and requires manual selection. A critical need exists to automate all aspects of the MTI processing formulation. In this paper, we investigate two techniques that automatically determine the number of clusters: the adaptive knee-point (KP) algorithm and the recursive pixel finding (RPF) algorithm. The KP algorithm is based on a well-known heuristic approach for determining the number of clusters. The RPF algorithm is analogous to the image processing, pixel labeling procedure. Both algorithms are used to analyze the false alarm and detection rates of three operational scenarios of personnel walking inside wood and cinderblock buildings.
Classification of a large microarray data set: Algorithm comparison and analysis of drug signatures
Natsoulis, Georges; El Ghaoui, Laurent; Lanckriet, Gert R.G.; Tolley, Alexander M.; Leroy, Fabrice; Dunlea, Shane; Eynon, Barrett P.; Pearson, Cecelia I.; Tugendreich, Stuart; Jarnagin, Kurt
2005-01-01
A large gene expression database has been produced that characterizes the gene expression and physiological effects of hundreds of approved and withdrawn drugs, toxicants, and biochemical standards in various organs of live rats. In order to derive useful biological knowledge from this large database, a variety of supervised classification algorithms were compared using a 597-microarray subset of the data. Our studies show that several types of linear classifiers based on Support Vector Machines (SVMs) and Logistic Regression can be used to derive readily interpretable drug signatures with high classification performance. Both methods can be tuned to produce classifiers of drug treatments in the form of short, weighted gene lists which upon analysis reveal that some of the signature genes have a positive contribution (act as “rewards” for the class-of-interest) while others have a negative contribution (act as “penalties”) to the classification decision. The combination of reward and penalty genes enhances performance by keeping the number of false positive treatments low. The results of these algorithms are combined with feature selection techniques that further reduce the length of the drug signatures, an important step towards the development of useful diagnostic biomarkers and low-cost assays. Multiple signatures with no genes in common can be generated for the same classification end-point. Comparison of these gene lists identifies biological processes characteristic of a given class. PMID:15867433
Instances selection algorithm by ensemble margin
NASA Astrophysics Data System (ADS)
Saidi, Meryem; Bechar, Mohammed El Amine; Settouti, Nesma; Chikh, Mohamed Amine
2018-05-01
The main limit of data mining algorithms is their inability to deal with the huge amount of available data in a reasonable processing time. A solution of producing fast and accurate results is instances and features selection. This process eliminates noisy or redundant data in order to reduce the storage and computational cost without performances degradation. In this paper, a new instance selection approach called Ensemble Margin Instance Selection (EMIS) algorithm is proposed. This approach is based on the ensemble margin. To evaluate our approach, we have conducted several experiments on different real-world classification problems from UCI Machine learning repository. The pixel-based image segmentation is a field where the storage requirement and computational cost of applied model become higher. To solve these limitations we conduct a study based on the application of EMIS and other instance selection techniques for the segmentation and automatic recognition of white blood cells WBC (nucleus and cytoplasm) in cytological images.
DEEPEN: A negation detection system for clinical text incorporating dependency relation into NegEx
Mehrabi, Saeed; Krishnan, Anand; Sohn, Sunghwan; Roch, Alexandra M; Schmidt, Heidi; Kesterson, Joe; Beesley, Chris; Dexter, Paul; Schmidt, C. Max; Liu, Hongfang; Palakal, Mathew
2018-01-01
In Electronic Health Records (EHRs), much of valuable information regarding patients’ conditions is embedded in free text format. Natural language processing (NLP) techniques have been developed to extract clinical information from free text. One challenge faced in clinical NLP is that the meaning of clinical entities is heavily affected by modifiers such as negation. A negation detection algorithm, NegEx, applies a simplistic approach that has been shown to be powerful in clinical NLP. However, due to the failure to consider the contextual relationship between words within a sentence, NegEx fails to correctly capture the negation status of concepts in complex sentences. Incorrect negation assignment could cause inaccurate diagnosis of patients’ condition or contaminated study cohorts. We developed a negation algorithm called DEEPEN to decrease NegEx’s false positives by taking into account the dependency relationship between negation words and concepts within a sentence using Stanford dependency parser. The system was developed and tested using EHR data from Indiana University (IU) and it was further evaluated on Mayo Clinic dataset to assess its generalizability. The evaluation results demonstrate DEEPEN, which incorporates dependency parsing into NegEx, can reduce the number of incorrect negation assignment for patients with positive findings, and therefore improve the identification of patients with the target clinical findings in EHRs. PMID:25791500
NASA Astrophysics Data System (ADS)
Xiang, Min; Qu, Qinqin; Chen, Cheng; Tian, Li; Zeng, Lingkang
2017-11-01
To improve the reliability of communication service in smart distribution grid (SDG), an access selection algorithm based on dynamic network status and different service types for heterogeneous wireless networks was proposed. The network performance index values were obtained in real time by multimode terminal and the variation trend of index values was analyzed by the growth matrix. The index weights were calculated by entropy-weight and then modified by rough set to get the final weights. Combining the grey relational analysis to sort the candidate networks, and the optimum communication network is selected. Simulation results show that the proposed algorithm can implement dynamically access selection in heterogeneous wireless networks of SDG effectively and reduce the network blocking probability.
USDA-ARS?s Scientific Manuscript database
Hyperspectral scattering is a promising technique for rapid and noninvasive measurement of multiple quality attributes of apple fruit. A hierarchical evolutionary algorithm (HEA) approach, in combination with subspace decomposition and partial least squares (PLS) regression, was proposed to select o...
Cross-validation pitfalls when selecting and assessing regression and classification models.
Krstajic, Damjan; Buturovic, Ljubomir J; Leahy, David E; Thomas, Simon
2014-03-29
We address the problem of selecting and assessing classification and regression models using cross-validation. Current state-of-the-art methods can yield models with high variance, rendering them unsuitable for a number of practical applications including QSAR. In this paper we describe and evaluate best practices which improve reliability and increase confidence in selected models. A key operational component of the proposed methods is cloud computing which enables routine use of previously infeasible approaches. We describe in detail an algorithm for repeated grid-search V-fold cross-validation for parameter tuning in classification and regression, and we define a repeated nested cross-validation algorithm for model assessment. As regards variable selection and parameter tuning we define two algorithms (repeated grid-search cross-validation and double cross-validation), and provide arguments for using the repeated grid-search in the general case. We show results of our algorithms on seven QSAR datasets. The variation of the prediction performance, which is the result of choosing different splits of the dataset in V-fold cross-validation, needs to be taken into account when selecting and assessing classification and regression models. We demonstrate the importance of repeating cross-validation when selecting an optimal model, as well as the importance of repeating nested cross-validation when assessing a prediction error.
Liston, Adrian; Hardy, Kristine; Pittelkow, Yvonne; Wilson, Susan R; Makaroff, Lydia E; Fahrer, Aude M; Goodnow, Christopher C
2007-01-01
T cells in the thymus undergo opposing positive and negative selection processes so that the only T cells entering circulation are those bearing a T cell receptor (TCR) with a low affinity for self. The mechanism differentiating negative from positive selection is poorly understood, despite the fact that inherited defects in negative selection underlie organ-specific autoimmune disease in AIRE-deficient people and the non-obese diabetic (NOD) mouse strain Here we use homogeneous populations of T cells undergoing either positive or negative selection in vivo together with genome-wide transcription profiling on microarrays to identify the gene expression differences underlying negative selection to an Aire-dependent organ-specific antigen, including the upregulation of a genomic cluster in the cytogenetic band 2F. Analysis of defective negative selection in the autoimmune-prone NOD strain demonstrates a global impairment in the induction of the negative selection response gene set, but little difference in positive selection response genes. Combining expression differences with genetic linkage data, we identify differentially expressed candidate genes, including Bim, Bnip3, Smox, Pdrg1, Id1, Pdcd1, Ly6c, Pdia3, Trim30 and Trim12. The data provide a molecular map of the negative selection response in vivo and, by analysis of deviations from this pathway in the autoimmune susceptible NOD strain, suggest that susceptibility arises from small expression differences in genes acting at multiple points in the pathway between the TCR and cell death.
Liston, Adrian; Hardy, Kristine; Pittelkow, Yvonne; Wilson, Susan R; Makaroff, Lydia E; Fahrer, Aude M; Goodnow, Christopher C
2007-01-01
Background T cells in the thymus undergo opposing positive and negative selection processes so that the only T cells entering circulation are those bearing a T cell receptor (TCR) with a low affinity for self. The mechanism differentiating negative from positive selection is poorly understood, despite the fact that inherited defects in negative selection underlie organ-specific autoimmune disease in AIRE-deficient people and the non-obese diabetic (NOD) mouse strain Results Here we use homogeneous populations of T cells undergoing either positive or negative selection in vivo together with genome-wide transcription profiling on microarrays to identify the gene expression differences underlying negative selection to an Aire-dependent organ-specific antigen, including the upregulation of a genomic cluster in the cytogenetic band 2F. Analysis of defective negative selection in the autoimmune-prone NOD strain demonstrates a global impairment in the induction of the negative selection response gene set, but little difference in positive selection response genes. Combining expression differences with genetic linkage data, we identify differentially expressed candidate genes, including Bim, Bnip3, Smox, Pdrg1, Id1, Pdcd1, Ly6c, Pdia3, Trim30 and Trim12. Conclusion The data provide a molecular map of the negative selection response in vivo and, by analysis of deviations from this pathway in the autoimmune susceptible NOD strain, suggest that susceptibility arises from small expression differences in genes acting at multiple points in the pathway between the TCR and cell death. PMID:17239257
Zarei, Kobra; Atabati, Morteza; Ahmadi, Monire
2017-05-04
Bee algorithm (BA) is an optimization algorithm inspired by the natural foraging behaviour of honey bees to find the optimal solution which can be proposed to feature selection. In this paper, shuffling cross-validation-BA (CV-BA) was applied to select the best descriptors that could describe the retention factor (log k) in the biopartitioning micellar chromatography (BMC) of 79 heterogeneous pesticides. Six descriptors were obtained using BA and then the selected descriptors were applied for model development using multiple linear regression (MLR). The descriptor selection was also performed using stepwise, genetic algorithm and simulated annealing methods and MLR was applied to model development and then the results were compared with those obtained from shuffling CV-BA. The results showed that shuffling CV-BA can be applied as a powerful descriptor selection method. Support vector machine (SVM) was also applied for model development using six selected descriptors by BA. The obtained statistical results using SVM were better than those obtained using MLR, as the root mean square error (RMSE) and correlation coefficient (R) for whole data set (training and test), using shuffling CV-BA-MLR, were obtained as 0.1863 and 0.9426, respectively, while these amounts for the shuffling CV-BA-SVM method were obtained as 0.0704 and 0.9922, respectively.
Investigating a memory-based account of negative priming: support for selection-feature mismatch.
MacDonald, P A; Joordens, S
2000-08-01
Using typical and modified negative priming tasks, the selection-feature mismatch account of negative priming was tested. In the modified task, participants performed selections on the basis of a semantic feature (e.g., referent size). This procedure has been shown to enhance negative priming (P. A. MacDonald, S. Joordens, & K. N. Seergobin, 1999). Across 3 experiments, negative priming occurred only when the repeated item mismatched in terms of the feature used as the basis for selections. When the repeated item was congruent on the selection feature across the prime and probe displays, positive priming arose. This pattern of results appeared in both the ignored- and the attended-repetition conditions. Negative priming does not result from previously ignoring an item. These findings strongly support the selection-feature mismatch account of negative priming and refute both the distractor inhibition and the episodic-retrieval explanations.
NASA Technical Reports Server (NTRS)
Eigen, D. J.; Fromm, F. R.; Northouse, R. A.
1974-01-01
A new clustering algorithm is presented that is based on dimensional information. The algorithm includes an inherent feature selection criterion, which is discussed. Further, a heuristic method for choosing the proper number of intervals for a frequency distribution histogram, a feature necessary for the algorithm, is presented. The algorithm, although usable as a stand-alone clustering technique, is then utilized as a global approximator. Local clustering techniques and configuration of a global-local scheme are discussed, and finally the complete global-local and feature selector configuration is shown in application to a real-time adaptive classification scheme for the analysis of remote sensed multispectral scanner data.
Spectral band selection for classification of soil organic matter content
NASA Technical Reports Server (NTRS)
Henderson, Tracey L.; Szilagyi, Andrea; Baumgardner, Marion F.; Chen, Chih-Chien Thomas; Landgrebe, David A.
1989-01-01
This paper describes the spectral-band-selection (SBS) algorithm of Chen and Landgrebe (1987, 1988, and 1989) and uses the algorithm to classify the organic matter content in the earth's surface soil. The effectiveness of the algorithm was evaluated comparing the results of classification of the soil organic matter using SBS bands with those obtained using Landsat MSS bands and TM bands, showing that the algorithm was successful in finding important spectral bands for classification of organic matter content. Using the calculated bands, the probabilities of correct classification for climate-stratified data were found to range from 0.910 to 0.980.
Lin, Jingjing; Jing, Honglei
2016-01-01
Artificial immune system is one of the most recently introduced intelligence methods which was inspired by biological immune system. Most immune system inspired algorithms are based on the clonal selection principle, known as clonal selection algorithms (CSAs). When coping with complex optimization problems with the characteristics of multimodality, high dimension, rotation, and composition, the traditional CSAs often suffer from the premature convergence and unsatisfied accuracy. To address these concerning issues, a recombination operator inspired by the biological combinatorial recombination is proposed at first. The recombination operator could generate the promising candidate solution to enhance search ability of the CSA by fusing the information from random chosen parents. Furthermore, a modified hypermutation operator is introduced to construct more promising and efficient candidate solutions. A set of 16 common used benchmark functions are adopted to test the effectiveness and efficiency of the recombination and hypermutation operators. The comparisons with classic CSA, CSA with recombination operator (RCSA), and CSA with recombination and modified hypermutation operator (RHCSA) demonstrate that the proposed algorithm significantly improves the performance of classic CSA. Moreover, comparison with the state-of-the-art algorithms shows that the proposed algorithm is quite competitive. PMID:27698662
Carvalho, Gustavo A.; Minnett, Peter J.; Banzon, Viva F.; Baringer, Warner; Heil, Cynthia A.
2011-01-01
We present a simple algorithm to identify Karenia brevis blooms in the Gulf of Mexico along the west coast of Florida in satellite imagery. It is based on an empirical analysis of collocated matchups of satellite and in situ measurements. The results of this Empirical Approach is compared to those of a Bio-optical Technique – taken from the published literature – and the Operational Method currently implemented by the NOAA Harmful Algal Bloom Forecasting System for K. brevis blooms. These three algorithms are evaluated using a multi-year MODIS data set (from July, 2002 to October, 2006) and a long-term in situ database. Matchup pairs, consisting of remotely-sensed ocean color parameters and near-coincident field measurements of K. brevis concentration, are used to assess the accuracy of the algorithms. Fair evaluation of the algorithms was only possible in the central west Florida shelf (i.e. between 25.75°N and 28.25°N) during the boreal Summer and Fall months (i.e. July to December) due to the availability of valid cloud-free matchups. Even though the predictive values of the three algorithms are similar, the statistical measure of success in red tide identification (defined as cell counts in excess of 1.5 × 104 cells L−1) varied considerably (sensitivity—Empirical: 86%; Bio-optical: 77%; Operational: 26%), as did their effectiveness in identifying non-bloom cases (specificity—Empirical: 53%; Bio-optical: 65%; Operational: 84%). As the Operational Method had an elevated frequency of false-negative cases (i.e. presented low accuracy in detecting known red tides), and because of the considerable overlap between the optical characteristics of the red tide and non-bloom population, only the other two algorithms underwent a procedure for further inspecting possible detection improvements. Both optimized versions of the Empirical and Bio-optical algorithms performed similarly, being equally specific and sensitive (~70% for both) and showing low levels of uncertainties (i.e. few cases of false-negatives and false-positives: ~30%)—improved positive predictive values (~60%) were also observed along with good negative predictive values (~80%). PMID:22180667
Alfa, Michelle J; Sepehri, Shadi
2013-01-01
BACKGROUND: There has been a growing interest in developing an appropriate laboratory diagnostic algorithm for Clostridium difficile, mainly as a result of increases in both the number and severity of cases of C difficile infection in the past decade. A C difficile diagnostic algorithm is necessary because diagnostic kits, mostly for the detection of toxins A and B or glutamate dehydrogenase (GDH) antigen, are not sufficient as stand-alone assays for optimal diagnosis of C difficile infection. In addition, conventional reference methods for C difficile detection (eg, toxigenic culture and cytotoxin neutralization [CTN] assays) are not routinely practiced in diagnostic laboratory settings. OBJECTIVE: To review the four-step algorithm used at Diagnostic Services of Manitoba sites for the laboratory diagnosis of toxigenic C difficile. RESULT: One year of retrospective C difficile data using the proposed algorithm was reported. Of 5695 stool samples tested, 9.1% (n=517) had toxigenic C difficile. Sixty per cent (310 of 517) of toxigenic C difficile stools were detected following the first two steps of the algorithm. CTN confirmation of GDH-positive, toxin A- and B-negative assays resulted in detection of an additional 37.7% (198 of 517) of toxigenic C difficile. Culture of the third specimen, from patients who had two previous negative specimens, detected an additional 2.32% (12 of 517) of toxigenic C difficile samples. DISCUSSION: Using GDH antigen as the screening and toxin A and B as confirmatory test for C difficile, 85% of specimens were reported negative or positive within 4 h. Without CTN confirmation for GDH antigen and toxin A and B discordant results, 37% (195 of 517) of toxigenic C difficile stools would have been missed. Following the algorithm, culture was needed for only 2.72% of all specimens submitted for C difficile testing. CONCLUSION: The overview of the data illustrated the significance of each stage of this four-step C difficile algorithm and emphasized the value of using CTN assay and culture as parts of an algorithm that ensures accurate diagnosis of toxigenic C difficile. PMID:24421808
NASA Astrophysics Data System (ADS)
Qian, Tingting; Wang, Lianlian; Lu, Guanghua
2017-07-01
Radar correlated imaging (RCI) introduces the optical correlated imaging technology to traditional microwave imaging, which has raised widespread concern recently. Conventional RCI methods neglect the structural information of complex extended target, which makes the quality of recovery result not really perfect, thus a novel combination of negative exponential restraint and total variation (NER-TV) algorithm for extended target imaging is proposed in this paper. The sparsity is measured by a sequential order one negative exponential function, then the 2D total variation technique is introduced to design a novel optimization problem for extended target imaging. And the proven alternating direction method of multipliers is applied to solve the new problem. Experimental results show that the proposed algorithm could realize high resolution imaging efficiently for extended target.
A Food Chain Algorithm for Capacitated Vehicle Routing Problem with Recycling in Reverse Logistics
NASA Astrophysics Data System (ADS)
Song, Qiang; Gao, Xuexia; Santos, Emmanuel T.
2015-12-01
This paper introduces the capacitated vehicle routing problem with recycling in reverse logistics, and designs a food chain algorithm for it. Some illustrative examples are selected to conduct simulation and comparison. Numerical results show that the performance of the food chain algorithm is better than the genetic algorithm, particle swarm optimization as well as quantum evolutionary algorithm.
Predictive Model of Linear Antimicrobial Peptides Active against Gram-Negative Bacteria.
Vishnepolsky, Boris; Gabrielian, Andrei; Rosenthal, Alex; Hurt, Darrell E; Tartakovsky, Michael; Managadze, Grigol; Grigolava, Maya; Makhatadze, George I; Pirtskhalava, Malak
2018-05-29
Antimicrobial peptides (AMPs) have been identified as a potential new class of anti-infectives for drug development. There are a lot of computational methods that try to predict AMPs. Most of them can only predict if a peptide will show any antimicrobial potency, but to the best of our knowledge, there are no tools which can predict antimicrobial potency against particular strains. Here we present a predictive model of linear AMPs being active against particular Gram-negative strains relying on a semi-supervised machine-learning approach with a density-based clustering algorithm. The algorithm can well distinguish peptides active against particular strains from others which may also be active but not against the considered strain. The available AMP prediction tools cannot carry out this task. The prediction tool based on the algorithm suggested herein is available on https://dbaasp.org.
NASA Astrophysics Data System (ADS)
Rahnamay Naeini, M.; Sadegh, M.; AghaKouchak, A.; Hsu, K. L.; Sorooshian, S.; Yang, T.
2017-12-01
Meta-Heuristic optimization algorithms have gained a great deal of attention in a wide variety of fields. Simplicity and flexibility of these algorithms, along with their robustness, make them attractive tools for solving optimization problems. Different optimization methods, however, hold algorithm-specific strengths and limitations. Performance of each individual algorithm obeys the "No-Free-Lunch" theorem, which means a single algorithm cannot consistently outperform all possible optimization problems over a variety of problems. From users' perspective, it is a tedious process to compare, validate, and select the best-performing algorithm for a specific problem or a set of test cases. In this study, we introduce a new hybrid optimization framework, entitled Shuffled Complex-Self Adaptive Hybrid EvoLution (SC-SAHEL), which combines the strengths of different evolutionary algorithms (EAs) in a parallel computing scheme, and allows users to select the most suitable algorithm tailored to the problem at hand. The concept of SC-SAHEL is to execute different EAs as separate parallel search cores, and let all participating EAs to compete during the course of the search. The newly developed SC-SAHEL algorithm is designed to automatically select, the best performing algorithm for the given optimization problem. This algorithm is rigorously effective in finding the global optimum for several strenuous benchmark test functions, and computationally efficient as compared to individual EAs. We benchmark the proposed SC-SAHEL algorithm over 29 conceptual test functions, and two real-world case studies - one hydropower reservoir model and one hydrological model (SAC-SMA). Results show that the proposed framework outperforms individual EAs in an absolute majority of the test problems, and can provide competitive results to the fittest EA algorithm with more comprehensive information during the search. The proposed framework is also flexible for merging additional EAs, boundary-handling techniques, and sampling schemes, and has good potential to be used in Water-Energy system optimal operation and management.
USDA-ARS?s Scientific Manuscript database
Palmer amaranth (Amaranthus palmeri S. Wats.) invasion negatively impacts cotton (Gossypium hirsutum L.) production systems throughout the United States. The objective of this study was to evaluate canopy hyperspectral narrowband data as input into the random forest machine learning algorithm to dis...
Tag SNP selection via a genetic algorithm.
Mahdevar, Ghasem; Zahiri, Javad; Sadeghi, Mehdi; Nowzari-Dalini, Abbas; Ahrabian, Hayedeh
2010-10-01
Single Nucleotide Polymorphisms (SNPs) provide valuable information on human evolutionary history and may lead us to identify genetic variants responsible for human complex diseases. Unfortunately, molecular haplotyping methods are costly, laborious, and time consuming; therefore, algorithms for constructing full haplotype patterns from small available data through computational methods, Tag SNP selection problem, are convenient and attractive. This problem is proved to be an NP-hard problem, so heuristic methods may be useful. In this paper we present a heuristic method based on genetic algorithm to find reasonable solution within acceptable time. The algorithm was tested on a variety of simulated and experimental data. In comparison with the exact algorithm, based on brute force approach, results show that our method can obtain optimal solutions in almost all cases and runs much faster than exact algorithm when the number of SNP sites is large. Our software is available upon request to the corresponding author.
Genetic Algorithms Applied to Multi-Objective Aerodynamic Shape Optimization
NASA Technical Reports Server (NTRS)
Holst, Terry L.
2004-01-01
A genetic algorithm approach suitable for solving multi-objective optimization problems is described and evaluated using a series of aerodynamic shape optimization problems. Several new features including two variations of a binning selection algorithm and a gene-space transformation procedure are included. The genetic algorithm is suitable for finding pareto optimal solutions in search spaces that are defined by any number of genes and that contain any number of local extrema. A new masking array capability is included allowing any gene or gene subset to be eliminated as decision variables from the design space. This allows determination of the effect of a single gene or gene subset on the pareto optimal solution. Results indicate that the genetic algorithm optimization approach is flexible in application and reliable. The binning selection algorithms generally provide pareto front quality enhancements and moderate convergence efficiency improvements for most of the problems solved.
Genetic Algorithms Applied to Multi-Objective Aerodynamic Shape Optimization
NASA Technical Reports Server (NTRS)
Holst, Terry L.
2005-01-01
A genetic algorithm approach suitable for solving multi-objective problems is described and evaluated using a series of aerodynamic shape optimization problems. Several new features including two variations of a binning selection algorithm and a gene-space transformation procedure are included. The genetic algorithm is suitable for finding Pareto optimal solutions in search spaces that are defined by any number of genes and that contain any number of local extrema. A new masking array capability is included allowing any gene or gene subset to be eliminated as decision variables from the design space. This allows determination of the effect of a single gene or gene subset on the Pareto optimal solution. Results indicate that the genetic algorithm optimization approach is flexible in application and reliable. The binning selection algorithms generally provide Pareto front quality enhancements and moderate convergence efficiency improvements for most of the problems solved.
An OMIC biomarker detection algorithm TriVote and its application in methylomic biomarker detection.
Xu, Cheng; Liu, Jiamei; Yang, Weifeng; Shu, Yayun; Wei, Zhipeng; Zheng, Weiwei; Feng, Xin; Zhou, Fengfeng
2018-04-01
Transcriptomic and methylomic patterns represent two major OMIC data sources impacted by both inheritable genetic information and environmental factors, and have been widely used as disease diagnosis and prognosis biomarkers. Modern transcriptomic and methylomic profiling technologies detect the status of tens of thousands or even millions of probing residues in the human genome, and introduce a major computational challenge for the existing feature selection algorithms. This study proposes a three-step feature selection algorithm, TriVote, to detect a subset of transcriptomic or methylomic residues with highly accurate binary classification performance. TriVote outperforms both filter and wrapper feature selection algorithms with both higher classification accuracy and smaller feature number on 17 transcriptomes and two methylomes. Biological functions of the methylome biomarkers detected by TriVote were discussed for their disease associations. An easy-to-use Python package is also released to facilitate the further applications.
An intelligent identification algorithm for the monoclonal picking instrument
NASA Astrophysics Data System (ADS)
Yan, Hua; Zhang, Rongfu; Yuan, Xujun; Wang, Qun
2017-11-01
The traditional colony selection is mainly operated by manual mode, which takes on low efficiency and strong subjectivity. Therefore, it is important to develop an automatic monoclonal-picking instrument. The critical stage of the automatic monoclonal-picking and intelligent optimal selection is intelligent identification algorithm. An auto-screening algorithm based on Support Vector Machine (SVM) is proposed in this paper, which uses the supervised learning method, which combined with the colony morphological characteristics to classify the colony accurately. Furthermore, through the basic morphological features of the colony, system can figure out a series of morphological parameters step by step. Through the establishment of maximal margin classifier, and based on the analysis of the growth trend of the colony, the selection of the monoclonal colony was carried out. The experimental results showed that the auto-screening algorithm could screen out the regular colony from the other, which meets the requirement of various parameters.
NASA Astrophysics Data System (ADS)
Ben Abdessalem, Anis; Dervilis, Nikolaos; Wagg, David; Worden, Keith
2018-01-01
This paper will introduce the use of the approximate Bayesian computation (ABC) algorithm for model selection and parameter estimation in structural dynamics. ABC is a likelihood-free method typically used when the likelihood function is either intractable or cannot be approached in a closed form. To circumvent the evaluation of the likelihood function, simulation from a forward model is at the core of the ABC algorithm. The algorithm offers the possibility to use different metrics and summary statistics representative of the data to carry out Bayesian inference. The efficacy of the algorithm in structural dynamics is demonstrated through three different illustrative examples of nonlinear system identification: cubic and cubic-quintic models, the Bouc-Wen model and the Duffing oscillator. The obtained results suggest that ABC is a promising alternative to deal with model selection and parameter estimation issues, specifically for systems with complex behaviours.
Utilization of Ancillary Data Sets for SMAP Algorithm Development and Product Generation
NASA Technical Reports Server (NTRS)
ONeill, P.; Podest, E.; Njoku, E.
2011-01-01
Algorithms being developed for the Soil Moisture Active Passive (SMAP) mission require a variety of both static and ancillary data. The selection of the most appropriate source for each ancillary data parameter is driven by a number of considerations, including accuracy, latency, availability, and consistency across all SMAP products and with SMOS (Soil Moisture Ocean Salinity). It is anticipated that initial selection of all ancillary datasets, which are needed for ongoing algorithm development activities on the SMAP algorithm testbed at JPL, will be completed within the year. These datasets will be updated as new or improved sources become available, and all selections and changes will be documented for the benefit of the user community. Wise choices in ancillary data will help to enable SMAP to provide new global measurements of soil moisture and freeze/thaw state at the targeted accuracy necessary to tackle hydrologically-relevant societal issues.
Negated bio-events: analysis and identification
2013-01-01
Background Negation occurs frequently in scientific literature, especially in biomedical literature. It has previously been reported that around 13% of sentences found in biomedical research articles contain negation. Historically, the main motivation for identifying negated events has been to ensure their exclusion from lists of extracted interactions. However, recently, there has been a growing interest in negative results, which has resulted in negation detection being identified as a key challenge in biomedical relation extraction. In this article, we focus on the problem of identifying negated bio-events, given gold standard event annotations. Results We have conducted a detailed analysis of three open access bio-event corpora containing negation information (i.e., GENIA Event, BioInfer and BioNLP’09 ST), and have identified the main types of negated bio-events. We have analysed the key aspects of a machine learning solution to the problem of detecting negated events, including selection of negation cues, feature engineering and the choice of learning algorithm. Combining the best solutions for each aspect of the problem, we propose a novel framework for the identification of negated bio-events. We have evaluated our system on each of the three open access corpora mentioned above. The performance of the system significantly surpasses the best results previously reported on the BioNLP’09 ST corpus, and achieves even better results on the GENIA Event and BioInfer corpora, both of which contain more varied and complex events. Conclusions Recently, in the field of biomedical text mining, the development and enhancement of event-based systems has received significant interest. The ability to identify negated events is a key performance element for these systems. We have conducted the first detailed study on the analysis and identification of negated bio-events. Our proposed framework can be integrated with state-of-the-art event extraction systems. The resulting systems will be able to extract bio-events with attached polarities from textual documents, which can serve as the foundation for more elaborate systems that are able to detect mutually contradicting bio-events. PMID:23323936
A fast fully constrained geometric unmixing of hyperspectral images
NASA Astrophysics Data System (ADS)
Zhou, Xin; Li, Xiao-run; Cui, Jian-tao; Zhao, Liao-ying; Zheng, Jun-peng
2014-11-01
A great challenge in hyperspectral image analysis is decomposing a mixed pixel into a collection of endmembers and their corresponding abundance fractions. This paper presents an improved implementation of Barycentric Coordinate approach to unmix hyperspectral images, integrating with the Most-Negative Remove Projection method to meet the abundance sum-to-one constraint (ASC) and abundance non-negativity constraint (ANC). The original barycentric coordinate approach interprets the endmember unmixing problem as a simplex volume ratio problem, which is solved by calculate the determinants of two augmented matrix. One consists of all the members and the other consist of the to-be-unmixed pixel and all the endmembers except for the one corresponding to the specific abundance that is to be estimated. In this paper, we first modified the algorithm of Barycentric Coordinate approach by bringing in the Matrix Determinant Lemma to simplify the unmixing process, which makes the calculation only contains linear matrix and vector operations. So, the matrix determinant calculation of every pixel, as the original algorithm did, is avoided. By the end of this step, the estimated abundance meet the ASC constraint. Then, the Most-Negative Remove Projection method is used to make the abundance fractions meet the full constraints. This algorithm is demonstrated both on synthetic and real images. The resulting algorithm yields the abundance maps that are similar to those obtained by FCLS, while the runtime is outperformed as its computational simplicity.
Trajectory data privacy protection based on differential privacy mechanism
NASA Astrophysics Data System (ADS)
Gu, Ke; Yang, Lihao; Liu, Yongzhi; Liao, Niandong
2018-05-01
In this paper, we propose a trajectory data privacy protection scheme based on differential privacy mechanism. In the proposed scheme, the algorithm first selects the protected points from the user’s trajectory data; secondly, the algorithm forms the polygon according to the protected points and the adjacent and high frequent accessed points that are selected from the accessing point database, then the algorithm calculates the polygon centroids; finally, the noises are added to the polygon centroids by the differential privacy method, and the polygon centroids replace the protected points, and then the algorithm constructs and issues the new trajectory data. The experiments show that the running time of the proposed algorithms is fast, the privacy protection of the scheme is effective and the data usability of the scheme is higher.
An AK-LDMeans algorithm based on image clustering
NASA Astrophysics Data System (ADS)
Chen, Huimin; Li, Xingwei; Zhang, Yongbin; Chen, Nan
2018-03-01
Clustering is an effective analytical technique for handling unmarked data for value mining. Its ultimate goal is to mark unclassified data quickly and correctly. We use the roadmap for the current image processing as the experimental background. In this paper, we propose an AK-LDMeans algorithm to automatically lock the K value by designing the Kcost fold line, and then use the long-distance high-density method to select the clustering centers to further replace the traditional initial clustering center selection method, which further improves the efficiency and accuracy of the traditional K-Means Algorithm. And the experimental results are compared with the current clustering algorithm and the results are obtained. The algorithm can provide effective reference value in the fields of image processing, machine vision and data mining.
Feature Selection Has a Large Impact on One-Class Classification Accuracy for MicroRNAs in Plants.
Yousef, Malik; Saçar Demirci, Müşerref Duygu; Khalifa, Waleed; Allmer, Jens
2016-01-01
MicroRNAs (miRNAs) are short RNA sequences involved in posttranscriptional gene regulation. Their experimental analysis is complicated and, therefore, needs to be supplemented with computational miRNA detection. Currently computational miRNA detection is mainly performed using machine learning and in particular two-class classification. For machine learning, the miRNAs need to be parametrized and more than 700 features have been described. Positive training examples for machine learning are readily available, but negative data is hard to come by. Therefore, it seems prerogative to use one-class classification instead of two-class classification. Previously, we were able to almost reach two-class classification accuracy using one-class classifiers. In this work, we employ feature selection procedures in conjunction with one-class classification and show that there is up to 36% difference in accuracy among these feature selection methods. The best feature set allowed the training of a one-class classifier which achieved an average accuracy of ~95.6% thereby outperforming previous two-class-based plant miRNA detection approaches by about 0.5%. We believe that this can be improved upon in the future by rigorous filtering of the positive training examples and by improving current feature clustering algorithms to better target pre-miRNA feature selection.
Conroy-Beam, Daniel; Buss, David M.
2016-01-01
Prior mate preference research has focused on the content of mate preferences. Yet in real life, people must select mates among potentials who vary along myriad dimensions. How do people incorporate information on many different mate preferences in order to choose which partner to pursue? Here, in Study 1, we compare seven candidate algorithms for integrating multiple mate preferences in a competitive agent-based model of human mate choice evolution. This model shows that a Euclidean algorithm is the most evolvable solution to the problem of selecting fitness-beneficial mates. Next, across three studies of actual couples (Study 2: n = 214; Study 3: n = 259; Study 4: n = 294) we apply the Euclidean algorithm toward predicting mate preference fulfillment overall and preference fulfillment as a function of mate value. Consistent with the hypothesis that mate preferences are integrated according to a Euclidean algorithm, we find that actual mates lie close in multidimensional preference space to the preferences of their partners. Moreover, this Euclidean preference fulfillment is greater for people who are higher in mate value, highlighting theoretically-predictable individual differences in who gets what they want. These new Euclidean tools have important implications for understanding real-world dynamics of mate selection. PMID:27276030
Hemmateenejad, Bahram; Akhond, Morteza; Miri, Ramin; Shamsipur, Mojtaba
2003-01-01
A QSAR algorithm, principal component-genetic algorithm-artificial neural network (PC-GA-ANN), has been applied to a set of newly synthesized calcium channel blockers, which are of special interest because of their role in cardiac diseases. A data set of 124 1,4-dihydropyridines bearing different ester substituents at the C-3 and C-5 positions of the dihydropyridine ring and nitroimidazolyl, phenylimidazolyl, and methylsulfonylimidazolyl groups at the C-4 position with known Ca(2+) channel binding affinities was employed in this study. Ten different sets of descriptors (837 descriptors) were calculated for each molecule. The principal component analysis was used to compress the descriptor groups into principal components. The most significant descriptors of each set were selected and used as input for the ANN. The genetic algorithm (GA) was used for the selection of the best set of extracted principal components. A feed forward artificial neural network with a back-propagation of error algorithm was used to process the nonlinear relationship between the selected principal components and biological activity of the dihydropyridines. A comparison between PC-GA-ANN and routine PC-ANN shows that the first model yields better prediction ability.
Conroy-Beam, Daniel; Buss, David M
2016-01-01
Prior mate preference research has focused on the content of mate preferences. Yet in real life, people must select mates among potentials who vary along myriad dimensions. How do people incorporate information on many different mate preferences in order to choose which partner to pursue? Here, in Study 1, we compare seven candidate algorithms for integrating multiple mate preferences in a competitive agent-based model of human mate choice evolution. This model shows that a Euclidean algorithm is the most evolvable solution to the problem of selecting fitness-beneficial mates. Next, across three studies of actual couples (Study 2: n = 214; Study 3: n = 259; Study 4: n = 294) we apply the Euclidean algorithm toward predicting mate preference fulfillment overall and preference fulfillment as a function of mate value. Consistent with the hypothesis that mate preferences are integrated according to a Euclidean algorithm, we find that actual mates lie close in multidimensional preference space to the preferences of their partners. Moreover, this Euclidean preference fulfillment is greater for people who are higher in mate value, highlighting theoretically-predictable individual differences in who gets what they want. These new Euclidean tools have important implications for understanding real-world dynamics of mate selection.
A clinical decision-making algorithm for penicillin allergy.
Soria, Angèle; Autegarden, Elodie; Amsler, Emmanuelle; Gaouar, Hafida; Vial, Amandine; Francès, Camille; Autegarden, Jean-Eric
2017-12-01
About 10% of subjects report suspected penicillin allergy, but 85-90% of these patients are not truly allergic and could safely receive beta-lactam antibiotics Objective: To design and validate a clinical decision-making algorithm, based on anamnesis (chronology, severity, and duration of the suspected allergic reactions) and reaching a 100% sensitivity and negative predictive value, to assess allergy risk related to a penicillin prescription in general practise. All patients were included prospectively and explorated based on ENDA/EAACI recommendations. Results of penicillin allergy work-up (gold standard) were compared with results of the algorithm. Allergological work-up diagnosed penicillin hypersensitivity in 41/259 patients (15.8%) [95% CI: 11.5-20.3]. Three of these patients were diagnosed as having immediate-type hypersensitivity to penicillin, but had been misdiagnosed as low risk patients using the clinical algorithm. Thus, the sensitivity and negative predictive value of the algorithm were 92.7% [95% CI: 80.1-98.5] and 96.3% [95% CI: 89.6-99.2], respectively, and the probability that a patient with true penicillin allergy had been misclassified was 3.7% [95% CI: 0.8-10.4]. Although the risk of misclassification is low, we cannot recommend the use of this algorithm in general practice. However, the algorithm can be useful in emergency situations in hospital settings. Key messages True penicillin allergy is considerably lower than alleged penicillin allergy (15.8%; 41 of the 259 patients with suspected penicillin allergy). A clinical algorithm based on the patient's clinical history of the supposed allergic event to penicillin misclassified 3/41 (3.7%) truly allergic patients.
Efficient Variable Selection Method for Exposure Variables on Binary Data
NASA Astrophysics Data System (ADS)
Ohno, Manabu; Tarumi, Tomoyuki
In this paper, we propose a new variable selection method for "robust" exposure variables. We define "robust" as property that the same variable can select among original data and perturbed data. There are few studies of effective for the selection method. The problem that selects exposure variables is almost the same as a problem that extracts correlation rules without robustness. [Brin 97] is suggested that correlation rules are possible to extract efficiently using chi-squared statistic of contingency table having monotone property on binary data. But the chi-squared value does not have monotone property, so it's is easy to judge the method to be not independent with an increase in the dimension though the variable set is completely independent, and the method is not usable in variable selection for robust exposure variables. We assume anti-monotone property for independent variables to select robust independent variables and use the apriori algorithm for it. The apriori algorithm is one of the algorithms which find association rules from the market basket data. The algorithm use anti-monotone property on the support which is defined by association rules. But independent property does not completely have anti-monotone property on the AIC of independent probability model, but the tendency to have anti-monotone property is strong. Therefore, selected variables with anti-monotone property on the AIC have robustness. Our method judges whether a certain variable is exposure variable for the independent variable using previous comparison of the AIC. Our numerical experiments show that our method can select robust exposure variables efficiently and precisely.
Adaptive Electronic Camouflage Using Texture Synthesis
2012-04-01
algorithm begins by computing the GLCMs, GIN and GOUT , of the input image (e.g., image of local environment) and output image (randomly generated...respectively. The algorithm randomly selects a pixel from the output image and cycles its gray-level through all values. For each value, GOUT is updated...The value of the selected pixel is permanently changed to the gray-level value that minimizes the error between GIN and GOUT . Without selecting a
Construction project selection with the use of fuzzy preference relation
NASA Astrophysics Data System (ADS)
Ibadov, Nabi
2016-06-01
In the article, author describes the problem of the construction project variant selection during pre-investment phase. As a solution, the algorithm basing on fuzzy preference relation is presented. The article provides an example of the algorithm used for selection of the best variant for construction project. The choice is made basing on criteria such as: net present value (NPV), level of technological difficulty, financing possibilities, and level of organizational difficulty.
Unlocking the spatial inversion of large scanning magnetic microscopy datasets
NASA Astrophysics Data System (ADS)
Myre, J. M.; Lascu, I.; Andrade Lima, E.; Feinberg, J. M.; Saar, M. O.; Weiss, B. P.
2013-12-01
Modern scanning magnetic microscopy provides the ability to perform high-resolution, ultra-high sensitivity moment magnetometry, with spatial resolutions better than 10^-4 m and magnetic moments as weak as 10^-16 Am^2. These microscopy capabilities have enhanced numerous magnetic studies, including investigations of the paleointensity of the Earth's magnetic field, shock magnetization and demagnetization of impacts, magnetostratigraphy, the magnetic record in speleothems, and the records of ancient core dynamos of planetary bodies. A common component among many studies utilizing scanning magnetic microscopy is solving an inverse problem to determine the non-negative magnitude of the magnetic moments that produce the measured component of the magnetic field. The two most frequently used methods to solve this inverse problem are classic fast Fourier techniques in the frequency domain and non-negative least squares (NNLS) methods in the spatial domain. Although Fourier techniques are extremely fast, they typically violate non-negativity and it is difficult to implement constraints associated with the space domain. NNLS methods do not violate non-negativity, but have typically been computation time prohibitive for samples of practical size or resolution. Existing NNLS methods use multiple techniques to attain tractable computation. To reduce computation time in the past, typically sample size or scan resolution would have to be reduced. Similarly, multiple inversions of smaller sample subdivisions can be performed, although this frequently results in undesirable artifacts at subdivision boundaries. Dipole interactions can also be filtered to only compute interactions above a threshold which enables the use of sparse methods through artificial sparsity. To improve upon existing spatial domain techniques, we present the application of the TNT algorithm, named TNT as it is a "dynamite" non-negative least squares algorithm which enhances the performance and accuracy of spatial domain inversions. We show that the TNT algorithm reduces the execution time of spatial domain inversions from months to hours and that inverse solution accuracy is improved as the TNT algorithm naturally produces solutions with small norms. Using sIRM and NRM measures of multiple synthetic and natural samples we show that the capabilities of the TNT algorithm allow very large samples to be inverted without the need for alternative techniques to make the problems tractable. Ultimately, the TNT algorithm enables accurate spatial domain analysis of scanning magnetic microscopy data on an accelerated time scale that renders spatial domain analyses tractable for numerous studies, including searches for the best fit of unidirectional magnetization direction and high-resolution step-wise magnetization and demagnetization.
Probabilistic pathway construction.
Yousofshahi, Mona; Lee, Kyongbum; Hassoun, Soha
2011-07-01
Expression of novel synthesis pathways in host organisms amenable to genetic manipulations has emerged as an attractive metabolic engineering strategy to overproduce natural products, biofuels, biopolymers and other commercially useful metabolites. We present a pathway construction algorithm for identifying viable synthesis pathways compatible with balanced cell growth. Rather than exhaustive exploration, we investigate probabilistic selection of reactions to construct the pathways. Three different selection schemes are investigated for the selection of reactions: high metabolite connectivity, low connectivity and uniformly random. For all case studies, which involved a diverse set of target metabolites, the uniformly random selection scheme resulted in the highest average maximum yield. When compared to an exhaustive search enumerating all possible reaction routes, our probabilistic algorithm returned nearly identical distributions of yields, while requiring far less computing time (minutes vs. years). The pathways identified by our algorithm have previously been confirmed in the literature as viable, high-yield synthesis routes. Prospectively, our algorithm could facilitate the design of novel, non-native synthesis routes by efficiently exploring the diversity of biochemical transformations in nature. Copyright © 2011 Elsevier Inc. All rights reserved.
NASA Technical Reports Server (NTRS)
Carreno, Victor A.
2002-01-01
The KB3D algorithm is a pairwise conflict detection and resolution (CD&R) algorithm. It detects and generates trajectory vectoring for an aircraft which has been predicted to be in an airspace minima violation within a given look-ahead time. It has been proven, using mechanized theorem proving techniques, that for a pair of aircraft, KB3D produces at least one vectoring solution and that all solutions produced are correct. Although solutions produced by the algorithm are mathematically correct, they might not be physically executable by an aircraft or might not solve multiple aircraft conflicts. This paper describes a simple solution selection method which assesses all solutions generated by KB3D and determines the solution to be executed. The solution selection method and KB3D are evaluated using a simulation in which N aircraft fly in a free-flight environment and each aircraft in the simulation uses KB3D to maintain separation. Specifically, the solution selection method filters KB3D solutions which are procedurally undesirable or physically not executable and uses a predetermined criteria for selection.
Selective Iterative Waterfilling for Digital Subscriber Lines
NASA Astrophysics Data System (ADS)
Xu, Yang; Le-Ngoc, Tho; Panigrahi, Saswat
2007-12-01
This paper presents a high-performance, low-complexity, quasi-distributed dynamic spectrum management (DSM) algorithm suitable for DSL systems. We analytically demonstrate that the rate degradation of the distributed iterative waterfilling (IW) algorithm in near-far scenarios is caused by the insufficient utilization of all available frequency and power resources due to its nature of noncooperative game theoretic formulation. Inspired by this observation, we propose the selective IW (SIW) algorithm that can considerably alleviate the performance degradation of IW by applying IW selectively to different groups of users over different frequency bands so that all the available resources can be fully utilized. For [InlineEquation not available: see fulltext.] users, the proposed SIW algorithm needs at most [InlineEquation not available: see fulltext.] times the complexity of the IW algorithm, and is much simpler than the centralized optimal spectrum balancing (OSB), while it can offer a rate performance much better than that of the IW and close to the maximum possible rate region computed by the OSB in realistic near-far DSL scenarios. Furthermore, its predominantly distributed structure makes it suitable for DSL implementation.
Adaptive firefly algorithm: parameter analysis and its application.
Cheung, Ngaam J; Ding, Xue-Ming; Shen, Hong-Bin
2014-01-01
As a nature-inspired search algorithm, firefly algorithm (FA) has several control parameters, which may have great effects on its performance. In this study, we investigate the parameter selection and adaptation strategies in a modified firefly algorithm - adaptive firefly algorithm (AdaFa). There are three strategies in AdaFa including (1) a distance-based light absorption coefficient; (2) a gray coefficient enhancing fireflies to share difference information from attractive ones efficiently; and (3) five different dynamic strategies for the randomization parameter. Promising selections of parameters in the strategies are analyzed to guarantee the efficient performance of AdaFa. AdaFa is validated over widely used benchmark functions, and the numerical experiments and statistical tests yield useful conclusions on the strategies and the parameter selections affecting the performance of AdaFa. When applied to the real-world problem - protein tertiary structure prediction, the results demonstrated improved variants can rebuild the tertiary structure with the average root mean square deviation less than 0.4Å and 1.5Å from the native constrains with noise free and 10% Gaussian white noise.
Adaptive Firefly Algorithm: Parameter Analysis and its Application
Shen, Hong-Bin
2014-01-01
As a nature-inspired search algorithm, firefly algorithm (FA) has several control parameters, which may have great effects on its performance. In this study, we investigate the parameter selection and adaptation strategies in a modified firefly algorithm — adaptive firefly algorithm (AdaFa). There are three strategies in AdaFa including (1) a distance-based light absorption coefficient; (2) a gray coefficient enhancing fireflies to share difference information from attractive ones efficiently; and (3) five different dynamic strategies for the randomization parameter. Promising selections of parameters in the strategies are analyzed to guarantee the efficient performance of AdaFa. AdaFa is validated over widely used benchmark functions, and the numerical experiments and statistical tests yield useful conclusions on the strategies and the parameter selections affecting the performance of AdaFa. When applied to the real-world problem — protein tertiary structure prediction, the results demonstrated improved variants can rebuild the tertiary structure with the average root mean square deviation less than 0.4Å and 1.5Å from the native constrains with noise free and 10% Gaussian white noise. PMID:25397812
Video error concealment using block matching and frequency selective extrapolation algorithms
NASA Astrophysics Data System (ADS)
P. K., Rajani; Khaparde, Arti
2017-06-01
Error Concealment (EC) is a technique at the decoder side to hide the transmission errors. It is done by analyzing the spatial or temporal information from available video frames. It is very important to recover distorted video because they are used for various applications such as video-telephone, video-conference, TV, DVD, internet video streaming, video games etc .Retransmission-based and resilient-based methods, are also used for error removal. But these methods add delay and redundant data. So error concealment is the best option for error hiding. In this paper, the error concealment methods such as Block Matching error concealment algorithm is compared with Frequency Selective Extrapolation algorithm. Both the works are based on concealment of manually error video frames as input. The parameter used for objective quality measurement was PSNR (Peak Signal to Noise Ratio) and SSIM(Structural Similarity Index). The original video frames along with error video frames are compared with both the Error concealment algorithms. According to simulation results, Frequency Selective Extrapolation is showing better quality measures such as 48% improved PSNR and 94% increased SSIM than Block Matching Algorithm.
An investigation of messy genetic algorithms
NASA Technical Reports Server (NTRS)
Goldberg, David E.; Deb, Kalyanmoy; Korb, Bradley
1990-01-01
Genetic algorithms (GAs) are search procedures based on the mechanics of natural selection and natural genetics. They combine the use of string codings or artificial chromosomes and populations with the selective and juxtapositional power of reproduction and recombination to motivate a surprisingly powerful search heuristic in many problems. Despite their empirical success, there has been a long standing objection to the use of GAs in arbitrarily difficult problems. A new approach was launched. Results to a 30-bit, order-three-deception problem were obtained using a new type of genetic algorithm called a messy genetic algorithm (mGAs). Messy genetic algorithms combine the use of variable-length strings, a two-phase selection scheme, and messy genetic operators to effect a solution to the fixed-coding problem of standard simple GAs. The results of the study of mGAs in problems with nonuniform subfunction scale and size are presented. The mGA approach is summarized, both its operation and the theory of its use. Experiments on problems of varying scale, varying building-block size, and combined varying scale and size are presented.
Development of the 7-Item Binge-Eating Disorder Screener (BEDS-7)
Deal, Linda S.; DiBenedetti, Dana B.; Nelson, Lauren; Fehnel, Sheri E.; Brown, T. Michelle
2016-01-01
Objective Develop a brief, patient-reported screening tool designed to identify individuals with probable binge-eating disorder (BED) for further evaluation or referral to specialists. Methods Items were developed on the basis of the DSM-5 diagnostic criteria, existing tools, and input from 3 clinical experts (January 2014). Items were then refined in cognitive debriefing interviews with participants self-reporting BED characteristics (March 2014) and piloted in a multisite, cross-sectional, prospective, noninterventional study consisting of a semistructured diagnostic interview (to diagnose BED) and administration of the pilot Binge-Eating Disorder Screener (BEDS), Binge Eating Scale (BES), and RAND 36-Item Short-Form Health Survey (RAND-36) (June 2014–July 2014). The sensitivity and specificity of classification algorithms (formed from the pilot BEDS item-level responses) in predicting BED diagnosis were evaluated. The final algorithm was selected to minimize false negatives and false positives, while utilizing the fewest number of BEDS items. Results Starting with the initial BEDS item pool (20 items), the 13-item pilot BEDS resulted from the cognitive debriefing interviews (n = 13). Of the 97 participants in the noninterventional study, 16 were diagnosed with BED (10/62 female, 16%; 6/35 male, 17%). Seven BEDS items (BEDS-7) yielded 100% sensitivity and 38.7% specificity. Participants correctly identified (true positives) had poorer BES scores and RAND-36 scores than participants identified as true negatives. Conclusions Implementation of the brief, patient-reported BEDS-7 in real-world clinical practice is expected to promote better understanding of BED characteristics and help physicians identify patients who may have BED. PMID:27486542
Development of the 7-Item Binge-Eating Disorder Screener (BEDS-7).
Herman, Barry K; Deal, Linda S; DiBenedetti, Dana B; Nelson, Lauren; Fehnel, Sheri E; Brown, T Michelle
2016-01-01
Develop a brief, patient-reported screening tool designed to identify individuals with probable binge-eating disorder (BED) for further evaluation or referral to specialists. Items were developed on the basis of the DSM-5 diagnostic criteria, existing tools, and input from 3 clinical experts (January 2014). Items were then refined in cognitive debriefing interviews with participants self-reporting BED characteristics (March 2014) and piloted in a multisite, cross-sectional, prospective, noninterventional study consisting of a semistructured diagnostic interview (to diagnose BED) and administration of the pilot Binge-Eating Disorder Screener (BEDS), Binge Eating Scale (BES), and RAND 36-Item Short-Form Health Survey (RAND-36) (June 2014-July 2014). The sensitivity and specificity of classification algorithms (formed from the pilot BEDS item-level responses) in predicting BED diagnosis were evaluated. The final algorithm was selected to minimize false negatives and false positives, while utilizing the fewest number of BEDS items. Starting with the initial BEDS item pool (20 items), the 13-item pilot BEDS resulted from the cognitive debriefing interviews (n = 13). Of the 97 participants in the noninterventional study, 16 were diagnosed with BED (10/62 female, 16%; 6/35 male, 17%). Seven BEDS items (BEDS-7) yielded 100% sensitivity and 38.7% specificity. Participants correctly identified (true positives) had poorer BES scores and RAND-36 scores than participants identified as true negatives. Implementation of the brief, patient-reported BEDS-7 in real-world clinical practice is expected to promote better understanding of BED characteristics and help physicians identify patients who may have BED.
Rönnberg, Jerker; Danielsson, Henrik; Rudner, Mary; Arlinger, Stig; Sternäng, Ola; Wahlin, Ake; Nilsson, Lars-Göran
2011-04-01
To test the relationship between degree of hearing loss and different memory systems in hearing aid users. Structural equation modeling (SEM) was used to study the relationship between auditory and visual acuity and different cognitive and memory functions in an age-hetereogenous subsample of 160 hearing aid users without dementia, drawn from the Swedish prospective cohort aging study known as Betula (L.-G. Nilsson et al., 1997). Hearing loss was selectively and negatively related to episodic and semantic long-term memory (LTM) but not short-term memory (STM) performance. This held true for both ears, even when age was accounted for. Visual acuity alone, or in combination with auditory acuity, did not contribute to any acceptable SEM solution. The overall relationships between hearing loss and memory systems were predicted by the ease of language understanding model (J. Rönnberg, 2003), but the exact mechanisms of episodic memory decline in hearing aid users (i.e., mismatch/disuse, attentional resources, or information degradation) remain open for further experiments. The hearing aid industry should strive to design signal processing algorithms that are cognition friendly.
Compromise Approach-Based Genetic Algorithm for Constrained Multiobjective Portfolio Selection Model
NASA Astrophysics Data System (ADS)
Li, Jun
In this paper, fuzzy set theory is incorporated into a multiobjective portfolio selection model for investors’ taking into three criteria: return, risk and liquidity. The cardinality constraint, the buy-in threshold constraint and the round-lots constraints are considered in the proposed model. To overcome the difficulty of evaluation a large set of efficient solutions and selection of the best one on non-dominated surface, a compromise approach-based genetic algorithm is presented to obtain a compromised solution for the proposed constrained multiobjective portfolio selection model.
Lin, Nan; Jiang, Junhai; Guo, Shicheng; Xiong, Momiao
2015-01-01
Due to the advancement in sensor technology, the growing large medical image data have the ability to visualize the anatomical changes in biological tissues. As a consequence, the medical images have the potential to enhance the diagnosis of disease, the prediction of clinical outcomes and the characterization of disease progression. But in the meantime, the growing data dimensions pose great methodological and computational challenges for the representation and selection of features in image cluster analysis. To address these challenges, we first extend the functional principal component analysis (FPCA) from one dimension to two dimensions to fully capture the space variation of image the signals. The image signals contain a large number of redundant features which provide no additional information for clustering analysis. The widely used methods for removing the irrelevant features are sparse clustering algorithms using a lasso-type penalty to select the features. However, the accuracy of clustering using a lasso-type penalty depends on the selection of the penalty parameters and the threshold value. In practice, they are difficult to determine. Recently, randomized algorithms have received a great deal of attentions in big data analysis. This paper presents a randomized algorithm for accurate feature selection in image clustering analysis. The proposed method is applied to both the liver and kidney cancer histology image data from the TCGA database. The results demonstrate that the randomized feature selection method coupled with functional principal component analysis substantially outperforms the current sparse clustering algorithms in image cluster analysis. PMID:26196383
Discrete Biogeography Based Optimization for Feature Selection in Molecular Signatures.
Liu, Bo; Tian, Meihong; Zhang, Chunhua; Li, Xiangtao
2015-04-01
Biomarker discovery from high-dimensional data is a complex task in the development of efficient cancer diagnoses and classification. However, these data are usually redundant and noisy, and only a subset of them present distinct profiles for different classes of samples. Thus, selecting high discriminative genes from gene expression data has become increasingly interesting in the field of bioinformatics. In this paper, a discrete biogeography based optimization is proposed to select the good subset of informative gene relevant to the classification. In the proposed algorithm, firstly, the fisher-markov selector is used to choose fixed number of gene data. Secondly, to make biogeography based optimization suitable for the feature selection problem; discrete migration model and discrete mutation model are proposed to balance the exploration and exploitation ability. Then, discrete biogeography based optimization, as we called DBBO, is proposed by integrating discrete migration model and discrete mutation model. Finally, the DBBO method is used for feature selection, and three classifiers are used as the classifier with the 10 fold cross-validation method. In order to show the effective and efficiency of the algorithm, the proposed algorithm is tested on four breast cancer dataset benchmarks. Comparison with genetic algorithm, particle swarm optimization, differential evolution algorithm and hybrid biogeography based optimization, experimental results demonstrate that the proposed method is better or at least comparable with previous method from literature when considering the quality of the solutions obtained. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
NASA Astrophysics Data System (ADS)
Wang, Fu; Liu, Bo; Zhang, Lijia; Xin, Xiangjun; Tian, Qinghua; Zhang, Qi; Rao, Lan; Tian, Feng; Luo, Biao; Liu, Yingjun; Tang, Bao
2016-10-01
Elastic Optical Networks are considered to be a promising technology for future high-speed network. In this paper, we propose a RSA algorithm based on the ant colony optimization of minimum consecutiveness loss (ACO-MCL). Based on the effect of the spectrum consecutiveness loss on the pheromone in the ant colony optimization, the path and spectrum of the minimal impact on the network are selected for the service request. When an ant arrives at the destination node from the source node along a path, we assume that this path is selected for the request. We calculate the consecutiveness loss of candidate-neighbor link pairs along this path after the routing and spectrum assignment. Then, the networks update the pheromone according to the value of the consecutiveness loss. We save the path with the smallest value. After multiple iterations of the ant colony optimization, the final selection of the path is assigned for the request. The algorithms are simulated in different networks. The results show that ACO-MCL algorithm performs better in blocking probability and spectrum efficiency than other algorithms. Moreover, the ACO-MCL algorithm can effectively decrease spectrum fragmentation and enhance available spectrum consecutiveness. Compared with other algorithms, the ACO-MCL algorithm can reduce the blocking rate by at least 5.9% in heavy load.
Ye, Fei; Lou, Xin Yuan; Sun, Lin Fu
2017-01-01
This paper proposes a new support vector machine (SVM) optimization scheme based on an improved chaotic fly optimization algorithm (FOA) with a mutation strategy to simultaneously perform parameter setting turning for the SVM and feature selection. In the improved FOA, the chaotic particle initializes the fruit fly swarm location and replaces the expression of distance for the fruit fly to find the food source. However, the proposed mutation strategy uses two distinct generative mechanisms for new food sources at the osphresis phase, allowing the algorithm procedure to search for the optimal solution in both the whole solution space and within the local solution space containing the fruit fly swarm location. In an evaluation based on a group of ten benchmark problems, the proposed algorithm's performance is compared with that of other well-known algorithms, and the results support the superiority of the proposed algorithm. Moreover, this algorithm is successfully applied in a SVM to perform both parameter setting turning for the SVM and feature selection to solve real-world classification problems. This method is called chaotic fruit fly optimization algorithm (CIFOA)-SVM and has been shown to be a more robust and effective optimization method than other well-known methods, particularly in terms of solving the medical diagnosis problem and the credit card problem.
Genetic algorithms and their use in Geophysical Problems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Parker, Paul B.
1999-04-01
Genetic algorithms (GAs), global optimization methods that mimic Darwinian evolution are well suited to the nonlinear inverse problems of geophysics. A standard genetic algorithm selects the best or ''fittest'' models from a ''population'' and then applies operators such as crossover and mutation in order to combine the most successful characteristics of each model and produce fitter models. More sophisticated operators have been developed, but the standard GA usually provides a robust and efficient search. Although the choice of parameter settings such as crossover and mutation rate may depend largely on the type of problem being solved, numerous results show thatmore » certain parameter settings produce optimal performance for a wide range of problems and difficulties. In particular, a low (about half of the inverse of the population size) mutation rate is crucial for optimal results, but the choice of crossover method and rate do not seem to affect performance appreciably. Optimal efficiency is usually achieved with smaller (< 50) populations. Lastly, tournament selection appears to be the best choice of selection methods due to its simplicity and its autoscaling properties. However, if a proportional selection method is used such as roulette wheel selection, fitness scaling is a necessity, and a high scaling factor (> 2.0) should be used for the best performance. Three case studies are presented in which genetic algorithms are used to invert for crustal parameters. The first is an inversion for basement depth at Yucca mountain using gravity data, the second an inversion for velocity structure in the crust of the south island of New Zealand using receiver functions derived from teleseismic events, and the third is a similar receiver function inversion for crustal velocities beneath the Mendocino Triple Junction region of Northern California. The inversions demonstrate that genetic algorithms are effective in solving problems with reasonably large numbers of free parameters and with computationally expensive objective function calculations. More sophisticated techniques are presented for special problems. Niching and island model algorithms are introduced as methods to find multiple, distinct solutions to the nonunique problems that are typically seen in geophysics. Finally, hybrid algorithms are investigated as a way to improve the efficiency of the standard genetic algorithm.« less
Genetic algorithms and their use in geophysical problems
NASA Astrophysics Data System (ADS)
Parker, Paul Bradley
Genetic algorithms (GAs), global optimization methods that mimic Darwinian evolution are well suited to the nonlinear inverse problems of geophysics. A standard genetic algorithm selects the best or "fittest" models from a "population" and then applies operators such as crossover and mutation in order to combine the most successful characteristics of each model and produce fitter models. More sophisticated operators have been developed, but the standard GA usually provides a robust and efficient search. Although the choice of parameter settings such as crossover and mutation rate may depend largely on the type of problem being solved, numerous results show that certain parameter settings produce optimal performance for a wide range of problems and difficulties. In particular, a low (about half of the inverse of the population size) mutation rate is crucial for optimal results, but the choice of crossover method and rate do not seem to affect performance appreciably. Also, optimal efficiency is usually achieved with smaller (<50) populations. Lastly, tournament selection appears to be the best choice of selection methods due to its simplicity and its autoscaling properties. However, if a proportional selection method is used such as roulette wheel selection, fitness scaling is a necessity, and a high scaling factor (>2.0) should be used for the best performance. Three case studies are presented in which genetic algorithms are used to invert for crustal parameters. The first is an inversion for basement depth at Yucca mountain using gravity data, the second an inversion for velocity structure in the crust of the south island of New Zealand using receiver functions derived from teleseismic events, and the third is a similar receiver function inversion for crustal velocities beneath the Mendocino Triple Junction region of Northern California. The inversions demonstrate that genetic algorithms are effective in solving problems with reasonably large numbers of free parameters and with computationally expensive objective function calculations. More sophisticated techniques are presented for special problems. Niching and island model algorithms are introduced as methods to find multiple, distinct solutions to the nonunique problems that are typically seen in geophysics. Finally, hybrid algorithms are investigated as a way to improve the efficiency of the standard genetic algorithm.
NASA Astrophysics Data System (ADS)
Zhang, Lei; Yang, Fengbao; Ji, Linna; Lv, Sheng
2018-01-01
Diverse image fusion methods perform differently. Each method has advantages and disadvantages compared with others. One notion is that the advantages of different image methods can be effectively combined. A multiple-algorithm parallel fusion method based on algorithmic complementarity and synergy is proposed. First, in view of the characteristics of the different algorithms and difference-features among images, an index vector-based feature-similarity is proposed to define the degree of complementarity and synergy. This proposed index vector is a reliable evidence indicator for algorithm selection. Second, the algorithms with a high degree of complementarity and synergy are selected. Then, the different degrees of various features and infrared intensity images are used as the initial weights for the nonnegative matrix factorization (NMF). This avoids randomness of the NMF initialization parameter. Finally, the fused images of different algorithms are integrated using the NMF because of its excellent data fusing performance on independent features. Experimental results demonstrate that the visual effect and objective evaluation index of the fused images obtained using the proposed method are better than those obtained using traditional methods. The proposed method retains all the advantages that individual fusion algorithms have.
O'Boyle, Noel M; Palmer, David S; Nigsch, Florian; Mitchell, John BO
2008-01-01
Background We present a novel feature selection algorithm, Winnowing Artificial Ant Colony (WAAC), that performs simultaneous feature selection and model parameter optimisation for the development of predictive quantitative structure-property relationship (QSPR) models. The WAAC algorithm is an extension of the modified ant colony algorithm of Shen et al. (J Chem Inf Model 2005, 45: 1024–1029). We test the ability of the algorithm to develop a predictive partial least squares model for the Karthikeyan dataset (J Chem Inf Model 2005, 45: 581–590) of melting point values. We also test its ability to perform feature selection on a support vector machine model for the same dataset. Results Starting from an initial set of 203 descriptors, the WAAC algorithm selected a PLS model with 68 descriptors which has an RMSE on an external test set of 46.6°C and R2 of 0.51. The number of components chosen for the model was 49, which was close to optimal for this feature selection. The selected SVM model has 28 descriptors (cost of 5, ε of 0.21) and an RMSE of 45.1°C and R2 of 0.54. This model outperforms a kNN model (RMSE of 48.3°C, R2 of 0.47) for the same data and has similar performance to a Random Forest model (RMSE of 44.5°C, R2 of 0.55). However it is much less prone to bias at the extremes of the range of melting points as shown by the slope of the line through the residuals: -0.43 for WAAC/SVM, -0.53 for Random Forest. Conclusion With a careful choice of objective function, the WAAC algorithm can be used to optimise machine learning and regression models that suffer from overfitting. Where model parameters also need to be tuned, as is the case with support vector machine and partial least squares models, it can optimise these simultaneously. The moving probabilities used by the algorithm are easily interpreted in terms of the best and current models of the ants, and the winnowing procedure promotes the removal of irrelevant descriptors. PMID:18959785
Why don’t you use Evolutionary Algorithms in Big Data?
NASA Astrophysics Data System (ADS)
Stanovov, Vladimir; Brester, Christina; Kolehmainen, Mikko; Semenkina, Olga
2017-02-01
In this paper we raise the question of using evolutionary algorithms in the area of Big Data processing. We show that evolutionary algorithms provide evident advantages due to their high scalability and flexibility, their ability to solve global optimization problems and optimize several criteria at the same time for feature selection, instance selection and other data reduction problems. In particular, we consider the usage of evolutionary algorithms with all kinds of machine learning tools, such as neural networks and fuzzy systems. All our examples prove that Evolutionary Machine Learning is becoming more and more important in data analysis and we expect to see the further development of this field especially in respect to Big Data.
Sensor Data Quality and Angular Rate Down-Selection Algorithms on SLS EM-1
NASA Technical Reports Server (NTRS)
Park, Thomas; Smith, Austin; Oliver, T. Emerson
2018-01-01
The NASA Space Launch System Block 1 launch vehicle is equipped with an Inertial Navigation System (INS) and multiple Rate Gyro Assemblies (RGA) that are used in the Guidance, Navigation, and Control (GN&C) algorithms. The INS provides the inertial position, velocity, and attitude of the vehicle along with both angular rate and specific force measurements. Additionally, multiple sets of co-located rate gyros supply angular rate data. The collection of angular rate data, taken along the launch vehicle, is used to separate out vehicle motion from flexible body dynamics. Since the system architecture uses redundant sensors, the capability was developed to evaluate the health (or validity) of the independent measurements. A suite of Sensor Data Quality (SDQ) algorithms is responsible for assessing the angular rate data from the redundant sensors. When failures are detected, SDQ will take the appropriate action and disqualify or remove faulted sensors from forward processing. Additionally, the SDQ algorithms contain logic for down-selecting the angular rate data used by the GNC software from the set of healthy measurements. This paper explores the trades and analyses that were performed in selecting a set of robust fault-detection algorithms included in the GN&C flight software. These trades included both an assessment of hardware-provided health and status data as well as an evaluation of different algorithms based on time-to-detection, type of failures detected, and probability of detecting false positives. We then provide an overview of the algorithms used for both fault-detection and measurement down selection. We next discuss the role of trajectory design, flexible-body models, and vehicle response to off-nominal conditions in setting the detection thresholds. Lastly, we present lessons learned from software integration and hardware-in-the-loop testing.
A Distributed and Energy-Efficient Algorithm for Event K-Coverage in Underwater Sensor Networks
Jiang, Peng; Xu, Yiming; Liu, Jun
2017-01-01
For event dynamic K-coverage algorithms, each management node selects its assistant node by using a greedy algorithm without considering the residual energy and situations in which a node is selected by several events. This approach affects network energy consumption and balance. Therefore, this study proposes a distributed and energy-efficient event K-coverage algorithm (DEEKA). After the network achieves 1-coverage, the nodes that detect the same event compete for the event management node with the number of candidate nodes and the average residual energy, as well as the distance to the event. Second, each management node estimates the probability of its neighbor nodes’ being selected by the event it manages with the distance level, the residual energy level, and the number of dynamic coverage event of these nodes. Third, each management node establishes an optimization model that uses expectation energy consumption and the residual energy variance of its neighbor nodes and detects the performance of the events it manages as targets. Finally, each management node uses a constrained non-dominated sorting genetic algorithm (NSGA-II) to obtain the Pareto set of the model and the best strategy via technique for order preference by similarity to an ideal solution (TOPSIS). The algorithm first considers the effect of harsh underwater environments on information collection and transmission. It also considers the residual energy of a node and a situation in which the node is selected by several other events. Simulation results show that, unlike the on-demand variable sensing K-coverage algorithm, DEEKA balances and reduces network energy consumption, thereby prolonging the network’s best service quality and lifetime. PMID:28106837
A Distributed and Energy-Efficient Algorithm for Event K-Coverage in Underwater Sensor Networks.
Jiang, Peng; Xu, Yiming; Liu, Jun
2017-01-19
For event dynamic K-coverage algorithms, each management node selects its assistant node by using a greedy algorithm without considering the residual energy and situations in which a node is selected by several events. This approach affects network energy consumption and balance. Therefore, this study proposes a distributed and energy-efficient event K-coverage algorithm (DEEKA). After the network achieves 1-coverage, the nodes that detect the same event compete for the event management node with the number of candidate nodes and the average residual energy, as well as the distance to the event. Second, each management node estimates the probability of its neighbor nodes' being selected by the event it manages with the distance level, the residual energy level, and the number of dynamic coverage event of these nodes. Third, each management node establishes an optimization model that uses expectation energy consumption and the residual energy variance of its neighbor nodes and detects the performance of the events it manages as targets. Finally, each management node uses a constrained non-dominated sorting genetic algorithm (NSGA-II) to obtain the Pareto set of the model and the best strategy via technique for order preference by similarity to an ideal solution (TOPSIS). The algorithm first considers the effect of harsh underwater environments on information collection and transmission. It also considers the residual energy of a node and a situation in which the node is selected by several other events. Simulation results show that, unlike the on-demand variable sensing K-coverage algorithm, DEEKA balances and reduces network energy consumption, thereby prolonging the network's best service quality and lifetime.
Landry, Marie L; Ferguson, David; Topal, Jeffrey
2014-01-01
Simplexa Clostridium difficile universal direct PCR, a real-time PCR assay for the detection of the C. difficile toxin B (tcdB) gene using the 3M integrated cycler, was compared with a two-step algorithm which includes the C. Diff Chek-60 glutamate dehydrogenase (GDH) antigen assay followed by cytotoxin neutralization. Three hundred forty-two liquid or semisolid stools submitted for diagnostic C. difficile testing, 171 GDH antigen positive and 171 GDH antigen negative, were selected for the study. All samples were tested by the C. Diff Chek-60 GDH antigen assay, cytotoxin neutralization, and Simplexa direct PCR. Of 171 GDH-positive samples, 4 were excluded (from patients on therapy or from whom duplicate samples were obtained) and 88 were determined to be true positives for toxigenic C. difficile. Of the 88, 67 (76.1%) were positive by the two-step method and 86 (97.7%) were positive by PCR. Seventy-nine were positive by the GDH antigen assay only. Of 171 GDH antigen-negative samples, none were positive by PCR. One antigen-negative sample positive by the cytotoxin assay only was deemed a false positive based on chart review. Simplexa C. difficile universal direct PCR was significantly more sensitive for detecting toxigenic C. difficile bacteria than cytotoxin neutralization (P = 0.0002). However, most PCR-positive/cytotoxin-negative patients did not have clear C. difficile disease. The estimated cost avoidance provided by a more rapid molecular diagnosis was outweighed by the cost of isolating and treating PCR-positive/cytotoxin-negative patients. The costs, clinical consequences, and impact on nosocomial transmission of treating and/or isolating patients positive for toxigenic C. difficile by PCR but negative for in vivo toxin production merit further study.
He, Xianmin; Wei, Qing; Sun, Meiqian; Fu, Xuping; Fan, Sichang; Li, Yao
2006-05-01
Biological techniques such as Array-Comparative genomic hybridization (CGH), fluorescent in situ hybridization (FISH) and affymetrix single nucleotide pleomorphism (SNP) array have been used to detect cytogenetic aberrations. However, on genomic scale, these techniques are labor intensive and time consuming. Comparative genomic microarray analysis (CGMA) has been used to identify cytogenetic changes in hepatocellular carcinoma (HCC) using gene expression microarray data. However, CGMA algorithm can not give precise localization of aberrations, fails to identify small cytogenetic changes, and exhibits false negatives and positives. Locally un-weighted smoothing cytogenetic aberrations prediction (LS-CAP) based on local smoothing and binomial distribution can be expected to address these problems. LS-CAP algorithm was built and used on HCC microarray profiles. Eighteen cytogenetic abnormalities were identified, among them 5 were reported previously, and 12 were proven by CGH studies. LS-CAP effectively reduced the false negatives and positives, and precisely located small fragments with cytogenetic aberrations.
Mehrabi, Saeed; Krishnan, Anand; Roch, Alexandra M; Schmidt, Heidi; Li, DingCheng; Kesterson, Joe; Beesley, Chris; Dexter, Paul; Schmidt, Max; Palakal, Mathew; Liu, Hongfang
2015-01-01
In this study we have developed a rule-based natural language processing (NLP) system to identify patients with family history of pancreatic cancer. The algorithm was developed in a Unstructured Information Management Architecture (UIMA) framework and consisted of section segmentation, relation discovery, and negation detection. The system was evaluated on data from two institutions. The family history identification precision was consistent across the institutions shifting from 88.9% on Indiana University (IU) dataset to 87.8% on Mayo Clinic dataset. Customizing the algorithm on the the Mayo Clinic data, increased its precision to 88.1%. The family member relation discovery achieved precision, recall, and F-measure of 75.3%, 91.6% and 82.6% respectively. Negation detection resulted in precision of 99.1%. The results show that rule-based NLP approaches for specific information extraction tasks are portable across institutions; however customization of the algorithm on the new dataset improves its performance.
Comparison of Two Phenotypic Algorithms To Detect Carbapenemase-Producing Enterobacteriaceae
Dortet, Laurent; Bernabeu, Sandrine; Gonzalez, Camille
2017-01-01
ABSTRACT A novel algorithm designed for the screening of carbapenemase-producing Enterobacteriaceae (CPE), based on faropenem and temocillin disks, was compared to that of the Committee of the Antibiogram of the French Society of Microbiology (CA-SFM), which is based on ticarcillin-clavulanate, imipenem, and temocillin disks. The two algorithms presented comparable negative predictive values (98.6% versus 97.5%) for CPE screening among carbapenem-nonsusceptible Enterobacteriaceae. However, since 46.2% (n = 49) of the CPE were correctly identified as OXA-48-like producers by the faropenem/temocillin-based algorithm, it significantly decreased the number of complementary tests needed (42.2% versus 62.6% with the CA-SFM algorithm). PMID:28607010
Fernandez-Lozano, C.; Canto, C.; Gestal, M.; Andrade-Garda, J. M.; Rabuñal, J. R.; Dorado, J.; Pazos, A.
2013-01-01
Given the background of the use of Neural Networks in problems of apple juice classification, this paper aim at implementing a newly developed method in the field of machine learning: the Support Vector Machines (SVM). Therefore, a hybrid model that combines genetic algorithms and support vector machines is suggested in such a way that, when using SVM as a fitness function of the Genetic Algorithm (GA), the most representative variables for a specific classification problem can be selected. PMID:24453933
Federal Register 2010, 2011, 2012, 2013, 2014
2010-10-28
... Change The Exchange proposes to modify the wording of Rule 6.12 relating to the C2 matching algorithm... matching algorithm and subsequently overlay certain priorities over the selected base algorithm. There are currently two base algorithms: price-time (often referred to as first in, first out or FIFO) in which...
NASA Astrophysics Data System (ADS)
Tereshin, Alexander A.; Usilin, Sergey A.; Arlazarov, Vladimir V.
2018-04-01
This paper aims to study the problem of multi-class object detection in video stream with Viola-Jones cascades. An adaptive algorithm for selecting Viola-Jones cascade based on greedy choice strategy in solution of the N-armed bandit problem is proposed. The efficiency of the algorithm on the problem of detection and recognition of the bank card logos in the video stream is shown. The proposed algorithm can be effectively used in documents localization and identification, recognition of road scene elements, localization and tracking of the lengthy objects , and for solving other problems of rigid object detection in a heterogeneous data flows. The computational efficiency of the algorithm makes it possible to use it both on personal computers and on mobile devices based on processors with low power consumption.
Ogawa, Takahiro; Haseyama, Miki
2013-03-01
A missing texture reconstruction method based on an error reduction (ER) algorithm, including a novel estimation scheme of Fourier transform magnitudes is presented in this brief. In our method, Fourier transform magnitude is estimated for a target patch including missing areas, and the missing intensities are estimated by retrieving its phase based on the ER algorithm. Specifically, by monitoring errors converged in the ER algorithm, known patches whose Fourier transform magnitudes are similar to that of the target patch are selected from the target image. In the second approach, the Fourier transform magnitude of the target patch is estimated from those of the selected known patches and their corresponding errors. Consequently, by using the ER algorithm, we can estimate both the Fourier transform magnitudes and phases to reconstruct the missing areas.
NASA Technical Reports Server (NTRS)
Shaffer, Scott; Dunbar, R. Scott; Hsiao, S. Vincent; Long, David G.
1989-01-01
The NASA Scatterometer, NSCAT, is an active spaceborne radar designed to measure the normalized radar backscatter coefficient (sigma0) of the ocean surface. These measurements can, in turn, be used to infer the surface vector wind over the ocean using a geophysical model function. Several ambiguous wind vectors result because of the nature of the model function. A median-filter-based ambiguity removal algorithm will be used by the NSCAT ground data processor to select the best wind vector from the set of ambiguous wind vectors. This process is commonly known as dealiasing or ambiguity removal. The baseline NSCAT ambiguity removal algorithm and the method used to select the set of optimum parameter values are described. An extensive simulation of the NSCAT instrument and ground data processor provides a means of testing the resulting tuned algorithm. This simulation generates the ambiguous wind-field vectors expected from the instrument as it orbits over a set of realistic meoscale wind fields. The ambiguous wind field is then dealiased using the median-based ambiguity removal algorithm. Performance is measured by comparison of the unambiguous wind fields with the true wind fields. Results have shown that the median-filter-based ambiguity removal algorithm satisfies NSCAT mission requirements.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gongzhang, R.; Xiao, B.; Lardner, T.
2014-02-18
This paper presents a robust frequency diversity based algorithm for clutter reduction in ultrasonic A-scan waveforms. The performance of conventional spectral-temporal techniques like Split Spectrum Processing (SSP) is highly dependent on the parameter selection, especially when the signal to noise ratio (SNR) is low. Although spatial beamforming offers noise reduction with less sensitivity to parameter variation, phased array techniques are not always available. The proposed algorithm first selects an ascending series of frequency bands. A signal is reconstructed for each selected band in which a defect is present when all frequency components are in uniform sign. Combining all reconstructed signalsmore » through averaging gives a probability profile of potential defect position. To facilitate data collection and validate the proposed algorithm, Full Matrix Capture is applied on the austenitic steel and high nickel alloy (HNA) samples with 5MHz transducer arrays. When processing A-scan signals with unrefined parameters, the proposed algorithm enhances SNR by 20dB for both samples and consequently, defects are more visible in B-scan images created from the large amount of A-scan traces. Importantly, the proposed algorithm is considered robust, while SSP is shown to fail on the austenitic steel data and achieves less SNR enhancement on the HNA data.« less
Artificial bee colony algorithm for single-trial electroencephalogram analysis.
Hsu, Wei-Yen; Hu, Ya-Ping
2015-04-01
In this study, we propose an analysis system combined with feature selection to further improve the classification accuracy of single-trial electroencephalogram (EEG) data. Acquiring event-related brain potential data from the sensorimotor cortices, the system comprises artifact and background noise removal, feature extraction, feature selection, and feature classification. First, the artifacts and background noise are removed automatically by means of independent component analysis and surface Laplacian filter, respectively. Several potential features, such as band power, autoregressive model, and coherence and phase-locking value, are then extracted for subsequent classification. Next, artificial bee colony (ABC) algorithm is used to select features from the aforementioned feature combination. Finally, selected subfeatures are classified by support vector machine. Comparing with and without artifact removal and feature selection, using a genetic algorithm on single-trial EEG data for 6 subjects, the results indicate that the proposed system is promising and suitable for brain-computer interface applications. © EEG and Clinical Neuroscience Society (ECNS) 2014.
FSMRank: feature selection algorithm for learning to rank.
Lai, Han-Jiang; Pan, Yan; Tang, Yong; Yu, Rong
2013-06-01
In recent years, there has been growing interest in learning to rank. The introduction of feature selection into different learning problems has been proven effective. These facts motivate us to investigate the problem of feature selection for learning to rank. We propose a joint convex optimization formulation which minimizes ranking errors while simultaneously conducting feature selection. This optimization formulation provides a flexible framework in which we can easily incorporate various importance measures and similarity measures of the features. To solve this optimization problem, we use the Nesterov's approach to derive an accelerated gradient algorithm with a fast convergence rate O(1/T(2)). We further develop a generalization bound for the proposed optimization problem using the Rademacher complexities. Extensive experimental evaluations are conducted on the public LETOR benchmark datasets. The results demonstrate that the proposed method shows: 1) significant ranking performance gain compared to several feature selection baselines for ranking, and 2) very competitive performance compared to several state-of-the-art learning-to-rank algorithms.
The Impact of Receiving the Same Items on Consecutive Computer Adaptive Test Administrations.
ERIC Educational Resources Information Center
O'Neill, Thomas; Lunz, Mary E.; Thiede, Keith
2000-01-01
Studied item exposure in a computerized adaptive test when the item selection algorithm presents examinees with questions they were asked in a previous test administration. Results with 178 repeat examinees on a medical technologists' test indicate that the combined use of an adaptive algorithm to select items and latent trait theory to estimate…
Competitive Parallel Processing For Compression Of Data
NASA Technical Reports Server (NTRS)
Diner, Daniel B.; Fender, Antony R. H.
1990-01-01
Momentarily-best compression algorithm selected. Proposed competitive-parallel-processing system compresses data for transmission in channel of limited band-width. Likely application for compression lies in high-resolution, stereoscopic color-television broadcasting. Data from information-rich source like color-television camera compressed by several processors, each operating with different algorithm. Referee processor selects momentarily-best compressed output.
Report on the Development of the Advanced Encryption Standard (AES).
Nechvatal, J; Barker, E; Bassham, L; Burr, W; Dworkin, M; Foti, J; Roback, E
2001-01-01
In 1997, the National Institute of Standards and Technology (NIST) initiated a process to select a symmetric-key encryption algorithm to be used to protect sensitive (unclassified) Federal information in furtherance of NIST's statutory responsibilities. In 1998, NIST announced the acceptance of 15 candidate algorithms and requested the assistance of the cryptographic research community in analyzing the candidates. This analysis included an initial examination of the security and efficiency characteristics for each algorithm. NIST reviewed the results of this preliminary research and selected MARS, RC™, Rijndael, Serpent and Twofish as finalists. Having reviewed further public analysis of the finalists, NIST has decided to propose Rijndael as the Advanced Encryption Standard (AES). The research results and rationale for this selection are documented in this report.
NASA Astrophysics Data System (ADS)
Siami, Mohammad; Gholamian, Mohammad Reza; Basiri, Javad
2014-10-01
Nowadays, credit scoring is one of the most important topics in the banking sector. Credit scoring models have been widely used to facilitate the process of credit assessing. In this paper, an application of the locally linear model tree algorithm (LOLIMOT) was experimented to evaluate the superiority of its performance to predict the customer's credit status. The algorithm is improved with an aim of adjustment by credit scoring domain by means of data fusion and feature selection techniques. Two real world credit data sets - Australian and German - from UCI machine learning database were selected to demonstrate the performance of our new classifier. The analytical results indicate that the improved LOLIMOT significantly increase the prediction accuracy.
Setting objective thresholds for rare event detection in flow cytometry
Richards, Adam J.; Staats, Janet; Enzor, Jennifer; McKinnon, Katherine; Frelinger, Jacob; Denny, Thomas N.; Weinhold, Kent J.; Chan, Cliburn
2014-01-01
The accurate identification of rare antigen-specific cytokine positive cells from peripheral blood mononuclear cells (PBMC) after antigenic stimulation in an intracellular staining (ICS) flow cytometry assay is challenging, as cytokine positive events may be fairly diffusely distributed and lack an obvious separation from the negative population. Traditionally, the approach by flow operators has been to manually set a positivity threshold to partition events into cytokine-positive and cytokine-negative. This approach suffers from subjectivity and inconsistency across different flow operators. The use of statistical clustering methods does not remove the need to find an objective threshold between between positive and negative events since consistent identification of rare event subsets is highly challenging for automated algorithms, especially when there is distributional overlap between the positive and negative events (“smear”). We present a new approach, based on the Fβ measure, that is similar to manual thresholding in providing a hard cutoff, but has the advantage of being determined objectively. The performance of this algorithm is compared with results obtained by expert visual gating. Several ICS data sets from the External Quality Assurance Program Oversight Laboratory (EQAPOL) proficiency program were used to make the comparisons. We first show that visually determined thresholds are difficult to reproduce and pose a problem when comparing results across operators or laboratories, as well as problems that occur with the use of commonly employed clustering algorithms. In contrast, a single parameterization for the Fβ method performs consistently across different centers, samples, and instruments because it optimizes the precision/recall tradeoff by using both negative and positive controls. PMID:24727143
Specialized proteasome subunits play an essential role in thymic selection of CD8+ T cells
Kincaid, Eleanor Z.; Murata, Shigeo; Tanaka, Keiji; Rock, Kenneth L.
2016-01-01
The cells that stimulate positive selection express different specialized proteasome β-subunits than all other cells, including those involved in negative selection. Mice that lack all four specialized proteasome β-subunits, and therefore express only constitutive proteasomes in all cells, had a profound defect in the generation of CD8+ T cells. While a defect in positive selection would reflect an inability to generate the appropriate positively selecting peptides, a block at negative selection would point to the potential need to switch peptides between positive and negative selection to avoid the two processes often cancelling each other out. We found that the block in T cell development occurred around the checkpoints of positive and, surprisingly, also negative selection. PMID:27294792
Ließ, Mareike; Schmidt, Johannes; Glaser, Bruno
2016-01-01
Tropical forests are significant carbon sinks and their soils' carbon storage potential is immense. However, little is known about the soil organic carbon (SOC) stocks of tropical mountain areas whose complex soil-landscape and difficult accessibility pose a challenge to spatial analysis. The choice of methodology for spatial prediction is of high importance to improve the expected poor model results in case of low predictor-response correlations. Four aspects were considered to improve model performance in predicting SOC stocks of the organic layer of a tropical mountain forest landscape: Different spatial predictor settings, predictor selection strategies, various machine learning algorithms and model tuning. Five machine learning algorithms: random forests, artificial neural networks, multivariate adaptive regression splines, boosted regression trees and support vector machines were trained and tuned to predict SOC stocks from predictors derived from a digital elevation model and satellite image. Topographical predictors were calculated with a GIS search radius of 45 to 615 m. Finally, three predictor selection strategies were applied to the total set of 236 predictors. All machine learning algorithms-including the model tuning and predictor selection-were compared via five repetitions of a tenfold cross-validation. The boosted regression tree algorithm resulted in the overall best model. SOC stocks ranged between 0.2 to 17.7 kg m-2, displaying a huge variability with diffuse insolation and curvatures of different scale guiding the spatial pattern. Predictor selection and model tuning improved the models' predictive performance in all five machine learning algorithms. The rather low number of selected predictors favours forward compared to backward selection procedures. Choosing predictors due to their indiviual performance was vanquished by the two procedures which accounted for predictor interaction.
Abràmoff, Michael David; Lou, Yiyue; Erginay, Ali; Clarida, Warren; Amelon, Ryan; Folk, James C; Niemeijer, Meindert
2016-10-01
To compare performance of a deep-learning enhanced algorithm for automated detection of diabetic retinopathy (DR), to the previously published performance of that algorithm, the Iowa Detection Program (IDP)-without deep learning components-on the same publicly available set of fundus images and previously reported consensus reference standard set, by three US Board certified retinal specialists. We used the previously reported consensus reference standard of referable DR (rDR), defined as International Clinical Classification of Diabetic Retinopathy moderate, severe nonproliferative (NPDR), proliferative DR, and/or macular edema (ME). Neither Messidor-2 images, nor the three retinal specialists setting the Messidor-2 reference standard were used for training IDx-DR version X2.1. Sensitivity, specificity, negative predictive value, area under the curve (AUC), and their confidence intervals (CIs) were calculated. Sensitivity was 96.8% (95% CI: 93.3%-98.8%), specificity was 87.0% (95% CI: 84.2%-89.4%), with 6/874 false negatives, resulting in a negative predictive value of 99.0% (95% CI: 97.8%-99.6%). No cases of severe NPDR, PDR, or ME were missed. The AUC was 0.980 (95% CI: 0.968-0.992). Sensitivity was not statistically different from published IDP sensitivity, which had a CI of 94.4% to 99.3%, but specificity was significantly better than the published IDP specificity CI of 55.7% to 63.0%. A deep-learning enhanced algorithm for the automated detection of DR, achieves significantly better performance than a previously reported, otherwise essentially identical, algorithm that does not employ deep learning. Deep learning enhanced algorithms have the potential to improve the efficiency of DR screening, and thereby to prevent visual loss and blindness from this devastating disease.
Solution Path for Pin-SVM Classifiers With Positive and Negative $\\tau $ Values.
Huang, Xiaolin; Shi, Lei; Suykens, Johan A K
2017-07-01
Applying the pinball loss in a support vector machine (SVM) classifier results in pin-SVM. The pinball loss is characterized by a parameter τ . Its value is related to the quantile level and different τ values are suitable for different problems. In this paper, we establish an algorithm to find the entire solution path for pin-SVM with different τ values. This algorithm is based on the fact that the optimal solution to pin-SVM is continuous and piecewise linear with respect to τ . We also show that the nonnegativity constraint on τ is not necessary, i.e., τ can be extended to negative values. First, in some applications, a negative τ leads to better accuracy. Second, τ = -1 corresponds to a simple solution that links SVM and the classical kernel rule. The solution for τ = -1 can be obtained directly and then be used as a starting point of the solution path. The proposed method efficiently traverses τ values through the solution path, and then achieves good performance by a suitable τ . In particular, τ = 0 corresponds to C-SVM, meaning that the traversal algorithm can output a result at least as good as C-SVM with respect to validation error.
Accounting for False Positive HIV Tests: Is Visceral Leishmaniasis Responsible?
Shanks, Leslie; Ritmeijer, Koert; Piriou, Erwan; Siddiqui, M. Ruby; Kliescikova, Jarmila; Pearce, Neil; Ariti, Cono; Muluneh, Libsework; Masiga, Johnson; Abebe, Almaz
2015-01-01
Background Co-infection with HIV and visceral leishmaniasis is an important consideration in treatment of either disease in endemic areas. Diagnosis of HIV in resource-limited settings relies on rapid diagnostic tests used together in an algorithm. A limitation of the HIV diagnostic algorithm is that it is vulnerable to falsely positive reactions due to cross reactivity. It has been postulated that visceral leishmaniasis (VL) infection can increase this risk of false positive HIV results. This cross sectional study compared the risk of false positive HIV results in VL patients with non-VL individuals. Methodology/Principal Findings Participants were recruited from 2 sites in Ethiopia. The Ethiopian algorithm of a tiebreaker using 3 rapid diagnostic tests (RDTs) was used to test for HIV. The gold standard test was the Western Blot, with indeterminate results resolved by PCR testing. Every RDT screen positive individual was included for testing with the gold standard along with 10% of all negatives. The final analysis included 89 VL and 405 non-VL patients. HIV prevalence was found to be 12.8% (47/ 367) in the VL group compared to 7.9% (200/2526) in the non-VL group. The RDT algorithm in the VL group yielded 47 positives, 4 false positives, and 38 negatives. The same algorithm for those without VL had 200 positives, 14 false positives, and 191 negatives. Specificity and positive predictive value for the group with VL was less than the non-VL group; however, the difference was not found to be significant (p = 0.52 and p = 0.76, respectively). Conclusion The test algorithm yielded a high number of HIV false positive results. However, we were unable to demonstrate a significant difference between groups with and without VL disease. This suggests that the presence of endemic visceral leishmaniasis alone cannot account for the high number of false positive HIV results in our study. PMID:26161864
Accounting for False Positive HIV Tests: Is Visceral Leishmaniasis Responsible?
Shanks, Leslie; Ritmeijer, Koert; Piriou, Erwan; Siddiqui, M Ruby; Kliescikova, Jarmila; Pearce, Neil; Ariti, Cono; Muluneh, Libsework; Masiga, Johnson; Abebe, Almaz
2015-01-01
Co-infection with HIV and visceral leishmaniasis is an important consideration in treatment of either disease in endemic areas. Diagnosis of HIV in resource-limited settings relies on rapid diagnostic tests used together in an algorithm. A limitation of the HIV diagnostic algorithm is that it is vulnerable to falsely positive reactions due to cross reactivity. It has been postulated that visceral leishmaniasis (VL) infection can increase this risk of false positive HIV results. This cross sectional study compared the risk of false positive HIV results in VL patients with non-VL individuals. Participants were recruited from 2 sites in Ethiopia. The Ethiopian algorithm of a tiebreaker using 3 rapid diagnostic tests (RDTs) was used to test for HIV. The gold standard test was the Western Blot, with indeterminate results resolved by PCR testing. Every RDT screen positive individual was included for testing with the gold standard along with 10% of all negatives. The final analysis included 89 VL and 405 non-VL patients. HIV prevalence was found to be 12.8% (47/ 367) in the VL group compared to 7.9% (200/2526) in the non-VL group. The RDT algorithm in the VL group yielded 47 positives, 4 false positives, and 38 negatives. The same algorithm for those without VL had 200 positives, 14 false positives, and 191 negatives. Specificity and positive predictive value for the group with VL was less than the non-VL group; however, the difference was not found to be significant (p = 0.52 and p = 0.76, respectively). The test algorithm yielded a high number of HIV false positive results. However, we were unable to demonstrate a significant difference between groups with and without VL disease. This suggests that the presence of endemic visceral leishmaniasis alone cannot account for the high number of false positive HIV results in our study.
Hybrid feature selection algorithm using symmetrical uncertainty and a harmony search algorithm
NASA Astrophysics Data System (ADS)
Salameh Shreem, Salam; Abdullah, Salwani; Nazri, Mohd Zakree Ahmad
2016-04-01
Microarray technology can be used as an efficient diagnostic system to recognise diseases such as tumours or to discriminate between different types of cancers in normal tissues. This technology has received increasing attention from the bioinformatics community because of its potential in designing powerful decision-making tools for cancer diagnosis. However, the presence of thousands or tens of thousands of genes affects the predictive accuracy of this technology from the perspective of classification. Thus, a key issue in microarray data is identifying or selecting the smallest possible set of genes from the input data that can achieve good predictive accuracy for classification. In this work, we propose a two-stage selection algorithm for gene selection problems in microarray data-sets called the symmetrical uncertainty filter and harmony search algorithm wrapper (SU-HSA). Experimental results show that the SU-HSA is better than HSA in isolation for all data-sets in terms of the accuracy and achieves a lower number of genes on 6 out of 10 instances. Furthermore, the comparison with state-of-the-art methods shows that our proposed approach is able to obtain 5 (out of 10) new best results in terms of the number of selected genes and competitive results in terms of the classification accuracy.
New development of the image matching algorithm
NASA Astrophysics Data System (ADS)
Zhang, Xiaoqiang; Feng, Zhao
2018-04-01
To study the image matching algorithm, algorithm four elements are described, i.e., similarity measurement, feature space, search space and search strategy. Four common indexes for evaluating the image matching algorithm are described, i.e., matching accuracy, matching efficiency, robustness and universality. Meanwhile, this paper describes the principle of image matching algorithm based on the gray value, image matching algorithm based on the feature, image matching algorithm based on the frequency domain analysis, image matching algorithm based on the neural network and image matching algorithm based on the semantic recognition, and analyzes their characteristics and latest research achievements. Finally, the development trend of image matching algorithm is discussed. This study is significant for the algorithm improvement, new algorithm design and algorithm selection in practice.
NASA Technical Reports Server (NTRS)
Hedgley, D. R.
1978-01-01
An efficient algorithm for selecting the degree of a polynomial that defines a curve that best approximates a data set was presented. This algorithm was applied to both oscillatory and nonoscillatory data without loss of generality.
Blind beam-hardening correction from Poisson measurements
NASA Astrophysics Data System (ADS)
Gu, Renliang; Dogandžić, Aleksandar
2016-02-01
We develop a sparse image reconstruction method for Poisson-distributed polychromatic X-ray computed tomography (CT) measurements under the blind scenario where the material of the inspected object and the incident energy spectrum are unknown. We employ our mass-attenuation spectrum parameterization of the noiseless measurements and express the mass- attenuation spectrum as a linear combination of B-spline basis functions of order one. A block coordinate-descent algorithm is developed for constrained minimization of a penalized Poisson negative log-likelihood (NLL) cost function, where constraints and penalty terms ensure nonnegativity of the spline coefficients and nonnegativity and sparsity of the density map image; the image sparsity is imposed using a convex total-variation (TV) norm penalty term. This algorithm alternates between a Nesterov's proximal-gradient (NPG) step for estimating the density map image and a limited-memory Broyden-Fletcher-Goldfarb-Shanno with box constraints (L-BFGS-B) step for estimating the incident-spectrum parameters. To accelerate convergence of the density- map NPG steps, we apply function restart and a step-size selection scheme that accounts for varying local Lipschitz constants of the Poisson NLL. Real X-ray CT reconstruction examples demonstrate the performance of the proposed scheme.
An Iris Segmentation Algorithm based on Edge Orientation for Off-angle Iris Recognition
DOE Office of Scientific and Technical Information (OSTI.GOV)
Karakaya, Mahmut; Barstow, Del R; Santos-Villalobos, Hector J
Iris recognition is known as one of the most accurate and reliable biometrics. However, the accuracy of iris recognition systems depends on the quality of data capture and is negatively affected by several factors such as angle, occlusion, and dilation. In this paper, we present a segmentation algorithm for off-angle iris images that uses edge detection, edge elimination, edge classification, and ellipse fitting techniques. In our approach, we first detect all candidate edges in the iris image by using the canny edge detector; this collection contains edges from the iris and pupil boundaries as well as eyelash, eyelids, iris texturemore » etc. Edge orientation is used to eliminate the edges that cannot be part of the iris or pupil. Then, we classify the remaining edge points into two sets as pupil edges and iris edges. Finally, we randomly generate subsets of iris and pupil edge points, fit ellipses for each subset, select ellipses with similar parameters, and average to form the resultant ellipses. Based on the results from real experiments, the proposed method shows effectiveness in segmentation for off-angle iris images.« less
Zhao, Li; Chen, Chunxia; Li, Bei; Dong, Li; Guo, Yingqiang; Xiao, Xijun; Zhang, Eryong; Qin, Li
2014-01-01
Objective To study the performance of pharmacogenetics-based warfarin dosing algorithms in the initial and the stable warfarin treatment phases in a cohort of Han-Chinese patients undertaking mechanic heart valve replacement. Methods We searched PubMed, Chinese National Knowledge Infrastructure and Wanfang databases for selecting pharmacogenetics-based warfarin dosing models. Patients with mechanic heart valve replacement were consecutively recruited between March 2012 and July 2012. The predicted warfarin dose of each patient was calculated and compared with the observed initial and stable warfarin doses. The percentage of patients whose predicted dose fell within 20% of their actual therapeutic dose (percentage within 20%), and the mean absolute error (MAE) were utilized to evaluate the predictive accuracy of all the selected algorithms. Results A total of 8 algorithms including Du, Huang, Miao, Wei, Zhang, Lou, Gage, and International Warfarin Pharmacogenetics Consortium (IWPC) model, were tested in 181 patients. The MAE of the Gage, IWPC and 6 Han-Chinese pharmacogenetics-based warfarin dosing algorithms was less than 0.6 mg/day in accuracy and the percentage within 20% exceeded 45% in all of the selected models in both the initial and the stable treatment stages. When patients were stratified according to the warfarin dose range, all of the equations demonstrated better performance in the ideal-dose range (1.88–4.38 mg/day) than the low-dose range (<1.88 mg/day). Among the 8 algorithms compared, the algorithms of Wei, Huang, and Miao showed a lower MAE and higher percentage within 20% in both the initial and the stable warfarin dose prediction and in the low-dose and the ideal-dose ranges. Conclusions All of the selected pharmacogenetics-based warfarin dosing regimens performed similarly in our cohort. However, the algorithms of Wei, Huang, and Miao showed a better potential for warfarin prediction in the initial and the stable treatment phases in Han-Chinese patients undertaking mechanic heart valve replacement. PMID:24728385
Zhao, Li; Chen, Chunxia; Li, Bei; Dong, Li; Guo, Yingqiang; Xiao, Xijun; Zhang, Eryong; Qin, Li
2014-01-01
To study the performance of pharmacogenetics-based warfarin dosing algorithms in the initial and the stable warfarin treatment phases in a cohort of Han-Chinese patients undertaking mechanic heart valve replacement. We searched PubMed, Chinese National Knowledge Infrastructure and Wanfang databases for selecting pharmacogenetics-based warfarin dosing models. Patients with mechanic heart valve replacement were consecutively recruited between March 2012 and July 2012. The predicted warfarin dose of each patient was calculated and compared with the observed initial and stable warfarin doses. The percentage of patients whose predicted dose fell within 20% of their actual therapeutic dose (percentage within 20%), and the mean absolute error (MAE) were utilized to evaluate the predictive accuracy of all the selected algorithms. A total of 8 algorithms including Du, Huang, Miao, Wei, Zhang, Lou, Gage, and International Warfarin Pharmacogenetics Consortium (IWPC) model, were tested in 181 patients. The MAE of the Gage, IWPC and 6 Han-Chinese pharmacogenetics-based warfarin dosing algorithms was less than 0.6 mg/day in accuracy and the percentage within 20% exceeded 45% in all of the selected models in both the initial and the stable treatment stages. When patients were stratified according to the warfarin dose range, all of the equations demonstrated better performance in the ideal-dose range (1.88-4.38 mg/day) than the low-dose range (<1.88 mg/day). Among the 8 algorithms compared, the algorithms of Wei, Huang, and Miao showed a lower MAE and higher percentage within 20% in both the initial and the stable warfarin dose prediction and in the low-dose and the ideal-dose ranges. All of the selected pharmacogenetics-based warfarin dosing regimens performed similarly in our cohort. However, the algorithms of Wei, Huang, and Miao showed a better potential for warfarin prediction in the initial and the stable treatment phases in Han-Chinese patients undertaking mechanic heart valve replacement.
Ning, Jing; Chen, Yong; Piao, Jin
2017-07-01
Publication bias occurs when the published research results are systematically unrepresentative of the population of studies that have been conducted, and is a potential threat to meaningful meta-analysis. The Copas selection model provides a flexible framework for correcting estimates and offers considerable insight into the publication bias. However, maximizing the observed likelihood under the Copas selection model is challenging because the observed data contain very little information on the latent variable. In this article, we study a Copas-like selection model and propose an expectation-maximization (EM) algorithm for estimation based on the full likelihood. Empirical simulation studies show that the EM algorithm and its associated inferential procedure performs well and avoids the non-convergence problem when maximizing the observed likelihood. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Comparison of Different EHG Feature Selection Methods for the Detection of Preterm Labor
Alamedine, D.; Khalil, M.; Marque, C.
2013-01-01
Numerous types of linear and nonlinear features have been extracted from the electrohysterogram (EHG) in order to classify labor and pregnancy contractions. As a result, the number of available features is now very large. The goal of this study is to reduce the number of features by selecting only the relevant ones which are useful for solving the classification problem. This paper presents three methods for feature subset selection that can be applied to choose the best subsets for classifying labor and pregnancy contractions: an algorithm using the Jeffrey divergence (JD) distance, a sequential forward selection (SFS) algorithm, and a binary particle swarm optimization (BPSO) algorithm. The two last methods are based on a classifier and were tested with three types of classifiers. These methods have allowed us to identify common features which are relevant for contraction classification. PMID:24454536
2017-01-01
In this paper, we propose a new automatic hyperparameter selection approach for determining the optimal network configuration (network structure and hyperparameters) for deep neural networks using particle swarm optimization (PSO) in combination with a steepest gradient descent algorithm. In the proposed approach, network configurations were coded as a set of real-number m-dimensional vectors as the individuals of the PSO algorithm in the search procedure. During the search procedure, the PSO algorithm is employed to search for optimal network configurations via the particles moving in a finite search space, and the steepest gradient descent algorithm is used to train the DNN classifier with a few training epochs (to find a local optimal solution) during the population evaluation of PSO. After the optimization scheme, the steepest gradient descent algorithm is performed with more epochs and the final solutions (pbest and gbest) of the PSO algorithm to train a final ensemble model and individual DNN classifiers, respectively. The local search ability of the steepest gradient descent algorithm and the global search capabilities of the PSO algorithm are exploited to determine an optimal solution that is close to the global optimum. We constructed several experiments on hand-written characters and biological activity prediction datasets to show that the DNN classifiers trained by the network configurations expressed by the final solutions of the PSO algorithm, employed to construct an ensemble model and individual classifier, outperform the random approach in terms of the generalization performance. Therefore, the proposed approach can be regarded an alternative tool for automatic network structure and parameter selection for deep neural networks. PMID:29236718
An Evaluation of an Algorithm for Linear Inequalities and Its Applications
NASA Technical Reports Server (NTRS)
Jurgensen, J.
1973-01-01
An algorithm is presented for obtaining a solution alpha to a set of inequalities (A alpha) 0 where A is an N x m-matrix and alpha is an m-vector. If the set of inequalities is consistant, then the algorithm is guaranteed to arrive at a solution in a finite number of steps. Also, if in the iteration, a negative vector is obtained, then the initial set of inequalities is inconsistant, and the iteration is terminated.
Andersson, Richard; Larsson, Linnea; Holmqvist, Kenneth; Stridh, Martin; Nyström, Marcus
2017-04-01
Almost all eye-movement researchers use algorithms to parse raw data and detect distinct types of eye movement events, such as fixations, saccades, and pursuit, and then base their results on these. Surprisingly, these algorithms are rarely evaluated. We evaluated the classifications of ten eye-movement event detection algorithms, on data from an SMI HiSpeed 1250 system, and compared them to manual ratings of two human experts. The evaluation focused on fixations, saccades, and post-saccadic oscillations. The evaluation used both event duration parameters, and sample-by-sample comparisons to rank the algorithms. The resulting event durations varied substantially as a function of what algorithm was used. This evaluation differed from previous evaluations by considering a relatively large set of algorithms, multiple events, and data from both static and dynamic stimuli. The main conclusion is that current detectors of only fixations and saccades work reasonably well for static stimuli, but barely better than chance for dynamic stimuli. Differing results across evaluation methods make it difficult to select one winner for fixation detection. For saccade detection, however, the algorithm by Larsson, Nyström and Stridh (IEEE Transaction on Biomedical Engineering, 60(9):2484-2493,2013) outperforms all algorithms in data from both static and dynamic stimuli. The data also show how improperly selected algorithms applied to dynamic data misestimate fixation and saccade properties.
Walters, K; Hardoon, S; Petersen, I; Iliffe, S; Omar, R Z; Nazareth, I; Rait, G
2016-01-21
Existing dementia risk scores require collection of additional data from patients, limiting their use in practice. Routinely collected healthcare data have the potential to assess dementia risk without the need to collect further information. Our objective was to develop and validate a 5-year dementia risk score derived from primary healthcare data. We used data from general practices in The Health Improvement Network (THIN) database from across the UK, randomly selecting 377 practices for a development cohort and identifying 930,395 patients aged 60-95 years without a recording of dementia, cognitive impairment or memory symptoms at baseline. We developed risk algorithm models for two age groups (60-79 and 80-95 years). An external validation was conducted by validating the model on a separate cohort of 264,224 patients from 95 randomly chosen THIN practices that did not contribute to the development cohort. Our main outcome was 5-year risk of first recorded dementia diagnosis. Potential predictors included sociodemographic, cardiovascular, lifestyle and mental health variables. Dementia incidence was 1.88 (95% CI, 1.83-1.93) and 16.53 (95% CI, 16.15-16.92) per 1000 PYAR for those aged 60-79 (n = 6017) and 80-95 years (n = 7104), respectively. Predictors for those aged 60-79 included age, sex, social deprivation, smoking, BMI, heavy alcohol use, anti-hypertensive drugs, diabetes, stroke/TIA, atrial fibrillation, aspirin, depression. The discrimination and calibration of the risk algorithm were good for the 60-79 years model; D statistic 2.03 (95% CI, 1.95-2.11), C index 0.84 (95% CI, 0.81-0.87), and calibration slope 0.98 (95% CI, 0.93-1.02). The algorithm had a high negative predictive value, but lower positive predictive value at most risk thresholds. Discrimination and calibration were poor for the 80-95 years model. Routinely collected data predicts 5-year risk of recorded diagnosis of dementia for those aged 60-79, but not those aged 80+. This algorithm can identify higher risk populations for dementia in primary care. The risk score has a high negative predictive value and may be most helpful in 'ruling out' those at very low risk from further testing or intensive preventative activities.
Qutub, M O; AlBaz, N; Hawken, P; Anoos, A
2011-01-01
To evaluate usefulness of applying either the two-step algorithm (Ag-EIAs and CCNA) or the three-step algorithm (all three assays) for better confirmation of toxigenic Clostridium difficile. The antigen enzyme immunoassays (Ag-EIAs) can accurately identify the glutamate dehydrogenase antigen of toxigenic and nontoxigenic Clostridium difficile. Therefore, it is used in combination with a toxin-detecting assay [cell line culture neutralization assay (CCNA), or the enzyme immunoassays for toxins A and B (TOX-A/BII EIA)] to provide specific evidence of Clostridium difficile-associated diarrhoea. A total of 151 nonformed stool specimens were tested by Ag-EIAs, TOX-A/BII EIA, and CCNA. All tests were performed according to the manufacturer's instructions and the results of Ag-EIAs and TOX-A/BII EIA were read using a spectrophotometer at a wavelength of 450 nm. A total of 61 (40.7%), 38 (25.3%), and 52 (34.7%) specimens tested positive with Ag-EIA, TOX-A/BII EIA, and CCNA, respectively. Overall, the sensitivity, specificity, negative predictive value, and positive predictive value for Ag-EIA were 94%, 87%, 96.6%, and 80.3%, respectively. Whereas for TOX-A/BII EIA, the sensitivity, specificity, negative predictive value, and positive predictive value were 73.1%, 100%, 87.5%, and 100%, respectively. With the two-step algorithm, all 61 Ag-EIAs-positive cases required 2 days for confirmation. With the three-step algorithm, 37 (60.7%) cases were reported immediately, and the remaining 24 (39.3%) required further testing by CCNA. By applying the two-step algorithm, the workload and cost could be reduced by 28.2% compared with the three-step algorithm. The two-step algorithm is the most practical for accurately detecting toxigenic Clostridium difficile, but it is time-consuming.
NASA Astrophysics Data System (ADS)
Pötzi, W.; Veronig, A. M.; Temmer, M.
2018-06-01
In the framework of the Space Situational Awareness program of the European Space Agency (ESA/SSA), an automatic flare detection system was developed at Kanzelhöhe Observatory (KSO). The system has been in operation since mid-2013. The event detection algorithm was upgraded in September 2017. All data back to 2014 was reprocessed using the new algorithm. In order to evaluate both algorithms, we apply verification measures that are commonly used for forecast validation. In order to overcome the problem of rare events, which biases the verification measures, we introduce a new event-based method. We divide the timeline of the Hα observations into positive events (flaring period) and negative events (quiet period), independent of the length of each event. In total, 329 positive and negative events were detected between 2014 and 2016. The hit rate for the new algorithm reached 96% (just five events were missed) and a false-alarm ratio of 17%. This is a significant improvement of the algorithm, as the original system had a hit rate of 85% and a false-alarm ratio of 33%. The true skill score and the Heidke skill score both reach values of 0.8 for the new algorithm; originally, they were at 0.5. The mean flare positions are accurate within {±} 1 heliographic degree for both algorithms, and the peak times improve from a mean difference of 1.7± 2.9 minutes to 1.3± 2.3 minutes. The flare start times that had been systematically late by about 3 minutes as determined by the original algorithm, now match the visual inspection within -0.47± 4.10 minutes.
Chan, An-Wen; Fung, Kinwah; Tran, Jennifer M; Kitchen, Jessica; Austin, Peter C; Weinstock, Martin A; Rochon, Paula A
2016-10-01
Keratinocyte carcinoma (nonmelanoma skin cancer) accounts for substantial burden in terms of high incidence and health care costs but is excluded by most cancer registries in North America. Administrative health insurance claims databases offer an opportunity to identify these cancers using diagnosis and procedural codes submitted for reimbursement purposes. To apply recursive partitioning to derive and validate a claims-based algorithm for identifying keratinocyte carcinoma with high sensitivity and specificity. Retrospective study using population-based administrative databases linked to 602 371 pathology episodes from a community laboratory for adults residing in Ontario, Canada, from January 1, 1992, to December 31, 2009. The final analysis was completed in January 2016. We used recursive partitioning (classification trees) to derive an algorithm based on health insurance claims. The performance of the derived algorithm was compared with 5 prespecified algorithms and validated using an independent academic hospital clinic data set of 2082 patients seen in May and June 2011. Sensitivity, specificity, positive predictive value, and negative predictive value using the histopathological diagnosis as the criterion standard. We aimed to achieve maximal specificity, while maintaining greater than 80% sensitivity. Among 602 371 pathology episodes, 131 562 (21.8%) had a diagnosis of keratinocyte carcinoma. Our final derived algorithm outperformed the 5 simple prespecified algorithms and performed well in both community and hospital data sets in terms of sensitivity (82.6% and 84.9%, respectively), specificity (93.0% and 99.0%, respectively), positive predictive value (76.7% and 69.2%, respectively), and negative predictive value (95.0% and 99.6%, respectively). Algorithm performance did not vary substantially during the 18-year period. This algorithm offers a reliable mechanism for ascertaining keratinocyte carcinoma for epidemiological research in the absence of cancer registry data. Our findings also demonstrate the value of recursive partitioning in deriving valid claims-based algorithms.
VDA, a Method of Choosing a Better Algorithm with Fewer Validations
Kluger, Yuval
2011-01-01
The multitude of bioinformatics algorithms designed for performing a particular computational task presents end-users with the problem of selecting the most appropriate computational tool for analyzing their biological data. The choice of the best available method is often based on expensive experimental validation of the results. We propose an approach to design validation sets for method comparison and performance assessment that are effective in terms of cost and discrimination power. Validation Discriminant Analysis (VDA) is a method for designing a minimal validation dataset to allow reliable comparisons between the performances of different algorithms. Implementation of our VDA approach achieves this reduction by selecting predictions that maximize the minimum Hamming distance between algorithmic predictions in the validation set. We show that VDA can be used to correctly rank algorithms according to their performances. These results are further supported by simulations and by realistic algorithmic comparisons in silico. VDA is a novel, cost-efficient method for minimizing the number of validation experiments necessary for reliable performance estimation and fair comparison between algorithms. Our VDA software is available at http://sourceforge.net/projects/klugerlab/files/VDA/ PMID:22046256
A Parameter Subset Selection Algorithm for Mixed-Effects Models
Schmidt, Kathleen L.; Smith, Ralph C.
2016-01-01
Mixed-effects models are commonly used to statistically model phenomena that include attributes associated with a population or general underlying mechanism as well as effects specific to individuals or components of the general mechanism. This can include individual effects associated with data from multiple experiments. However, the parameterizations used to incorporate the population and individual effects are often unidentifiable in the sense that parameters are not uniquely specified by the data. As a result, the current literature focuses on model selection, by which insensitive parameters are fixed or removed from the model. Model selection methods that employ information criteria are applicablemore » to both linear and nonlinear mixed-effects models, but such techniques are limited in that they are computationally prohibitive for large problems due to the number of possible models that must be tested. To limit the scope of possible models for model selection via information criteria, we introduce a parameter subset selection (PSS) algorithm for mixed-effects models, which orders the parameters by their significance. In conclusion, we provide examples to verify the effectiveness of the PSS algorithm and to test the performance of mixed-effects model selection that makes use of parameter subset selection.« less
Learning Behavior Characterization with Multi-Feature, Hierarchical Activity Sequences
ERIC Educational Resources Information Center
Ye, Cheng; Segedy, James R.; Kinnebrew, John S.; Biswas, Gautam
2015-01-01
This paper discusses Multi-Feature Hierarchical Sequential Pattern Mining, MFH-SPAM, a novel algorithm that efficiently extracts patterns from students' learning activity sequences. This algorithm extends an existing sequential pattern mining algorithm by dynamically selecting the level of specificity for hierarchically-defined features…
Thompson, William K; Rasmussen, Luke V; Pacheco, Jennifer A; Peissig, Peggy L; Denny, Joshua C; Kho, Abel N; Miller, Aaron; Pathak, Jyotishman
2012-01-01
The development of Electronic Health Record (EHR)-based phenotype selection algorithms is a non-trivial and highly iterative process involving domain experts and informaticians. To make it easier to port algorithms across institutions, it is desirable to represent them using an unambiguous formal specification language. For this purpose we evaluated the recently developed National Quality Forum (NQF) information model designed for EHR-based quality measures: the Quality Data Model (QDM). We selected 9 phenotyping algorithms that had been previously developed as part of the eMERGE consortium and translated them into QDM format. Our study concluded that the QDM contains several core elements that make it a promising format for EHR-driven phenotyping algorithms for clinical research. However, we also found areas in which the QDM could be usefully extended, such as representing information extracted from clinical text, and the ability to handle algorithms that do not consist of Boolean combinations of criteria.
EMD self-adaptive selecting relevant modes algorithm for FBG spectrum signal
NASA Astrophysics Data System (ADS)
Chen, Yong; Wu, Chun-ting; Liu, Huan-lin
2017-07-01
Noise may reduce the demodulation accuracy of fiber Bragg grating (FBG) sensing signal so as to affect the quality of sensing detection. Thus, the recovery of a signal from observed noisy data is necessary. In this paper, a precise self-adaptive algorithm of selecting relevant modes is proposed to remove the noise of signal. Empirical mode decomposition (EMD) is first used to decompose a signal into a set of modes. The pseudo modes cancellation is introduced to identify and eliminate false modes, and then the Mutual Information (MI) of partial modes is calculated. MI is used to estimate the critical point of high and low frequency components. Simulation results show that the proposed algorithm estimates the critical point more accurately than the traditional algorithms for FBG spectral signal. While, compared to the similar algorithms, the signal noise ratio of the signal can be improved more than 10 dB after processing by the proposed algorithm, and correlation coefficient can be increased by 0.5, so it demonstrates better de-noising effect.
Conception of discrete systems decomposition algorithm using p-invariants and hypergraphs
NASA Astrophysics Data System (ADS)
Stefanowicz, Ł.
2016-09-01
In the article author presents an idea of decomposition algorithm of discrete systems described by Petri Nets using pinvariants. Decomposition process is significant from the point of view of discrete systems design, because it allows separation of the smaller sequential parts. Proposed algorithm uses modified Martinez-Silva method as well as author's selection algorithm. The developed method is a good complement of classical decomposition algorithms using graphs and hypergraphs.
NASA Technical Reports Server (NTRS)
Brewin, Robert J.W.; Sathyendranath, Shubha; Muller, Dagmar; Brockmann, Carsten; Deschamps, Pierre-Yves; Devred, Emmanuel; Doerffer, Roland; Fomferra, Norman; Franz, Bryan; Grant, Mike;
2013-01-01
Satellite-derived remote-sensing reflectance (Rrs) can be used for mapping biogeochemically relevant variables, such as the chlorophyll concentration and the Inherent Optical Properties (IOPs) of the water, at global scale for use in climate-change studies. Prior to generating such products, suitable algorithms have to be selected that are appropriate for the purpose. Algorithm selection needs to account for both qualitative and quantitative requirements. In this paper we develop an objective methodology designed to rank the quantitative performance of a suite of bio-optical models. The objective classification is applied using the NASA bio-Optical Marine Algorithm Dataset (NOMAD). Using in situ Rrs as input to the models, the performance of eleven semianalytical models, as well as five empirical chlorophyll algorithms and an empirical diffuse attenuation coefficient algorithm, is ranked for spectrally-resolved IOPs, chlorophyll concentration and the diffuse attenuation coefficient at 489 nm. The sensitivity of the objective classification and the uncertainty in the ranking are tested using a Monte-Carlo approach (bootstrapping). Results indicate that the performance of the semi-analytical models varies depending on the product and wavelength of interest. For chlorophyll retrieval, empirical algorithms perform better than semi-analytical models, in general. The performance of these empirical models reflects either their immunity to scale errors or instrument noise in Rrs data, or simply that the data used for model parameterisation were not independent of NOMAD. Nonetheless, uncertainty in the classification suggests that the performance of some semi-analytical algorithms at retrieving chlorophyll is comparable with the empirical algorithms. For phytoplankton absorption at 443 nm, some semi-analytical models also perform with similar accuracy to an empirical model. We discuss the potential biases, limitations and uncertainty in the approach, as well as additional qualitative considerations for algorithm selection for climate-change studies. Our classification has the potential to be routinely implemented, such that the performance of emerging algorithms can be compared with existing algorithms as they become available. In the long-term, such an approach will further aid algorithm development for ocean-colour studies.
Unified selective sorting approach to analyse multi-electrode extracellular data
NASA Astrophysics Data System (ADS)
Veerabhadrappa, R.; Lim, C. P.; Nguyen, T. T.; Berk, M.; Tye, S. J.; Monaghan, P.; Nahavandi, S.; Bhatti, A.
2016-06-01
Extracellular data analysis has become a quintessential method for understanding the neurophysiological responses to stimuli. This demands stringent techniques owing to the complicated nature of the recording environment. In this paper, we highlight the challenges in extracellular multi-electrode recording and data analysis as well as the limitations pertaining to some of the currently employed methodologies. To address some of the challenges, we present a unified algorithm in the form of selective sorting. Selective sorting is modelled around hypothesized generative model, which addresses the natural phenomena of spikes triggered by an intricate neuronal population. The algorithm incorporates Cepstrum of Bispectrum, ad hoc clustering algorithms, wavelet transforms, least square and correlation concepts which strategically tailors a sequence to characterize and form distinctive clusters. Additionally, we demonstrate the influence of noise modelled wavelets to sort overlapping spikes. The algorithm is evaluated using both raw and synthesized data sets with different levels of complexity and the performances are tabulated for comparison using widely accepted qualitative and quantitative indicators.
Unified selective sorting approach to analyse multi-electrode extracellular data
Veerabhadrappa, R.; Lim, C. P.; Nguyen, T. T.; Berk, M.; Tye, S. J.; Monaghan, P.; Nahavandi, S.; Bhatti, A.
2016-01-01
Extracellular data analysis has become a quintessential method for understanding the neurophysiological responses to stimuli. This demands stringent techniques owing to the complicated nature of the recording environment. In this paper, we highlight the challenges in extracellular multi-electrode recording and data analysis as well as the limitations pertaining to some of the currently employed methodologies. To address some of the challenges, we present a unified algorithm in the form of selective sorting. Selective sorting is modelled around hypothesized generative model, which addresses the natural phenomena of spikes triggered by an intricate neuronal population. The algorithm incorporates Cepstrum of Bispectrum, ad hoc clustering algorithms, wavelet transforms, least square and correlation concepts which strategically tailors a sequence to characterize and form distinctive clusters. Additionally, we demonstrate the influence of noise modelled wavelets to sort overlapping spikes. The algorithm is evaluated using both raw and synthesized data sets with different levels of complexity and the performances are tabulated for comparison using widely accepted qualitative and quantitative indicators. PMID:27339770
Singer, Y
1997-08-01
A constant rebalanced portfolio is an asset allocation algorithm which keeps the same distribution of wealth among a set of assets along a period of time. Recently, there has been work on on-line portfolio selection algorithms which are competitive with the best constant rebalanced portfolio determined in hindsight (Cover, 1991; Helmbold et al., 1996; Cover and Ordentlich, 1996). By their nature, these algorithms employ the assumption that high returns can be achieved using a fixed asset allocation strategy. However, stock markets are far from being stationary and in many cases the wealth achieved by a constant rebalanced portfolio is much smaller than the wealth achieved by an ad hoc investment strategy that adapts to changes in the market. In this paper we present an efficient portfolio selection algorithm that is able to track a changing market. We also describe a simple extension of the algorithm for the case of a general transaction cost, including the transactions cost models recently investigated in (Blum and Kalai, 1997). We provide a simple analysis of the competitiveness of the algorithm and check its performance on real stock data from the New York Stock Exchange accumulated during a 22-year period. On this data, our algorithm outperforms all the algorithms referenced above, with and without transaction costs.
Algorithm for AEEG data selection leading to wireless and long term epilepsy monitoring.
Casson, Alexander J; Yates, David C; Patel, Shyam; Rodriguez-Villegas, Esther
2007-01-01
High quality, wireless ambulatory EEG (AEEG) systems that can operate over extended periods of time are not currently feasible due to the high power consumption of wireless transmitters. Previous work has thus proposed data reduction by only transmitting sections of data that contain candidate epileptic activity. This paper investigates algorithms by which this data selection can be carried out. It is essential that the algorithm is low power and that all possible features are identified, even at the expense of more false detections. Given this, a brief review of spike detection algorithms is carried out with a view to using these algorithms to drive the data reduction process. A CWT based algorithm is deemed most suitable for use and an algorithm is described in detail and its performance tested. It is found that over 90% of expert marked spikes are identified whilst giving a 40% reduction in the amount of data to be transmitted and analysed. The performance varies with the recording duration in response to each detection and this effect is also investigated. The proposed algorithm will form the basis of new a AEEG system that allows wireless and longer term epilepsy monitoring.
Local connected fractal dimension analysis in gill of fish experimentally exposed to toxicants.
Manera, Maurizio; Giari, Luisa; De Pasquale, Joseph A; Sayyaf Dezfuli, Bahram
2016-06-01
An operator-neutral method was implemented to objectively assess European seabass, Dicentrarchus labrax (Linnaeus, 1758) gill pathology after experimental exposure to cadmium (Cd) and terbuthylazine (TBA) for 24 and 48h. An algorithm-derived local connected fractal dimension (LCFD) frequency measure was used in this comparative analysis. Canonical variates (CVA) and linear discriminant analysis (LDA) were used to evaluate the discrimination power of the method among exposure classes (unexposed, Cd exposed, TBA exposed). Misclassification, sensitivity and specificity, both with original and cross-validated cases, were determined. LCFDs frequencies enhanced the differences among classes which were visually selected after their means, respective variances and the differences between Cd and TBA exposed means, with respect to unexposed mean, were analyzed by scatter plots. Selected frequencies were then scanned by means of LDA, stepwise analysis, and Mahalanobis distance to detect the most discriminative frequencies out of ten originally selected. Discrimination resulted in 91.7% of cross-validated cases correctly classified (22 out of 24 total cases), with sensitivity and specificity, respectively, of 95.5% (1 false negative with respect to 21 really positive cases) and 75% (1 false positive with respect to 3 really negative cases). CVA with convex hull polygons ensured prompt, visually intuitive discrimination among exposure classes and graphically supported the false positive case. The combined use of semithin sections, which enhanced the visual evaluation of the overall lamellar structure; of LCFD analysis, which objectively detected local variation in complexity, without the possible bias connected to human personnel; and of CVA/LDA, could be an objective, sensitive and specific approach to study fish gill lamellar pathology. Furthermore this approach enabled discrimination with sufficient confidence between exposure classes or pathological states and avoided misdiagnosis. Copyright © 2016 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Shahini Shamsabadi, Salar
A web-based PAVEment MONitoring system, PAVEMON, is a GIS oriented platform for accommodating, representing, and leveraging data from a multi-modal mobile sensor system. Stated sensor system consists of acoustic, optical, electromagnetic, and GPS sensors and is capable of producing as much as 1 Terabyte of data per day. Multi-channel raw sensor data (microphone, accelerometer, tire pressure sensor, video) and processed results (road profile, crack density, international roughness index, micro texture depth, etc.) are outputs of this sensor system. By correlating the sensor measurements and positioning data collected in tight time synchronization, PAVEMON attaches a spatial component to all the datasets. These spatially indexed outputs are placed into an Oracle database which integrates seamlessly with PAVEMON's web-based system. The web-based system of PAVEMON consists of two major modules: 1) a GIS module for visualizing and spatial analysis of pavement condition information layers, and 2) a decision-support module for managing maintenance and repair (Mℝ) activities and predicting future budget needs. PAVEMON weaves together sensor data with third-party climate and traffic information from the National Oceanic and Atmospheric Administration (NOAA) and Long Term Pavement Performance (LTPP) databases for an organized data driven approach to conduct pavement management activities. PAVEMON deals with heterogeneous and redundant observations by fusing them for jointly-derived higher-confidence results. A prominent example of the fusion algorithms developed within PAVEMON is a data fusion algorithm used for estimating the overall pavement conditions in terms of ASTM's Pavement Condition Index (PCI). PAVEMON predicts PCI by undertaking a statistical fusion approach and selecting a subset of all the sensor measurements. Other fusion algorithms include noise-removal algorithms to remove false negatives in the sensor data in addition to fusion algorithms developed for identifying features on the road. PAVEMON offers an ideal research and monitoring platform for rapid, intelligent and comprehensive evaluation of tomorrow's transportation infrastructure based on up-to-date data from heterogeneous sensor systems.
Harman, David J; Ryder, Stephen D; James, Martin W; Jelpke, Matthew; Ottey, Dominic S; Wilkes, Emilie A; Card, Timothy R; Aithal, Guruprasad P; Guha, Indra Neil
2015-05-03
To assess the feasibility of a novel diagnostic algorithm targeting patients with risk factors for chronic liver disease in a community setting. Prospective cross-sectional study. Two primary care practices (adult patient population 10,479) in Nottingham, UK. Adult patients (aged 18 years or over) fulfilling one or more selected risk factors for developing chronic liver disease: (1) hazardous alcohol use, (2) type 2 diabetes or (3) persistently elevated alanine aminotransferase (ALT) liver function enzyme with negative serology. A serial biomarker algorithm, using a simple blood-based marker (aspartate aminotransferase:ALT ratio for hazardous alcohol users, BARD score for other risk groups) and subsequently liver stiffness measurement using transient elastography (TE). Diagnosis of clinically significant liver disease (defined as liver stiffness ≥8 kPa); definitive diagnosis of liver cirrhosis. We identified 920 patients with the defined risk factors of whom 504 patients agreed to undergo investigation. A normal blood biomarker was found in 62 patients (12.3%) who required no further investigation. Subsequently, 378 patients agreed to undergo TE, of whom 98 (26.8% of valid scans) had elevated liver stiffness. Importantly, 71/98 (72.4%) patients with elevated liver stiffness had normal liver enzymes and would be missed by traditional investigation algorithms. We identified 11 new patients with definite cirrhosis, representing a 140% increase in the number of diagnosed cases in this population. A non-invasive liver investigation algorithm based in a community setting is feasible to implement. Targeting risk factors using a non-invasive biomarker approach identified a substantial number of patients with previously undetected cirrhosis. The diagnostic algorithm utilised for this study can be found on clinicaltrials.gov (NCT02037867), and is part of a continuing longitudinal cohort study. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Efficient experimental design of high-fidelity three-qubit quantum gates via genetic programming
NASA Astrophysics Data System (ADS)
Devra, Amit; Prabhu, Prithviraj; Singh, Harpreet; Arvind; Dorai, Kavita
2018-03-01
We have designed efficient quantum circuits for the three-qubit Toffoli (controlled-controlled-NOT) and the Fredkin (controlled-SWAP) gate, optimized via genetic programming methods. The gates thus obtained were experimentally implemented on a three-qubit NMR quantum information processor, with a high fidelity. Toffoli and Fredkin gates in conjunction with the single-qubit Hadamard gates form a universal gate set for quantum computing and are an essential component of several quantum algorithms. Genetic algorithms are stochastic search algorithms based on the logic of natural selection and biological genetics and have been widely used for quantum information processing applications. We devised a new selection mechanism within the genetic algorithm framework to select individuals from a population. We call this mechanism the "Luck-Choose" mechanism and were able to achieve faster convergence to a solution using this mechanism, as compared to existing selection mechanisms. The optimization was performed under the constraint that the experimentally implemented pulses are of short duration and can be implemented with high fidelity. We demonstrate the advantage of our pulse sequences by comparing our results with existing experimental schemes and other numerical optimization methods.
Sensor Data Quality and Angular Rate Down-Selection Algorithms on SLS EM-1
NASA Technical Reports Server (NTRS)
Park, Thomas; Oliver, Emerson; Smith, Austin
2018-01-01
The NASA Space Launch System Block 1 launch vehicle is equipped with an Inertial Navigation System (INS) and multiple Rate Gyro Assemblies (RGA) that are used in the Guidance, Navigation, and Control (GN&C) algorithms. The INS provides the inertial position, velocity, and attitude of the vehicle along with both angular rate and specific force measurements. Additionally, multiple sets of co-located rate gyros supply angular rate data. The collection of angular rate data, taken along the launch vehicle, is used to separate out vehicle motion from flexible body dynamics. Since the system architecture uses redundant sensors, the capability was developed to evaluate the health (or validity) of the independent measurements. A suite of Sensor Data Quality (SDQ) algorithms is responsible for assessing the angular rate data from the redundant sensors. When failures are detected, SDQ will take the appropriate action and disqualify or remove faulted sensors from forward processing. Additionally, the SDQ algorithms contain logic for down-selecting the angular rate data used by the GN&C software from the set of healthy measurements. This paper provides an overview of the algorithms used for both fault-detection and measurement down selection.
Localization of Non-Linearly Modeled Autonomous Mobile Robots Using Out-of-Sequence Measurements
Besada-Portas, Eva; Lopez-Orozco, Jose A.; Lanillos, Pablo; de la Cruz, Jesus M.
2012-01-01
This paper presents a state of the art of the estimation algorithms dealing with Out-of-Sequence (OOS) measurements for non-linearly modeled systems. The state of the art includes a critical analysis of the algorithm properties that takes into account the applicability of these techniques to autonomous mobile robot navigation based on the fusion of the measurements provided, delayed and OOS, by multiple sensors. Besides, it shows a representative example of the use of one of the most computationally efficient approaches in the localization module of the control software of a real robot (which has non-linear dynamics, and linear and non-linear sensors) and compares its performance against other approaches. The simulated results obtained with the selected OOS algorithm shows the computational requirements that each sensor of the robot imposes to it. The real experiments show how the inclusion of the selected OOS algorithm in the control software lets the robot successfully navigate in spite of receiving many OOS measurements. Finally, the comparison highlights that not only is the selected OOS algorithm among the best performing ones of the comparison, but it also has the lowest computational and memory cost. PMID:22736962
Localization of non-linearly modeled autonomous mobile robots using out-of-sequence measurements.
Besada-Portas, Eva; Lopez-Orozco, Jose A; Lanillos, Pablo; de la Cruz, Jesus M
2012-01-01
This paper presents a state of the art of the estimation algorithms dealing with Out-of-Sequence (OOS) measurements for non-linearly modeled systems. The state of the art includes a critical analysis of the algorithm properties that takes into account the applicability of these techniques to autonomous mobile robot navigation based on the fusion of the measurements provided, delayed and OOS, by multiple sensors. Besides, it shows a representative example of the use of one of the most computationally efficient approaches in the localization module of the control software of a real robot (which has non-linear dynamics, and linear and non-linear sensors) and compares its performance against other approaches. The simulated results obtained with the selected OOS algorithm shows the computational requirements that each sensor of the robot imposes to it. The real experiments show how the inclusion of the selected OOS algorithm in the control software lets the robot successfully navigate in spite of receiving many OOS measurements. Finally, the comparison highlights that not only is the selected OOS algorithm among the best performing ones of the comparison, but it also has the lowest computational and memory cost.
NASA Astrophysics Data System (ADS)
Zaiwani, B. E.; Zarlis, M.; Efendi, S.
2018-03-01
In this research, the improvement of hybridization algorithm of Fuzzy Analytic Hierarchy Process (FAHP) with Fuzzy Technique for Order Preference by Similarity to Ideal Solution (FTOPSIS) in selecting the best bank chief inspector based on several qualitative and quantitative criteria with various priorities. To improve the performance of the above research, FAHP algorithm hybridization with Fuzzy Multiple Attribute Decision Making - Simple Additive Weighting (FMADM-SAW) algorithm was adopted, which applied FAHP algorithm to the weighting process and SAW for the ranking process to determine the promotion of employee at a government institution. The result of improvement of the average value of Efficiency Rate (ER) is 85.24%, which means that this research has succeeded in improving the previous research that is equal to 77.82%. Keywords: Ranking and Selection, Fuzzy AHP, Fuzzy TOPSIS, FMADM-SAW.
Genetic algorithm based fuzzy control of spacecraft autonomous rendezvous
NASA Technical Reports Server (NTRS)
Karr, C. L.; Freeman, L. M.; Meredith, D. L.
1990-01-01
The U.S. Bureau of Mines is currently investigating ways to combine the control capabilities of fuzzy logic with the learning capabilities of genetic algorithms. Fuzzy logic allows for the uncertainty inherent in most control problems to be incorporated into conventional expert systems. Although fuzzy logic based expert systems have been used successfully for controlling a number of physical systems, the selection of acceptable fuzzy membership functions has generally been a subjective decision. High performance fuzzy membership functions for a fuzzy logic controller that manipulates a mathematical model simulating the autonomous rendezvous of spacecraft are learned using a genetic algorithm, a search technique based on the mechanics of natural genetics. The membership functions learned by the genetic algorithm provide for a more efficient fuzzy logic controller than membership functions selected by the authors for the rendezvous problem. Thus, genetic algorithms are potentially an effective and structured approach for learning fuzzy membership functions.
NASA Astrophysics Data System (ADS)
Furlong, Cosme; Pryputniewicz, Ryszard J.
2002-06-01
Effective suppression of speckle noise content in interferometric data images can help in improving accuracy and resolution of the results obtained with interferometric optical metrology techniques. In this paper, novel speckle noise reduction algorithms based on the discrete wavelet transformation are presented. The algorithms proceed by: (a) estimating the noise level contained in the interferograms of interest, (b) selecting wavelet families, (c) applying the wavelet transformation using the selected families, (d) wavelet thresholding, and (e) applying the inverse wavelet transformation, producing denoised interferograms. The algorithms are applied to the different stages of the processing procedures utilized for generation of quantitative speckle correlation interferometry data of fiber-optic based opto-electronic holography (FOBOEH) techniques, allowing identification of optimal processing conditions. It is shown that wavelet algorithms are effective for speckle noise reduction while preserving image features otherwise faded with other algorithms.
Algorithmic Mechanism Design of Evolutionary Computation.
Pei, Yan
2015-01-01
We consider algorithmic design, enhancement, and improvement of evolutionary computation as a mechanism design problem. All individuals or several groups of individuals can be considered as self-interested agents. The individuals in evolutionary computation can manipulate parameter settings and operations by satisfying their own preferences, which are defined by an evolutionary computation algorithm designer, rather than by following a fixed algorithm rule. Evolutionary computation algorithm designers or self-adaptive methods should construct proper rules and mechanisms for all agents (individuals) to conduct their evolution behaviour correctly in order to definitely achieve the desired and preset objective(s). As a case study, we propose a formal framework on parameter setting, strategy selection, and algorithmic design of evolutionary computation by considering the Nash strategy equilibrium of a mechanism design in the search process. The evaluation results present the efficiency of the framework. This primary principle can be implemented in any evolutionary computation algorithm that needs to consider strategy selection issues in its optimization process. The final objective of our work is to solve evolutionary computation design as an algorithmic mechanism design problem and establish its fundamental aspect by taking this perspective. This paper is the first step towards achieving this objective by implementing a strategy equilibrium solution (such as Nash equilibrium) in evolutionary computation algorithm.
Algorithmic Mechanism Design of Evolutionary Computation
2015-01-01
We consider algorithmic design, enhancement, and improvement of evolutionary computation as a mechanism design problem. All individuals or several groups of individuals can be considered as self-interested agents. The individuals in evolutionary computation can manipulate parameter settings and operations by satisfying their own preferences, which are defined by an evolutionary computation algorithm designer, rather than by following a fixed algorithm rule. Evolutionary computation algorithm designers or self-adaptive methods should construct proper rules and mechanisms for all agents (individuals) to conduct their evolution behaviour correctly in order to definitely achieve the desired and preset objective(s). As a case study, we propose a formal framework on parameter setting, strategy selection, and algorithmic design of evolutionary computation by considering the Nash strategy equilibrium of a mechanism design in the search process. The evaluation results present the efficiency of the framework. This primary principle can be implemented in any evolutionary computation algorithm that needs to consider strategy selection issues in its optimization process. The final objective of our work is to solve evolutionary computation design as an algorithmic mechanism design problem and establish its fundamental aspect by taking this perspective. This paper is the first step towards achieving this objective by implementing a strategy equilibrium solution (such as Nash equilibrium) in evolutionary computation algorithm. PMID:26257777
NASA Astrophysics Data System (ADS)
Wang, Lijuan; Yan, Yong; Wang, Xue; Wang, Tao
2017-03-01
Input variable selection is an essential step in the development of data-driven models for environmental, biological and industrial applications. Through input variable selection to eliminate the irrelevant or redundant variables, a suitable subset of variables is identified as the input of a model. Meanwhile, through input variable selection the complexity of the model structure is simplified and the computational efficiency is improved. This paper describes the procedures of the input variable selection for the data-driven models for the measurement of liquid mass flowrate and gas volume fraction under two-phase flow conditions using Coriolis flowmeters. Three advanced input variable selection methods, including partial mutual information (PMI), genetic algorithm-artificial neural network (GA-ANN) and tree-based iterative input selection (IIS) are applied in this study. Typical data-driven models incorporating support vector machine (SVM) are established individually based on the input candidates resulting from the selection methods. The validity of the selection outcomes is assessed through an output performance comparison of the SVM based data-driven models and sensitivity analysis. The validation and analysis results suggest that the input variables selected from the PMI algorithm provide more effective information for the models to measure liquid mass flowrate while the IIS algorithm provides a fewer but more effective variables for the models to predict gas volume fraction.
Optimal design of low-density SNP arrays for genomic prediction: algorithm and applications
USDA-ARS?s Scientific Manuscript database
Low-density (LD) single nucleotide polymorphism (SNP) arrays provide a cost-effective solution for genomic prediction and selection, but algorithms and computational tools are needed for their optimal design. A multiple-objective, local optimization (MOLO) algorithm was developed for design of optim...
Integrated feature extraction and selection for neuroimage classification
NASA Astrophysics Data System (ADS)
Fan, Yong; Shen, Dinggang
2009-02-01
Feature extraction and selection are of great importance in neuroimage classification for identifying informative features and reducing feature dimensionality, which are generally implemented as two separate steps. This paper presents an integrated feature extraction and selection algorithm with two iterative steps: constrained subspace learning based feature extraction and support vector machine (SVM) based feature selection. The subspace learning based feature extraction focuses on the brain regions with higher possibility of being affected by the disease under study, while the possibility of brain regions being affected by disease is estimated by the SVM based feature selection, in conjunction with SVM classification. This algorithm can not only take into account the inter-correlation among different brain regions, but also overcome the limitation of traditional subspace learning based feature extraction methods. To achieve robust performance and optimal selection of parameters involved in feature extraction, selection, and classification, a bootstrapping strategy is used to generate multiple versions of training and testing sets for parameter optimization, according to the classification performance measured by the area under the ROC (receiver operating characteristic) curve. The integrated feature extraction and selection method is applied to a structural MR image based Alzheimer's disease (AD) study with 98 non-demented and 100 demented subjects. Cross-validation results indicate that the proposed algorithm can improve performance of the traditional subspace learning based classification.
Report on the Development of the Advanced Encryption Standard (AES)
Nechvatal, James; Barker, Elaine; Bassham, Lawrence; Burr, William; Dworkin, Morris; Foti, James; Roback, Edward
2001-01-01
In 1997, the National Institute of Standards and Technology (NIST) initiated a process to select a symmetric-key encryption algorithm to be used to protect sensitive (unclassified) Federal information in furtherance of NIST’s statutory responsibilities. In 1998, NIST announced the acceptance of 15 candidate algorithms and requested the assistance of the cryptographic research community in analyzing the candidates. This analysis included an initial examination of the security and efficiency characteristics for each algorithm. NIST reviewed the results of this preliminary research and selected MARS, RC™, Rijndael, Serpent and Twofish as finalists. Having reviewed further public analysis of the finalists, NIST has decided to propose Rijndael as the Advanced Encryption Standard (AES). The research results and rationale for this selection are documented in this report. PMID:27500035
de Almeida, Valber Elias; de Araújo Gomes, Adriano; de Sousa Fernandes, David Douglas; Goicoechea, Héctor Casimiro; Galvão, Roberto Kawakami Harrop; Araújo, Mario Cesar Ugulino
2018-05-01
This paper proposes a new variable selection method for nonlinear multivariate calibration, combining the Successive Projections Algorithm for interval selection (iSPA) with the Kernel Partial Least Squares (Kernel-PLS) modelling technique. The proposed iSPA-Kernel-PLS algorithm is employed in a case study involving a Vis-NIR spectrometric dataset with complex nonlinear features. The analytical problem consists of determining Brix and sucrose content in samples from a sugar production system, on the basis of transflectance spectra. As compared to full-spectrum Kernel-PLS, the iSPA-Kernel-PLS models involve a smaller number of variables and display statistically significant superiority in terms of accuracy and/or bias in the predictions. Published by Elsevier B.V.
NASA Astrophysics Data System (ADS)
Kostrzewa, Daniel; Josiński, Henryk
2016-06-01
The expanded Invasive Weed Optimization algorithm (exIWO) is an optimization metaheuristic modelled on the original IWO version inspired by dynamic growth of weeds colony. The authors of the present paper have modified the exIWO algorithm introducing a set of both deterministic and non-deterministic strategies of individuals' selection. The goal of the project was to evaluate the modified exIWO by testing its usefulness for multidimensional numerical functions optimization. The optimized functions: Griewank, Rastrigin, and Rosenbrock are frequently used as benchmarks because of their characteristics.
A Constraint-Based Planner for Data Production
NASA Technical Reports Server (NTRS)
Pang, Wanlin; Golden, Keith
2005-01-01
This paper presents a graph-based backtracking algorithm designed to support constrain-tbased planning in data production domains. This algorithm performs backtracking at two nested levels: the outer- backtracking following the structure of the planning graph to select planner subgoals and actions to achieve them and the inner-backtracking inside a subproblem associated with a selected action to find action parameter values. We show this algorithm works well in a planner applied to automating data production in an ecological forecasting system. We also discuss how the idea of multi-level backtracking may improve efficiency of solving semi-structured constraint problems.
Vivekanandan, T; Sriman Narayana Iyengar, N Ch
2017-11-01
Enormous data growth in multiple domains has posed a great challenge for data processing and analysis techniques. In particular, the traditional record maintenance strategy has been replaced in the healthcare system. It is vital to develop a model that is able to handle the huge amount of e-healthcare data efficiently. In this paper, the challenging tasks of selecting critical features from the enormous set of available features and diagnosing heart disease are carried out. Feature selection is one of the most widely used pre-processing steps in classification problems. A modified differential evolution (DE) algorithm is used to perform feature selection for cardiovascular disease and optimization of selected features. Of the 10 available strategies for the traditional DE algorithm, the seventh strategy, which is represented by DE/rand/2/exp, is considered for comparative study. The performance analysis of the developed modified DE strategy is given in this paper. With the selected critical features, prediction of heart disease is carried out using fuzzy AHP and a feed-forward neural network. Various performance measures of integrating the modified differential evolution algorithm with fuzzy AHP and a feed-forward neural network in the prediction of heart disease are evaluated in this paper. The accuracy of the proposed hybrid model is 83%, which is higher than that of some other existing models. In addition, the prediction time of the proposed hybrid model is also evaluated and has shown promising results. Copyright © 2017 Elsevier Ltd. All rights reserved.
Evaluation of Anomaly Detection Capability for Ground-Based Pre-Launch Shuttle Operations. Chapter 8
NASA Technical Reports Server (NTRS)
Martin, Rodney Alexander
2010-01-01
This chapter will provide a thorough end-to-end description of the process for evaluation of three different data-driven algorithms for anomaly detection to select the best candidate for deployment as part of a suite of IVHM (Integrated Vehicle Health Management) technologies. These algorithms were deemed to be sufficiently mature enough to be considered viable candidates for deployment in support of the maiden launch of Ares I-X, the successor to the Space Shuttle for NASA's Constellation program. Data-driven algorithms are just one of three different types being deployed. The other two types of algorithms being deployed include a "nile-based" expert system, and a "model-based" system. Within these two categories, the deployable candidates have already been selected based upon qualitative factors such as flight heritage. For the rule-based system, SHINE (Spacecraft High-speed Inference Engine) has been selected for deployment, which is a component of BEAM (Beacon-based Exception Analysis for Multimissions), a patented technology developed at NASA's JPL (Jet Propulsion Laboratory) and serves to aid in the management and identification of operational modes. For the "model-based" system, a commercially available package developed by QSI (Qualtech Systems, Inc.), TEAMS (Testability Engineering and Maintenance System) has been selected for deployment to aid in diagnosis. In the context of this particular deployment, distinctions among the use of the terms "data-driven," "rule-based," and "model-based," can be found in. Although there are three different categories of algorithms that have been selected for deployment, our main focus in this chapter will be on the evaluation of three candidates for data-driven anomaly detection. These algorithms will be evaluated upon their capability for robustly detecting incipient faults or failures in the ground-based phase of pre-launch space shuttle operations, rather than based oil heritage as performed in previous studies. Robust detection will allow for the achievement of pre-specified minimum false alarm and/or missed detection rates in the selection of alert thresholds. All algorithms will also be optimized with respect to an aggregation of these same criteria. Our study relies upon the use of Shuttle data to act as was a proxy for and in preparation for application to Ares I-X data, which uses a very similar hardware platform for the subsystems that are being targeted (TVC - Thrust Vector Control subsystem for the SRB (Solid Rocket Booster)).
Positive-unlabeled learning for disease gene identification
Yang, Peng; Li, Xiao-Li; Mei, Jian-Ping; Kwoh, Chee-Keong; Ng, See-Kiong
2012-01-01
Background: Identifying disease genes from human genome is an important but challenging task in biomedical research. Machine learning methods can be applied to discover new disease genes based on the known ones. Existing machine learning methods typically use the known disease genes as the positive training set P and the unknown genes as the negative training set N (non-disease gene set does not exist) to build classifiers to identify new disease genes from the unknown genes. However, such kind of classifiers is actually built from a noisy negative set N as there can be unknown disease genes in N itself. As a result, the classifiers do not perform as well as they could be. Result: Instead of treating the unknown genes as negative examples in N, we treat them as an unlabeled set U. We design a novel positive-unlabeled (PU) learning algorithm PUDI (PU learning for disease gene identification) to build a classifier using P and U. We first partition U into four sets, namely, reliable negative set RN, likely positive set LP, likely negative set LN and weak negative set WN. The weighted support vector machines are then used to build a multi-level classifier based on the four training sets and positive training set P to identify disease genes. Our experimental results demonstrate that our proposed PUDI algorithm outperformed the existing methods significantly. Conclusion: The proposed PUDI algorithm is able to identify disease genes more accurately by treating the unknown data more appropriately as unlabeled set U instead of negative set N. Given that many machine learning problems in biomedical research do involve positive and unlabeled data instead of negative data, it is possible that the machine learning methods for these problems can be further improved by adopting PU learning methods, as we have done here for disease gene identification. Availability and implementation: The executable program and data are available at http://www1.i2r.a-star.edu.sg/∼xlli/PUDI/PUDI.html. Contact: xlli@i2r.a-star.edu.sg or yang0293@e.ntu.edu.sg Supplementary information: Supplementary Data are available at Bioinformatics online. PMID:22923290
Lou, Xin Yuan; Sun, Lin Fu
2017-01-01
This paper proposes a new support vector machine (SVM) optimization scheme based on an improved chaotic fly optimization algorithm (FOA) with a mutation strategy to simultaneously perform parameter setting turning for the SVM and feature selection. In the improved FOA, the chaotic particle initializes the fruit fly swarm location and replaces the expression of distance for the fruit fly to find the food source. However, the proposed mutation strategy uses two distinct generative mechanisms for new food sources at the osphresis phase, allowing the algorithm procedure to search for the optimal solution in both the whole solution space and within the local solution space containing the fruit fly swarm location. In an evaluation based on a group of ten benchmark problems, the proposed algorithm’s performance is compared with that of other well-known algorithms, and the results support the superiority of the proposed algorithm. Moreover, this algorithm is successfully applied in a SVM to perform both parameter setting turning for the SVM and feature selection to solve real-world classification problems. This method is called chaotic fruit fly optimization algorithm (CIFOA)-SVM and has been shown to be a more robust and effective optimization method than other well-known methods, particularly in terms of solving the medical diagnosis problem and the credit card problem. PMID:28369096
A Scalable O(N) Algorithm for Large-Scale Parallel First-Principles Molecular Dynamics Simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Osei-Kuffuor, Daniel; Fattebert, Jean-Luc
2014-01-01
Traditional algorithms for first-principles molecular dynamics (FPMD) simulations only gain a modest capability increase from current petascale computers, due to their O(N 3) complexity and their heavy use of global communications. To address this issue, we are developing a truly scalable O(N) complexity FPMD algorithm, based on density functional theory (DFT), which avoids global communications. The computational model uses a general nonorthogonal orbital formulation for the DFT energy functional, which requires knowledge of selected elements of the inverse of the associated overlap matrix. We present a scalable algorithm for approximately computing selected entries of the inverse of the overlap matrix,more » based on an approximate inverse technique, by inverting local blocks corresponding to principal submatrices of the global overlap matrix. The new FPMD algorithm exploits sparsity and uses nearest neighbor communication to provide a computational scheme capable of extreme scalability. Accuracy is controlled by the mesh spacing of the finite difference discretization, the size of the localization regions in which the electronic orbitals are confined, and a cutoff beyond which the entries of the overlap matrix can be omitted when computing selected entries of its inverse. We demonstrate the algorithm's excellent parallel scaling for up to O(100K) atoms on O(100K) processors, with a wall-clock time of O(1) minute per molecular dynamics time step.« less
Classification of radiolarian images with hand-crafted and deep features
NASA Astrophysics Data System (ADS)
Keçeli, Ali Seydi; Kaya, Aydın; Keçeli, Seda Uzunçimen
2017-12-01
Radiolarians are planktonic protozoa and are important biostratigraphic and paleoenvironmental indicators for paleogeographic reconstructions. Radiolarian paleontology still remains as a low cost and the one of the most convenient way to obtain dating of deep ocean sediments. Traditional methods for identifying radiolarians are time-consuming and cannot scale to the granularity or scope necessary for large-scale studies. Automated image classification will allow making these analyses promptly. In this study, a method for automatic radiolarian image classification is proposed on Scanning Electron Microscope (SEM) images of radiolarians to ease species identification of fossilized radiolarians. The proposed method uses both hand-crafted features like invariant moments, wavelet moments, Gabor features, basic morphological features and deep features obtained from a pre-trained Convolutional Neural Network (CNN). Feature selection is applied over deep features to reduce high dimensionality. Classification outcomes are analyzed to compare hand-crafted features, deep features, and their combinations. Results show that the deep features obtained from a pre-trained CNN are more discriminative comparing to hand-crafted ones. Additionally, feature selection utilizes to the computational cost of classification algorithms and have no negative effect on classification accuracy.
A quasi-likelihood approach to non-negative matrix factorization
Devarajan, Karthik; Cheung, Vincent C.K.
2017-01-01
A unified approach to non-negative matrix factorization based on the theory of generalized linear models is proposed. This approach embeds a variety of statistical models, including the exponential family, within a single theoretical framework and provides a unified view of such factorizations from the perspective of quasi-likelihood. Using this framework, a family of algorithms for handling signal-dependent noise is developed and its convergence proven using the Expectation-Maximization algorithm. In addition, a measure to evaluate the goodness-of-fit of the resulting factorization is described. The proposed methods allow modeling of non-linear effects via appropriate link functions and are illustrated using an application in biomedical signal processing. PMID:27348511
The Roadmaker's algorithm for the discrete pulse transform.
Laurie, Dirk P
2011-02-01
The discrete pulse transform (DPT) is a decomposition of an observed signal into a sum of pulses, i.e., signals that are constant on a connected set and zero elsewhere. Originally developed for 1-D signal processing, the DPT has recently been generalized to more dimensions. Applications in image processing are currently being investigated. The time required to compute the DPT as originally defined via the successive application of LULU operators (members of a class of minimax filters studied by Rohwer) has been a severe drawback to its applicability. This paper introduces a fast method for obtaining such a decomposition, called the Roadmaker's algorithm because it involves filling pits and razing bumps. It acts selectively only on those features actually present in the signal, flattening them in order of increasing size by subtracing an appropriate positive or negative pulse, which is then appended to the decomposition. The implementation described here covers 1-D signal as well as two and 3-D image processing in a single framework. This is achieved by considering the signal or image as a function defined on a graph, with the geometry specified by the edges of the graph. Whenever a feature is flattened, nodes in the graph are merged, until eventually only one node remains. At that stage, a new set of edges for the same nodes as the graph, forming a tree structure, defines the obtained decomposition. The Roadmaker's algorithm is shown to be equivalent to the DPT in the sense of obtaining the same decomposition. However, its simpler operators are not in general equivalent to the LULU operators in situations where those operators are not applied successively. A by-product of the Roadmaker's algorithm is that it yields a proof of the so-called Highlight Conjecture, stated as an open problem in 2006. We pay particular attention to algorithmic details and complexity, including a demonstration that in the 1-D case, and also in the case of a complete graph, the Roadmaker's algorithm has optimal complexity: it runs in time O(m), where m is the number of arcs in the graph.
Caso, Giuseppe; de Nardis, Luca; di Benedetto, Maria-Gabriella
2015-10-30
The weighted k-nearest neighbors (WkNN) algorithm is by far the most popular choice in the design of fingerprinting indoor positioning systems based on WiFi received signal strength (RSS). WkNN estimates the position of a target device by selecting k reference points (RPs) based on the similarity of their fingerprints with the measured RSS values. The position of the target device is then obtained as a weighted sum of the positions of the k RPs. Two-step WkNN positioning algorithms were recently proposed, in which RPs are divided into clusters using the affinity propagation clustering algorithm, and one representative for each cluster is selected. Only cluster representatives are then considered during the position estimation, leading to a significant computational complexity reduction compared to traditional, flat WkNN. Flat and two-step WkNN share the issue of properly selecting the similarity metric so as to guarantee good positioning accuracy: in two-step WkNN, in particular, the metric impacts three different steps in the position estimation, that is cluster formation, cluster selection and RP selection and weighting. So far, however, the only similarity metric considered in the literature was the one proposed in the original formulation of the affinity propagation algorithm. This paper fills this gap by comparing different metrics and, based on this comparison, proposes a novel mixed approach in which different metrics are adopted in the different steps of the position estimation procedure. The analysis is supported by an extensive experimental campaign carried out in a multi-floor 3D indoor positioning testbed. The impact of similarity metrics and their combinations on the structure and size of the resulting clusters, 3D positioning accuracy and computational complexity are investigated. Results show that the adoption of metrics different from the one proposed in the original affinity propagation algorithm and, in particular, the combination of different metrics can significantly improve the positioning accuracy while preserving the efficiency in computational complexity typical of two-step algorithms.
Caso, Giuseppe; de Nardis, Luca; di Benedetto, Maria-Gabriella
2015-01-01
The weighted k-nearest neighbors (WkNN) algorithm is by far the most popular choice in the design of fingerprinting indoor positioning systems based on WiFi received signal strength (RSS). WkNN estimates the position of a target device by selecting k reference points (RPs) based on the similarity of their fingerprints with the measured RSS values. The position of the target device is then obtained as a weighted sum of the positions of the k RPs. Two-step WkNN positioning algorithms were recently proposed, in which RPs are divided into clusters using the affinity propagation clustering algorithm, and one representative for each cluster is selected. Only cluster representatives are then considered during the position estimation, leading to a significant computational complexity reduction compared to traditional, flat WkNN. Flat and two-step WkNN share the issue of properly selecting the similarity metric so as to guarantee good positioning accuracy: in two-step WkNN, in particular, the metric impacts three different steps in the position estimation, that is cluster formation, cluster selection and RP selection and weighting. So far, however, the only similarity metric considered in the literature was the one proposed in the original formulation of the affinity propagation algorithm. This paper fills this gap by comparing different metrics and, based on this comparison, proposes a novel mixed approach in which different metrics are adopted in the different steps of the position estimation procedure. The analysis is supported by an extensive experimental campaign carried out in a multi-floor 3D indoor positioning testbed. The impact of similarity metrics and their combinations on the structure and size of the resulting clusters, 3D positioning accuracy and computational complexity are investigated. Results show that the adoption of metrics different from the one proposed in the original affinity propagation algorithm and, in particular, the combination of different metrics can significantly improve the positioning accuracy while preserving the efficiency in computational complexity typical of two-step algorithms. PMID:26528984
Selective object encryption for privacy protection
NASA Astrophysics Data System (ADS)
Zhou, Yicong; Panetta, Karen; Cherukuri, Ravindranath; Agaian, Sos
2009-05-01
This paper introduces a new recursive sequence called the truncated P-Fibonacci sequence, its corresponding binary code called the truncated Fibonacci p-code and a new bit-plane decomposition method using the truncated Fibonacci pcode. In addition, a new lossless image encryption algorithm is presented that can encrypt a selected object using this new decomposition method for privacy protection. The user has the flexibility (1) to define the object to be protected as an object in an image or in a specific part of the image, a selected region of an image, or an entire image, (2) to utilize any new or existing method for edge detection or segmentation to extract the selected object from an image or a specific part/region of the image, (3) to select any new or existing method for the shuffling process. The algorithm can be used in many different areas such as wireless networking, mobile phone services and applications in homeland security and medical imaging. Simulation results and analysis verify that the algorithm shows good performance in object/image encryption and can withstand plaintext attacks.
Real Time Optima Tracking Using Harvesting Models of the Genetic Algorithm
NASA Technical Reports Server (NTRS)
Baskaran, Subbiah; Noever, D.
1999-01-01
Tracking optima in real time propulsion control, particularly for non-stationary optimization problems is a challenging task. Several approaches have been put forward for such a study including the numerical method called the genetic algorithm. In brief, this approach is built upon Darwinian-style competition between numerical alternatives displayed in the form of binary strings, or by analogy to 'pseudogenes'. Breeding of improved solution is an often cited parallel to natural selection in.evolutionary or soft computing. In this report we present our results of applying a novel model of a genetic algorithm for tracking optima in propulsion engineering and in real time control. We specialize the algorithm to mission profiling and planning optimizations, both to select reduced propulsion needs through trajectory planning and to explore time or fuel conservation strategies.
NASA Technical Reports Server (NTRS)
Pierson, W. J., Jr.
1984-01-01
Backscatter measurements at upwind and crosswind are simulated for five incidence angles by means of the SASS-1 model function. The effects of communication noise and attitude errors are simulated by Monte Carlo methods, and the winds are recovered by both the Sum of Square (SOS) algorithm and a Maximum Likelihood Estimater (MLE). The SOS algorithm is shown to fail for light enough winds at all incidence angles and to fail to show areas of calm because backscatter estimates that were negative or that produced incorrect values of K sub p greater than one were discarded. The MLE performs well for all input backscatter estimates and returns calm when both are negative. The use of the SOS algorithm is shown to have introduced errors in the SASS-1 model function that, in part, cancel out the errors that result from using it, but that also cause disagreement with other data sources such as the AAFE circle flight data at light winds. Implications for future scatterometer systems are given.
Wen, Jianming
2012-09-01
A recent thermal ghost imaging experiment implemented in Wu's group [Chin. Phys. Lett. 279, 074216 (2012)] showed that both positive and negative images can be constructed by applying a novel algorithm. This algorithm allows us to form the images with the use of partial measurements from the reference arm (even which never passes through the object), conditioned on the object arm. In this paper, we present a simple theory that explains the experimental observation and provides an in-depth understanding of conventional ghost imaging. In particular, we theoretically show that the visibility of formed images through such an algorithm is not bounded by the standard value 1/3. In fact, it can ideally grow up to unity (with reduced imaging quality). Thus, the algorithm described here not only offers an alternative way to decode spatial correlation of thermal light, but also mimics a "bandpass filter" to remove the constant background such that the visibility or imaging contrast is improved. We further show that conditioned on one still object present in the test arm, it is possible to construct the object's image by sampling the available reference data.
An optimal algorithm for reconstructing images from binary measurements
NASA Astrophysics Data System (ADS)
Yang, Feng; Lu, Yue M.; Sbaiz, Luciano; Vetterli, Martin
2010-01-01
We have studied a camera with a very large number of binary pixels referred to as the gigavision camera [1] or the gigapixel digital film camera [2, 3]. Potential advantages of this new camera design include improved dynamic range, thanks to its logarithmic sensor response curve, and reduced exposure time in low light conditions, due to its highly sensitive photon detection mechanism. We use maximum likelihood estimator (MLE) to reconstruct a high quality conventional image from the binary sensor measurements of the gigavision camera. We prove that when the threshold T is "1", the negative loglikelihood function is a convex function. Therefore, optimal solution can be achieved using convex optimization. Base on filter bank techniques, fast algorithms are given for computing the gradient and the multiplication of a vector and Hessian matrix of the negative log-likelihood function. We show that with a minor change, our algorithm also works for estimating conventional images from multiple binary images. Numerical experiments with synthetic 1-D signals and images verify the effectiveness and quality of the proposed algorithm. Experimental results also show that estimation performance can be improved by increasing the oversampling factor or the number of binary images.
Multispectral iris recognition based on group selection and game theory
NASA Astrophysics Data System (ADS)
Ahmad, Foysal; Roy, Kaushik
2017-05-01
A commercially available iris recognition system uses only a narrow band of the near infrared spectrum (700-900 nm) while iris images captured in the wide range of 405 nm to 1550 nm offer potential benefits to enhance recognition performance of an iris biometric system. The novelty of this research is that a group selection algorithm based on coalition game theory is explored to select the best patch subsets. In this algorithm, patches are divided into several groups based on their maximum contribution in different groups. Shapley values are used to evaluate the contribution of patches in different groups. Results show that this group selection based iris recognition
Diagnosis of Chronic Kidney Disease Based on Support Vector Machine by Feature Selection Methods.
Polat, Huseyin; Danaei Mehr, Homay; Cetin, Aydin
2017-04-01
As Chronic Kidney Disease progresses slowly, early detection and effective treatment are the only cure to reduce the mortality rate. Machine learning techniques are gaining significance in medical diagnosis because of their classification ability with high accuracy rates. The accuracy of classification algorithms depend on the use of correct feature selection algorithms to reduce the dimension of datasets. In this study, Support Vector Machine classification algorithm was used to diagnose Chronic Kidney Disease. To diagnose the Chronic Kidney Disease, two essential types of feature selection methods namely, wrapper and filter approaches were chosen to reduce the dimension of Chronic Kidney Disease dataset. In wrapper approach, classifier subset evaluator with greedy stepwise search engine and wrapper subset evaluator with the Best First search engine were used. In filter approach, correlation feature selection subset evaluator with greedy stepwise search engine and filtered subset evaluator with the Best First search engine were used. The results showed that the Support Vector Machine classifier by using filtered subset evaluator with the Best First search engine feature selection method has higher accuracy rate (98.5%) in the diagnosis of Chronic Kidney Disease compared to other selected methods.
False-negative syphilis treponemal enzyme immunoassay results in an HIV-infected case-patient.
Katz, Alan R; Komeya, Alan Y; Tomas, Juval E
2017-06-01
We present a case report of a false-negative syphilis treponemal enzyme immunoassay test result in an HIV-infected male. While treponemal tests are widely considered to be more sensitive and specific than non-treponemal tests, our findings point to potential challenges using the reverse sequence syphilis screening algorithm.
A Program That Acquires Language Using Positive and Negative Feedback.
ERIC Educational Resources Information Center
Brand, James
1987-01-01
Describes the language learning program "Acquire," which is a sample of grammar induction. It is a learning algorithm based on a pattern-matching scheme, using both a positive and negative network to reduce overgeneration. Language learning programs may be useful as tutorials for learning the syntax of a foreign language. (Author/LMO)
Interim Calibration Report for the SMMR Simulator
NASA Technical Reports Server (NTRS)
Gloersen, P.; Cavalieri, D.
1979-01-01
The calibration data obtained during the fall 1978 Nimbus-G underflight mission with the scanning multichannel microwave radiometer (SMMR) simulator on board the NASA CV-990 aircraft were analyzed and an interim calibration algorithm was developed. Data selected for the analysis consisted of in flight sky, first-year sea ice, and open water observations, as well as ground based observations of fixed targets with varied temperatures of selected instrument components. For most of the SMMR channels, a good fit to the selected data set was obtained with the algorithm.
Application of genetic algorithms in nonlinear heat conduction problems.
Kadri, Muhammad Bilal; Khan, Waqar A
2014-01-01
Genetic algorithms are employed to optimize dimensionless temperature in nonlinear heat conduction problems. Three common geometries are selected for the analysis and the concept of minimum entropy generation is used to determine the optimum temperatures under the same constraints. The thermal conductivity is assumed to vary linearly with temperature while internal heat generation is assumed to be uniform. The dimensionless governing equations are obtained for each selected geometry and the dimensionless temperature distributions are obtained using MATLAB. It is observed that GA gives the minimum dimensionless temperature in each selected geometry.
NASA Astrophysics Data System (ADS)
Farrell, Kathryn; Oden, J. Tinsley; Faghihi, Danial
2015-08-01
A general adaptive modeling algorithm for selection and validation of coarse-grained models of atomistic systems is presented. A Bayesian framework is developed to address uncertainties in parameters, data, and model selection. Algorithms for computing output sensitivities to parameter variances, model evidence and posterior model plausibilities for given data, and for computing what are referred to as Occam Categories in reference to a rough measure of model simplicity, make up components of the overall approach. Computational results are provided for representative applications.
Unsupervised online classifier in sleep scoring for sleep deprivation studies.
Libourel, Paul-Antoine; Corneyllie, Alexandra; Luppi, Pierre-Hervé; Chouvet, Guy; Gervasoni, Damien
2015-05-01
This study was designed to evaluate an unsupervised adaptive algorithm for real-time detection of sleep and wake states in rodents. We designed a Bayesian classifier that automatically extracts electroencephalogram (EEG) and electromyogram (EMG) features and categorizes non-overlapping 5-s epochs into one of the three major sleep and wake states without any human supervision. This sleep-scoring algorithm is coupled online with a new device to perform selective paradoxical sleep deprivation (PSD). Controlled laboratory settings for chronic polygraphic sleep recordings and selective PSD. Ten adult Sprague-Dawley rats instrumented for chronic polysomnographic recordings. The performance of the algorithm is evaluated by comparison with the score obtained by a human expert reader. Online detection of PS is then validated with a PSD protocol with duration of 72 hours. Our algorithm gave a high concordance with human scoring with an average κ coefficient > 70%. Notably, the specificity to detect PS reached 92%. Selective PSD using real-time detection of PS strongly reduced PS amounts, leaving only brief PS bouts necessary for the detection of PS in EEG and EMG signals (4.7 ± 0.7% over 72 h, versus 8.9 ± 0.5% in baseline), and was followed by a significant PS rebound (23.3 ± 3.3% over 150 minutes). Our fully unsupervised data-driven algorithm overcomes some limitations of the other automated methods such as the selection of representative descriptors or threshold settings. When used online and coupled with our sleep deprivation device, it represents a better option for selective PSD than other methods like the tedious gentle handling or the platform method. © 2015 Associated Professional Sleep Societies, LLC.
Enhancing Privacy in Participatory Sensing Applications with Multidimensional Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Groat, Michael; Forrest, Stephanie; Horey, James L
2012-01-01
Participatory sensing applications rely on individuals to share local and personal data with others to produce aggregated models and knowledge. In this setting, privacy is an important consideration, and lack of privacy could discourage widespread adoption of many exciting applications. We present a privacy-preserving participatory sensing scheme for multidimensional data which uses negative surveys. Multidimensional data, such as vectors of attributes that include location and environment fields, pose a particular challenge for privacy protection and are common in participatory sensing applications. When reporting data in a negative survey, an individual participant randomly selects a value from the set complement ofmore » the sensed data value, once for each dimension, and returns the negative values to a central collection server. Using algorithms described in this paper, the server can reconstruct the probability density functions of the original distributions of sensed values, without knowing the participants actual data. As a consequence, complicated encryption and key management schemes are avoided, conserving energy. We study trade-offs between accuracy and privacy, and their relationships to the number of dimensions, categories, and participants. We introduce dimensional adjustment, a method that reduces the magnification of error associated with earlier work. Two simulation scenarios illustrate how the approach can protect the privacy of a participant's multidimensional data while allowing useful population information to be aggregated.« less
Potential for false positive HIV test results with the serial rapid HIV testing algorithm.
Baveewo, Steven; Kamya, Moses R; Mayanja-Kizza, Harriet; Fatch, Robin; Bangsberg, David R; Coates, Thomas; Hahn, Judith A; Wanyenze, Rhoda K
2012-03-19
Rapid HIV tests provide same-day results and are widely used in HIV testing programs in areas with limited personnel and laboratory infrastructure. The Uganda Ministry of Health currently recommends the serial rapid testing algorithm with Determine, STAT-PAK, and Uni-Gold for diagnosis of HIV infection. Using this algorithm, individuals who test positive on Determine, negative to STAT-PAK and positive to Uni-Gold are reported as HIV positive. We conducted further testing on this subgroup of samples using qualitative DNA PCR to assess the potential for false positive tests in this situation. Of the 3388 individuals who were tested, 984 were HIV positive on two consecutive tests, and 29 were considered positive by a tiebreaker (positive on Determine, negative on STAT-PAK, and positive on Uni-Gold). However, when the 29 samples were further tested using qualitative DNA PCR, 14 (48.2%) were HIV negative. Although this study was not primarily designed to assess the validity of rapid HIV tests and thus only a subset of the samples were retested, the findings show a potential for false positive HIV results in the subset of individuals who test positive when a tiebreaker test is used in serial testing. These findings highlight a need for confirmatory testing for this category of individuals.
Potential for false positive HIV test results with the serial rapid HIV testing algorithm
2012-01-01
Background Rapid HIV tests provide same-day results and are widely used in HIV testing programs in areas with limited personnel and laboratory infrastructure. The Uganda Ministry of Health currently recommends the serial rapid testing algorithm with Determine, STAT-PAK, and Uni-Gold for diagnosis of HIV infection. Using this algorithm, individuals who test positive on Determine, negative to STAT-PAK and positive to Uni-Gold are reported as HIV positive. We conducted further testing on this subgroup of samples using qualitative DNA PCR to assess the potential for false positive tests in this situation. Results Of the 3388 individuals who were tested, 984 were HIV positive on two consecutive tests, and 29 were considered positive by a tiebreaker (positive on Determine, negative on STAT-PAK, and positive on Uni-Gold). However, when the 29 samples were further tested using qualitative DNA PCR, 14 (48.2%) were HIV negative. Conclusion Although this study was not primarily designed to assess the validity of rapid HIV tests and thus only a subset of the samples were retested, the findings show a potential for false positive HIV results in the subset of individuals who test positive when a tiebreaker test is used in serial testing. These findings highlight a need for confirmatory testing for this category of individuals. PMID:22429706
Firefly as a novel swarm intelligence variable selection method in spectroscopy.
Goodarzi, Mohammad; dos Santos Coelho, Leandro
2014-12-10
A critical step in multivariate calibration is wavelength selection, which is used to build models with better prediction performance when applied to spectral data. Up to now, many feature selection techniques have been developed. Among all different types of feature selection techniques, those based on swarm intelligence optimization methodologies are more interesting since they are usually simulated based on animal and insect life behavior to, e.g., find the shortest path between a food source and their nests. This decision is made by a crowd, leading to a more robust model with less falling in local minima during the optimization cycle. This paper represents a novel feature selection approach to the selection of spectroscopic data, leading to more robust calibration models. The performance of the firefly algorithm, a swarm intelligence paradigm, was evaluated and compared with genetic algorithm and particle swarm optimization. All three techniques were coupled with partial least squares (PLS) and applied to three spectroscopic data sets. They demonstrate improved prediction results in comparison to when only a PLS model was built using all wavelengths. Results show that firefly algorithm as a novel swarm paradigm leads to a lower number of selected wavelengths while the prediction performance of built PLS stays the same. Copyright © 2014. Published by Elsevier B.V.
Variables selection methods in near-infrared spectroscopy.
Xiaobo, Zou; Jiewen, Zhao; Povey, Malcolm J W; Holmes, Mel; Hanpin, Mao
2010-05-14
Near-infrared (NIR) spectroscopy has increasingly been adopted as an analytical tool in various fields, such as the petrochemical, pharmaceutical, environmental, clinical, agricultural, food and biomedical sectors during the past 15 years. A NIR spectrum of a sample is typically measured by modern scanning instruments at hundreds of equally spaced wavelengths. The large number of spectral variables in most data sets encountered in NIR spectral chemometrics often renders the prediction of a dependent variable unreliable. Recently, considerable effort has been directed towards developing and evaluating different procedures that objectively identify variables which contribute useful information and/or eliminate variables containing mostly noise. This review focuses on the variable selection methods in NIR spectroscopy. Selection methods include some classical approaches, such as manual approach (knowledge based selection), "Univariate" and "Sequential" selection methods; sophisticated methods such as successive projections algorithm (SPA) and uninformative variable elimination (UVE), elaborate search-based strategies such as simulated annealing (SA), artificial neural networks (ANN) and genetic algorithms (GAs) and interval base algorithms such as interval partial least squares (iPLS), windows PLS and iterative PLS. Wavelength selection with B-spline, Kalman filtering, Fisher's weights and Bayesian are also mentioned. Finally, the websites of some variable selection software and toolboxes for non-commercial use are given. Copyright 2010 Elsevier B.V. All rights reserved.
Dynamic Grover search: applications in recommendation systems and optimization problems
NASA Astrophysics Data System (ADS)
Chakrabarty, Indranil; Khan, Shahzor; Singh, Vanshdeep
2017-06-01
In the recent years, we have seen that Grover search algorithm (Proceedings, 28th annual ACM symposium on the theory of computing, pp. 212-219, 1996) by using quantum parallelism has revolutionized the field of solving huge class of NP problems in comparisons to classical systems. In this work, we explore the idea of extending Grover search algorithm to approximate algorithms. Here we try to analyze the applicability of Grover search to process an unstructured database with a dynamic selection function in contrast to the static selection function used in the original work (Grover in Proceedings, 28th annual ACM symposium on the theory of computing, pp. 212-219, 1996). We show that this alteration facilitates us to extend the application of Grover search to the field of randomized search algorithms. Further, we use the dynamic Grover search algorithm to define the goals for a recommendation system based on which we propose a recommendation algorithm which uses binomial similarity distribution space giving us a quadratic speedup over traditional classical unstructured recommendation systems. Finally, we see how dynamic Grover search can be used to tackle a wide range of optimization problems where we improve complexity over existing optimization algorithms.
Sun, Yongliang; Xu, Yubin; Li, Cheng; Ma, Lin
2013-11-13
A Kalman/map filtering (KMF)-aided fast normalized cross correlation (FNCC)-based Wi-Fi fingerprinting location sensing system is proposed in this paper. Compared with conventional neighbor selection algorithms that calculate localization results with received signal strength (RSS) mean samples, the proposed FNCC algorithm makes use of all the on-line RSS samples and reference point RSS variations to achieve higher fingerprinting accuracy. The FNCC computes efficiently while maintaining the same accuracy as the basic normalized cross correlation. Additionally, a KMF is also proposed to process fingerprinting localization results. It employs a new map matching algorithm to nonlinearize the linear location prediction process of Kalman filtering (KF) that takes advantage of spatial proximities of consecutive localization results. With a calibration model integrated into an indoor map, the map matching algorithm corrects unreasonable prediction locations of the KF according to the building interior structure. Thus, more accurate prediction locations are obtained. Using these locations, the KMF considerably improves fingerprinting algorithm performance. Experimental results demonstrate that the FNCC algorithm with reduced computational complexity outperforms other neighbor selection algorithms and the KMF effectively improves location sensing accuracy by using indoor map information and spatial proximities of consecutive localization results.