Sample records for function prediction algorithms

  1. A traveling salesman approach for predicting protein functions.

    PubMed

    Johnson, Olin; Liu, Jing

    2006-10-12

    Protein-protein interaction information can be used to predict unknown protein functions and to help study biological pathways. Here we present a new approach utilizing the classic Traveling Salesman Problem to study the protein-protein interactions and to predict protein functions in budding yeast Saccharomyces cerevisiae. We apply the global optimization tool from combinatorial optimization algorithms to cluster the yeast proteins based on the global protein interaction information. We then use this clustering information to help us predict protein functions. We use our algorithm together with the direct neighbor algorithm 1 on characterized proteins and compare the prediction accuracy of the two methods. We show our algorithm can produce better predictions than the direct neighbor algorithm, which only considers the immediate neighbors of the query protein. Our method is a promising one to be used as a general tool to predict functions of uncharacterized proteins and a successful sample of using computer science knowledge and algorithms to study biological problems.

  2. A traveling salesman approach for predicting protein functions

    PubMed Central

    Johnson, Olin; Liu, Jing

    2006-01-01

    Background Protein-protein interaction information can be used to predict unknown protein functions and to help study biological pathways. Results Here we present a new approach utilizing the classic Traveling Salesman Problem to study the protein-protein interactions and to predict protein functions in budding yeast Saccharomyces cerevisiae. We apply the global optimization tool from combinatorial optimization algorithms to cluster the yeast proteins based on the global protein interaction information. We then use this clustering information to help us predict protein functions. We use our algorithm together with the direct neighbor algorithm [1] on characterized proteins and compare the prediction accuracy of the two methods. We show our algorithm can produce better predictions than the direct neighbor algorithm, which only considers the immediate neighbors of the query protein. Conclusion Our method is a promising one to be used as a general tool to predict functions of uncharacterized proteins and a successful sample of using computer science knowledge and algorithms to study biological problems. PMID:17147783

  3. Parametric Bayesian priors and better choice of negative examples improve protein function prediction.

    PubMed

    Youngs, Noah; Penfold-Brown, Duncan; Drew, Kevin; Shasha, Dennis; Bonneau, Richard

    2013-05-01

    Computational biologists have demonstrated the utility of using machine learning methods to predict protein function from an integration of multiple genome-wide data types. Yet, even the best performing function prediction algorithms rely on heuristics for important components of the algorithm, such as choosing negative examples (proteins without a given function) or determining key parameters. The improper choice of negative examples, in particular, can hamper the accuracy of protein function prediction. We present a novel approach for choosing negative examples, using a parameterizable Bayesian prior computed from all observed annotation data, which also generates priors used during function prediction. We incorporate this new method into the GeneMANIA function prediction algorithm and demonstrate improved accuracy of our algorithm over current top-performing function prediction methods on the yeast and mouse proteomes across all metrics tested. Code and Data are available at: http://bonneaulab.bio.nyu.edu/funcprop.html

  4. Characterization and prediction of the backscattered form function of an immersed cylindrical shell using hybrid fuzzy clustering and bio-inspired algorithms.

    PubMed

    Agounad, Said; Aassif, El Houcein; Khandouch, Younes; Maze, Gérard; Décultot, Dominique

    2018-02-01

    The acoustic scattering of a plane wave by an elastic cylindrical shell is studied. A new approach is developed to predict the form function of an immersed cylindrical shell of the radius ratio b/a ('b' is the inner radius and 'a' is the outer radius). The prediction of the backscattered form function is investigated by a combined approach between fuzzy clustering algorithms and bio-inspired algorithms. Four famous fuzzy clustering algorithms: the fuzzy c-means (FCM), the Gustafson-Kessel algorithm (GK), the fuzzy c-regression model (FCRM) and the Gath-Geva algorithm (GG) are combined with particle swarm optimization and genetic algorithm. The symmetric and antisymmetric circumferential waves A, S 0 , A 1 , S 1 and S 2 are investigated in a reduced frequency (k 1 a) range extends over 0.1

  5. A review of predictive coding algorithms.

    PubMed

    Spratling, M W

    2017-03-01

    Predictive coding is a leading theory of how the brain performs probabilistic inference. However, there are a number of distinct algorithms which are described by the term "predictive coding". This article provides a concise review of these different predictive coding algorithms, highlighting their similarities and differences. Five algorithms are covered: linear predictive coding which has a long and influential history in the signal processing literature; the first neuroscience-related application of predictive coding to explaining the function of the retina; and three versions of predictive coding that have been proposed to model cortical function. While all these algorithms aim to fit a generative model to sensory data, they differ in the type of generative model they employ, in the process used to optimise the fit between the model and sensory data, and in the way that they are related to neurobiology. Copyright © 2016 Elsevier Inc. All rights reserved.

  6. SIFTER search: a web server for accurate phylogeny-based protein function prediction

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sahraeian, Sayed M.; Luo, Kevin R.; Brenner, Steven E.

    We are awash in proteins discovered through high-throughput sequencing projects. As only a minuscule fraction of these have been experimentally characterized, computational methods are widely used for automated annotation. Here, we introduce a user-friendly web interface for accurate protein function prediction using the SIFTER algorithm. SIFTER is a state-of-the-art sequence-based gene molecular function prediction algorithm that uses a statistical model of function evolution to incorporate annotations throughout the phylogenetic tree. Due to the resources needed by the SIFTER algorithm, running SIFTER locally is not trivial for most users, especially for large-scale problems. The SIFTER web server thus provides access tomore » precomputed predictions on 16 863 537 proteins from 232 403 species. Users can explore SIFTER predictions with queries for proteins, species, functions, and homologs of sequences not in the precomputed prediction set. Lastly, the SIFTER web server is accessible at http://sifter.berkeley.edu/ and the source code can be downloaded.« less

  7. SIFTER search: a web server for accurate phylogeny-based protein function prediction

    DOE PAGES

    Sahraeian, Sayed M.; Luo, Kevin R.; Brenner, Steven E.

    2015-05-15

    We are awash in proteins discovered through high-throughput sequencing projects. As only a minuscule fraction of these have been experimentally characterized, computational methods are widely used for automated annotation. Here, we introduce a user-friendly web interface for accurate protein function prediction using the SIFTER algorithm. SIFTER is a state-of-the-art sequence-based gene molecular function prediction algorithm that uses a statistical model of function evolution to incorporate annotations throughout the phylogenetic tree. Due to the resources needed by the SIFTER algorithm, running SIFTER locally is not trivial for most users, especially for large-scale problems. The SIFTER web server thus provides access tomore » precomputed predictions on 16 863 537 proteins from 232 403 species. Users can explore SIFTER predictions with queries for proteins, species, functions, and homologs of sequences not in the precomputed prediction set. Lastly, the SIFTER web server is accessible at http://sifter.berkeley.edu/ and the source code can be downloaded.« less

  8. The effect of machine learning regression algorithms and sample size on individualized behavioral prediction with functional connectivity features.

    PubMed

    Cui, Zaixu; Gong, Gaolang

    2018-06-02

    Individualized behavioral/cognitive prediction using machine learning (ML) regression approaches is becoming increasingly applied. The specific ML regression algorithm and sample size are two key factors that non-trivially influence prediction accuracies. However, the effects of the ML regression algorithm and sample size on individualized behavioral/cognitive prediction performance have not been comprehensively assessed. To address this issue, the present study included six commonly used ML regression algorithms: ordinary least squares (OLS) regression, least absolute shrinkage and selection operator (LASSO) regression, ridge regression, elastic-net regression, linear support vector regression (LSVR), and relevance vector regression (RVR), to perform specific behavioral/cognitive predictions based on different sample sizes. Specifically, the publicly available resting-state functional MRI (rs-fMRI) dataset from the Human Connectome Project (HCP) was used, and whole-brain resting-state functional connectivity (rsFC) or rsFC strength (rsFCS) were extracted as prediction features. Twenty-five sample sizes (ranged from 20 to 700) were studied by sub-sampling from the entire HCP cohort. The analyses showed that rsFC-based LASSO regression performed remarkably worse than the other algorithms, and rsFCS-based OLS regression performed markedly worse than the other algorithms. Regardless of the algorithm and feature type, both the prediction accuracy and its stability exponentially increased with increasing sample size. The specific patterns of the observed algorithm and sample size effects were well replicated in the prediction using re-testing fMRI data, data processed by different imaging preprocessing schemes, and different behavioral/cognitive scores, thus indicating excellent robustness/generalization of the effects. The current findings provide critical insight into how the selected ML regression algorithm and sample size influence individualized predictions of behavior/cognition and offer important guidance for choosing the ML regression algorithm or sample size in relevant investigations. Copyright © 2018 Elsevier Inc. All rights reserved.

  9. Training radial basis function networks for wind speed prediction using PSO enhanced differential search optimizer

    PubMed Central

    2018-01-01

    This paper presents an integrated hybrid optimization algorithm for training the radial basis function neural network (RBF NN). Training of neural networks is still a challenging exercise in machine learning domain. Traditional training algorithms in general suffer and trap in local optima and lead to premature convergence, which makes them ineffective when applied for datasets with diverse features. Training algorithms based on evolutionary computations are becoming popular due to their robust nature in overcoming the drawbacks of the traditional algorithms. Accordingly, this paper proposes a hybrid training procedure with differential search (DS) algorithm functionally integrated with the particle swarm optimization (PSO). To surmount the local trapping of the search procedure, a new population initialization scheme is proposed using Logistic chaotic sequence, which enhances the population diversity and aid the search capability. To demonstrate the effectiveness of the proposed RBF hybrid training algorithm, experimental analysis on publicly available 7 benchmark datasets are performed. Subsequently, experiments were conducted on a practical application case for wind speed prediction to expound the superiority of the proposed RBF training algorithm in terms of prediction accuracy. PMID:29768463

  10. Training radial basis function networks for wind speed prediction using PSO enhanced differential search optimizer.

    PubMed

    Rani R, Hannah Jessie; Victoire T, Aruldoss Albert

    2018-01-01

    This paper presents an integrated hybrid optimization algorithm for training the radial basis function neural network (RBF NN). Training of neural networks is still a challenging exercise in machine learning domain. Traditional training algorithms in general suffer and trap in local optima and lead to premature convergence, which makes them ineffective when applied for datasets with diverse features. Training algorithms based on evolutionary computations are becoming popular due to their robust nature in overcoming the drawbacks of the traditional algorithms. Accordingly, this paper proposes a hybrid training procedure with differential search (DS) algorithm functionally integrated with the particle swarm optimization (PSO). To surmount the local trapping of the search procedure, a new population initialization scheme is proposed using Logistic chaotic sequence, which enhances the population diversity and aid the search capability. To demonstrate the effectiveness of the proposed RBF hybrid training algorithm, experimental analysis on publicly available 7 benchmark datasets are performed. Subsequently, experiments were conducted on a practical application case for wind speed prediction to expound the superiority of the proposed RBF training algorithm in terms of prediction accuracy.

  11. Using PPI network autocorrelation in hierarchical multi-label classification trees for gene function prediction.

    PubMed

    Stojanova, Daniela; Ceci, Michelangelo; Malerba, Donato; Dzeroski, Saso

    2013-09-26

    Ontologies and catalogs of gene functions, such as the Gene Ontology (GO) and MIPS-FUN, assume that functional classes are organized hierarchically, that is, general functions include more specific ones. This has recently motivated the development of several machine learning algorithms for gene function prediction that leverages on this hierarchical organization where instances may belong to multiple classes. In addition, it is possible to exploit relationships among examples, since it is plausible that related genes tend to share functional annotations. Although these relationships have been identified and extensively studied in the area of protein-protein interaction (PPI) networks, they have not received much attention in hierarchical and multi-class gene function prediction. Relations between genes introduce autocorrelation in functional annotations and violate the assumption that instances are independently and identically distributed (i.i.d.), which underlines most machine learning algorithms. Although the explicit consideration of these relations brings additional complexity to the learning process, we expect substantial benefits in predictive accuracy of learned classifiers. This article demonstrates the benefits (in terms of predictive accuracy) of considering autocorrelation in multi-class gene function prediction. We develop a tree-based algorithm for considering network autocorrelation in the setting of Hierarchical Multi-label Classification (HMC). We empirically evaluate the proposed algorithm, called NHMC (Network Hierarchical Multi-label Classification), on 12 yeast datasets using each of the MIPS-FUN and GO annotation schemes and exploiting 2 different PPI networks. The results clearly show that taking autocorrelation into account improves the predictive performance of the learned models for predicting gene function. Our newly developed method for HMC takes into account network information in the learning phase: When used for gene function prediction in the context of PPI networks, the explicit consideration of network autocorrelation increases the predictive performance of the learned models. Overall, we found that this holds for different gene features/ descriptions, functional annotation schemes, and PPI networks: Best results are achieved when the PPI network is dense and contains a large proportion of function-relevant interactions.

  12. Research on wind field algorithm of wind lidar based on BP neural network and grey prediction

    NASA Astrophysics Data System (ADS)

    Chen, Yong; Chen, Chun-Li; Luo, Xiong; Zhang, Yan; Yang, Ze-hou; Zhou, Jie; Shi, Xiao-ding; Wang, Lei

    2018-01-01

    This paper uses the BP neural network and grey algorithm to forecast and study radar wind field. In order to reduce the residual error in the wind field prediction which uses BP neural network and grey algorithm, calculating the minimum value of residual error function, adopting the residuals of the gray algorithm trained by BP neural network, using the trained network model to forecast the residual sequence, using the predicted residual error sequence to modify the forecast sequence of the grey algorithm. The test data show that using the grey algorithm modified by BP neural network can effectively reduce the residual value and improve the prediction precision.

  13. Annotation of Alternatively Spliced Proteins and Transcripts with Protein-Folding Algorithms and Isoform-Level Functional Networks.

    PubMed

    Li, Hongdong; Zhang, Yang; Guan, Yuanfang; Menon, Rajasree; Omenn, Gilbert S

    2017-01-01

    Tens of thousands of splice isoforms of proteins have been catalogued as predicted sequences from transcripts in humans and other species. Relatively few have been characterized biochemically or structurally. With the extensive development of protein bioinformatics, the characterization and modeling of isoform features, isoform functions, and isoform-level networks have advanced notably. Here we present applications of the I-TASSER family of algorithms for folding and functional predictions and the IsoFunc, MIsoMine, and Hisonet data resources for isoform-level analyses of network and pathway-based functional predictions and protein-protein interactions. Hopefully, predictions and insights from protein bioinformatics will stimulate many experimental validation studies.

  14. Prediction of dynamical systems by symbolic regression

    NASA Astrophysics Data System (ADS)

    Quade, Markus; Abel, Markus; Shafi, Kamran; Niven, Robert K.; Noack, Bernd R.

    2016-07-01

    We study the modeling and prediction of dynamical systems based on conventional models derived from measurements. Such algorithms are highly desirable in situations where the underlying dynamics are hard to model from physical principles or simplified models need to be found. We focus on symbolic regression methods as a part of machine learning. These algorithms are capable of learning an analytically tractable model from data, a highly valuable property. Symbolic regression methods can be considered as generalized regression methods. We investigate two particular algorithms, the so-called fast function extraction which is a generalized linear regression algorithm, and genetic programming which is a very general method. Both are able to combine functions in a certain way such that a good model for the prediction of the temporal evolution of a dynamical system can be identified. We illustrate the algorithms by finding a prediction for the evolution of a harmonic oscillator based on measurements, by detecting an arriving front in an excitable system, and as a real-world application, the prediction of solar power production based on energy production observations at a given site together with the weather forecast.

  15. Real-time prediction and gating of respiratory motion using an extended Kalman filter and Gaussian process regression

    NASA Astrophysics Data System (ADS)

    Bukhari, W.; Hong, S.-M.

    2015-01-01

    Motion-adaptive radiotherapy aims to deliver a conformal dose to the target tumour with minimal normal tissue exposure by compensating for tumour motion in real time. The prediction as well as the gating of respiratory motion have received much attention over the last two decades for reducing the targeting error of the treatment beam due to respiratory motion. In this article, we present a real-time algorithm for predicting and gating respiratory motion that utilizes a model-based and a model-free Bayesian framework by combining them in a cascade structure. The algorithm, named EKF-GPR+, implements a gating function without pre-specifying a particular region of the patient’s breathing cycle. The algorithm first employs an extended Kalman filter (LCM-EKF) to predict the respiratory motion and then uses a model-free Gaussian process regression (GPR) to correct the error of the LCM-EKF prediction. The GPR is a non-parametric Bayesian algorithm that yields predictive variance under Gaussian assumptions. The EKF-GPR+ algorithm utilizes the predictive variance from the GPR component to capture the uncertainty in the LCM-EKF prediction error and systematically identify breathing points with a higher probability of large prediction error in advance. This identification allows us to pause the treatment beam over such instances. EKF-GPR+ implements the gating function by using simple calculations based on the predictive variance with no additional detection mechanism. A sparse approximation of the GPR algorithm is employed to realize EKF-GPR+ in real time. Extensive numerical experiments are performed based on a large database of 304 respiratory motion traces to evaluate EKF-GPR+. The experimental results show that the EKF-GPR+ algorithm effectively reduces the prediction error in a root-mean-square (RMS) sense by employing the gating function, albeit at the cost of a reduced duty cycle. As an example, EKF-GPR+ reduces the patient-wise RMS error to 37%, 39% and 42% in percent ratios relative to no prediction for a duty cycle of 80% at lookahead lengths of 192 ms, 384 ms and 576 ms, respectively. The experiments also confirm that EKF-GPR+ controls the duty cycle with reasonable accuracy.

  16. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dall-Anese, Emiliano; Simonetto, Andrea

    This paper focuses on the design of online algorithms based on prediction-correction steps to track the optimal solution of a time-varying constrained problem. Existing prediction-correction methods have been shown to work well for unconstrained convex problems and for settings where obtaining the inverse of the Hessian of the cost function can be computationally affordable. The prediction-correction algorithm proposed in this paper addresses the limitations of existing methods by tackling constrained problems and by designing a first-order prediction step that relies on the Hessian of the cost function (and do not require the computation of its inverse). Analytical results are establishedmore » to quantify the tracking error. Numerical simulations corroborate the analytical results and showcase performance and benefits of the algorithms.« less

  17. Experimental identification of specificity determinants in the domain linker of a LacI/GalR protein: bioinformatics-based predictions generate true positives and false negatives.

    PubMed

    Meinhardt, Sarah; Swint-Kruse, Liskin

    2008-12-01

    In protein families, conserved residues often contribute to a common general function, such as DNA-binding. However, unique attributes for each homolog (e.g. recognition of alternative DNA sequences) must arise from variation in other functionally-important positions. The locations of these "specificity determinant" positions are obscured amongst the background of varied residues that do not make significant contributions to either structure or function. To isolate specificity determinants, a number of bioinformatics algorithms have been developed. When applied to the LacI/GalR family of transcription regulators, several specificity determinants are predicted in the 18 amino acids that link the DNA-binding and regulatory domains. However, results from alternative algorithms are only in partial agreement with each other. Here, we experimentally evaluate these predictions using an engineered repressor comprising the LacI DNA-binding domain, the LacI linker, and the GalR regulatory domain (LLhG). "Wild-type" LLhG has altered DNA specificity and weaker lacO(1) repression compared to LacI or a similar LacI:PurR chimera. Next, predictions of linker specificity determinants were tested, using amino acid substitution and in vivo repression assays to assess functional change. In LLhG, all predicted sites are specificity determinants, as well as three sites not predicted by any algorithm. Strategies are suggested for diminishing the number of false negative predictions. Finally, individual substitutions at LLhG specificity determinants exhibited a broad range of functional changes that are not predicted by bioinformatics algorithms. Results suggest that some variants have altered affinity for DNA, some have altered allosteric response, and some appear to have changed specificity for alternative DNA ligands.

  18. Systematically Differentiating Functions for Alternatively Spliced Isoforms through Integrating RNA-seq Data

    PubMed Central

    Menon, Rajasree; Wen, Yuchen; Omenn, Gilbert S.; Kretzler, Matthias; Guan, Yuanfang

    2013-01-01

    Integrating large-scale functional genomic data has significantly accelerated our understanding of gene functions. However, no algorithm has been developed to differentiate functions for isoforms of the same gene using high-throughput genomic data. This is because standard supervised learning requires ‘ground-truth’ functional annotations, which are lacking at the isoform level. To address this challenge, we developed a generic framework that interrogates public RNA-seq data at the transcript level to differentiate functions for alternatively spliced isoforms. For a specific function, our algorithm identifies the ‘responsible’ isoform(s) of a gene and generates classifying models at the isoform level instead of at the gene level. Through cross-validation, we demonstrated that our algorithm is effective in assigning functions to genes, especially the ones with multiple isoforms, and robust to gene expression levels and removal of homologous gene pairs. We identified genes in the mouse whose isoforms are predicted to have disparate functionalities and experimentally validated the ‘responsible’ isoforms using data from mammary tissue. With protein structure modeling and experimental evidence, we further validated the predicted isoform functional differences for the genes Cdkn2a and Anxa6. Our generic framework is the first to predict and differentiate functions for alternatively spliced isoforms, instead of genes, using genomic data. It is extendable to any base machine learner and other species with alternatively spliced isoforms, and shifts the current gene-centered function prediction to isoform-level predictions. PMID:24244129

  19. Training the Recurrent neural network by the Fuzzy Min-Max algorithm for fault prediction

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zemouri, Ryad; Racoceanu, Daniel; Zerhouni, Noureddine

    2009-03-05

    In this paper, we present a training technique of a Recurrent Radial Basis Function neural network for fault prediction. We use the Fuzzy Min-Max technique to initialize the k-center of the RRBF neural network. The k-means algorithm is then applied to calculate the centers that minimize the mean square error of the prediction task. The performances of the k-means algorithm are then boosted by the Fuzzy Min-Max technique.

  20. Network-based ranking methods for prediction of novel disease associated microRNAs.

    PubMed

    Le, Duc-Hau

    2015-10-01

    Many studies have shown roles of microRNAs on human disease and a number of computational methods have been proposed to predict such associations by ranking candidate microRNAs according to their relevance to a disease. Among them, machine learning-based methods usually have a limitation in specifying non-disease microRNAs as negative training samples. Meanwhile, network-based methods are becoming dominant since they well exploit a "disease module" principle in microRNA functional similarity networks. Of which, random walk with restart (RWR) algorithm-based method is currently state-of-the-art. The use of this algorithm was inspired from its success in predicting disease gene because the "disease module" principle also exists in protein interaction networks. Besides, many algorithms designed for webpage ranking have been successfully applied in ranking disease candidate genes because web networks share topological properties with protein interaction networks. However, these algorithms have not yet been utilized for disease microRNA prediction. We constructed microRNA functional similarity networks based on shared targets of microRNAs, and then we integrated them with a microRNA functional synergistic network, which was recently identified. After analyzing topological properties of these networks, in addition to RWR, we assessed the performance of (i) PRINCE (PRIoritizatioN and Complex Elucidation), which was proposed for disease gene prediction; (ii) PageRank with Priors (PRP) and K-Step Markov (KSM), which were used for studying web networks; and (iii) a neighborhood-based algorithm. Analyses on topological properties showed that all microRNA functional similarity networks are small-worldness and scale-free. The performance of each algorithm was assessed based on average AUC values on 35 disease phenotypes and average rankings of newly discovered disease microRNAs. As a result, the performance on the integrated network was better than that on individual ones. In addition, the performance of PRINCE, PRP and KSM was comparable with that of RWR, whereas it was worst for the neighborhood-based algorithm. Moreover, all the algorithms were stable with the change of parameters. Final, using the integrated network, we predicted six novel miRNAs (i.e., hsa-miR-101, hsa-miR-181d, hsa-miR-192, hsa-miR-423-3p, hsa-miR-484 and hsa-miR-98) associated with breast cancer. Network-based ranking algorithms, which were successfully applied for either disease gene prediction or for studying social/web networks, can be also used effectively for disease microRNA prediction. Copyright © 2015 Elsevier Ltd. All rights reserved.

  1. Automatic measurement of voice onset time using discriminative structured prediction.

    PubMed

    Sonderegger, Morgan; Keshet, Joseph

    2012-12-01

    A discriminative large-margin algorithm for automatic measurement of voice onset time (VOT) is described, considered as a case of predicting structured output from speech. Manually labeled data are used to train a function that takes as input a speech segment of an arbitrary length containing a voiceless stop, and outputs its VOT. The function is explicitly trained to minimize the difference between predicted and manually measured VOT; it operates on a set of acoustic feature functions designed based on spectral and temporal cues used by human VOT annotators. The algorithm is applied to initial voiceless stops from four corpora, representing different types of speech. Using several evaluation methods, the algorithm's performance is near human intertranscriber reliability, and compares favorably with previous work. Furthermore, the algorithm's performance is minimally affected by training and testing on different corpora, and remains essentially constant as the amount of training data is reduced to 50-250 manually labeled examples, demonstrating the method's practical applicability to new datasets.

  2. Predicting the random drift of MEMS gyroscope based on K-means clustering and OLS RBF Neural Network

    NASA Astrophysics Data System (ADS)

    Wang, Zhen-yu; Zhang, Li-jie

    2017-10-01

    Measure error of the sensor can be effectively compensated with prediction. Aiming at large random drift error of MEMS(Micro Electro Mechanical System))gyroscope, an improved learning algorithm of Radial Basis Function(RBF) Neural Network(NN) based on K-means clustering and Orthogonal Least-Squares (OLS) is proposed in this paper. The algorithm selects the typical samples as the initial cluster centers of RBF NN firstly, candidates centers with K-means algorithm secondly, and optimizes the candidate centers with OLS algorithm thirdly, which makes the network structure simpler and makes the prediction performance better. Experimental results show that the proposed K-means clustering OLS learning algorithm can predict the random drift of MEMS gyroscope effectively, the prediction error of which is 9.8019e-007°/s and the prediction time of which is 2.4169e-006s

  3. Which method of posttraumatic stress disorder classification best predicts psychosocial function in children with traumatic brain injury?

    PubMed

    Iselin, Greg; Le Brocque, Robyne; Kenardy, Justin; Anderson, Vicki; McKinlay, Lynne

    2010-10-01

    Controversy surrounds the classification of posttraumatic stress disorder (PTSD), particularly in children and adolescents with traumatic brain injury (TBI). In these populations, it is difficult to differentiate TBI-related organic memory loss from dissociative amnesia. Several alternative PTSD classification algorithms have been proposed for use with children. This paper investigates DSM-IV-TR and alternative PTSD classification algorithms, including and excluding the dissociative amnesia item, in terms of their ability to predict psychosocial function following pediatric TBI. A sample of 184 children aged 6-14 years were recruited following emergency department presentation and/or hospital admission for TBI. PTSD was assessed via semi-structured clinical interview (CAPS-CA) with the child at 3 months post-injury. Psychosocial function was assessed using the parent report CHQ-PF50. Two alternative classification algorithms, the PTSD-AA and 2 of 3 algorithms, reached statistical significance. While the inclusion of the dissociative amnesia item increased prevalence rates across algorithms, it generally resulted in weaker associations with psychosocial function. The PTSD-AA algorithm appears to have the strongest association with psychosocial function following TBI in children and adolescents. Removing the dissociative amnesia item from the diagnostic algorithm generally results in improved validity. Copyright 2010 Elsevier Ltd. All rights reserved.

  4. A Novel RSSI Prediction Using Imperialist Competition Algorithm (ICA), Radial Basis Function (RBF) and Firefly Algorithm (FFA) in Wireless Networks

    PubMed Central

    Goudarzi, Shidrokh; Haslina Hassan, Wan; Abdalla Hashim, Aisha-Hassan; Soleymani, Seyed Ahmad; Anisi, Mohammad Hossein; Zakaria, Omar M.

    2016-01-01

    This study aims to design a vertical handover prediction method to minimize unnecessary handovers for a mobile node (MN) during the vertical handover process. This relies on a novel method for the prediction of a received signal strength indicator (RSSI) referred to as IRBF-FFA, which is designed by utilizing the imperialist competition algorithm (ICA) to train the radial basis function (RBF), and by hybridizing with the firefly algorithm (FFA) to predict the optimal solution. The prediction accuracy of the proposed IRBF–FFA model was validated by comparing it to support vector machines (SVMs) and multilayer perceptron (MLP) models. In order to assess the model’s performance, we measured the coefficient of determination (R2), correlation coefficient (r), root mean square error (RMSE) and mean absolute percentage error (MAPE). The achieved results indicate that the IRBF–FFA model provides more precise predictions compared to different ANNs, namely, support vector machines (SVMs) and multilayer perceptron (MLP). The performance of the proposed model is analyzed through simulated and real-time RSSI measurements. The results also suggest that the IRBF–FFA model can be applied as an efficient technique for the accurate prediction of vertical handover. PMID:27438600

  5. A Novel RSSI Prediction Using Imperialist Competition Algorithm (ICA), Radial Basis Function (RBF) and Firefly Algorithm (FFA) in Wireless Networks.

    PubMed

    Goudarzi, Shidrokh; Haslina Hassan, Wan; Abdalla Hashim, Aisha-Hassan; Soleymani, Seyed Ahmad; Anisi, Mohammad Hossein; Zakaria, Omar M

    2016-01-01

    This study aims to design a vertical handover prediction method to minimize unnecessary handovers for a mobile node (MN) during the vertical handover process. This relies on a novel method for the prediction of a received signal strength indicator (RSSI) referred to as IRBF-FFA, which is designed by utilizing the imperialist competition algorithm (ICA) to train the radial basis function (RBF), and by hybridizing with the firefly algorithm (FFA) to predict the optimal solution. The prediction accuracy of the proposed IRBF-FFA model was validated by comparing it to support vector machines (SVMs) and multilayer perceptron (MLP) models. In order to assess the model's performance, we measured the coefficient of determination (R2), correlation coefficient (r), root mean square error (RMSE) and mean absolute percentage error (MAPE). The achieved results indicate that the IRBF-FFA model provides more precise predictions compared to different ANNs, namely, support vector machines (SVMs) and multilayer perceptron (MLP). The performance of the proposed model is analyzed through simulated and real-time RSSI measurements. The results also suggest that the IRBF-FFA model can be applied as an efficient technique for the accurate prediction of vertical handover.

  6. Statistical Relational Learning to Predict Primary Myocardial Infarction from Electronic Health Records

    PubMed Central

    Weiss, Jeremy C; Page, David; Peissig, Peggy L; Natarajan, Sriraam; McCarty, Catherine

    2013-01-01

    Electronic health records (EHRs) are an emerging relational domain with large potential to improve clinical outcomes. We apply two statistical relational learning (SRL) algorithms to the task of predicting primary myocardial infarction. We show that one SRL algorithm, relational functional gradient boosting, outperforms propositional learners particularly in the medically-relevant high recall region. We observe that both SRL algorithms predict outcomes better than their propositional analogs and suggest how our methods can augment current epidemiological practices. PMID:25360347

  7. Minimalist ensemble algorithms for genome-wide protein localization prediction.

    PubMed

    Lin, Jhih-Rong; Mondal, Ananda Mohan; Liu, Rong; Hu, Jianjun

    2012-07-03

    Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi.

  8. Minimalist ensemble algorithms for genome-wide protein localization prediction

    PubMed Central

    2012-01-01

    Background Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. Results This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. Conclusions We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi. PMID:22759391

  9. An O(n(5)) algorithm for MFE prediction of kissing hairpins and 4-chains in nucleic acids.

    PubMed

    Chen, Ho-Lin; Condon, Anne; Jabbari, Hosna

    2009-06-01

    Efficient methods for prediction of minimum free energy (MFE) nucleic secondary structures are widely used, both to better understand structure and function of biological RNAs and to design novel nano-structures. Here, we present a new algorithm for MFE secondary structure prediction, which significantly expands the class of structures that can be handled in O(n(5)) time. Our algorithm can handle H-type pseudoknotted structures, kissing hairpins, and chains of four overlapping stems, as well as nested substructures of these types.

  10. A Feature and Algorithm Selection Method for Improving the Prediction of Protein Structural Class.

    PubMed

    Ni, Qianwu; Chen, Lei

    2017-01-01

    Correct prediction of protein structural class is beneficial to investigation on protein functions, regulations and interactions. In recent years, several computational methods have been proposed in this regard. However, based on various features, it is still a great challenge to select proper classification algorithm and extract essential features to participate in classification. In this study, a feature and algorithm selection method was presented for improving the accuracy of protein structural class prediction. The amino acid compositions and physiochemical features were adopted to represent features and thirty-eight machine learning algorithms collected in Weka were employed. All features were first analyzed by a feature selection method, minimum redundancy maximum relevance (mRMR), producing a feature list. Then, several feature sets were constructed by adding features in the list one by one. For each feature set, thirtyeight algorithms were executed on a dataset, in which proteins were represented by features in the set. The predicted classes yielded by these algorithms and true class of each protein were collected to construct a dataset, which were analyzed by mRMR method, yielding an algorithm list. From the algorithm list, the algorithm was taken one by one to build an ensemble prediction model. Finally, we selected the ensemble prediction model with the best performance as the optimal ensemble prediction model. Experimental results indicate that the constructed model is much superior to models using single algorithm and other models that only adopt feature selection procedure or algorithm selection procedure. The feature selection procedure or algorithm selection procedure are really helpful for building an ensemble prediction model that can yield a better performance. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  11. Many amino acid substitution variants identified in DNA repair genes during human population screenings are predicted to impact protein function

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Xi, T; Jones, I M; Mohrenweiser, H W

    2003-11-03

    Over 520 different amino acid substitution variants have been previously identified in the systematic screening of 91 human DNA repair genes for sequence variation. Two algorithms were employed to predict the impact of these amino acid substitutions on protein activity. Sorting Intolerant From Tolerant (SIFT) classified 226 of 508 variants (44%) as ''Intolerant''. Polymorphism Phenotyping (PolyPhen) classed 165 of 489 amino acid substitutions (34%) as ''Probably or Possibly Damaging''. Another 9-15% of the variants were classed as ''Potentially Intolerant or Damaging''. The results from the two algorithms are highly associated, with concordance in predicted impact observed for {approx}62% of themore » variants. Twenty one to thirty one percent of the variant proteins are predicted to exhibit reduced activity by both algorithms. These variants occur at slightly lower individual allele frequency than do the variants classified as ''Tolerant'' or ''Benign''. Both algorithms correctly predicted the impact of 26 functionally characterized amino acid substitutions in the APE1 protein on biochemical activity, with one exception. It is concluded that a substantial fraction of the missense variants observed in the general human population are functionally relevant. These variants are expected to be the molecular genetic and biochemical basis for the associations of reduced DNA repair capacity phenotypes with elevated cancer risk.« less

  12. Analysis of Bioactive Amino Acids from Fish Hydrolysates with a New Bioinformatic Intelligent System Approach.

    PubMed

    Elaziz, Mohamed Abd; Hemdan, Ahmed Monem; Hassanien, AboulElla; Oliva, Diego; Xiong, Shengwu

    2017-09-07

    The current economics of the fish protein industry demand rapid, accurate and expressive prediction algorithms at every step of protein production especially with the challenge of global climate change. This help to predict and analyze functional and nutritional quality then consequently control food allergies in hyper allergic patients. As, it is quite expensive and time-consuming to know these concentrations by the lab experimental tests, especially to conduct large-scale projects. Therefore, this paper introduced a new intelligent algorithm using adaptive neuro-fuzzy inference system based on whale optimization algorithm. This algorithm is used to predict the concentration levels of bioactive amino acids in fish protein hydrolysates at different times during the year. The whale optimization algorithm is used to determine the optimal parameters in adaptive neuro-fuzzy inference system. The results of proposed algorithm are compared with others and it is indicated the higher performance of the proposed algorithm.

  13. Prediction-Correction Algorithms for Time-Varying Constrained Optimization

    DOE PAGES

    Simonetto, Andrea; Dall'Anese, Emiliano

    2017-07-26

    This article develops online algorithms to track solutions of time-varying constrained optimization problems. Particularly, resembling workhorse Kalman filtering-based approaches for dynamical systems, the proposed methods involve prediction-correction steps to provably track the trajectory of the optimal solutions of time-varying convex problems. The merits of existing prediction-correction methods have been shown for unconstrained problems and for setups where computing the inverse of the Hessian of the cost function is computationally affordable. This paper addresses the limitations of existing methods by tackling constrained problems and by designing first-order prediction steps that rely on the Hessian of the cost function (and do notmore » require the computation of its inverse). In addition, the proposed methods are shown to improve the convergence speed of existing prediction-correction methods when applied to unconstrained problems. Numerical simulations corroborate the analytical results and showcase performance and benefits of the proposed algorithms. A realistic application of the proposed method to real-time control of energy resources is presented.« less

  14. From synthetic coiled coils to functional proteins: automated design of a receptor for the calmodulin-binding domain of calcineurin.

    PubMed

    Ghirlanda, G; Lear, J D; Lombardi, A; DeGrado, W F

    1998-08-14

    A series of synthetic receptors capable of binding to the calmodulin-binding domain of calcineurin (CN393-414) was designed, synthesized and characterized. The design was accomplished by docking CN393-414 against a two-helix receptor, using an idealized three-stranded coiled coil as a starting geometry. The sequence of the receptor was chosen using a side-chain re-packing program, which employed a genetic algorithm to select potential binders from a total of 7.5x10(6) possible sequences. A total of 25 receptors were prepared, representing 13 sequences predicted by the algorithm as well as 12 related sequences that were not predicted. The receptors were characterized by CD spectroscopy, analytical ultracentrifugation, and binding assays. The receptors predicted by the algorithm bound CN393-414 with apparent dissociation constants ranging from 0.2 microM to >50 microM. Many of the receptors that were not predicted by the algorithm also bound to CN393-414. Methods to circumvent this problem and to improve the automated design of functional proteins are discussed. Copyright 1998 Academic Press

  15. Multistage classification of multispectral Earth observational data: The design approach

    NASA Technical Reports Server (NTRS)

    Bauer, M. E. (Principal Investigator); Muasher, M. J.; Landgrebe, D. A.

    1981-01-01

    An algorithm is proposed which predicts the optimal features at every node in a binary tree procedure. The algorithm estimates the probability of error by approximating the area under the likelihood ratio function for two classes and taking into account the number of training samples used in estimating each of these two classes. Some results on feature selection techniques, particularly in the presence of a very limited set of training samples, are presented. Results comparing probabilities of error predicted by the proposed algorithm as a function of dimensionality as compared to experimental observations are shown for aircraft and LANDSAT data. Results are obtained for both real and simulated data. Finally, two binary tree examples which use the algorithm are presented to illustrate the usefulness of the procedure.

  16. Conditional nonlinear optimal perturbations based on the particle swarm optimization and their applications to the predictability problems

    NASA Astrophysics Data System (ADS)

    Zheng, Qin; Yang, Zubin; Sha, Jianxin; Yan, Jun

    2017-02-01

    In predictability problem research, the conditional nonlinear optimal perturbation (CNOP) describes the initial perturbation that satisfies a certain constraint condition and causes the largest prediction error at the prediction time. The CNOP has been successfully applied in estimation of the lower bound of maximum predictable time (LBMPT). Generally, CNOPs are calculated by a gradient descent algorithm based on the adjoint model, which is called ADJ-CNOP. This study, through the two-dimensional Ikeda model, investigates the impacts of the nonlinearity on ADJ-CNOP and the corresponding precision problems when using ADJ-CNOP to estimate the LBMPT. Our conclusions are that (1) when the initial perturbation is large or the prediction time is long, the strong nonlinearity of the dynamical model in the prediction variable will lead to failure of the ADJ-CNOP method, and (2) when the objective function has multiple extreme values, ADJ-CNOP has a large probability of producing local CNOPs, hence making a false estimation of the LBMPT. Furthermore, the particle swarm optimization (PSO) algorithm, one kind of intelligent algorithm, is introduced to solve this problem. The method using PSO to compute CNOP is called PSO-CNOP. The results of numerical experiments show that even with a large initial perturbation and long prediction time, or when the objective function has multiple extreme values, PSO-CNOP can always obtain the global CNOP. Since the PSO algorithm is a heuristic search algorithm based on the population, it can overcome the impact of nonlinearity and the disturbance from multiple extremes of the objective function. In addition, to check the estimation accuracy of the LBMPT presented by PSO-CNOP and ADJ-CNOP, we partition the constraint domain of initial perturbations into sufficiently fine grid meshes and take the LBMPT obtained by the filtering method as a benchmark. The result shows that the estimation presented by PSO-CNOP is closer to the true value than the one by ADJ-CNOP with the forecast time increasing.

  17. Real-time prediction and gating of respiratory motion in 3D space using extended Kalman filters and Gaussian process regression network

    NASA Astrophysics Data System (ADS)

    Bukhari, W.; Hong, S.-M.

    2016-03-01

    The prediction as well as the gating of respiratory motion have received much attention over the last two decades for reducing the targeting error of the radiation treatment beam due to respiratory motion. In this article, we present a real-time algorithm for predicting respiratory motion in 3D space and realizing a gating function without pre-specifying a particular phase of the patient’s breathing cycle. The algorithm, named EKF-GPRN+ , first employs an extended Kalman filter (EKF) independently along each coordinate to predict the respiratory motion and then uses a Gaussian process regression network (GPRN) to correct the prediction error of the EKF in 3D space. The GPRN is a nonparametric Bayesian algorithm for modeling input-dependent correlations between the output variables in multi-output regression. Inference in GPRN is intractable and we employ variational inference with mean field approximation to compute an approximate predictive mean and predictive covariance matrix. The approximate predictive mean is used to correct the prediction error of the EKF. The trace of the approximate predictive covariance matrix is utilized to capture the uncertainty in EKF-GPRN+ prediction error and systematically identify breathing points with a higher probability of large prediction error in advance. This identification enables us to pause the treatment beam over such instances. EKF-GPRN+ implements a gating function by using simple calculations based on the trace of the predictive covariance matrix. Extensive numerical experiments are performed based on a large database of 304 respiratory motion traces to evaluate EKF-GPRN+ . The experimental results show that the EKF-GPRN+ algorithm reduces the patient-wise prediction error to 38%, 40% and 40% in root-mean-square, compared to no prediction, at lookahead lengths of 192 ms, 384 ms and 576 ms, respectively. The EKF-GPRN+ algorithm can further reduce the prediction error by employing the gating function, albeit at the cost of reduced duty cycle. The error reduction allows the clinical target volume to planning target volume (CTV-PTV) margin to be reduced, leading to decreased normal-tissue toxicity and possible dose escalation. The CTV-PTV margin is also evaluated to quantify clinical benefits of EKF-GPRN+ prediction.

  18. A critical assessment of Mus musculus gene function prediction using integrated genomic evidence

    PubMed Central

    Peña-Castillo, Lourdes; Tasan, Murat; Myers, Chad L; Lee, Hyunju; Joshi, Trupti; Zhang, Chao; Guan, Yuanfang; Leone, Michele; Pagnani, Andrea; Kim, Wan Kyu; Krumpelman, Chase; Tian, Weidong; Obozinski, Guillaume; Qi, Yanjun; Mostafavi, Sara; Lin, Guan Ning; Berriz, Gabriel F; Gibbons, Francis D; Lanckriet, Gert; Qiu, Jian; Grant, Charles; Barutcuoglu, Zafer; Hill, David P; Warde-Farley, David; Grouios, Chris; Ray, Debajyoti; Blake, Judith A; Deng, Minghua; Jordan, Michael I; Noble, William S; Morris, Quaid; Klein-Seetharaman, Judith; Bar-Joseph, Ziv; Chen, Ting; Sun, Fengzhu; Troyanskaya, Olga G; Marcotte, Edward M; Xu, Dong; Hughes, Timothy R; Roth, Frederick P

    2008-01-01

    Background: Several years after sequencing the human genome and the mouse genome, much remains to be discovered about the functions of most human and mouse genes. Computational prediction of gene function promises to help focus limited experimental resources on the most likely hypotheses. Several algorithms using diverse genomic data have been applied to this task in model organisms; however, the performance of such approaches in mammals has not yet been evaluated. Results: In this study, a standardized collection of mouse functional genomic data was assembled; nine bioinformatics teams used this data set to independently train classifiers and generate predictions of function, as defined by Gene Ontology (GO) terms, for 21,603 mouse genes; and the best performing submissions were combined in a single set of predictions. We identified strengths and weaknesses of current functional genomic data sets and compared the performance of function prediction algorithms. This analysis inferred functions for 76% of mouse genes, including 5,000 currently uncharacterized genes. At a recall rate of 20%, a unified set of predictions averaged 41% precision, with 26% of GO terms achieving a precision better than 90%. Conclusion: We performed a systematic evaluation of diverse, independently developed computational approaches for predicting gene function from heterogeneous data sources in mammals. The results show that currently available data for mammals allows predictions with both breadth and accuracy. Importantly, many highly novel predictions emerge for the 38% of mouse genes that remain uncharacterized. PMID:18613946

  19. New correction procedures for the fast field program which extend its range

    NASA Technical Reports Server (NTRS)

    West, M.; Sack, R. A.

    1990-01-01

    A fast field program (FFP) algorithm was developed based on the method of Lee et al., for the prediction of sound pressure level from low frequency, high intensity sources. In order to permit accurate predictions at distances greater than 2 km, new correction procedures have had to be included in the algorithm. Certain functions, whose Hankel transforms can be determined analytically, are subtracted from the depth dependent Green's function. The distance response is then obtained as the sum of these transforms and the Fast Fourier Transformation (FFT) of the residual k dependent function. One procedure, which permits the elimination of most complex exponentials, has allowed significant changes in the structure of the FFP algorithm, which has resulted in a substantial reduction in computation time.

  20. Disk storage management for LHCb based on Data Popularity estimator

    NASA Astrophysics Data System (ADS)

    Hushchyn, Mikhail; Charpentier, Philippe; Ustyuzhanin, Andrey

    2015-12-01

    This paper presents an algorithm providing recommendations for optimizing the LHCb data storage. The LHCb data storage system is a hybrid system. All datasets are kept as archives on magnetic tapes. The most popular datasets are kept on disks. The algorithm takes the dataset usage history and metadata (size, type, configuration etc.) to generate a recommendation report. This article presents how we use machine learning algorithms to predict future data popularity. Using these predictions it is possible to estimate which datasets should be removed from disk. We use regression algorithms and time series analysis to find the optimal number of replicas for datasets that are kept on disk. Based on the data popularity and the number of replicas optimization, the algorithm minimizes a loss function to find the optimal data distribution. The loss function represents all requirements for data distribution in the data storage system. We demonstrate how our algorithm helps to save disk space and to reduce waiting times for jobs using this data.

  1. Mortality risk score prediction in an elderly population using machine learning.

    PubMed

    Rose, Sherri

    2013-03-01

    Standard practice for prediction often relies on parametric regression methods. Interesting new methods from the machine learning literature have been introduced in epidemiologic studies, such as random forest and neural networks. However, a priori, an investigator will not know which algorithm to select and may wish to try several. Here I apply the super learner, an ensembling machine learning approach that combines multiple algorithms into a single algorithm and returns a prediction function with the best cross-validated mean squared error. Super learning is a generalization of stacking methods. I used super learning in the Study of Physical Performance and Age-Related Changes in Sonomans (SPPARCS) to predict death among 2,066 residents of Sonoma, California, aged 54 years or more during the period 1993-1999. The super learner for predicting death (risk score) improved upon all single algorithms in the collection of algorithms, although its performance was similar to that of several algorithms. Super learner outperformed the worst algorithm (neural networks) by 44% with respect to estimated cross-validated mean squared error and had an R2 value of 0.201. The improvement of super learner over random forest with respect to R2 was approximately 2-fold. Alternatives for risk score prediction include the super learner, which can provide improved performance.

  2. An improved shuffled frog leaping algorithm based evolutionary framework for currency exchange rate prediction

    NASA Astrophysics Data System (ADS)

    Dash, Rajashree

    2017-11-01

    Forecasting purchasing power of one currency with respect to another currency is always an interesting topic in the field of financial time series prediction. Despite the existence of several traditional and computational models for currency exchange rate forecasting, there is always a need for developing simpler and more efficient model, which will produce better prediction capability. In this paper, an evolutionary framework is proposed by using an improved shuffled frog leaping (ISFL) algorithm with a computationally efficient functional link artificial neural network (CEFLANN) for prediction of currency exchange rate. The model is validated by observing the monthly prediction measures obtained for three currency exchange data sets such as USD/CAD, USD/CHF, and USD/JPY accumulated within same period of time. The model performance is also compared with two other evolutionary learning techniques such as Shuffled frog leaping algorithm and Particle Swarm optimization algorithm. Practical analysis of results suggest that, the proposed model developed using the ISFL algorithm with CEFLANN network is a promising predictor model for currency exchange rate prediction compared to other models included in the study.

  3. Prediction of Unsteady Aerodynamic Coefficients at High Angles of Attack

    NASA Technical Reports Server (NTRS)

    Pamadi, Bandu N.; Murphy, Patrick C.; Klein, Vladislav; Brandon, Jay M.

    2001-01-01

    The nonlinear indicial response method is used to model the unsteady aerodynamic coefficients in the low speed longitudinal oscillatory wind tunnel test data of the 0.1 scale model of the F-16XL aircraft. Exponential functions are used to approximate the deficiency function in the indicial response. Using one set of oscillatory wind tunnel data and parameter identification method, the unknown parameters in the exponential functions are estimated. The genetic algorithm is used as a least square minimizing algorithm. The assumed model structures and parameter estimates are validated by comparing the predictions with other sets of available oscillatory wind tunnel test data.

  4. Estimation of State Transition Probabilities: A Neural Network Model

    NASA Astrophysics Data System (ADS)

    Saito, Hiroshi; Takiyama, Ken; Okada, Masato

    2015-12-01

    Humans and animals can predict future states on the basis of acquired knowledge. This prediction of the state transition is important for choosing the best action, and the prediction is only possible if the state transition probability has already been learned. However, how our brains learn the state transition probability is unknown. Here, we propose a simple algorithm for estimating the state transition probability by utilizing the state prediction error. We analytically and numerically confirmed that our algorithm is able to learn the probability completely with an appropriate learning rate. Furthermore, our learning rule reproduced experimentally reported psychometric functions and neural activities in the lateral intraparietal area in a decision-making task. Thus, our algorithm might describe the manner in which our brains learn state transition probabilities and predict future states.

  5. Long-range prediction of Indian summer monsoon rainfall using data mining and statistical approaches

    NASA Astrophysics Data System (ADS)

    H, Vathsala; Koolagudi, Shashidhar G.

    2017-10-01

    This paper presents a hybrid model to better predict Indian summer monsoon rainfall. The algorithm considers suitable techniques for processing dense datasets. The proposed three-step algorithm comprises closed itemset generation-based association rule mining for feature selection, cluster membership for dimensionality reduction, and simple logistic function for prediction. The application of predicting rainfall into flood, excess, normal, deficit, and drought based on 36 predictors consisting of land and ocean variables is presented. Results show good accuracy in the considered study period of 37years (1969-2005).

  6. A new algorithm for finding survival coefficients employed in reliability equations

    NASA Technical Reports Server (NTRS)

    Bouricius, W. G.; Flehinger, B. J.

    1973-01-01

    Product reliabilities are predicted from past failure rates and reasonable estimate of future failure rates. Algorithm is used to calculate probability that product will function correctly. Algorithm sums the probabilities of each survival pattern and number of permutations for that pattern, over all possible ways in which product can survive.

  7. Motion prediction of a non-cooperative space target

    NASA Astrophysics Data System (ADS)

    Zhou, Bang-Zhao; Cai, Guo-Ping; Liu, Yun-Meng; Liu, Pan

    2018-01-01

    Capturing a non-cooperative space target is a tremendously challenging research topic. Effective acquisition of motion information of the space target is the premise to realize target capture. In this paper, motion prediction of a free-floating non-cooperative target in space is studied and a motion prediction algorithm is proposed. In order to predict the motion of the free-floating non-cooperative target, dynamic parameters of the target must be firstly identified (estimated), such as inertia, angular momentum and kinetic energy and so on; then the predicted motion of the target can be acquired by substituting these identified parameters into the Euler's equations of the target. Accurate prediction needs precise identification. This paper presents an effective method to identify these dynamic parameters of a free-floating non-cooperative target. This method is based on two steps, (1) the rough estimation of the parameters is computed using the motion observation data to the target, and (2) the best estimation of the parameters is found by an optimization method. In the optimization problem, the objective function is based on the difference between the observed and the predicted motion, and the interior-point method (IPM) is chosen as the optimization algorithm, which starts at the rough estimate obtained in the first step and finds a global minimum to the objective function with the guidance of objective function's gradient. So the speed of IPM searching for the global minimum is fast, and an accurate identification can be obtained in time. The numerical results show that the proposed motion prediction algorithm is able to predict the motion of the target.

  8. Prediction of protein-protein interaction network using a multi-objective optimization approach.

    PubMed

    Chowdhury, Archana; Rakshit, Pratyusha; Konar, Amit

    2016-06-01

    Protein-Protein Interactions (PPIs) are very important as they coordinate almost all cellular processes. This paper attempts to formulate PPI prediction problem in a multi-objective optimization framework. The scoring functions for the trial solution deal with simultaneous maximization of functional similarity, strength of the domain interaction profiles, and the number of common neighbors of the proteins predicted to be interacting. The above optimization problem is solved using the proposed Firefly Algorithm with Nondominated Sorting. Experiments undertaken reveal that the proposed PPI prediction technique outperforms existing methods, including gene ontology-based Relative Specific Similarity, multi-domain-based Domain Cohesion Coupling method, domain-based Random Decision Forest method, Bagging with REP Tree, and evolutionary/swarm algorithm-based approaches, with respect to sensitivity, specificity, and F1 score.

  9. Algorithm for Lossless Compression of Calibrated Hyperspectral Imagery

    NASA Technical Reports Server (NTRS)

    Kiely, Aaron B.; Klimesh, Matthew A.

    2010-01-01

    A two-stage predictive method was developed for lossless compression of calibrated hyperspectral imagery. The first prediction stage uses a conventional linear predictor intended to exploit spatial and/or spectral dependencies in the data. The compressor tabulates counts of the past values of the difference between this initial prediction and the actual sample value. To form the ultimate predicted value, in the second stage, these counts are combined with an adaptively updated weight function intended to capture information about data regularities introduced by the calibration process. Finally, prediction residuals are losslessly encoded using adaptive arithmetic coding. Algorithms of this type are commonly tested on a readily available collection of images from the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) hyperspectral imager. On the standard calibrated AVIRIS hyperspectral images that are most widely used for compression benchmarking, the new compressor provides more than 0.5 bits/sample improvement over the previous best compression results. The algorithm has been implemented in Mathematica. The compression algorithm was demonstrated as beneficial on 12-bit calibrated AVIRIS images.

  10. Predictive Power Estimation Algorithm (PPEA) - A New Algorithm to Reduce Overfitting for Genomic Biomarker Discovery

    PubMed Central

    Liu, Jiangang; Jolly, Robert A.; Smith, Aaron T.; Searfoss, George H.; Goldstein, Keith M.; Uversky, Vladimir N.; Dunker, Keith; Li, Shuyu; Thomas, Craig E.; Wei, Tao

    2011-01-01

    Toxicogenomics promises to aid in predicting adverse effects, understanding the mechanisms of drug action or toxicity, and uncovering unexpected or secondary pharmacology. However, modeling adverse effects using high dimensional and high noise genomic data is prone to over-fitting. Models constructed from such data sets often consist of a large number of genes with no obvious functional relevance to the biological effect the model intends to predict that can make it challenging to interpret the modeling results. To address these issues, we developed a novel algorithm, Predictive Power Estimation Algorithm (PPEA), which estimates the predictive power of each individual transcript through an iterative two-way bootstrapping procedure. By repeatedly enforcing that the sample number is larger than the transcript number, in each iteration of modeling and testing, PPEA reduces the potential risk of overfitting. We show with three different cases studies that: (1) PPEA can quickly derive a reliable rank order of predictive power of individual transcripts in a relatively small number of iterations, (2) the top ranked transcripts tend to be functionally related to the phenotype they are intended to predict, (3) using only the most predictive top ranked transcripts greatly facilitates development of multiplex assay such as qRT-PCR as a biomarker, and (4) more importantly, we were able to demonstrate that a small number of genes identified from the top-ranked transcripts are highly predictive of phenotype as their expression changes distinguished adverse from nonadverse effects of compounds in completely independent tests. Thus, we believe that the PPEA model effectively addresses the over-fitting problem and can be used to facilitate genomic biomarker discovery for predictive toxicology and drug responses. PMID:21935387

  11. CUFID-query: accurate network querying through random walk based network flow estimation.

    PubMed

    Jeong, Hyundoo; Qian, Xiaoning; Yoon, Byung-Jun

    2017-12-28

    Functional modules in biological networks consist of numerous biomolecules and their complicated interactions. Recent studies have shown that biomolecules in a functional module tend to have similar interaction patterns and that such modules are often conserved across biological networks of different species. As a result, such conserved functional modules can be identified through comparative analysis of biological networks. In this work, we propose a novel network querying algorithm based on the CUFID (Comparative network analysis Using the steady-state network Flow to IDentify orthologous proteins) framework combined with an efficient seed-and-extension approach. The proposed algorithm, CUFID-query, can accurately detect conserved functional modules as small subnetworks in the target network that are expected to perform similar functions to the given query functional module. The CUFID framework was recently developed for probabilistic pairwise global comparison of biological networks, and it has been applied to pairwise global network alignment, where the framework was shown to yield accurate network alignment results. In the proposed CUFID-query algorithm, we adopt the CUFID framework and extend it for local network alignment, specifically to solve network querying problems. First, in the seed selection phase, the proposed method utilizes the CUFID framework to compare the query and the target networks and to predict the probabilistic node-to-node correspondence between the networks. Next, the algorithm selects and greedily extends the seed in the target network by iteratively adding nodes that have frequent interactions with other nodes in the seed network, in a way that the conductance of the extended network is maximally reduced. Finally, CUFID-query removes irrelevant nodes from the querying results based on the personalized PageRank vector for the induced network that includes the fully extended network and its neighboring nodes. Through extensive performance evaluation based on biological networks with known functional modules, we show that CUFID-query outperforms the existing state-of-the-art algorithms in terms of prediction accuracy and biological significance of the predictions.

  12. Negative Example Selection for Protein Function Prediction: The NoGO Database

    PubMed Central

    Youngs, Noah; Penfold-Brown, Duncan; Bonneau, Richard; Shasha, Dennis

    2014-01-01

    Negative examples – genes that are known not to carry out a given protein function – are rarely recorded in genome and proteome annotation databases, such as the Gene Ontology database. Negative examples are required, however, for several of the most powerful machine learning methods for integrative protein function prediction. Most protein function prediction efforts have relied on a variety of heuristics for the choice of negative examples. Determining the accuracy of methods for negative example prediction is itself a non-trivial task, given that the Open World Assumption as applied to gene annotations rules out many traditional validation metrics. We present a rigorous comparison of these heuristics, utilizing a temporal holdout, and a novel evaluation strategy for negative examples. We add to this comparison several algorithms adapted from Positive-Unlabeled learning scenarios in text-classification, which are the current state of the art methods for generating negative examples in low-density annotation contexts. Lastly, we present two novel algorithms of our own construction, one based on empirical conditional probability, and the other using topic modeling applied to genes and annotations. We demonstrate that our algorithms achieve significantly fewer incorrect negative example predictions than the current state of the art, using multiple benchmarks covering multiple organisms. Our methods may be applied to generate negative examples for any type of method that deals with protein function, and to this end we provide a database of negative examples in several well-studied organisms, for general use (The NoGO database, available at: bonneaulab.bio.nyu.edu/nogo.html). PMID:24922051

  13. Variant effect prediction tools assessed using independent, functional assay-based datasets: implications for discovery and diagnostics.

    PubMed

    Mahmood, Khalid; Jung, Chol-Hee; Philip, Gayle; Georgeson, Peter; Chung, Jessica; Pope, Bernard J; Park, Daniel J

    2017-05-16

    Genetic variant effect prediction algorithms are used extensively in clinical genomics and research to determine the likely consequences of amino acid substitutions on protein function. It is vital that we better understand their accuracies and limitations because published performance metrics are confounded by serious problems of circularity and error propagation. Here, we derive three independent, functionally determined human mutation datasets, UniFun, BRCA1-DMS and TP53-TA, and employ them, alongside previously described datasets, to assess the pre-eminent variant effect prediction tools. Apparent accuracies of variant effect prediction tools were influenced significantly by the benchmarking dataset. Benchmarking with the assay-determined datasets UniFun and BRCA1-DMS yielded areas under the receiver operating characteristic curves in the modest ranges of 0.52 to 0.63 and 0.54 to 0.75, respectively, considerably lower than observed for other, potentially more conflicted datasets. These results raise concerns about how such algorithms should be employed, particularly in a clinical setting. Contemporary variant effect prediction tools are unlikely to be as accurate at the general prediction of functional impacts on proteins as reported prior. Use of functional assay-based datasets that avoid prior dependencies promises to be valuable for the ongoing development and accurate benchmarking of such tools.

  14. Anopheles gambiae genome reannotation through synthesis of ab initio and comparative gene prediction algorithms

    PubMed Central

    Li, Jun; Riehle, Michelle M; Zhang, Yan; Xu, Jiannong; Oduol, Frederick; Gomez, Shawn M; Eiglmeier, Karin; Ueberheide, Beatrix M; Shabanowitz, Jeffrey; Hunt, Donald F; Ribeiro, José MC; Vernick, Kenneth D

    2006-01-01

    Background Complete genome annotation is a necessary tool as Anopheles gambiae researchers probe the biology of this potent malaria vector. Results We reannotate the A. gambiae genome by synthesizing comparative and ab initio sets of predicted coding sequences (CDSs) into a single set using an exon-gene-union algorithm followed by an open-reading-frame-selection algorithm. The reannotation predicts 20,970 CDSs supported by at least two lines of evidence, and it lowers the proportion of CDSs lacking start and/or stop codons to only approximately 4%. The reannotated CDS set includes a set of 4,681 novel CDSs not represented in the Ensembl annotation but with EST support, and another set of 4,031 Ensembl-supported genes that undergo major structural and, therefore, probably functional changes in the reannotated set. The quality and accuracy of the reannotation was assessed by comparison with end sequences from 20,249 full-length cDNA clones, and evaluation of mass spectrometry peptide hit rates from an A. gambiae shotgun proteomic dataset confirms that the reannotated CDSs offer a high quality protein database for proteomics. We provide a functional proteomics annotation, ReAnoXcel, obtained by analysis of the new CDSs through the AnoXcel pipeline, which allows functional comparisons of the CDS sets within the same bioinformatic platform. CDS data are available for download. Conclusion Comprehensive A. gambiae genome reannotation is achieved through a combination of comparative and ab initio gene prediction algorithms. PMID:16569258

  15. Multi-agent cooperation rescue algorithm based on influence degree and state prediction

    NASA Astrophysics Data System (ADS)

    Zheng, Yanbin; Ma, Guangfu; Wang, Linlin; Xi, Pengxue

    2018-04-01

    Aiming at the multi-agent cooperative rescue in disaster, a multi-agent cooperative rescue algorithm based on impact degree and state prediction is proposed. Firstly, based on the influence of the information in the scene on the collaborative task, the influence degree function is used to filter the information. Secondly, using the selected information to predict the state of the system and Agent behavior. Finally, according to the result of the forecast, the cooperative behavior of Agent is guided and improved the efficiency of individual collaboration. The simulation results show that this algorithm can effectively solve the cooperative rescue problem of multi-agent and ensure the efficient completion of the task.

  16. Development of Artificial Neural Network Model for Diesel Fuel Properties Prediction using Vibrational Spectroscopy.

    PubMed

    Bolanča, Tomislav; Marinović, Slavica; Ukić, Sime; Jukić, Ante; Rukavina, Vinko

    2012-06-01

    This paper describes development of artificial neural network models which can be used to correlate and predict diesel fuel properties from several FTIR-ATR absorbances and Raman intensities as input variables. Multilayer feed forward and radial basis function neural networks have been used to rapid and simultaneous prediction of cetane number, cetane index, density, viscosity, distillation temperatures at 10% (T10), 50% (T50) and 90% (T90) recovery, contents of total aromatics and polycyclic aromatic hydrocarbons of commercial diesel fuels. In this study two-phase training procedures for multilayer feed forward networks were applied. While first phase training algorithm was constantly the back propagation one, two second phase training algorithms were varied and compared, namely: conjugate gradient and quasi Newton. In case of radial basis function network, radial layer was trained using K-means radial assignment algorithm and three different radial spread algorithms: explicit, isotropic and K-nearest neighbour. The number of hidden layer neurons and experimental data points used for the training set have been optimized for both neural networks in order to insure good predictive ability by reducing unnecessary experimental work. This work shows that developed artificial neural network models can determine main properties of diesel fuels simultaneously based on a single and fast IR or Raman measurement.

  17. De novo protein structure prediction by dynamic fragment assembly and conformational space annealing.

    PubMed

    Lee, Juyong; Lee, Jinhyuk; Sasaki, Takeshi N; Sasai, Masaki; Seok, Chaok; Lee, Jooyoung

    2011-08-01

    Ab initio protein structure prediction is a challenging problem that requires both an accurate energetic representation of a protein structure and an efficient conformational sampling method for successful protein modeling. In this article, we present an ab initio structure prediction method which combines a recently suggested novel way of fragment assembly, dynamic fragment assembly (DFA) and conformational space annealing (CSA) algorithm. In DFA, model structures are scored by continuous functions constructed based on short- and long-range structural restraint information from a fragment library. Here, DFA is represented by the full-atom model by CHARMM with the addition of the empirical potential of DFIRE. The relative contributions between various energy terms are optimized using linear programming. The conformational sampling was carried out with CSA algorithm, which can find low energy conformations more efficiently than simulated annealing used in the existing DFA study. The newly introduced DFA energy function and CSA sampling algorithm are implemented into CHARMM. Test results on 30 small single-domain proteins and 13 template-free modeling targets of the 8th Critical Assessment of protein Structure Prediction show that the current method provides comparable and complementary prediction results to existing top methods. Copyright © 2011 Wiley-Liss, Inc.

  18. An Impact-Location Estimation Algorithm for Subsonic Uninhabited Aircraft

    NASA Technical Reports Server (NTRS)

    Bauer, Jeffrey E.; Teets, Edward

    1997-01-01

    An impact-location estimation algorithm is being used at the NASA Dryden Flight Research Center to support range safety for uninhabited aerial vehicle flight tests. The algorithm computes an impact location based on the descent rate, mass, and altitude of the vehicle and current wind information. The predicted impact location is continuously displayed on the range safety officer's moving map display so that the flightpath of the vehicle can be routed to avoid ground assets if the flight must be terminated. The algorithm easily adapts to different vehicle termination techniques and has been shown to be accurate to the extent required to support range safety for subsonic uninhabited aerial vehicles. This paper describes how the algorithm functions, how the algorithm is used at NASA Dryden, and how various termination techniques are handled by the algorithm. Other approaches to predicting the impact location and the reasons why they were not selected for real-time implementation are also discussed.

  19. Prediction model for peninsular Indian summer monsoon rainfall using data mining and statistical approaches

    NASA Astrophysics Data System (ADS)

    Vathsala, H.; Koolagudi, Shashidhar G.

    2017-01-01

    In this paper we discuss a data mining application for predicting peninsular Indian summer monsoon rainfall, and propose an algorithm that combine data mining and statistical techniques. We select likely predictors based on association rules that have the highest confidence levels. We then cluster the selected predictors to reduce their dimensions and use cluster membership values for classification. We derive the predictors from local conditions in southern India, including mean sea level pressure, wind speed, and maximum and minimum temperatures. The global condition variables include southern oscillation and Indian Ocean dipole conditions. The algorithm predicts rainfall in five categories: Flood, Excess, Normal, Deficit and Drought. We use closed itemset mining, cluster membership calculations and a multilayer perceptron function in the algorithm to predict monsoon rainfall in peninsular India. Using Indian Institute of Tropical Meteorology data, we found the prediction accuracy of our proposed approach to be exceptionally good.

  20. Predictability of the Lagrangian Motion in the Upper Ocean

    NASA Astrophysics Data System (ADS)

    Piterbarg, L. I.; Griffa, A.; Griffa, A.; Mariano, A. J.; Ozgokmen, T. M.; Ryan, E. H.

    2001-12-01

    The complex non-linear dynamics of the upper ocean leads to chaotic behavior of drifter trajectories in the ocean. Our study is focused on estimating the predictability limit for the position of an individual Lagrangian particle or a particle cluster based on the knowledge of mean currents and observations of nearby particles (predictors). The Lagrangian prediction problem, besides being a fundamental scientific problem, is also of great importance for practical applications such as search and rescue operations and for modeling the spread of fish larvae. A stochastic multi-particle model for the Lagrangian motion has been rigorously formulated and is a generalization of the well known "random flight" model for a single particle. Our model is mathematically consistent and includes a few easily interpreted parameters, such as the Lagrangian velocity decorrelation time scale, the turbulent velocity variance, and the velocity decorrelation radius, that can be estimated from data. The top Lyapunov exponent for an isotropic version of the model is explicitly expressed as a function of these parameters enabling us to approximate the predictability limit to first order. Lagrangian prediction errors for two new prediction algorithms are evaluated against simple algorithms and each other and are used to test the predictability limits of the stochastic model for isotropic turbulence. The first algorithm is based on a Kalman filter and uses the developed stochastic model. Its implementation for drifter clusters in both the Tropical Pacific and Adriatic Sea, showed good prediction skill over a period of 1-2 weeks. The prediction error is primarily a function of the data density, defined as the number of predictors within a velocity decorrelation spatial scale from the particle to be predicted. The second algorithm is model independent and is based on spatial regression considerations. Preliminary results, based on simulated, as well as, real data, indicate that it performs better than the Kalman-based algorithm in strong shear flows. An important component of our research is the optimal predictor location problem; Where should floats be launched in order to minimize the Lagrangian prediction error? Preliminary Lagrangian sampling results for different flow scenarios will be presented.

  1. Predicting the continuum between corridors and barriers to animal movements using Step Selection Functions and Randomized Shortest Paths.

    PubMed

    Panzacchi, Manuela; Van Moorter, Bram; Strand, Olav; Saerens, Marco; Kivimäki, Ilkka; St Clair, Colleen C; Herfindal, Ivar; Boitani, Luigi

    2016-01-01

    The loss, fragmentation and degradation of habitat everywhere on Earth prompts increasing attention to identifying landscape features that support animal movement (corridors) or impedes it (barriers). Most algorithms used to predict corridors assume that animals move through preferred habitat either optimally (e.g. least cost path) or as random walkers (e.g. current models), but neither extreme is realistic. We propose that corridors and barriers are two sides of the same coin and that animals experience landscapes as spatiotemporally dynamic corridor-barrier continua connecting (separating) functional areas where individuals fulfil specific ecological processes. Based on this conceptual framework, we propose a novel methodological approach that uses high-resolution individual-based movement data to predict corridor-barrier continua with increased realism. Our approach consists of two innovations. First, we use step selection functions (SSF) to predict friction maps quantifying corridor-barrier continua for tactical steps between consecutive locations. Secondly, we introduce to movement ecology the randomized shortest path algorithm (RSP) which operates on friction maps to predict the corridor-barrier continuum for strategic movements between functional areas. By modulating the parameter Ѳ, which controls the trade-off between exploration and optimal exploitation of the environment, RSP bridges the gap between algorithms assuming optimal movements (when Ѳ approaches infinity, RSP is equivalent to LCP) or random walk (when Ѳ → 0, RSP → current models). Using this approach, we identify migration corridors for GPS-monitored wild reindeer (Rangifer t. tarandus) in Norway. We demonstrate that reindeer movement is best predicted by an intermediate value of Ѳ, indicative of a movement trade-off between optimization and exploration. Model calibration allows identification of a corridor-barrier continuum that closely fits empirical data and demonstrates that RSP outperforms models that assume either optimality or random walk. The proposed approach models the multiscale cognitive maps by which animals likely navigate real landscapes and generalizes the most common algorithms for identifying corridors. Because suboptimal, but non-random, movement strategies are likely widespread, our approach has the potential to predict more realistic corridor-barrier continua for a wide range of species. © 2015 The Authors. Journal of Animal Ecology © 2015 British Ecological Society.

  2. Linear and nonlinear trending and prediction for AVHRR time series data

    NASA Technical Reports Server (NTRS)

    Smid, J.; Volf, P.; Slama, M.; Palus, M.

    1995-01-01

    The variability of AVHRR calibration coefficient in time was analyzed using algorithms of linear and non-linear time series analysis. Specifically we have used the spline trend modeling, autoregressive process analysis, incremental neural network learning algorithm and redundancy functional testing. The analysis performed on available AVHRR data sets revealed that (1) the calibration data have nonlinear dependencies, (2) the calibration data depend strongly on the target temperature, (3) both calibration coefficients and the temperature time series can be modeled, in the first approximation, as autonomous dynamical systems, (4) the high frequency residuals of the analyzed data sets can be best modeled as an autoregressive process of the 10th degree. We have dealt with a nonlinear identification problem and the problem of noise filtering (data smoothing). The system identification and filtering are significant problems for AVHRR data sets. The algorithms outlined in this study can be used for the future EOS missions. Prediction and smoothing algorithms for time series of calibration data provide a functional characterization of the data. Those algorithms can be particularly useful when calibration data are incomplete or sparse.

  3. Prediction of cancer proteins by integrating protein interaction, domain frequency, and domain interaction data using machine learning algorithms.

    PubMed

    Huang, Chien-Hung; Peng, Huai-Shun; Ng, Ka-Lok

    2015-01-01

    Many proteins are known to be associated with cancer diseases. It is quite often that their precise functional role in disease pathogenesis remains unclear. A strategy to gain a better understanding of the function of these proteins is to make use of a combination of different aspects of proteomics data types. In this study, we extended Aragues's method by employing the protein-protein interaction (PPI) data, domain-domain interaction (DDI) data, weighted domain frequency score (DFS), and cancer linker degree (CLD) data to predict cancer proteins. Performances were benchmarked based on three kinds of experiments as follows: (I) using individual algorithm, (II) combining algorithms, and (III) combining the same classification types of algorithms. When compared with Aragues's method, our proposed methods, that is, machine learning algorithm and voting with the majority, are significantly superior in all seven performance measures. We demonstrated the accuracy of the proposed method on two independent datasets. The best algorithm can achieve a hit ratio of 89.4% and 72.8% for lung cancer dataset and lung cancer microarray study, respectively. It is anticipated that the current research could help understand disease mechanisms and diagnosis.

  4. Prediction of Cancer Proteins by Integrating Protein Interaction, Domain Frequency, and Domain Interaction Data Using Machine Learning Algorithms

    PubMed Central

    2015-01-01

    Many proteins are known to be associated with cancer diseases. It is quite often that their precise functional role in disease pathogenesis remains unclear. A strategy to gain a better understanding of the function of these proteins is to make use of a combination of different aspects of proteomics data types. In this study, we extended Aragues's method by employing the protein-protein interaction (PPI) data, domain-domain interaction (DDI) data, weighted domain frequency score (DFS), and cancer linker degree (CLD) data to predict cancer proteins. Performances were benchmarked based on three kinds of experiments as follows: (I) using individual algorithm, (II) combining algorithms, and (III) combining the same classification types of algorithms. When compared with Aragues's method, our proposed methods, that is, machine learning algorithm and voting with the majority, are significantly superior in all seven performance measures. We demonstrated the accuracy of the proposed method on two independent datasets. The best algorithm can achieve a hit ratio of 89.4% and 72.8% for lung cancer dataset and lung cancer microarray study, respectively. It is anticipated that the current research could help understand disease mechanisms and diagnosis. PMID:25866773

  5. Noniterative algorithm for improving the accuracy of a multicolor-light-emitting-diode-based colorimeter.

    PubMed

    Yang, Pao-Keng

    2012-05-01

    We present a noniterative algorithm to reliably reconstruct the spectral reflectance from discrete reflectance values measured by using multicolor light emitting diodes (LEDs) as probing light sources. The proposed algorithm estimates the spectral reflectance by a linear combination of product functions of the detector's responsivity function and the LEDs' line-shape functions. After introducing suitable correction, the resulting spectral reflectance was found to be free from the spectral-broadening effect due to the finite bandwidth of LED. We analyzed the data for a real sample and found that spectral reflectance with enhanced resolution gives a more accurate prediction in the color measurement.

  6. Noniterative algorithm for improving the accuracy of a multicolor-light-emitting-diode-based colorimeter

    NASA Astrophysics Data System (ADS)

    Yang, Pao-Keng

    2012-05-01

    We present a noniterative algorithm to reliably reconstruct the spectral reflectance from discrete reflectance values measured by using multicolor light emitting diodes (LEDs) as probing light sources. The proposed algorithm estimates the spectral reflectance by a linear combination of product functions of the detector's responsivity function and the LEDs' line-shape functions. After introducing suitable correction, the resulting spectral reflectance was found to be free from the spectral-broadening effect due to the finite bandwidth of LED. We analyzed the data for a real sample and found that spectral reflectance with enhanced resolution gives a more accurate prediction in the color measurement.

  7. Utilizing Machine Learning and Automated Performance Metrics to Evaluate Robot-Assisted Radical Prostatectomy Performance and Predict Outcomes.

    PubMed

    Hung, Andrew J; Chen, Jian; Che, Zhengping; Nilanon, Tanachat; Jarc, Anthony; Titus, Micha; Oh, Paul J; Gill, Inderbir S; Liu, Yan

    2018-05-01

    Surgical performance is critical for clinical outcomes. We present a novel machine learning (ML) method of processing automated performance metrics (APMs) to evaluate surgical performance and predict clinical outcomes after robot-assisted radical prostatectomy (RARP). We trained three ML algorithms utilizing APMs directly from robot system data (training material) and hospital length of stay (LOS; training label) (≤2 days and >2 days) from 78 RARP cases, and selected the algorithm with the best performance. The selected algorithm categorized the cases as "Predicted as expected LOS (pExp-LOS)" and "Predicted as extended LOS (pExt-LOS)." We compared postoperative outcomes of the two groups (Kruskal-Wallis/Fisher's exact tests). The algorithm then predicted individual clinical outcomes, which we compared with actual outcomes (Spearman's correlation/Fisher's exact tests). Finally, we identified five most relevant APMs adopted by the algorithm during predicting. The "Random Forest-50" (RF-50) algorithm had the best performance, reaching 87.2% accuracy in predicting LOS (73 cases as "pExp-LOS" and 5 cases as "pExt-LOS"). The "pExp-LOS" cases outperformed the "pExt-LOS" cases in surgery time (3.7 hours vs 4.6 hours, p = 0.007), LOS (2 days vs 4 days, p = 0.02), and Foley duration (9 days vs 14 days, p = 0.02). Patient outcomes predicted by the algorithm had significant association with the "ground truth" in surgery time (p < 0.001, r = 0.73), LOS (p = 0.05, r = 0.52), and Foley duration (p < 0.001, r = 0.45). The five most relevant APMs, adopted by the RF-50 algorithm in predicting, were largely related to camera manipulation. To our knowledge, ours is the first study to show that APMs and ML algorithms may help assess surgical RARP performance and predict clinical outcomes. With further accrual of clinical data (oncologic and functional data), this process will become increasingly relevant and valuable in surgical assessment and training.

  8. A parallel strategy for predicting the secondary structure of polycistronic microRNAs.

    PubMed

    Han, Dianwei; Tang, Guiliang; Zhang, Jun

    2013-01-01

    The biogenesis of a functional microRNA is largely dependent on the secondary structure of the microRNA precursor (pre-miRNA). Recently, it has been shown that microRNAs are present in the genome as the form of polycistronic transcriptional units in plants and animals. It will be important to design efficient computational methods to predict such structures for microRNA discovery and its applications in gene silencing. In this paper, we propose a parallel algorithm based on the master-slave architecture to predict the secondary structure from an input sequence. We conducted some experiments to verify the effectiveness of our parallel algorithm. The experimental results show that our algorithm is able to produce the optimal secondary structure of polycistronic microRNAs.

  9. Predicting intensity ranks of peptide fragment ions.

    PubMed

    Frank, Ari M

    2009-05-01

    Accurate modeling of peptide fragmentation is necessary for the development of robust scoring functions for peptide-spectrum matches, which are the cornerstone of MS/MS-based identification algorithms. Unfortunately, peptide fragmentation is a complex process that can involve several competing chemical pathways, which makes it difficult to develop generative probabilistic models that describe it accurately. However, the vast amounts of MS/MS data being generated now make it possible to use data-driven machine learning methods to develop discriminative ranking-based models that predict the intensity ranks of a peptide's fragment ions. We use simple sequence-based features that get combined by a boosting algorithm into models that make peak rank predictions with high accuracy. In an accompanying manuscript, we demonstrate how these prediction models are used to significantly improve the performance of peptide identification algorithms. The models can also be useful in the design of optimal multiple reaction monitoring (MRM) transitions, in cases where there is insufficient experimental data to guide the peak selection process. The prediction algorithm can also be run independently through PepNovo+, which is available for download from http://bix.ucsd.edu/Software/PepNovo.html.

  10. Predicting Intensity Ranks of Peptide Fragment Ions

    PubMed Central

    Frank, Ari M.

    2009-01-01

    Accurate modeling of peptide fragmentation is necessary for the development of robust scoring functions for peptide-spectrum matches, which are the cornerstone of MS/MS-based identification algorithms. Unfortunately, peptide fragmentation is a complex process that can involve several competing chemical pathways, which makes it difficult to develop generative probabilistic models that describe it accurately. However, the vast amounts of MS/MS data being generated now make it possible to use data-driven machine learning methods to develop discriminative ranking-based models that predict the intensity ranks of a peptide's fragment ions. We use simple sequence-based features that get combined by a boosting algorithm in to models that make peak rank predictions with high accuracy. In an accompanying manuscript, we demonstrate how these prediction models are used to significantly improve the performance of peptide identification algorithms. The models can also be useful in the design of optimal MRM transitions, in cases where there is insufficient experimental data to guide the peak selection process. The prediction algorithm can also be run independently through PepNovo+, which is available for download from http://bix.ucsd.edu/Software/PepNovo.html. PMID:19256476

  11. A New Ensemble Canonical Correlation Prediction Scheme for Seasonal Precipitation

    NASA Technical Reports Server (NTRS)

    Kim, Kyu-Myong; Lau, William K. M.; Li, Guilong; Shen, Samuel S. P.; Lau, William K. M. (Technical Monitor)

    2001-01-01

    Department of Mathematical Sciences, University of Alberta, Edmonton, Canada This paper describes the fundamental theory of the ensemble canonical correlation (ECC) algorithm for the seasonal climate forecasting. The algorithm is a statistical regression sch eme based on maximal correlation between the predictor and predictand. The prediction error is estimated by a spectral method using the basis of empirical orthogonal functions. The ECC algorithm treats the predictors and predictands as continuous fields and is an improvement from the traditional canonical correlation prediction. The improvements include the use of area-factor, estimation of prediction error, and the optimal ensemble of multiple forecasts. The ECC is applied to the seasonal forecasting over various parts of the world. The example presented here is for the North America precipitation. The predictor is the sea surface temperature (SST) from different ocean basins. The Climate Prediction Center's reconstructed SST (1951-1999) is used as the predictor's historical data. The optimally interpolated global monthly precipitation is used as the predictand?s historical data. Our forecast experiments show that the ECC algorithm renders very high skill and the optimal ensemble is very important to the high value.

  12. Predicting mining activity with parallel genetic algorithms

    USGS Publications Warehouse

    Talaie, S.; Leigh, R.; Louis, S.J.; Raines, G.L.; Beyer, H.G.; O'Reilly, U.M.; Banzhaf, Arnold D.; Blum, W.; Bonabeau, C.; Cantu-Paz, E.W.; ,; ,

    2005-01-01

    We explore several different techniques in our quest to improve the overall model performance of a genetic algorithm calibrated probabilistic cellular automata. We use the Kappa statistic to measure correlation between ground truth data and data predicted by the model. Within the genetic algorithm, we introduce a new evaluation function sensitive to spatial correctness and we explore the idea of evolving different rule parameters for different subregions of the land. We reduce the time required to run a simulation from 6 hours to 10 minutes by parallelizing the code and employing a 10-node cluster. Our empirical results suggest that using the spatially sensitive evaluation function does indeed improve the performance of the model and our preliminary results also show that evolving different rule parameters for different regions tends to improve overall model performance. Copyright 2005 ACM.

  13. Distributed model predictive control for constrained nonlinear systems with decoupled local dynamics.

    PubMed

    Zhao, Meng; Ding, Baocang

    2015-03-01

    This paper considers the distributed model predictive control (MPC) of nonlinear large-scale systems with dynamically decoupled subsystems. According to the coupled state in the overall cost function of centralized MPC, the neighbors are confirmed and fixed for each subsystem, and the overall objective function is disassembled into each local optimization. In order to guarantee the closed-loop stability of distributed MPC algorithm, the overall compatibility constraint for centralized MPC algorithm is decomposed into each local controller. The communication between each subsystem and its neighbors is relatively low, only the current states before optimization and the optimized input variables after optimization are being transferred. For each local controller, the quasi-infinite horizon MPC algorithm is adopted, and the global closed-loop system is proven to be exponentially stable. Copyright © 2014 ISA. Published by Elsevier Ltd. All rights reserved.

  14. Implementation of a combined algorithm designed to increase the reliability of information systems: simulation modeling

    NASA Astrophysics Data System (ADS)

    Popov, A.; Zolotarev, V.; Bychkov, S.

    2016-11-01

    This paper examines the results of experimental studies of a previously submitted combined algorithm designed to increase the reliability of information systems. The data that illustrates the organization and conduct of the studies is provided. Within the framework of a comparison of As a part of the study conducted, the comparison of the experimental data of simulation modeling and the data of the functioning of the real information system was made. The hypothesis of the homogeneity of the logical structure of the information systems was formulated, thus enabling to reconfigure the algorithm presented, - more specifically, to transform it into the model for the analysis and prediction of arbitrary information systems. The results presented can be used for further research in this direction. The data of the opportunity to predict the functioning of the information systems can be used for strategic and economic planning. The algorithm can be used as a means for providing information security.

  15. Detecting non-orthology in the COGs database and other approaches grouping orthologs using genome-specific best hits.

    PubMed

    Dessimoz, Christophe; Boeckmann, Brigitte; Roth, Alexander C J; Gonnet, Gaston H

    2006-01-01

    Correct orthology assignment is a critical prerequisite of numerous comparative genomics procedures, such as function prediction, construction of phylogenetic species trees and genome rearrangement analysis. We present an algorithm for the detection of non-orthologs that arise by mistake in current orthology classification methods based on genome-specific best hits, such as the COGs database. The algorithm works with pairwise distance estimates, rather than computationally expensive and error-prone tree-building methods. The accuracy of the algorithm is evaluated through verification of the distribution of predicted cases, case-by-case phylogenetic analysis and comparisons with predictions from other projects using independent methods. Our results show that a very significant fraction of the COG groups include non-orthologs: using conservative parameters, the algorithm detects non-orthology in a third of all COG groups. Consequently, sequence analysis sensitive to correct orthology assignments will greatly benefit from these findings.

  16. Medical chart validation of an algorithm for identifying multiple sclerosis relapse in healthcare claims.

    PubMed

    Chastek, Benjamin J; Oleen-Burkey, Merrikay; Lopez-Bresnahan, Maria V

    2010-01-01

    Relapse is a common measure of disease activity in relapsing-remitting multiple sclerosis (MS). The objective of this study was to test the content validity of an operational algorithm for detecting relapse in claims data. A claims-based relapse detection algorithm was tested by comparing its detection rate over a 1-year period with relapses identified based on medical chart review. According to the algorithm, MS patients in a US healthcare claims database who had either (1) a primary claim for MS during hospitalization or (2) a corticosteroid claim following a MS-related outpatient visit were designated as having a relapse. Patient charts were examined for explicit indication of relapse or care suggestive of relapse. Positive and negative predictive values were calculated. Medical charts were reviewed for 300 MS patients, half of whom had a relapse according to the algorithm. The claims-based criteria correctly classified 67.3% of patients with relapses (positive predictive value) and 70.0% of patients without relapses (negative predictive value; kappa 0.373: p < 0.001). Alternative algorithms did not improve on the predictive value of the operational algorithm. Limitations of the algorithm include lack of differentiation between relapsing-remitting MS and other types, and that it does not incorporate measures of function and disability. The claims-based algorithm appeared to successfully detect moderate-to-severe MS relapse. This validated definition can be applied to future claims-based MS studies.

  17. Detecting REM sleep from the finger: an automatic REM sleep algorithm based on peripheral arterial tone (PAT) and actigraphy.

    PubMed

    Herscovici, Sarah; Pe'er, Avivit; Papyan, Surik; Lavie, Peretz

    2007-02-01

    Scoring of REM sleep based on polysomnographic recordings is a laborious and time-consuming process. The growing number of ambulatory devices designed for cost-effective home-based diagnostic sleep recordings necessitates the development of a reliable automatic REM sleep detection algorithm that is not based on the traditional electroencephalographic, electrooccolographic and electromyographic recordings trio. This paper presents an automatic REM detection algorithm based on the peripheral arterial tone (PAT) signal and actigraphy which are recorded with an ambulatory wrist-worn device (Watch-PAT100). The PAT signal is a measure of the pulsatile volume changes at the finger tip reflecting sympathetic tone variations. The algorithm was developed using a training set of 30 patients recorded simultaneously with polysomnography and Watch-PAT100. Sleep records were divided into 5 min intervals and two time series were constructed from the PAT amplitudes and PAT-derived inter-pulse periods in each interval. A prediction function based on 16 features extracted from the above time series that determines the likelihood of detecting a REM epoch was developed. The coefficients of the prediction function were determined using a genetic algorithm (GA) optimizing process tuned to maximize a price function depending on the sensitivity, specificity and agreement of the algorithm in comparison with the gold standard of polysomnographic manual scoring. Based on a separate validation set of 30 patients overall sensitivity, specificity and agreement of the automatic algorithm to identify standard 30 s epochs of REM sleep were 78%, 92%, 89%, respectively. Deploying this REM detection algorithm in a wrist worn device could be very useful for unattended ambulatory sleep monitoring. The innovative method of optimization using a genetic algorithm has been proven to yield robust results in the validation set.

  18. Genetic algorithm based adaptive neural network ensemble and its application in predicting carbon flux

    USGS Publications Warehouse

    Xue, Y.; Liu, S.; Hu, Y.; Yang, J.; Chen, Q.

    2007-01-01

    To improve the accuracy in prediction, Genetic Algorithm based Adaptive Neural Network Ensemble (GA-ANNE) is presented. Intersections are allowed between different training sets based on the fuzzy clustering analysis, which ensures the diversity as well as the accuracy of individual Neural Networks (NNs). Moreover, to improve the accuracy of the adaptive weights of individual NNs, GA is used to optimize the cluster centers. Empirical results in predicting carbon flux of Duke Forest reveal that GA-ANNE can predict the carbon flux more accurately than Radial Basis Function Neural Network (RBFNN), Bagging NN ensemble, and ANNE. ?? 2007 IEEE.

  19. Searching for discrimination rules in protease proteolytic cleavage activity using genetic programming with a min-max scoring function.

    PubMed

    Yang, Zheng Rong; Thomson, Rebecca; Hodgman, T Charles; Dry, Jonathan; Doyle, Austin K; Narayanan, Ajit; Wu, XiKun

    2003-11-01

    This paper presents an algorithm which is able to extract discriminant rules from oligopeptides for protease proteolytic cleavage activity prediction. The algorithm is developed using genetic programming. Three important components in the algorithm are a min-max scoring function, the reverse Polish notation (RPN) and the use of minimum description length. The min-max scoring function is developed using amino acid similarity matrices for measuring the similarity between an oligopeptide and a rule, which is a complex algebraic equation of amino acids rather than a simple pattern sequence. The Fisher ratio is then calculated on the scoring values using the class label associated with the oligopeptides. The discriminant ability of each rule can therefore be evaluated. The use of RPN makes the evolutionary operations simpler and therefore reduces the computational cost. To prevent overfitting, the concept of minimum description length is used to penalize over-complicated rules. A fitness function is therefore composed of the Fisher ratio and the use of minimum description length for an efficient evolutionary process. In the application to four protease datasets (Trypsin, Factor Xa, Hepatitis C Virus and HIV protease cleavage site prediction), our algorithm is superior to C5, a conventional method for deriving decision trees.

  20. Reducing the worst case running times of a family of RNA and CFG problems, using Valiant's approach.

    PubMed

    Zakov, Shay; Tsur, Dekel; Ziv-Ukelson, Michal

    2011-08-18

    RNA secondary structure prediction is a mainstream bioinformatic domain, and is key to computational analysis of functional RNA. In more than 30 years, much research has been devoted to defining different variants of RNA structure prediction problems, and to developing techniques for improving prediction quality. Nevertheless, most of the algorithms in this field follow a similar dynamic programming approach as that presented by Nussinov and Jacobson in the late 70's, which typically yields cubic worst case running time algorithms. Recently, some algorithmic approaches were applied to improve the complexity of these algorithms, motivated by new discoveries in the RNA domain and by the need to efficiently analyze the increasing amount of accumulated genome-wide data. We study Valiant's classical algorithm for Context Free Grammar recognition in sub-cubic time, and extract features that are common to problems on which Valiant's approach can be applied. Based on this, we describe several problem templates, and formulate generic algorithms that use Valiant's technique and can be applied to all problems which abide by these templates, including many problems within the world of RNA Secondary Structures and Context Free Grammars. The algorithms presented in this paper improve the theoretical asymptotic worst case running time bounds for a large family of important problems. It is also possible that the suggested techniques could be applied to yield a practical speedup for these problems. For some of the problems (such as computing the RNA partition function and base-pair binding probabilities), the presented techniques are the only ones which are currently known for reducing the asymptotic running time bounds of the standard algorithms.

  1. Reducing the worst case running times of a family of RNA and CFG problems, using Valiant's approach

    PubMed Central

    2011-01-01

    Background RNA secondary structure prediction is a mainstream bioinformatic domain, and is key to computational analysis of functional RNA. In more than 30 years, much research has been devoted to defining different variants of RNA structure prediction problems, and to developing techniques for improving prediction quality. Nevertheless, most of the algorithms in this field follow a similar dynamic programming approach as that presented by Nussinov and Jacobson in the late 70's, which typically yields cubic worst case running time algorithms. Recently, some algorithmic approaches were applied to improve the complexity of these algorithms, motivated by new discoveries in the RNA domain and by the need to efficiently analyze the increasing amount of accumulated genome-wide data. Results We study Valiant's classical algorithm for Context Free Grammar recognition in sub-cubic time, and extract features that are common to problems on which Valiant's approach can be applied. Based on this, we describe several problem templates, and formulate generic algorithms that use Valiant's technique and can be applied to all problems which abide by these templates, including many problems within the world of RNA Secondary Structures and Context Free Grammars. Conclusions The algorithms presented in this paper improve the theoretical asymptotic worst case running time bounds for a large family of important problems. It is also possible that the suggested techniques could be applied to yield a practical speedup for these problems. For some of the problems (such as computing the RNA partition function and base-pair binding probabilities), the presented techniques are the only ones which are currently known for reducing the asymptotic running time bounds of the standard algorithms. PMID:21851589

  2. A Stochastic Framework for Evaluating Seizure Prediction Algorithms Using Hidden Markov Models

    PubMed Central

    Wong, Stephen; Gardner, Andrew B.; Krieger, Abba M.; Litt, Brian

    2007-01-01

    Responsive, implantable stimulation devices to treat epilepsy are now in clinical trials. New evidence suggests that these devices may be more effective when they deliver therapy before seizure onset. Despite years of effort, prospective seizure prediction, which could improve device performance, remains elusive. In large part, this is explained by lack of agreement on a statistical framework for modeling seizure generation and a method for validating algorithm performance. We present a novel stochastic framework based on a three-state hidden Markov model (HMM) (representing interictal, preictal, and seizure states) with the feature that periods of increased seizure probability can transition back to the interictal state. This notion reflects clinical experience and may enhance interpretation of published seizure prediction studies. Our model accommodates clipped EEG segments and formalizes intuitive notions regarding statistical validation. We derive equations for type I and type II errors as a function of the number of seizures, duration of interictal data, and prediction horizon length and we demonstrate the model’s utility with a novel seizure detection algorithm that appeared to predicted seizure onset. We propose this framework as a vital tool for designing and validating prediction algorithms and for facilitating collaborative research in this area. PMID:17021032

  3. A Third Approach to Gene Prediction Suggests Thousands of Additional Human Transcribed Regions

    PubMed Central

    Glusman, Gustavo; Qin, Shizhen; El-Gewely, M. Raafat; Siegel, Andrew F; Roach, Jared C; Hood, Leroy; Smit, Arian F. A

    2006-01-01

    The identification and characterization of the complete ensemble of genes is a main goal of deciphering the digital information stored in the human genome. Many algorithms for computational gene prediction have been described, ultimately derived from two basic concepts: (1) modeling gene structure and (2) recognizing sequence similarity. Successful hybrid methods combining these two concepts have also been developed. We present a third orthogonal approach to gene prediction, based on detecting the genomic signatures of transcription, accumulated over evolutionary time. We discuss four algorithms based on this third concept: Greens and CHOWDER, which quantify mutational strand biases caused by transcription-coupled DNA repair, and ROAST and PASTA, which are based on strand-specific selection against polyadenylation signals. We combined these algorithms into an integrated method called FEAST, which we used to predict the location and orientation of thousands of putative transcription units not overlapping known genes. Many of the newly predicted transcriptional units do not appear to code for proteins. The new algorithms are particularly apt at detecting genes with long introns and lacking sequence conservation. They therefore complement existing gene prediction methods and will help identify functional transcripts within many apparent “genomic deserts.” PMID:16543943

  4. SHARPEN-systematic hierarchical algorithms for rotamers and proteins on an extended network.

    PubMed

    Loksha, Ilya V; Maiolo, James R; Hong, Cheng W; Ng, Albert; Snow, Christopher D

    2009-04-30

    Algorithms for discrete optimization of proteins play a central role in recent advances in protein structure prediction and design. We wish to improve the resources available for computational biologists to rapidly prototype such algorithms and to easily scale these algorithms to many processors. To that end, we describe the implementation and use of two new open source resources, citing potential benefits over existing software. We discuss CHOMP, a new object-oriented library for macromolecular optimization, and SHARPEN, a framework for scaling CHOMP scripts to many computers. These tools allow users to develop new algorithms for a variety of applications including protein repacking, protein-protein docking, loop rebuilding, or homology model remediation. Particular care was taken to allow modular energy function design; protein conformations may currently be scored using either the OPLSaa molecular mechanical energy function or an all-atom semiempirical energy function employed by Rosetta. (c) 2009 Wiley Periodicals, Inc.

  5. RNA design using simulated SHAPE data.

    PubMed

    Lotfi, Mohadeseh; Zare-Mirakabad, Fatemeh; Montaseri, Soheila

    2018-05-03

    It has long been established that in addition to being involved in protein translation, RNA plays essential roles in numerous other cellular processes, including gene regulation and DNA replication. Such roles are known to be dictated by higher-order structures of RNA molecules. It is therefore of prime importance to find an RNA sequence that can fold to acquire a particular function that is desirable for use in pharmaceuticals and basic research. The challenge of finding an RNA sequence for a given structure is known as the RNA design problem. Although there are several algorithms to solve this problem, they mainly consider hard constraints, such as minimum free energy, to evaluate the predicted sequences. Recently, SHAPE data has emerged as a new soft constraint for RNA secondary structure prediction. To take advantage of this new experimental constraint, we report here a new method for accurate design of RNA sequences based on their secondary structures using SHAPE data as pseudo-free energy. We then compare our algorithm with four others: INFO-RNA, ERD, MODENA and RNAifold 2.0. Our algorithm precisely predicts 26 out of 29 new sequences for the structures extracted from the Rfam dataset, while the other four algorithms predict no more than 22 out of 29. The proposed algorithm is comparable to the above algorithms on RNA-SSD datasets, where they can predict up to 33 appropriate sequences for RNA secondary structures out of 34.

  6. An Approximation of the Error Backpropagation Algorithm in a Predictive Coding Network with Local Hebbian Synaptic Plasticity

    PubMed Central

    Whittington, James C. R.; Bogacz, Rafal

    2017-01-01

    To efficiently learn from feedback, cortical networks need to update synaptic weights on multiple levels of cortical hierarchy. An effective and well-known algorithm for computing such changes in synaptic weights is the error backpropagation algorithm. However, in this algorithm, the change in synaptic weights is a complex function of weights and activities of neurons not directly connected with the synapse being modified, whereas the changes in biological synapses are determined only by the activity of presynaptic and postsynaptic neurons. Several models have been proposed that approximate the backpropagation algorithm with local synaptic plasticity, but these models require complex external control over the network or relatively complex plasticity rules. Here we show that a network developed in the predictive coding framework can efficiently perform supervised learning fully autonomously, employing only simple local Hebbian plasticity. Furthermore, for certain parameters, the weight change in the predictive coding model converges to that of the backpropagation algorithm. This suggests that it is possible for cortical networks with simple Hebbian synaptic plasticity to implement efficient learning algorithms in which synapses in areas on multiple levels of hierarchy are modified to minimize the error on the output. PMID:28333583

  7. An Approximation of the Error Backpropagation Algorithm in a Predictive Coding Network with Local Hebbian Synaptic Plasticity.

    PubMed

    Whittington, James C R; Bogacz, Rafal

    2017-05-01

    To efficiently learn from feedback, cortical networks need to update synaptic weights on multiple levels of cortical hierarchy. An effective and well-known algorithm for computing such changes in synaptic weights is the error backpropagation algorithm. However, in this algorithm, the change in synaptic weights is a complex function of weights and activities of neurons not directly connected with the synapse being modified, whereas the changes in biological synapses are determined only by the activity of presynaptic and postsynaptic neurons. Several models have been proposed that approximate the backpropagation algorithm with local synaptic plasticity, but these models require complex external control over the network or relatively complex plasticity rules. Here we show that a network developed in the predictive coding framework can efficiently perform supervised learning fully autonomously, employing only simple local Hebbian plasticity. Furthermore, for certain parameters, the weight change in the predictive coding model converges to that of the backpropagation algorithm. This suggests that it is possible for cortical networks with simple Hebbian synaptic plasticity to implement efficient learning algorithms in which synapses in areas on multiple levels of hierarchy are modified to minimize the error on the output.

  8. DIANA-microT web server: elucidating microRNA functions through target prediction.

    PubMed

    Maragkakis, M; Reczko, M; Simossis, V A; Alexiou, P; Papadopoulos, G L; Dalamagas, T; Giannopoulos, G; Goumas, G; Koukis, E; Kourtis, K; Vergoulis, T; Koziris, N; Sellis, T; Tsanakas, P; Hatzigeorgiou, A G

    2009-07-01

    Computational microRNA (miRNA) target prediction is one of the key means for deciphering the role of miRNAs in development and disease. Here, we present the DIANA-microT web server as the user interface to the DIANA-microT 3.0 miRNA target prediction algorithm. The web server provides extensive information for predicted miRNA:target gene interactions with a user-friendly interface, providing extensive connectivity to online biological resources. Target gene and miRNA functions may be elucidated through automated bibliographic searches and functional information is accessible through Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. The web server offers links to nomenclature, sequence and protein databases, and users are facilitated by being able to search for targeted genes using different nomenclatures or functional features, such as the genes possible involvement in biological pathways. The target prediction algorithm supports parameters calculated individually for each miRNA:target gene interaction and provides a signal-to-noise ratio and a precision score that helps in the evaluation of the significance of the predicted results. Using a set of miRNA targets recently identified through the pSILAC method, the performance of several computational target prediction programs was assessed. DIANA-microT 3.0 achieved there with 66% the highest ratio of correctly predicted targets over all predicted targets. The DIANA-microT web server is freely available at www.microrna.gr/microT.

  9. Automated diagnoses of attention deficit hyperactive disorder using magnetic resonance imaging.

    PubMed

    Eloyan, Ani; Muschelli, John; Nebel, Mary Beth; Liu, Han; Han, Fang; Zhao, Tuo; Barber, Anita D; Joel, Suresh; Pekar, James J; Mostofsky, Stewart H; Caffo, Brian

    2012-01-01

    Successful automated diagnoses of attention deficit hyperactive disorder (ADHD) using imaging and functional biomarkers would have fundamental consequences on the public health impact of the disease. In this work, we show results on the predictability of ADHD using imaging biomarkers and discuss the scientific and diagnostic impacts of the research. We created a prediction model using the landmark ADHD 200 data set focusing on resting state functional connectivity (rs-fc) and structural brain imaging. We predicted ADHD status and subtype, obtained by behavioral examination, using imaging data, intelligence quotients and other covariates. The novel contributions of this manuscript include a thorough exploration of prediction and image feature extraction methodology on this form of data, including the use of singular value decompositions (SVDs), CUR decompositions, random forest, gradient boosting, bagging, voxel-based morphometry, and support vector machines as well as important insights into the value, and potentially lack thereof, of imaging biomarkers of disease. The key results include the CUR-based decomposition of the rs-fc-fMRI along with gradient boosting and the prediction algorithm based on a motor network parcellation and random forest algorithm. We conjecture that the CUR decomposition is largely diagnosing common population directions of head motion. Of note, a byproduct of this research is a potential automated method for detecting subtle in-scanner motion. The final prediction algorithm, a weighted combination of several algorithms, had an external test set specificity of 94% with sensitivity of 21%. The most promising imaging biomarker was a correlation graph from a motor network parcellation. In summary, we have undertaken a large-scale statistical exploratory prediction exercise on the unique ADHD 200 data set. The exercise produced several potential leads for future scientific exploration of the neurological basis of ADHD.

  10. Automated diagnoses of attention deficit hyperactive disorder using magnetic resonance imaging

    PubMed Central

    Eloyan, Ani; Muschelli, John; Nebel, Mary Beth; Liu, Han; Han, Fang; Zhao, Tuo; Barber, Anita D.; Joel, Suresh; Pekar, James J.; Mostofsky, Stewart H.; Caffo, Brian

    2012-01-01

    Successful automated diagnoses of attention deficit hyperactive disorder (ADHD) using imaging and functional biomarkers would have fundamental consequences on the public health impact of the disease. In this work, we show results on the predictability of ADHD using imaging biomarkers and discuss the scientific and diagnostic impacts of the research. We created a prediction model using the landmark ADHD 200 data set focusing on resting state functional connectivity (rs-fc) and structural brain imaging. We predicted ADHD status and subtype, obtained by behavioral examination, using imaging data, intelligence quotients and other covariates. The novel contributions of this manuscript include a thorough exploration of prediction and image feature extraction methodology on this form of data, including the use of singular value decompositions (SVDs), CUR decompositions, random forest, gradient boosting, bagging, voxel-based morphometry, and support vector machines as well as important insights into the value, and potentially lack thereof, of imaging biomarkers of disease. The key results include the CUR-based decomposition of the rs-fc-fMRI along with gradient boosting and the prediction algorithm based on a motor network parcellation and random forest algorithm. We conjecture that the CUR decomposition is largely diagnosing common population directions of head motion. Of note, a byproduct of this research is a potential automated method for detecting subtle in-scanner motion. The final prediction algorithm, a weighted combination of several algorithms, had an external test set specificity of 94% with sensitivity of 21%. The most promising imaging biomarker was a correlation graph from a motor network parcellation. In summary, we have undertaken a large-scale statistical exploratory prediction exercise on the unique ADHD 200 data set. The exercise produced several potential leads for future scientific exploration of the neurological basis of ADHD. PMID:22969709

  11. Identification of family-specific residue packing motifs and their use for structure-based protein function prediction: I. Method development.

    PubMed

    Bandyopadhyay, Deepak; Huan, Jun; Prins, Jan; Snoeyink, Jack; Wang, Wei; Tropsha, Alexander

    2009-11-01

    Protein function prediction is one of the central problems in computational biology. We present a novel automated protein structure-based function prediction method using libraries of local residue packing patterns that are common to most proteins in a known functional family. Critical to this approach is the representation of a protein structure as a graph where residue vertices (residue name used as a vertex label) are connected by geometrical proximity edges. The approach employs two steps. First, it uses a fast subgraph mining algorithm to find all occurrences of family-specific labeled subgraphs for all well characterized protein structural and functional families. Second, it queries a new structure for occurrences of a set of motifs characteristic of a known family, using a graph index to speed up Ullman's subgraph isomorphism algorithm. The confidence of function inference from structure depends on the number of family-specific motifs found in the query structure compared with their distribution in a large non-redundant database of proteins. This method can assign a new structure to a specific functional family in cases where sequence alignments, sequence patterns, structural superposition and active site templates fail to provide accurate annotation.

  12. 3D Protein structure prediction with genetic tabu search algorithm

    PubMed Central

    2010-01-01

    Background Protein structure prediction (PSP) has important applications in different fields, such as drug design, disease prediction, and so on. In protein structure prediction, there are two important issues. The first one is the design of the structure model and the second one is the design of the optimization technology. Because of the complexity of the realistic protein structure, the structure model adopted in this paper is a simplified model, which is called off-lattice AB model. After the structure model is assumed, optimization technology is needed for searching the best conformation of a protein sequence based on the assumed structure model. However, PSP is an NP-hard problem even if the simplest model is assumed. Thus, many algorithms have been developed to solve the global optimization problem. In this paper, a hybrid algorithm, which combines genetic algorithm (GA) and tabu search (TS) algorithm, is developed to complete this task. Results In order to develop an efficient optimization algorithm, several improved strategies are developed for the proposed genetic tabu search algorithm. The combined use of these strategies can improve the efficiency of the algorithm. In these strategies, tabu search introduced into the crossover and mutation operators can improve the local search capability, the adoption of variable population size strategy can maintain the diversity of the population, and the ranking selection strategy can improve the possibility of an individual with low energy value entering into next generation. Experiments are performed with Fibonacci sequences and real protein sequences. Experimental results show that the lowest energy obtained by the proposed GATS algorithm is lower than that obtained by previous methods. Conclusions The hybrid algorithm has the advantages from both genetic algorithm and tabu search algorithm. It makes use of the advantage of multiple search points in genetic algorithm, and can overcome poor hill-climbing capability in the conventional genetic algorithm by using the flexible memory functions of TS. Compared with some previous algorithms, GATS algorithm has better performance in global optimization and can predict 3D protein structure more effectively. PMID:20522256

  13. Hidden state prediction: a modification of classic ancestral state reconstruction algorithms helps unravel complex symbioses.

    PubMed

    Zaneveld, Jesse R R; Thurber, Rebecca L V

    2014-01-01

    Complex symbioses between animal or plant hosts and their associated microbiotas can involve thousands of species and millions of genes. Because of the number of interacting partners, it is often impractical to study all organisms or genes in these host-microbe symbioses individually. Yet new phylogenetic predictive methods can use the wealth of accumulated data on diverse model organisms to make inferences into the properties of less well-studied species and gene families. Predictive functional profiling methods use evolutionary models based on the properties of studied relatives to put bounds on the likely characteristics of an organism or gene that has not yet been studied in detail. These techniques have been applied to predict diverse features of host-associated microbial communities ranging from the enzymatic function of uncharacterized genes to the gene content of uncultured microorganisms. We consider these phylogenetically informed predictive techniques from disparate fields as examples of a general class of algorithms for Hidden State Prediction (HSP), and argue that HSP methods have broad value in predicting organismal traits in a variety of contexts, including the study of complex host-microbe symbioses.

  14. Think globally and solve locally: secondary memory-based network learning for automated multi-species function prediction

    PubMed Central

    2014-01-01

    Background Network-based learning algorithms for automated function prediction (AFP) are negatively affected by the limited coverage of experimental data and limited a priori known functional annotations. As a consequence their application to model organisms is often restricted to well characterized biological processes and pathways, and their effectiveness with poorly annotated species is relatively limited. A possible solution to this problem might consist in the construction of big networks including multiple species, but this in turn poses challenging computational problems, due to the scalability limitations of existing algorithms and the main memory requirements induced by the construction of big networks. Distributed computation or the usage of big computers could in principle respond to these issues, but raises further algorithmic problems and require resources not satisfiable with simple off-the-shelf computers. Results We propose a novel framework for scalable network-based learning of multi-species protein functions based on both a local implementation of existing algorithms and the adoption of innovative technologies: we solve “locally” the AFP problem, by designing “vertex-centric” implementations of network-based algorithms, but we do not give up thinking “globally” by exploiting the overall topology of the network. This is made possible by the adoption of secondary memory-based technologies that allow the efficient use of the large memory available on disks, thus overcoming the main memory limitations of modern off-the-shelf computers. This approach has been applied to the analysis of a large multi-species network including more than 300 species of bacteria and to a network with more than 200,000 proteins belonging to 13 Eukaryotic species. To our knowledge this is the first work where secondary-memory based network analysis has been applied to multi-species function prediction using biological networks with hundreds of thousands of proteins. Conclusions The combination of these algorithmic and technological approaches makes feasible the analysis of large multi-species networks using ordinary computers with limited speed and primary memory, and in perspective could enable the analysis of huge networks (e.g. the whole proteomes available in SwissProt), using well-equipped stand-alone machines. PMID:24843788

  15. Distributed Function Mining for Gene Expression Programming Based on Fast Reduction.

    PubMed

    Deng, Song; Yue, Dong; Yang, Le-chan; Fu, Xiong; Feng, Ya-zhou

    2016-01-01

    For high-dimensional and massive data sets, traditional centralized gene expression programming (GEP) or improved algorithms lead to increased run-time and decreased prediction accuracy. To solve this problem, this paper proposes a new improved algorithm called distributed function mining for gene expression programming based on fast reduction (DFMGEP-FR). In DFMGEP-FR, fast attribution reduction in binary search algorithms (FAR-BSA) is proposed to quickly find the optimal attribution set, and the function consistency replacement algorithm is given to solve integration of the local function model. Thorough comparative experiments for DFMGEP-FR, centralized GEP and the parallel gene expression programming algorithm based on simulated annealing (parallel GEPSA) are included in this paper. For the waveform, mushroom, connect-4 and musk datasets, the comparative results show that the average time-consumption of DFMGEP-FR drops by 89.09%%, 88.85%, 85.79% and 93.06%, respectively, in contrast to centralized GEP and by 12.5%, 8.42%, 9.62% and 13.75%, respectively, compared with parallel GEPSA. Six well-studied UCI test data sets demonstrate the efficiency and capability of our proposed DFMGEP-FR algorithm for distributed function mining.

  16. Analysis of energy-based algorithms for RNA secondary structure prediction

    PubMed Central

    2012-01-01

    Background RNA molecules play critical roles in the cells of organisms, including roles in gene regulation, catalysis, and synthesis of proteins. Since RNA function depends in large part on its folded structures, much effort has been invested in developing accurate methods for prediction of RNA secondary structure from the base sequence. Minimum free energy (MFE) predictions are widely used, based on nearest neighbor thermodynamic parameters of Mathews, Turner et al. or those of Andronescu et al. Some recently proposed alternatives that leverage partition function calculations find the structure with maximum expected accuracy (MEA) or pseudo-expected accuracy (pseudo-MEA) methods. Advances in prediction methods are typically benchmarked using sensitivity, positive predictive value and their harmonic mean, namely F-measure, on datasets of known reference structures. Since such benchmarks document progress in improving accuracy of computational prediction methods, it is important to understand how measures of accuracy vary as a function of the reference datasets and whether advances in algorithms or thermodynamic parameters yield statistically significant improvements. Our work advances such understanding for the MFE and (pseudo-)MEA-based methods, with respect to the latest datasets and energy parameters. Results We present three main findings. First, using the bootstrap percentile method, we show that the average F-measure accuracy of the MFE and (pseudo-)MEA-based algorithms, as measured on our largest datasets with over 2000 RNAs from diverse families, is a reliable estimate (within a 2% range with high confidence) of the accuracy of a population of RNA molecules represented by this set. However, average accuracy on smaller classes of RNAs such as a class of 89 Group I introns used previously in benchmarking algorithm accuracy is not reliable enough to draw meaningful conclusions about the relative merits of the MFE and MEA-based algorithms. Second, on our large datasets, the algorithm with best overall accuracy is a pseudo MEA-based algorithm of Hamada et al. that uses a generalized centroid estimator of base pairs. However, between MFE and other MEA-based methods, there is no clear winner in the sense that the relative accuracy of the MFE versus MEA-based algorithms changes depending on the underlying energy parameters. Third, of the four parameter sets we considered, the best accuracy for the MFE-, MEA-based, and pseudo-MEA-based methods is 0.686, 0.680, and 0.711, respectively (on a scale from 0 to 1 with 1 meaning perfect structure predictions) and is obtained with a thermodynamic parameter set obtained by Andronescu et al. called BL* (named after the Boltzmann likelihood method by which the parameters were derived). Conclusions Large datasets should be used to obtain reliable measures of the accuracy of RNA structure prediction algorithms, and average accuracies on specific classes (such as Group I introns and Transfer RNAs) should be interpreted with caution, considering the relatively small size of currently available datasets for such classes. The accuracy of the MEA-based methods is significantly higher when using the BL* parameter set of Andronescu et al. than when using the parameters of Mathews and Turner, and there is no significant difference between the accuracy of MEA-based methods and MFE when using the BL* parameters. The pseudo-MEA-based method of Hamada et al. with the BL* parameter set significantly outperforms all other MFE and MEA-based algorithms on our large data sets. PMID:22296803

  17. Analysis of energy-based algorithms for RNA secondary structure prediction.

    PubMed

    Hajiaghayi, Monir; Condon, Anne; Hoos, Holger H

    2012-02-01

    RNA molecules play critical roles in the cells of organisms, including roles in gene regulation, catalysis, and synthesis of proteins. Since RNA function depends in large part on its folded structures, much effort has been invested in developing accurate methods for prediction of RNA secondary structure from the base sequence. Minimum free energy (MFE) predictions are widely used, based on nearest neighbor thermodynamic parameters of Mathews, Turner et al. or those of Andronescu et al. Some recently proposed alternatives that leverage partition function calculations find the structure with maximum expected accuracy (MEA) or pseudo-expected accuracy (pseudo-MEA) methods. Advances in prediction methods are typically benchmarked using sensitivity, positive predictive value and their harmonic mean, namely F-measure, on datasets of known reference structures. Since such benchmarks document progress in improving accuracy of computational prediction methods, it is important to understand how measures of accuracy vary as a function of the reference datasets and whether advances in algorithms or thermodynamic parameters yield statistically significant improvements. Our work advances such understanding for the MFE and (pseudo-)MEA-based methods, with respect to the latest datasets and energy parameters. We present three main findings. First, using the bootstrap percentile method, we show that the average F-measure accuracy of the MFE and (pseudo-)MEA-based algorithms, as measured on our largest datasets with over 2000 RNAs from diverse families, is a reliable estimate (within a 2% range with high confidence) of the accuracy of a population of RNA molecules represented by this set. However, average accuracy on smaller classes of RNAs such as a class of 89 Group I introns used previously in benchmarking algorithm accuracy is not reliable enough to draw meaningful conclusions about the relative merits of the MFE and MEA-based algorithms. Second, on our large datasets, the algorithm with best overall accuracy is a pseudo MEA-based algorithm of Hamada et al. that uses a generalized centroid estimator of base pairs. However, between MFE and other MEA-based methods, there is no clear winner in the sense that the relative accuracy of the MFE versus MEA-based algorithms changes depending on the underlying energy parameters. Third, of the four parameter sets we considered, the best accuracy for the MFE-, MEA-based, and pseudo-MEA-based methods is 0.686, 0.680, and 0.711, respectively (on a scale from 0 to 1 with 1 meaning perfect structure predictions) and is obtained with a thermodynamic parameter set obtained by Andronescu et al. called BL* (named after the Boltzmann likelihood method by which the parameters were derived). Large datasets should be used to obtain reliable measures of the accuracy of RNA structure prediction algorithms, and average accuracies on specific classes (such as Group I introns and Transfer RNAs) should be interpreted with caution, considering the relatively small size of currently available datasets for such classes. The accuracy of the MEA-based methods is significantly higher when using the BL* parameter set of Andronescu et al. than when using the parameters of Mathews and Turner, and there is no significant difference between the accuracy of MEA-based methods and MFE when using the BL* parameters. The pseudo-MEA-based method of Hamada et al. with the BL* parameter set significantly outperforms all other MFE and MEA-based algorithms on our large data sets.

  18. A novel acenocoumarol pharmacogenomic dosing algorithm for the Greek population of EU-PACT trial.

    PubMed

    Ragia, Georgia; Kolovou, Vana; Kolovou, Genovefa; Konstantinides, Stavros; Maltezos, Efstratios; Tavridou, Anna; Tziakas, Dimitrios; Maitland-van der Zee, Anke H; Manolopoulos, Vangelis G

    2017-01-01

    To generate and validate a pharmacogenomic-guided (PG) dosing algorithm for acenocoumarol in the Greek population. To compare its performance with other PG algorithms developed for the Greek population. A total of 140 Greek patients participants of the EU-PACT trial for acenocoumarol, a randomized clinical trial that prospectively compared the effect of a PG dosing algorithm with a clinical dosing algorithm on the percentage of time within INR therapeutic range, who reached acenocoumarol stable dose were included in the study. CYP2C9 and VKORC1 genotypes, age and weight affected acenocoumarol dose and predicted 53.9% of its variability. EU-PACT PG algorithm overestimated acenocoumarol dose across all different CYP2C9/VKORC1 functional phenotype bins (predicted dose vs stable dose in normal responders 2.31 vs 2.00 mg/day, p = 0.028, in sensitive responders 1.72 vs 1.50 mg/day, p = 0.003, in highly sensitive responders 1.39 vs 1.00 mg/day, p = 0.029). The PG algorithm previously developed for the Greek population overestimated the dose in normal responders (2.51 vs 2.00 mg/day, p < 0.001). Ethnic-specific dosing algorithm is suggested for better prediction of acenocoumarol dosage requirements in patients of Greek origin.

  19. System identification of an unmanned quadcopter system using MRAN neural

    NASA Astrophysics Data System (ADS)

    Pairan, M. F.; Shamsudin, S. S.

    2017-12-01

    This project presents the performance analysis of the radial basis function neural network (RBF) trained with Minimal Resource Allocating Network (MRAN) algorithm for real-time identification of quadcopter. MRAN’s performance is compared with the RBF with Constant Trace algorithm for 2500 input-output pair data sampling. MRAN utilizes adding and pruning hidden neuron strategy to obtain optimum RBF structure, increase prediction accuracy and reduce training time. The results indicate that MRAN algorithm produces fast training time and more accurate prediction compared with standard RBF. The model proposed in this paper is capable of identifying and modelling a nonlinear representation of the quadcopter flight dynamics.

  20. Identification of Predictive Cis-Regulatory Elements Using a Discriminative Objective Function and a Dynamic Search Space

    PubMed Central

    Karnik, Rahul; Beer, Michael A.

    2015-01-01

    The generation of genomic binding or accessibility data from massively parallel sequencing technologies such as ChIP-seq and DNase-seq continues to accelerate. Yet state-of-the-art computational approaches for the identification of DNA binding motifs often yield motifs of weak predictive power. Here we present a novel computational algorithm called MotifSpec, designed to find predictive motifs, in contrast to over-represented sequence elements. The key distinguishing feature of this algorithm is that it uses a dynamic search space and a learned threshold to find discriminative motifs in combination with the modeling of motifs using a full PWM (position weight matrix) rather than k-mer words or regular expressions. We demonstrate that our approach finds motifs corresponding to known binding specificities in several mammalian ChIP-seq datasets, and that our PWMs classify the ChIP-seq signals with accuracy comparable to, or marginally better than motifs from the best existing algorithms. In other datasets, our algorithm identifies novel motifs where other methods fail. Finally, we apply this algorithm to detect motifs from expression datasets in C. elegans using a dynamic expression similarity metric rather than fixed expression clusters, and find novel predictive motifs. PMID:26465884

  1. Identification of Predictive Cis-Regulatory Elements Using a Discriminative Objective Function and a Dynamic Search Space.

    PubMed

    Karnik, Rahul; Beer, Michael A

    2015-01-01

    The generation of genomic binding or accessibility data from massively parallel sequencing technologies such as ChIP-seq and DNase-seq continues to accelerate. Yet state-of-the-art computational approaches for the identification of DNA binding motifs often yield motifs of weak predictive power. Here we present a novel computational algorithm called MotifSpec, designed to find predictive motifs, in contrast to over-represented sequence elements. The key distinguishing feature of this algorithm is that it uses a dynamic search space and a learned threshold to find discriminative motifs in combination with the modeling of motifs using a full PWM (position weight matrix) rather than k-mer words or regular expressions. We demonstrate that our approach finds motifs corresponding to known binding specificities in several mammalian ChIP-seq datasets, and that our PWMs classify the ChIP-seq signals with accuracy comparable to, or marginally better than motifs from the best existing algorithms. In other datasets, our algorithm identifies novel motifs where other methods fail. Finally, we apply this algorithm to detect motifs from expression datasets in C. elegans using a dynamic expression similarity metric rather than fixed expression clusters, and find novel predictive motifs.

  2. Developing a local least-squares support vector machines-based neuro-fuzzy model for nonlinear and chaotic time series prediction.

    PubMed

    Miranian, A; Abdollahzade, M

    2013-02-01

    Local modeling approaches, owing to their ability to model different operating regimes of nonlinear systems and processes by independent local models, seem appealing for modeling, identification, and prediction applications. In this paper, we propose a local neuro-fuzzy (LNF) approach based on the least-squares support vector machines (LSSVMs). The proposed LNF approach employs LSSVMs, which are powerful in modeling and predicting time series, as local models and uses hierarchical binary tree (HBT) learning algorithm for fast and efficient estimation of its parameters. The HBT algorithm heuristically partitions the input space into smaller subdomains by axis-orthogonal splits. In each partitioning, the validity functions automatically form a unity partition and therefore normalization side effects, e.g., reactivation, are prevented. Integration of LSSVMs into the LNF network as local models, along with the HBT learning algorithm, yield a high-performance approach for modeling and prediction of complex nonlinear time series. The proposed approach is applied to modeling and predictions of different nonlinear and chaotic real-world and hand-designed systems and time series. Analysis of the prediction results and comparisons with recent and old studies demonstrate the promising performance of the proposed LNF approach with the HBT learning algorithm for modeling and prediction of nonlinear and chaotic systems and time series.

  3. [The application of gene expression programming in the diagnosis of heart disease].

    PubMed

    Dai, Wenbin; Zhang, Yuntao; Gao, Xingyu

    2009-02-01

    GEP (Gene expression programming) is a new genetic algorithm, and it has been proved to be excellent in function finding. In this paper, for the purpose of setting up a diagnostic model, GEP is used to deal with the data of heart disease. Eight variables, Sex, Chest pain, Blood pressure, Angina, Peak, Slope, Colored vessels and Thal, are picked out of thirteen variables to form a classified function. This function is used to predict a forecasting set of 100 samples, and the accuracy is 87%. Other algorithms such as SVM (Support vector machine) are applied to the same data and the forecasting results show that GEP is better than other algorithms.

  4. Assessment of the Clinical Relevance of BRCA2 Missense Variants by Functional and Computational Approaches.

    PubMed

    Guidugli, Lucia; Shimelis, Hermela; Masica, David L; Pankratz, Vernon S; Lipton, Gary B; Singh, Namit; Hu, Chunling; Monteiro, Alvaro N A; Lindor, Noralane M; Goldgar, David E; Karchin, Rachel; Iversen, Edwin S; Couch, Fergus J

    2018-01-17

    Many variants of uncertain significance (VUS) have been identified in BRCA2 through clinical genetic testing. VUS pose a significant clinical challenge because the contribution of these variants to cancer risk has not been determined. We conducted a comprehensive assessment of VUS in the BRCA2 C-terminal DNA binding domain (DBD) by using a validated functional assay of BRCA2 homologous recombination (HR) DNA-repair activity and defined a classifier of variant pathogenicity. Among 139 variants evaluated, 54 had ≥99% probability of pathogenicity, and 73 had ≥95% probability of neutrality. Functional assay results were compared with predictions of variant pathogenicity from the Align-GVGD protein-sequence-based prediction algorithm, which has been used for variant classification. Relative to the HR assay, Align-GVGD significantly (p < 0.05) over-predicted pathogenic variants. We subsequently combined functional and Align-GVGD prediction results in a Bayesian hierarchical model (VarCall) to estimate the overall probability of pathogenicity for each VUS. In addition, to predict the effects of all other BRCA2 DBD variants and to prioritize variants for functional studies, we used the endoPhenotype-Optimized Sequence Ensemble (ePOSE) algorithm to train classifiers for BRCA2 variants by using data from the HR functional assay. Together, the results show that systematic functional assays in combination with in silico predictors of pathogenicity provide robust tools for clinical annotation of BRCA2 VUS. Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  5. MOlecular MAterials Property Prediction Package (MOMAP) 1.0: a software package for predicting the luminescent properties and mobility of organic functional materials

    NASA Astrophysics Data System (ADS)

    Niu, Yingli; Li, Wenqiang; Peng, Qian; Geng, Hua; Yi, Yuanping; Wang, Linjun; Nan, Guangjun; Wang, Dong; Shuai, Zhigang

    2018-04-01

    MOlecular MAterials Property Prediction Package (MOMAP) is a software toolkit for molecular materials property prediction. It focuses on luminescent properties and charge mobility properties. This article contains a brief descriptive introduction of key features, theoretical models and algorithms of the software, together with examples that illustrate the performance. First, we present the theoretical models and algorithms for molecular luminescent properties calculation, which includes the excited-state radiative/non-radiative decay rate constant and the optical spectra. Then, a multi-scale simulation approach and its algorithm for the molecular charge mobility are described. This approach is based on hopping model and combines with Kinetic Monte Carlo and molecular dynamics simulations, and it is especially applicable for describing a large category of organic semiconductors, whose inter-molecular electronic coupling is much smaller than intra-molecular charge reorganisation energy.

  6. Pole-placement Predictive Functional Control for under-damped systems with real numbers algebra.

    PubMed

    Zabet, K; Rossiter, J A; Haber, R; Abdullah, M

    2017-11-01

    This paper presents the new algorithm of PP-PFC (Pole-placement Predictive Functional Control) for stable, linear under-damped higher-order processes. It is shown that while conventional PFC aims to get first-order exponential behavior, this is not always straightforward with significant under-damped modes and hence a pole-placement PFC algorithm is proposed which can be tuned more precisely to achieve the desired dynamics, but exploits complex number algebra and linear combinations in order to deliver guarantees of stability and performance. Nevertheless, practical implementation is easier by avoiding complex number algebra and hence a modified formulation of the PP-PFC algorithm is also presented which utilises just real numbers while retaining the key attributes of simple algebra, coding and tuning. The potential advantages are demonstrated with numerical examples and real-time control of a laboratory plant. Copyright © 2017 ISA. All rights reserved.

  7. Prediction of Aerodynamic Coefficients for Wind Tunnel Data using a Genetic Algorithm Optimized Neural Network

    NASA Technical Reports Server (NTRS)

    Rajkumar, T.; Aragon, Cecilia; Bardina, Jorge; Britten, Roy

    2002-01-01

    A fast, reliable way of predicting aerodynamic coefficients is produced using a neural network optimized by a genetic algorithm. Basic aerodynamic coefficients (e.g. lift, drag, pitching moment) are modelled as functions of angle of attack and Mach number. The neural network is first trained on a relatively rich set of data from wind tunnel tests of numerical simulations to learn an overall model. Most of the aerodynamic parameters can be well-fitted using polynomial functions. A new set of data, which can be relatively sparse, is then supplied to the network to produce a new model consistent with the previous model and the new data. Because the new model interpolates realistically between the sparse test data points, it is suitable for use in piloted simulations. The genetic algorithm is used to choose a neural network architecture to give best results, avoiding over-and under-fitting of the test data.

  8. Development of an Evolutionary Algorithm for the ab Initio Discovery of Two-Dimensional Materials

    NASA Astrophysics Data System (ADS)

    Revard, Benjamin Charles

    Crystal structure prediction is an important first step on the path toward computational materials design. Increasingly robust methods have become available in recent years for computing many materials properties, but because properties are largely a function of crystal structure, the structure must be known before these methods can be brought to bear. In addition, structure prediction is particularly useful for identifying low-energy structures of subperiodic materials, such as two-dimensional (2D) materials, which may adopt unexpected structures that differ from those of the corresponding bulk phases. Evolutionary algorithms, which are heuristics for global optimization inspired by biological evolution, have proven to be a fruitful approach for tackling the problem of crystal structure prediction. This thesis describes the development of an improved evolutionary algorithm for structure prediction and several applications of the algorithm to predict the structures of novel low-energy 2D materials. The first part of this thesis contains an overview of evolutionary algorithms for crystal structure prediction and presents our implementation, including details of extending the algorithm to search for clusters, wires, and 2D materials, improvements to efficiency when running in parallel, improved composition space sampling, and the ability to search for partial phase diagrams. We then present several applications of the evolutionary algorithm to 2D systems, including InP, the C-Si and Sn-S phase diagrams, and several group-IV dioxides. This thesis makes use of the Cornell graduate school's "papers" option. Chapters 1 and 3 correspond to the first-author publications of Refs. [131] and [132], respectively, and chapter 2 will soon be submitted as a first-author publication. The material in chapter 4 is taken from Ref. [144], in which I share joint first-authorship. In this case I have included only my own contributions.

  9. sNebula, a network-based algorithm to predict binding between human leukocyte antigens and peptides

    PubMed Central

    Luo, Heng; Ye, Hao; Ng, Hui Wen; Sakkiah, Sugunadevi; Mendrick, Donna L.; Hong, Huixiao

    2016-01-01

    Understanding the binding between human leukocyte antigens (HLAs) and peptides is important to understand the functioning of the immune system. Since it is time-consuming and costly to measure the binding between large numbers of HLAs and peptides, computational methods including machine learning models and network approaches have been developed to predict HLA-peptide binding. However, there are several limitations for the existing methods. We developed a network-based algorithm called sNebula to address these limitations. We curated qualitative Class I HLA-peptide binding data and demonstrated the prediction performance of sNebula on this dataset using leave-one-out cross-validation and five-fold cross-validations. This algorithm can predict not only peptides of different lengths and different types of HLAs, but also the peptides or HLAs that have no existing binding data. We believe sNebula is an effective method to predict HLA-peptide binding and thus improve our understanding of the immune system. PMID:27558848

  10. Prediction of thermal conductivity of polyvinylpyrrolidone (PVP) electrospun nanocomposite fibers using artificial neural network and prey-predator algorithm.

    PubMed

    Khan, Waseem S; Hamadneh, Nawaf N; Khan, Waqar A

    2017-01-01

    In this study, multilayer perception neural network (MLPNN) was employed to predict thermal conductivity of PVP electrospun nanocomposite fibers with multiwalled carbon nanotubes (MWCNTs) and Nickel Zinc ferrites [(Ni0.6Zn0.4) Fe2O4]. This is the second attempt on the application of MLPNN with prey predator algorithm for the prediction of thermal conductivity of PVP electrospun nanocomposite fibers. The prey predator algorithm was used to train the neural networks to find the best models. The best models have the minimal of sum squared error between the experimental testing data and the corresponding models results. The minimal error was found to be 0.0028 for MWCNTs model and 0.00199 for Ni-Zn ferrites model. The predicted artificial neural networks (ANNs) responses were analyzed statistically using z-test, correlation coefficient, and the error functions for both inclusions. The predicted ANN responses for PVP electrospun nanocomposite fibers were compared with the experimental data and were found in good agreement.

  11. Voidage correction algorithm for unresolved Euler-Lagrange simulations

    NASA Astrophysics Data System (ADS)

    Askarishahi, Maryam; Salehi, Mohammad-Sadegh; Radl, Stefan

    2018-04-01

    The effect of grid coarsening on the predicted total drag force and heat exchange rate in dense gas-particle flows is investigated using Euler-Lagrange (EL) approach. We demonstrate that grid coarsening may reduce the predicted total drag force and exchange rate. Surprisingly, exchange coefficients predicted by the EL approach deviate more significantly from the exact value compared to results of Euler-Euler (EE)-based calculations. The voidage gradient is identified as the root cause of this peculiar behavior. Consequently, we propose a correction algorithm based on a sigmoidal function to predict the voidage experienced by individual particles. Our correction algorithm can significantly improve the prediction of exchange coefficients in EL models, which is tested for simulations involving Euler grid cell sizes between 2d_p and 12d_p . It is most relevant in simulations of dense polydisperse particle suspensions featuring steep voidage profiles. For these suspensions, classical approaches may result in an error of the total exchange rate of up to 30%.

  12. Multi-instance multi-label distance metric learning for genome-wide protein function prediction.

    PubMed

    Xu, Yonghui; Min, Huaqing; Song, Hengjie; Wu, Qingyao

    2016-08-01

    Multi-instance multi-label (MIML) learning has been proven to be effective for the genome-wide protein function prediction problems where each training example is associated with not only multiple instances but also multiple class labels. To find an appropriate MIML learning method for genome-wide protein function prediction, many studies in the literature attempted to optimize objective functions in which dissimilarity between instances is measured using the Euclidean distance. But in many real applications, Euclidean distance may be unable to capture the intrinsic similarity/dissimilarity in feature space and label space. Unlike other previous approaches, in this paper, we propose to learn a multi-instance multi-label distance metric learning framework (MIMLDML) for genome-wide protein function prediction. Specifically, we learn a Mahalanobis distance to preserve and utilize the intrinsic geometric information of both feature space and label space for MIML learning. In addition, we try to deal with the sparsely labeled data by giving weight to the labeled data. Extensive experiments on seven real-world organisms covering the biological three-domain system (i.e., archaea, bacteria, and eukaryote; Woese et al., 1990) show that the MIMLDML algorithm is superior to most state-of-the-art MIML learning algorithms. Copyright © 2016 Elsevier Ltd. All rights reserved.

  13. Imbalanced multi-modal multi-label learning for subcellular localization prediction of human proteins with both single and multiple sites.

    PubMed

    He, Jianjun; Gu, Hong; Liu, Wenqi

    2012-01-01

    It is well known that an important step toward understanding the functions of a protein is to determine its subcellular location. Although numerous prediction algorithms have been developed, most of them typically focused on the proteins with only one location. In recent years, researchers have begun to pay attention to the subcellular localization prediction of the proteins with multiple sites. However, almost all the existing approaches have failed to take into account the correlations among the locations caused by the proteins with multiple sites, which may be the important information for improving the prediction accuracy of the proteins with multiple sites. In this paper, a new algorithm which can effectively exploit the correlations among the locations is proposed by using gaussian process model. Besides, the algorithm also can realize optimal linear combination of various feature extraction technologies and could be robust to the imbalanced data set. Experimental results on a human protein data set show that the proposed algorithm is valid and can achieve better performance than the existing approaches.

  14. Robust Kalman filter design for predictive wind shear detection

    NASA Technical Reports Server (NTRS)

    Stratton, Alexander D.; Stengel, Robert F.

    1991-01-01

    Severe, low-altitude wind shear is a threat to aviation safety. Airborne sensors under development measure the radial component of wind along a line directly in front of an aircraft. In this paper, optimal estimation theory is used to define a detection algorithm to warn of hazardous wind shear from these sensors. To achieve robustness, a wind shear detection algorithm must distinguish threatening wind shear from less hazardous gustiness, despite variations in wind shear structure. This paper presents statistical analysis methods to refine wind shear detection algorithm robustness. Computational methods predict the ability to warn of severe wind shear and avoid false warning. Comparative capability of the detection algorithm as a function of its design parameters is determined, identifying designs that provide robust detection of severe wind shear.

  15. FindPrimaryPairs: An efficient algorithm for predicting element-transferring reactant/product pairs in metabolic networks.

    PubMed

    Steffensen, Jon Lund; Dufault-Thompson, Keith; Zhang, Ying

    2018-01-01

    The metabolism of individual organisms and biological communities can be viewed as a network of metabolites connected to each other through chemical reactions. In metabolic networks, chemical reactions transform reactants into products, thereby transferring elements between these metabolites. Knowledge of how elements are transferred through reactant/product pairs allows for the identification of primary compound connections through a metabolic network. However, such information is not readily available and is often challenging to obtain for large reaction databases or genome-scale metabolic models. In this study, a new algorithm was developed for automatically predicting the element-transferring reactant/product pairs using the limited information available in the standard representation of metabolic networks. The algorithm demonstrated high efficiency in analyzing large datasets and provided accurate predictions when benchmarked with manually curated data. Applying the algorithm to the visualization of metabolic networks highlighted pathways of primary reactant/product connections and provided an organized view of element-transferring biochemical transformations. The algorithm was implemented as a new function in the open source software package PSAMM in the release v0.30 (https://zhanglab.github.io/psamm/).

  16. Load balancing prediction method of cloud storage based on analytic hierarchy process and hybrid hierarchical genetic algorithm.

    PubMed

    Zhou, Xiuze; Lin, Fan; Yang, Lvqing; Nie, Jing; Tan, Qian; Zeng, Wenhua; Zhang, Nian

    2016-01-01

    With the continuous expansion of the cloud computing platform scale and rapid growth of users and applications, how to efficiently use system resources to improve the overall performance of cloud computing has become a crucial issue. To address this issue, this paper proposes a method that uses an analytic hierarchy process group decision (AHPGD) to evaluate the load state of server nodes. Training was carried out by using a hybrid hierarchical genetic algorithm (HHGA) for optimizing a radial basis function neural network (RBFNN). The AHPGD makes the aggregative indicator of virtual machines in cloud, and become input parameters of predicted RBFNN. Also, this paper proposes a new dynamic load balancing scheduling algorithm combined with a weighted round-robin algorithm, which uses the predictive periodical load value of nodes based on AHPPGD and RBFNN optimized by HHGA, then calculates the corresponding weight values of nodes and makes constant updates. Meanwhile, it keeps the advantages and avoids the shortcomings of static weighted round-robin algorithm.

  17. Empirical scoring functions for advanced protein-ligand docking with PLANTS.

    PubMed

    Korb, Oliver; Stützle, Thomas; Exner, Thomas E

    2009-01-01

    In this paper we present two empirical scoring functions, PLANTS(CHEMPLP) and PLANTS(PLP), designed for our docking algorithm PLANTS (Protein-Ligand ANT System), which is based on ant colony optimization (ACO). They are related, regarding their functional form, to parts of already published scoring functions and force fields. The parametrization procedure described here was able to identify several parameter settings showing an excellent performance for the task of pose prediction on two test sets comprising 298 complexes in total. Up to 87% of the complexes of the Astex diverse set and 77% of the CCDC/Astex clean listnc (noncovalently bound complexes of the clean list) could be reproduced with root-mean-square deviations of less than 2 A with respect to the experimentally determined structures. A comparison with the state-of-the-art docking tool GOLD clearly shows that this is, especially for the druglike Astex diverse set, an improvement in pose prediction performance. Additionally, optimized parameter settings for the search algorithm were identified, which can be used to balance pose prediction reliability and search speed.

  18. Model predictive controller design for boost DC-DC converter using T-S fuzzy cost function

    NASA Astrophysics Data System (ADS)

    Seo, Sang-Wha; Kim, Yong; Choi, Han Ho

    2017-11-01

    This paper proposes a Takagi-Sugeno (T-S) fuzzy method to select cost function weights of finite control set model predictive DC-DC converter control algorithms. The proposed method updates the cost function weights at every sample time by using T-S type fuzzy rules derived from the common optimal control engineering knowledge that a state or input variable with an excessively large magnitude can be penalised by increasing the weight corresponding to the variable. The best control input is determined via the online optimisation of the T-S fuzzy cost function for all the possible control input sequences. This paper implements the proposed model predictive control algorithm in real time on a Texas Instruments TMS320F28335 floating-point Digital Signal Processor (DSP). Some experimental results are given to illuminate the practicality and effectiveness of the proposed control system under several operating conditions. The results verify that our method can yield not only good transient and steady-state responses (fast recovery time, small overshoot, zero steady-state error, etc.) but also insensitiveness to abrupt load or input voltage parameter variations.

  19. A Formally Verified Conflict Detection Algorithm for Polynomial Trajectories

    NASA Technical Reports Server (NTRS)

    Narkawicz, Anthony; Munoz, Cesar

    2015-01-01

    In air traffic management, conflict detection algorithms are used to determine whether or not aircraft are predicted to lose horizontal and vertical separation minima within a time interval assuming a trajectory model. In the case of linear trajectories, conflict detection algorithms have been proposed that are both sound, i.e., they detect all conflicts, and complete, i.e., they do not present false alarms. In general, for arbitrary nonlinear trajectory models, it is possible to define detection algorithms that are either sound or complete, but not both. This paper considers the case of nonlinear aircraft trajectory models based on polynomial functions. In particular, it proposes a conflict detection algorithm that precisely determines whether, given a lookahead time, two aircraft flying polynomial trajectories are in conflict. That is, it has been formally verified that, assuming that the aircraft trajectories are modeled as polynomial functions, the proposed algorithm is both sound and complete.

  20. Out-of-Home Placement Decision-Making and Outcomes in Child Welfare: A Longitudinal Study

    PubMed Central

    McClelland, Gary M.; Weiner, Dana A.; Jordan, Neil; Lyons, John S.

    2015-01-01

    After children enter the child welfare system, subsequent out-of-home placement decisions and their impact on children’s well-being are complex and under-researched. This study examined two placement decision-making models: a multidisciplinary team approach, and a decision support algorithm using a standardized assessment. Based on 3,911 placement records in the Illinois child welfare system over 4 years, concordant (agreement) and discordant (disagreement) decisions between the two models were compared. Concordant decisions consistently predicted improvement in children’s well-being regardless of placement type. Discordant decisions showed greater variability. In general, placing children in settings less restrictive than the algorithm suggested (“under-placing”) was associated with less severe baseline functioning but also less improvement over time than placing children according to the algorithm. “Over-placing” children in settings more restrictive than the algorithm recommended was associated with more severe baseline functioning but fewer significant results in rate of improvement than predicted by concordant decisions. The importance of placement decision-making on policy, restrictiveness of placement, and delivery of treatments and services in child welfare are discussed. PMID:24677172

  1. Predicting human protein function with multi-task deep neural networks.

    PubMed

    Fa, Rui; Cozzetto, Domenico; Wan, Cen; Jones, David T

    2018-01-01

    Machine learning methods for protein function prediction are urgently needed, especially now that a substantial fraction of known sequences remains unannotated despite the extensive use of functional assignments based on sequence similarity. One major bottleneck supervised learning faces in protein function prediction is the structured, multi-label nature of the problem, because biological roles are represented by lists of terms from hierarchically organised controlled vocabularies such as the Gene Ontology. In this work, we build on recent developments in the area of deep learning and investigate the usefulness of multi-task deep neural networks (MTDNN), which consist of upstream shared layers upon which are stacked in parallel as many independent modules (additional hidden layers with their own output units) as the number of output GO terms (the tasks). MTDNN learns individual tasks partially using shared representations and partially from task-specific characteristics. When no close homologues with experimentally validated functions can be identified, MTDNN gives more accurate predictions than baseline methods based on annotation frequencies in public databases or homology transfers. More importantly, the results show that MTDNN binary classification accuracy is higher than alternative machine learning-based methods that do not exploit commonalities and differences among prediction tasks. Interestingly, compared with a single-task predictor, the performance improvement is not linearly correlated with the number of tasks in MTDNN, but medium size models provide more improvement in our case. One of advantages of MTDNN is that given a set of features, there is no requirement for MTDNN to have a bootstrap feature selection procedure as what traditional machine learning algorithms do. Overall, the results indicate that the proposed MTDNN algorithm improves the performance of protein function prediction. On the other hand, there is still large room for deep learning techniques to further enhance prediction ability.

  2. More About Vector Adaptive/Predictive Coding Of Speech

    NASA Technical Reports Server (NTRS)

    Jedrey, Thomas C.; Gersho, Allen

    1992-01-01

    Report presents additional information about digital speech-encoding and -decoding system described in "Vector Adaptive/Predictive Encoding of Speech" (NPO-17230). Summarizes development of vector adaptive/predictive coding (VAPC) system and describes basic functions of algorithm. Describes refinements introduced enabling receiver to cope with errors. VAPC algorithm implemented in integrated-circuit coding/decoding processors (codecs). VAPC and other codecs tested under variety of operating conditions. Tests designed to reveal effects of various background quiet and noisy environments and of poor telephone equipment. VAPC found competitive with and, in some respects, superior to other 4.8-kb/s codecs and other codecs of similar complexity.

  3. Research prioritization through prediction of future impact on biomedical science: a position paper on inference-analytics.

    PubMed

    Ganapathiraju, Madhavi K; Orii, Naoki

    2013-08-30

    Advances in biotechnology have created "big-data" situations in molecular and cellular biology. Several sophisticated algorithms have been developed that process big data to generate hundreds of biomedical hypotheses (or predictions). The bottleneck to translating this large number of biological hypotheses is that each of them needs to be studied by experimentation for interpreting its functional significance. Even when the predictions are estimated to be very accurate, from a biologist's perspective, the choice of which of these predictions is to be studied further is made based on factors like availability of reagents and resources and the possibility of formulating some reasonable hypothesis about its biological relevance. When viewed from a global perspective, say from that of a federal funding agency, ideally the choice of which prediction should be studied would be made based on which of them can make the most translational impact. We propose that algorithms be developed to identify which of the computationally generated hypotheses have potential for high translational impact; this way, funding agencies and scientific community can invest resources and drive the research based on a global view of biomedical impact without being deterred by local view of feasibility. In short, data-analytic algorithms analyze big-data and generate hypotheses; in contrast, the proposed inference-analytic algorithms analyze these hypotheses and rank them by predicted biological impact. We demonstrate this through the development of an algorithm to predict biomedical impact of protein-protein interactions (PPIs) which is estimated by the number of future publications that cite the paper which originally reported the PPI. This position paper describes a new computational problem that is relevant in the era of big-data and discusses the challenges that exist in studying this problem, highlighting the need for the scientific community to engage in this line of research. The proposed class of algorithms, namely inference-analytic algorithms, is necessary to ensure that resources are invested in translating those computational outcomes that promise maximum biological impact. Application of this concept to predict biomedical impact of PPIs illustrates not only the concept, but also the challenges in designing these algorithms.

  4. A flexible ontology for inference of emergent whole cell function from relationships between subcellular processes.

    PubMed

    Hansen, Jens; Meretzky, David; Woldesenbet, Simeneh; Stolovitzky, Gustavo; Iyengar, Ravi

    2017-12-18

    Whole cell responses arise from coordinated interactions between diverse human gene products functioning within various pathways underlying sub-cellular processes (SCP). Lower level SCPs interact to form higher level SCPs, often in a context specific manner to give rise to whole cell function. We sought to determine if capturing such relationships enables us to describe the emergence of whole cell functions from interacting SCPs. We developed the Molecular Biology of the Cell Ontology based on standard cell biology and biochemistry textbooks and review articles. Currently, our ontology contains 5,384 genes, 753 SCPs and 19,180 expertly curated gene-SCP associations. Our algorithm to populate the SCPs with genes enables extension of the ontology on demand and the adaption of the ontology to the continuously growing cell biological knowledge. Since whole cell responses most often arise from the coordinated activity of multiple SCPs, we developed a dynamic enrichment algorithm that flexibly predicts SCP-SCP relationships beyond the current taxonomy. This algorithm enables us to identify interactions between SCPs as a basis for higher order function in a context dependent manner, allowing us to provide a detailed description of how SCPs together can give rise to whole cell functions. We conclude that this ontology can, from omics data sets, enable the development of detailed SCP networks for predictive modeling of emergent whole cell functions.

  5. Actigraphy features for predicting mobility disability in older adults

    USDA-ARS?s Scientific Manuscript database

    Actigraphy has attracted much attention for assessing physical activity in the past decade. Many algorithms have been developed to automate the analysis process, but none has targeted a general model to discover related features for detecting or predicting mobility function, or more specifically, mo...

  6. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Simonetto, Andrea; Dall'Anese, Emiliano

    This article develops online algorithms to track solutions of time-varying constrained optimization problems. Particularly, resembling workhorse Kalman filtering-based approaches for dynamical systems, the proposed methods involve prediction-correction steps to provably track the trajectory of the optimal solutions of time-varying convex problems. The merits of existing prediction-correction methods have been shown for unconstrained problems and for setups where computing the inverse of the Hessian of the cost function is computationally affordable. This paper addresses the limitations of existing methods by tackling constrained problems and by designing first-order prediction steps that rely on the Hessian of the cost function (and do notmore » require the computation of its inverse). In addition, the proposed methods are shown to improve the convergence speed of existing prediction-correction methods when applied to unconstrained problems. Numerical simulations corroborate the analytical results and showcase performance and benefits of the proposed algorithms. A realistic application of the proposed method to real-time control of energy resources is presented.« less

  7. Common spaceborne multicomputer operating system and development environment

    NASA Technical Reports Server (NTRS)

    Craymer, L. G.; Lewis, B. F.; Hayes, P. J.; Jones, R. L.

    1994-01-01

    A preliminary technical specification for a multicomputer operating system is developed. The operating system is targeted for spaceborne flight missions and provides a broad range of real-time functionality, dynamic remote code-patching capability, and system fault tolerance and long-term survivability features. Dataflow concepts are used for representing application algorithms. Functional features are included to ensure real-time predictability for a class of algorithms which require data-driven execution on an iterative steady state basis. The development environment supports the development of algorithm code, design of control parameters, performance analysis, simulation of real-time dataflow applications, and compiling and downloading of the resulting application.

  8. A Class of Prediction-Correction Methods for Time-Varying Convex Optimization

    NASA Astrophysics Data System (ADS)

    Simonetto, Andrea; Mokhtari, Aryan; Koppel, Alec; Leus, Geert; Ribeiro, Alejandro

    2016-09-01

    This paper considers unconstrained convex optimization problems with time-varying objective functions. We propose algorithms with a discrete time-sampling scheme to find and track the solution trajectory based on prediction and correction steps, while sampling the problem data at a constant rate of $1/h$, where $h$ is the length of the sampling interval. The prediction step is derived by analyzing the iso-residual dynamics of the optimality conditions. The correction step adjusts for the distance between the current prediction and the optimizer at each time step, and consists either of one or multiple gradient steps or Newton steps, which respectively correspond to the gradient trajectory tracking (GTT) or Newton trajectory tracking (NTT) algorithms. Under suitable conditions, we establish that the asymptotic error incurred by both proposed methods behaves as $O(h^2)$, and in some cases as $O(h^4)$, which outperforms the state-of-the-art error bound of $O(h)$ for correction-only methods in the gradient-correction step. Moreover, when the characteristics of the objective function variation are not available, we propose approximate gradient and Newton tracking algorithms (AGT and ANT, respectively) that still attain these asymptotical error bounds. Numerical simulations demonstrate the practical utility of the proposed methods and that they improve upon existing techniques by several orders of magnitude.

  9. Computational gene expression profiling under salt stress reveals patterns of co-expression

    PubMed Central

    Sanchita; Sharma, Ashok

    2016-01-01

    Plants respond differently to environmental conditions. Among various abiotic stresses, salt stress is a condition where excess salt in soil causes inhibition of plant growth. To understand the response of plants to the stress conditions, identification of the responsible genes is required. Clustering is a data mining technique used to group the genes with similar expression. The genes of a cluster show similar expression and function. We applied clustering algorithms on gene expression data of Solanum tuberosum showing differential expression in Capsicum annuum under salt stress. The clusters, which were common in multiple algorithms were taken further for analysis. Principal component analysis (PCA) further validated the findings of other cluster algorithms by visualizing their clusters in three-dimensional space. Functional annotation results revealed that most of the genes were involved in stress related responses. Our findings suggest that these algorithms may be helpful in the prediction of the function of co-expressed genes. PMID:26981411

  10. Hidden state prediction: a modification of classic ancestral state reconstruction algorithms helps unravel complex symbioses

    PubMed Central

    Zaneveld, Jesse R. R.; Thurber, Rebecca L. V.

    2014-01-01

    Complex symbioses between animal or plant hosts and their associated microbiotas can involve thousands of species and millions of genes. Because of the number of interacting partners, it is often impractical to study all organisms or genes in these host-microbe symbioses individually. Yet new phylogenetic predictive methods can use the wealth of accumulated data on diverse model organisms to make inferences into the properties of less well-studied species and gene families. Predictive functional profiling methods use evolutionary models based on the properties of studied relatives to put bounds on the likely characteristics of an organism or gene that has not yet been studied in detail. These techniques have been applied to predict diverse features of host-associated microbial communities ranging from the enzymatic function of uncharacterized genes to the gene content of uncultured microorganisms. We consider these phylogenetically informed predictive techniques from disparate fields as examples of a general class of algorithms for Hidden State Prediction (HSP), and argue that HSP methods have broad value in predicting organismal traits in a variety of contexts, including the study of complex host-microbe symbioses. PMID:25202302

  11. A dynamical system view of cerebellar function

    NASA Astrophysics Data System (ADS)

    Keeler, James D.

    1990-06-01

    First some previous theories of cerebellar function are reviewed, and deficiencies in how they map onto the neurophysiological structure are pointed out. I hypothesize that the cerebellar cortex builds an internal model, or prediction, of the dynamics of the animal. A class of algorithms for doing prediction based on local reconstruction of attractors are described, and it is shown how this class maps very well onto the structure of the cerebellar cortex. I hypothesize that the climbing fibers multiplex between different trajectories corresponding to different modes of operation. Then the vestibulo-ocular reflex is examined, and experiments to test the proposed model are suggested. The purpose of the presentation here is twofold: (1) To enlighten physiologists to the mathematics of a class of prediction algorithms that map well onto cerebellar architecture. (2) To enlighten dynamical system theorists to the physiological and anatomical details of the cerebellum.

  12. Prediction and Factor Extraction of Drug Function by Analyzing Medical Records in Developing Countries.

    PubMed

    Hu, Min; Nohara, Yasunobu; Nakamura, Masafumi; Nakashima, Naoki

    2017-01-01

    The World Health Organization has declared Bangladesh one of 58 countries facing acute Human Resources for Health (HRH) crisis. Artificial intelligence in healthcare has been shown to be successful for diagnostics. Using machine learning to predict pharmaceutical prescriptions may solve HRH crises. In this study, we investigate a predictive model by analyzing prescription data of 4,543 subjects in Bangladesh. We predict the function of prescribed drugs, comparing three machine-learning approaches. The approaches compare whether a subject shall be prescribed medicine from the 21 most frequently prescribed drug functions. Receiver Operating Characteristics (ROC) were selected as a way to evaluate and assess prediction models. The results show the drug function with the best prediction performance was oral hypoglycemic drugs, which has an average AUC of 0.962. To understand how the variables affect prediction, we conducted factor analysis based on tree-based algorithms and natural language processing techniques.

  13. Linear genetic programming application for successive-station monthly streamflow prediction

    NASA Astrophysics Data System (ADS)

    Danandeh Mehr, Ali; Kahya, Ercan; Yerdelen, Cahit

    2014-09-01

    In recent decades, artificial intelligence (AI) techniques have been pronounced as a branch of computer science to model wide range of hydrological phenomena. A number of researches have been still comparing these techniques in order to find more effective approaches in terms of accuracy and applicability. In this study, we examined the ability of linear genetic programming (LGP) technique to model successive-station monthly streamflow process, as an applied alternative for streamflow prediction. A comparative efficiency study between LGP and three different artificial neural network algorithms, namely feed forward back propagation (FFBP), generalized regression neural networks (GRNN), and radial basis function (RBF), has also been presented in this study. For this aim, firstly, we put forward six different successive-station monthly streamflow prediction scenarios subjected to training by LGP and FFBP using the field data recorded at two gauging stations on Çoruh River, Turkey. Based on Nash-Sutcliffe and root mean squared error measures, we then compared the efficiency of these techniques and selected the best prediction scenario. Eventually, GRNN and RBF algorithms were utilized to restructure the selected scenario and to compare with corresponding FFBP and LGP. Our results indicated the promising role of LGP for successive-station monthly streamflow prediction providing more accurate results than those of all the ANN algorithms. We found an explicit LGP-based expression evolved by only the basic arithmetic functions as the best prediction model for the river, which uses the records of the both target and upstream stations.

  14. Understanding the undelaying mechanism of HA-subtyping in the level of physic-chemical characteristics of protein.

    PubMed

    Ebrahimi, Mansour; Aghagolzadeh, Parisa; Shamabadi, Narges; Tahmasebi, Ahmad; Alsharifi, Mohammed; Adelson, David L; Hemmatzadeh, Farhid; Ebrahimie, Esmaeil

    2014-01-01

    The evolution of the influenza A virus to increase its host range is a major concern worldwide. Molecular mechanisms of increasing host range are largely unknown. Influenza surface proteins play determining roles in reorganization of host-sialic acid receptors and host range. In an attempt to uncover the physic-chemical attributes which govern HA subtyping, we performed a large scale functional analysis of over 7000 sequences of 16 different HA subtypes. Large number (896) of physic-chemical protein characteristics were calculated for each HA sequence. Then, 10 different attribute weighting algorithms were used to find the key characteristics distinguishing HA subtypes. Furthermore, to discover machine leaning models which can predict HA subtypes, various Decision Tree, Support Vector Machine, Naïve Bayes, and Neural Network models were trained on calculated protein characteristics dataset as well as 10 trimmed datasets generated by attribute weighting algorithms. The prediction accuracies of the machine learning methods were evaluated by 10-fold cross validation. The results highlighted the frequency of Gln (selected by 80% of attribute weighting algorithms), percentage/frequency of Tyr, percentage of Cys, and frequencies of Try and Glu (selected by 70% of attribute weighting algorithms) as the key features that are associated with HA subtyping. Random Forest tree induction algorithm and RBF kernel function of SVM (scaled by grid search) showed high accuracy of 98% in clustering and predicting HA subtypes based on protein attributes. Decision tree models were successful in monitoring the short mutation/reassortment paths by which influenza virus can gain the key protein structure of another HA subtype and increase its host range in a short period of time with less energy consumption. Extracting and mining a large number of amino acid attributes of HA subtypes of influenza A virus through supervised algorithms represent a new avenue for understanding and predicting possible future structure of influenza pandemics.

  15. Understanding the Underlying Mechanism of HA-Subtyping in the Level of Physic-Chemical Characteristics of Protein

    PubMed Central

    Ebrahimi, Mansour; Aghagolzadeh, Parisa; Shamabadi, Narges; Tahmasebi, Ahmad; Alsharifi, Mohammed; Adelson, David L.

    2014-01-01

    The evolution of the influenza A virus to increase its host range is a major concern worldwide. Molecular mechanisms of increasing host range are largely unknown. Influenza surface proteins play determining roles in reorganization of host-sialic acid receptors and host range. In an attempt to uncover the physic-chemical attributes which govern HA subtyping, we performed a large scale functional analysis of over 7000 sequences of 16 different HA subtypes. Large number (896) of physic-chemical protein characteristics were calculated for each HA sequence. Then, 10 different attribute weighting algorithms were used to find the key characteristics distinguishing HA subtypes. Furthermore, to discover machine leaning models which can predict HA subtypes, various Decision Tree, Support Vector Machine, Naïve Bayes, and Neural Network models were trained on calculated protein characteristics dataset as well as 10 trimmed datasets generated by attribute weighting algorithms. The prediction accuracies of the machine learning methods were evaluated by 10-fold cross validation. The results highlighted the frequency of Gln (selected by 80% of attribute weighting algorithms), percentage/frequency of Tyr, percentage of Cys, and frequencies of Try and Glu (selected by 70% of attribute weighting algorithms) as the key features that are associated with HA subtyping. Random Forest tree induction algorithm and RBF kernel function of SVM (scaled by grid search) showed high accuracy of 98% in clustering and predicting HA subtypes based on protein attributes. Decision tree models were successful in monitoring the short mutation/reassortment paths by which influenza virus can gain the key protein structure of another HA subtype and increase its host range in a short period of time with less energy consumption. Extracting and mining a large number of amino acid attributes of HA subtypes of influenza A virus through supervised algorithms represent a new avenue for understanding and predicting possible future structure of influenza pandemics. PMID:24809455

  16. Robust and Adaptive Online Time Series Prediction with Long Short-Term Memory

    PubMed Central

    Tao, Qing

    2017-01-01

    Online time series prediction is the mainstream method in a wide range of fields, ranging from speech analysis and noise cancelation to stock market analysis. However, the data often contains many outliers with the increasing length of time series in real world. These outliers can mislead the learned model if treated as normal points in the process of prediction. To address this issue, in this paper, we propose a robust and adaptive online gradient learning method, RoAdam (Robust Adam), for long short-term memory (LSTM) to predict time series with outliers. This method tunes the learning rate of the stochastic gradient algorithm adaptively in the process of prediction, which reduces the adverse effect of outliers. It tracks the relative prediction error of the loss function with a weighted average through modifying Adam, a popular stochastic gradient method algorithm for training deep neural networks. In our algorithm, the large value of the relative prediction error corresponds to a small learning rate, and vice versa. The experiments on both synthetic data and real time series show that our method achieves better performance compared to the existing methods based on LSTM. PMID:29391864

  17. Robust and Adaptive Online Time Series Prediction with Long Short-Term Memory.

    PubMed

    Yang, Haimin; Pan, Zhisong; Tao, Qing

    2017-01-01

    Online time series prediction is the mainstream method in a wide range of fields, ranging from speech analysis and noise cancelation to stock market analysis. However, the data often contains many outliers with the increasing length of time series in real world. These outliers can mislead the learned model if treated as normal points in the process of prediction. To address this issue, in this paper, we propose a robust and adaptive online gradient learning method, RoAdam (Robust Adam), for long short-term memory (LSTM) to predict time series with outliers. This method tunes the learning rate of the stochastic gradient algorithm adaptively in the process of prediction, which reduces the adverse effect of outliers. It tracks the relative prediction error of the loss function with a weighted average through modifying Adam, a popular stochastic gradient method algorithm for training deep neural networks. In our algorithm, the large value of the relative prediction error corresponds to a small learning rate, and vice versa. The experiments on both synthetic data and real time series show that our method achieves better performance compared to the existing methods based on LSTM.

  18. MSPocket: an orientation-independent algorithm for the detection of ligand binding pockets.

    PubMed

    Zhu, Hongbo; Pisabarro, M Teresa

    2011-02-01

    Identification of ligand binding pockets on proteins is crucial for the characterization of protein functions. It provides valuable information for protein-ligand docking and rational engineering of small molecules that regulate protein functions. A major number of current prediction algorithms of ligand binding pockets are based on cubic grid representation of proteins and, thus, the results are often protein orientation dependent. We present the MSPocket program for detecting pockets on the solvent excluded surface of proteins. The core algorithm of the MSPocket approach does not use any cubic grid system to represent proteins and is therefore independent of protein orientations. We demonstrate that MSPocket is able to achieve an accuracy of 75% in predicting ligand binding pockets on a test dataset used for evaluating several existing methods. The accuracy is 92% if the top three predictions are considered. Comparison to one of the recently published best performing methods shows that MSPocket reaches similar performance with the additional feature of being protein orientation independent. Interestingly, some of the predictions are different, meaning that the two methods can be considered complementary and combined to achieve better prediction accuracy. MSPocket also provides a graphical user interface for interactive investigation of the predicted ligand binding pockets. In addition, we show that overlap criterion is a better strategy for the evaluation of predicted ligand binding pockets than the single point distance criterion. The MSPocket source code can be downloaded from http://appserver.biotec.tu-dresden.de/MSPocket/. MSPocket is also available as a PyMOL plugin with a graphical user interface.

  19. Protein complex prediction for large protein protein interaction networks with the Core&Peel method.

    PubMed

    Pellegrini, Marco; Baglioni, Miriam; Geraci, Filippo

    2016-11-08

    Biological networks play an increasingly important role in the exploration of functional modularity and cellular organization at a systemic level. Quite often the first tools used to analyze these networks are clustering algorithms. We concentrate here on the specific task of predicting protein complexes (PC) in large protein-protein interaction networks (PPIN). Currently, many state-of-the-art algorithms work well for networks of small or moderate size. However, their performance on much larger networks, which are becoming increasingly common in modern proteome-wise studies, needs to be re-assessed. We present a new fast algorithm for clustering large sparse networks: Core&Peel, which runs essentially in time and storage O(a(G)m+n) for a network G of n nodes and m arcs, where a(G) is the arboricity of G (which is roughly proportional to the maximum average degree of any induced subgraph in G). We evaluated Core&Peel on five PPI networks of large size and one of medium size from both yeast and homo sapiens, comparing its performance against those of ten state-of-the-art methods. We demonstrate that Core&Peel consistently outperforms the ten competitors in its ability to identify known protein complexes and in the functional coherence of its predictions. Our method is remarkably robust, being quite insensible to the injection of random interactions. Core&Peel is also empirically efficient attaining the second best running time over large networks among the tested algorithms. Our algorithm Core&Peel pushes forward the state-of the-art in PPIN clustering providing an algorithmic solution with polynomial running time that attains experimentally demonstrable good output quality and speed on challenging large real networks.

  20. RNA secondary structure prediction with pseudoknots: Contribution of algorithm versus energy model.

    PubMed

    Jabbari, Hosna; Wark, Ian; Montemagno, Carlo

    2018-01-01

    RNA is a biopolymer with various applications inside the cell and in biotechnology. Structure of an RNA molecule mainly determines its function and is essential to guide nanostructure design. Since experimental structure determination is time-consuming and expensive, accurate computational prediction of RNA structure is of great importance. Prediction of RNA secondary structure is relatively simpler than its tertiary structure and provides information about its tertiary structure, therefore, RNA secondary structure prediction has received attention in the past decades. Numerous methods with different folding approaches have been developed for RNA secondary structure prediction. While methods for prediction of RNA pseudoknot-free structure (structures with no crossing base pairs) have greatly improved in terms of their accuracy, methods for prediction of RNA pseudoknotted secondary structure (structures with crossing base pairs) still have room for improvement. A long-standing question for improving the prediction accuracy of RNA pseudoknotted secondary structure is whether to focus on the prediction algorithm or the underlying energy model, as there is a trade-off on computational cost of the prediction algorithm versus the generality of the method. The aim of this work is to argue when comparing different methods for RNA pseudoknotted structure prediction, the combination of algorithm and energy model should be considered and a method should not be considered superior or inferior to others if they do not use the same scoring model. We demonstrate that while the folding approach is important in structure prediction, it is not the only important factor in prediction accuracy of a given method as the underlying energy model is also as of great value. Therefore we encourage researchers to pay particular attention in comparing methods with different energy models.

  1. Connectome-based predictive modeling of attention: Comparing different functional connectivity features and prediction methods across datasets.

    PubMed

    Yoo, Kwangsun; Rosenberg, Monica D; Hsu, Wei-Ting; Zhang, Sheng; Li, Chiang-Shan R; Scheinost, Dustin; Constable, R Todd; Chun, Marvin M

    2018-02-15

    Connectome-based predictive modeling (CPM; Finn et al., 2015; Shen et al., 2017) was recently developed to predict individual differences in traits and behaviors, including fluid intelligence (Finn et al., 2015) and sustained attention (Rosenberg et al., 2016a), from functional brain connectivity (FC) measured with fMRI. Here, using the CPM framework, we compared the predictive power of three different measures of FC (Pearson's correlation, accordance, and discordance) and two different prediction algorithms (linear and partial least square [PLS] regression) for attention function. Accordance and discordance are recently proposed FC measures that respectively track in-phase synchronization and out-of-phase anti-correlation (Meskaldji et al., 2015). We defined connectome-based models using task-based or resting-state FC data, and tested the effects of (1) functional connectivity measure and (2) feature-selection/prediction algorithm on individualized attention predictions. Models were internally validated in a training dataset using leave-one-subject-out cross-validation, and externally validated with three independent datasets. The training dataset included fMRI data collected while participants performed a sustained attention task and rested (N = 25; Rosenberg et al., 2016a). The validation datasets included: 1) data collected during performance of a stop-signal task and at rest (N = 83, including 19 participants who were administered methylphenidate prior to scanning; Farr et al., 2014a; Rosenberg et al., 2016b), 2) data collected during Attention Network Task performance and rest (N = 41, Rosenberg et al., in press), and 3) resting-state data and ADHD symptom severity from the ADHD-200 Consortium (N = 113; Rosenberg et al., 2016a). Models defined using all combinations of functional connectivity measure (Pearson's correlation, accordance, and discordance) and prediction algorithm (linear and PLS regression) predicted attentional abilities, with correlations between predicted and observed measures of attention as high as 0.9 for internal validation, and 0.6 for external validation (all p's < 0.05). Models trained on task data outperformed models trained on rest data. Pearson's correlation and accordance features generally showed a small numerical advantage over discordance features, while PLS regression models were usually better than linear regression models. Overall, in addition to correlation features combined with linear models (Rosenberg et al., 2016a), it is useful to consider accordance features and PLS regression for CPM. Copyright © 2017 Elsevier Inc. All rights reserved.

  2. Uniform, optimal signal processing of mapped deep-sequencing data.

    PubMed

    Kumar, Vibhor; Muratani, Masafumi; Rayan, Nirmala Arul; Kraus, Petra; Lufkin, Thomas; Ng, Huck Hui; Prabhakar, Shyam

    2013-07-01

    Despite their apparent diversity, many problems in the analysis of high-throughput sequencing data are merely special cases of two general problems, signal detection and signal estimation. Here we adapt formally optimal solutions from signal processing theory to analyze signals of DNA sequence reads mapped to a genome. We describe DFilter, a detection algorithm that identifies regulatory features in ChIP-seq, DNase-seq and FAIRE-seq data more accurately than assay-specific algorithms. We also describe EFilter, an estimation algorithm that accurately predicts mRNA levels from as few as 1-2 histone profiles (R ∼0.9). Notably, the presence of regulatory motifs in promoters correlates more with histone modifications than with mRNA levels, suggesting that histone profiles are more predictive of cis-regulatory mechanisms. We show by applying DFilter and EFilter to embryonic forebrain ChIP-seq data that regulatory protein identification and functional annotation are feasible despite tissue heterogeneity. The mathematical formalism underlying our tools facilitates integrative analysis of data from virtually any sequencing-based functional profile.

  3. Equilibrium Propagation: Bridging the Gap between Energy-Based Models and Backpropagation

    PubMed Central

    Scellier, Benjamin; Bengio, Yoshua

    2017-01-01

    We introduce Equilibrium Propagation, a learning framework for energy-based models. It involves only one kind of neural computation, performed in both the first phase (when the prediction is made) and the second phase of training (after the target or prediction error is revealed). Although this algorithm computes the gradient of an objective function just like Backpropagation, it does not need a special computation or circuit for the second phase, where errors are implicitly propagated. Equilibrium Propagation shares similarities with Contrastive Hebbian Learning and Contrastive Divergence while solving the theoretical issues of both algorithms: our algorithm computes the gradient of a well-defined objective function. Because the objective function is defined in terms of local perturbations, the second phase of Equilibrium Propagation corresponds to only nudging the prediction (fixed point or stationary distribution) toward a configuration that reduces prediction error. In the case of a recurrent multi-layer supervised network, the output units are slightly nudged toward their target in the second phase, and the perturbation introduced at the output layer propagates backward in the hidden layers. We show that the signal “back-propagated” during this second phase corresponds to the propagation of error derivatives and encodes the gradient of the objective function, when the synaptic update corresponds to a standard form of spike-timing dependent plasticity. This work makes it more plausible that a mechanism similar to Backpropagation could be implemented by brains, since leaky integrator neural computation performs both inference and error back-propagation in our model. The only local difference between the two phases is whether synaptic changes are allowed or not. We also show experimentally that multi-layer recurrently connected networks with 1, 2, and 3 hidden layers can be trained by Equilibrium Propagation on the permutation-invariant MNIST task. PMID:28522969

  4. Fundamental Algorithms of the Goddard Battery Model

    NASA Technical Reports Server (NTRS)

    Jagielski, J. M.

    1985-01-01

    The Goddard Space Flight Center (GSFC) is currently producing a computer model to predict Nickel Cadmium (NiCd) performance in a Low Earth Orbit (LEO) cycling regime. The model proper is currently still in development, but the inherent, fundamental algorithms (or methodologies) of the model are defined. At present, the model is closely dependent on empirical data and the data base currently used is of questionable accuracy. Even so, very good correlations have been determined between model predictions and actual cycling data. A more accurate and encompassing data base has been generated to serve dual functions: show the limitations of the current data base, and be inbred in the model properly for more accurate predictions. The fundamental algorithms of the model, and the present data base and its limitations, are described and a brief preliminary analysis of the new data base and its verification of the model's methodology are presented.

  5. A learning-based autonomous driver: emulate human driver's intelligence in low-speed car following

    NASA Astrophysics Data System (ADS)

    Wei, Junqing; Dolan, John M.; Litkouhi, Bakhtiar

    2010-04-01

    In this paper, an offline learning mechanism based on the genetic algorithm is proposed for autonomous vehicles to emulate human driver behaviors. The autonomous driving ability is implemented based on a Prediction- and Cost function-Based algorithm (PCB). PCB is designed to emulate a human driver's decision process, which is modeled as traffic scenario prediction and evaluation. This paper focuses on using a learning algorithm to optimize PCB with very limited training data, so that PCB can have the ability to predict and evaluate traffic scenarios similarly to human drivers. 80 seconds of human driving data was collected in low-speed (< 30miles/h) car-following scenarios. In the low-speed car-following tests, PCB was able to perform more human-like carfollowing after learning. A more general 120 kilometer-long simulation showed that PCB performs robustly even in scenarios that are not part of the training set.

  6. Applications of information theory, genetic algorithms, and neural models to predict oil flow

    NASA Astrophysics Data System (ADS)

    Ludwig, Oswaldo; Nunes, Urbano; Araújo, Rui; Schnitman, Leizer; Lepikson, Herman Augusto

    2009-07-01

    This work introduces a new information-theoretic methodology for choosing variables and their time lags in a prediction setting, particularly when neural networks are used in non-linear modeling. The first contribution of this work is the Cross Entropy Function (XEF) proposed to select input variables and their lags in order to compose the input vector of black-box prediction models. The proposed XEF method is more appropriate than the usually applied Cross Correlation Function (XCF) when the relationship among the input and output signals comes from a non-linear dynamic system. The second contribution is a method that minimizes the Joint Conditional Entropy (JCE) between the input and output variables by means of a Genetic Algorithm (GA). The aim is to take into account the dependence among the input variables when selecting the most appropriate set of inputs for a prediction problem. In short, theses methods can be used to assist the selection of input training data that have the necessary information to predict the target data. The proposed methods are applied to a petroleum engineering problem; predicting oil production. Experimental results obtained with a real-world dataset are presented demonstrating the feasibility and effectiveness of the method.

  7. Gaussian Process Regression (GPR) Representation in Predictive Model Markup Language (PMML)

    PubMed Central

    Lechevalier, D.; Ak, R.; Ferguson, M.; Law, K. H.; Lee, Y.-T. T.; Rachuri, S.

    2017-01-01

    This paper describes Gaussian process regression (GPR) models presented in predictive model markup language (PMML). PMML is an extensible-markup-language (XML) -based standard language used to represent data-mining and predictive analytic models, as well as pre- and post-processed data. The previous PMML version, PMML 4.2, did not provide capabilities for representing probabilistic (stochastic) machine-learning algorithms that are widely used for constructing predictive models taking the associated uncertainties into consideration. The newly released PMML version 4.3, which includes the GPR model, provides new features: confidence bounds and distribution for the predictive estimations. Both features are needed to establish the foundation for uncertainty quantification analysis. Among various probabilistic machine-learning algorithms, GPR has been widely used for approximating a target function because of its capability of representing complex input and output relationships without predefining a set of basis functions, and predicting a target output with uncertainty quantification. GPR is being employed to various manufacturing data-analytics applications, which necessitates representing this model in a standardized form for easy and rapid employment. In this paper, we present a GPR model and its representation in PMML. Furthermore, we demonstrate a prototype using a real data set in the manufacturing domain. PMID:29202125

  8. Gaussian Process Regression (GPR) Representation in Predictive Model Markup Language (PMML).

    PubMed

    Park, J; Lechevalier, D; Ak, R; Ferguson, M; Law, K H; Lee, Y-T T; Rachuri, S

    2017-01-01

    This paper describes Gaussian process regression (GPR) models presented in predictive model markup language (PMML). PMML is an extensible-markup-language (XML) -based standard language used to represent data-mining and predictive analytic models, as well as pre- and post-processed data. The previous PMML version, PMML 4.2, did not provide capabilities for representing probabilistic (stochastic) machine-learning algorithms that are widely used for constructing predictive models taking the associated uncertainties into consideration. The newly released PMML version 4.3, which includes the GPR model, provides new features: confidence bounds and distribution for the predictive estimations. Both features are needed to establish the foundation for uncertainty quantification analysis. Among various probabilistic machine-learning algorithms, GPR has been widely used for approximating a target function because of its capability of representing complex input and output relationships without predefining a set of basis functions, and predicting a target output with uncertainty quantification. GPR is being employed to various manufacturing data-analytics applications, which necessitates representing this model in a standardized form for easy and rapid employment. In this paper, we present a GPR model and its representation in PMML. Furthermore, we demonstrate a prototype using a real data set in the manufacturing domain.

  9. Transcriptional network inference from functional similarity and expression data: a global supervised approach.

    PubMed

    Ambroise, Jérôme; Robert, Annie; Macq, Benoit; Gala, Jean-Luc

    2012-01-06

    An important challenge in system biology is the inference of biological networks from postgenomic data. Among these biological networks, a gene transcriptional regulatory network focuses on interactions existing between transcription factors (TFs) and and their corresponding target genes. A large number of reverse engineering algorithms were proposed to infer such networks from gene expression profiles, but most current methods have relatively low predictive performances. In this paper, we introduce the novel TNIFSED method (Transcriptional Network Inference from Functional Similarity and Expression Data), that infers a transcriptional network from the integration of correlations and partial correlations of gene expression profiles and gene functional similarities through a supervised classifier. In the current work, TNIFSED was applied to predict the transcriptional network in Escherichia coli and in Saccharomyces cerevisiae, using datasets of 445 and 170 affymetrix arrays, respectively. Using the area under the curve of the receiver operating characteristics and the F-measure as indicators, we showed the predictive performance of TNIFSED to be better than unsupervised state-of-the-art methods. TNIFSED performed slightly worse than the supervised SIRENE algorithm for the target genes identification of the TF having a wide range of yet identified target genes but better for TF having only few identified target genes. Our results indicate that TNIFSED is complementary to the SIRENE algorithm, and particularly suitable to discover target genes of "orphan" TFs.

  10. Combining classifiers to predict gene function in Arabidopsis thaliana using large-scale gene expression measurements.

    PubMed

    Lan, Hui; Carson, Rachel; Provart, Nicholas J; Bonner, Anthony J

    2007-09-21

    Arabidopsis thaliana is the model species of current plant genomic research with a genome size of 125 Mb and approximately 28,000 genes. The function of half of these genes is currently unknown. The purpose of this study is to infer gene function in Arabidopsis using machine-learning algorithms applied to large-scale gene expression data sets, with the goal of identifying genes that are potentially involved in plant response to abiotic stress. Using in house and publicly available data, we assembled a large set of gene expression measurements for A. thaliana. Using those genes of known function, we first evaluated and compared the ability of basic machine-learning algorithms to predict which genes respond to stress. Predictive accuracy was measured using ROC50 and precision curves derived through cross validation. To improve accuracy, we developed a method for combining these classifiers using a weighted-voting scheme. The combined classifier was then trained on genes of known function and applied to genes of unknown function, identifying genes that potentially respond to stress. Visual evidence corroborating the predictions was obtained using electronic Northern analysis. Three of the predicted genes were chosen for biological validation. Gene knockout experiments confirmed that all three are involved in a variety of stress responses. The biological analysis of one of these genes (At1g16850) is presented here, where it is shown to be necessary for the normal response to temperature and NaCl. Supervised learning methods applied to large-scale gene expression measurements can be used to predict gene function. However, the ability of basic learning methods to predict stress response varies widely and depends heavily on how much dimensionality reduction is used. Our method of combining classifiers can improve the accuracy of such predictions - in this case, predictions of genes involved in stress response in plants - and it effectively chooses the appropriate amount of dimensionality reduction automatically. The method provides a useful means of identifying genes in A. thaliana that potentially respond to stress, and we expect it would be useful in other organisms and for other gene functions.

  11. Solar-cell interconnect design for terrestrial photovoltaic modules

    NASA Technical Reports Server (NTRS)

    Mon, G. R.; Moore, D. M.; Ross, R. G., Jr.

    1984-01-01

    Useful solar cell interconnect reliability design and life prediction algorithms are presented, together with experimental data indicating that the classical strain cycle (fatigue) curve for the interconnect material does not account for the statistical scatter that is required in reliability predictions. This shortcoming is presently addressed by fitting a functional form to experimental cumulative interconnect failure rate data, which thereby yields statistical fatigue curves enabling not only the prediction of cumulative interconnect failures during the design life of an array field, but also the quantitative interpretation of data from accelerated thermal cycling tests. Optimal interconnect cost reliability design algorithms are also derived which may allow the minimization of energy cost over the design life of the array field.

  12. Solar-cell interconnect design for terrestrial photovoltaic modules

    NASA Astrophysics Data System (ADS)

    Mon, G. R.; Moore, D. M.; Ross, R. G., Jr.

    1984-11-01

    Useful solar cell interconnect reliability design and life prediction algorithms are presented, together with experimental data indicating that the classical strain cycle (fatigue) curve for the interconnect material does not account for the statistical scatter that is required in reliability predictions. This shortcoming is presently addressed by fitting a functional form to experimental cumulative interconnect failure rate data, which thereby yields statistical fatigue curves enabling not only the prediction of cumulative interconnect failures during the design life of an array field, but also the quantitative interpretation of data from accelerated thermal cycling tests. Optimal interconnect cost reliability design algorithms are also derived which may allow the minimization of energy cost over the design life of the array field.

  13. A novel neural-inspired learning algorithm with application to clinical risk prediction.

    PubMed

    Tay, Darwin; Poh, Chueh Loo; Kitney, Richard I

    2015-04-01

    Clinical risk prediction - the estimation of the likelihood an individual is at risk of a disease - is a coveted and exigent clinical task, and a cornerstone to the recommendation of life saving management strategies. This is especially important for individuals at risk of cardiovascular disease (CVD) given the fact that it is the leading causes of death in many developed counties. To this end, we introduce a novel learning algorithm - a key factor that influences the performance of machine learning-based prediction models - and utilities it to develop CVD risk prediction tool. This novel neural-inspired algorithm, called the Artificial Neural Cell System for classification (ANCSc), is inspired by mechanisms that develop the brain and empowering it with capabilities such as information processing/storage and recall, decision making and initiating actions on external environment. Specifically, we exploit on 3 natural neural mechanisms responsible for developing and enriching the brain - namely neurogenesis, neuroplasticity via nurturing and apoptosis - when implementing ANCSc algorithm. Benchmark testing was conducted using the Honolulu Heart Program (HHP) dataset and results are juxtaposed with 2 other algorithms - i.e. Support Vector Machine (SVM) and Evolutionary Data-Conscious Artificial Immune Recognition System (EDC-AIRS). Empirical experiments indicate that ANCSc algorithm (statistically) outperforms both SVM and EDC-AIRS algorithms. Key clinical markers identified by ANCSc algorithm include risk factors related to diet/lifestyle, pulmonary function, personal/family/medical history, blood data, blood pressure, and electrocardiography. These clinical markers, in general, are also found to be clinically significant - providing a promising avenue for identifying potential cardiovascular risk factors to be evaluated in clinical trials. Copyright © 2015 Elsevier Inc. All rights reserved.

  14. Defining probabilities of bowel resection in deep endometriosis of the rectum: Prediction with preoperative magnetic resonance imaging.

    PubMed

    Perandini, Alessio; Perandini, Simone; Montemezzi, Stefania; Bonin, Cecilia; Bellini, Gaia; Bergamini, Valentino

    2018-02-01

    Deep endometriosis of the rectum is a highly challenging disease, and a surgical approach is often needed to restore anatomy and function. Two kinds of surgeries may be performed: radical with segmental bowel resection or conservative without resection. Most patients undergo magnetic resonance imaging (MRI) before surgery, but there is currently no method to predict if conservative surgery is feasible or whether bowel resection is required. The aim of this study was to create an algorithm that could predict bowel resection using MRI images, that was easy to apply and could be useful in a clinical setting, in order to adequately discuss informed consent with the patient and plan the an appropriate and efficient surgical session. We collected medical records from 2010 to 2016 and reviewed the MRI results of 52 patients to detect any parameters that could predict bowel resection. Parameters that were reproducible and with a significant correlation to radical surgery were investigated by statistical regression and combined in an algorithm to give the best prediction of resection. The calculation of two parameters in MRI, impact angle and lesion size, and their use in a mathematical algorithm permit us to predict bowel resection with a positive predictive value of 87% and a negative predictive value of 83%. MRI could be of value in predicting the need for bowel resection in deep endometriosis of the rectum. Further research is required to assess the possibility of a wider application of this algorithm outside our single-center study. © 2017 Japan Society of Obstetrics and Gynecology.

  15. A large-scale evaluation of computational protein function prediction

    PubMed Central

    Radivojac, Predrag; Clark, Wyatt T; Ronnen Oron, Tal; Schnoes, Alexandra M; Wittkop, Tobias; Sokolov, Artem; Graim, Kiley; Funk, Christopher; Verspoor, Karin; Ben-Hur, Asa; Pandey, Gaurav; Yunes, Jeffrey M; Talwalkar, Ameet S; Repo, Susanna; Souza, Michael L; Piovesan, Damiano; Casadio, Rita; Wang, Zheng; Cheng, Jianlin; Fang, Hai; Gough, Julian; Koskinen, Patrik; Törönen, Petri; Nokso-Koivisto, Jussi; Holm, Liisa; Cozzetto, Domenico; Buchan, Daniel W A; Bryson, Kevin; Jones, David T; Limaye, Bhakti; Inamdar, Harshal; Datta, Avik; Manjari, Sunitha K; Joshi, Rajendra; Chitale, Meghana; Kihara, Daisuke; Lisewski, Andreas M; Erdin, Serkan; Venner, Eric; Lichtarge, Olivier; Rentzsch, Robert; Yang, Haixuan; Romero, Alfonso E; Bhat, Prajwal; Paccanaro, Alberto; Hamp, Tobias; Kassner, Rebecca; Seemayer, Stefan; Vicedo, Esmeralda; Schaefer, Christian; Achten, Dominik; Auer, Florian; Böhm, Ariane; Braun, Tatjana; Hecht, Maximilian; Heron, Mark; Hönigschmid, Peter; Hopf, Thomas; Kaufmann, Stefanie; Kiening, Michael; Krompass, Denis; Landerer, Cedric; Mahlich, Yannick; Roos, Manfred; Björne, Jari; Salakoski, Tapio; Wong, Andrew; Shatkay, Hagit; Gatzmann, Fanny; Sommer, Ingolf; Wass, Mark N; Sternberg, Michael J E; Škunca, Nives; Supek, Fran; Bošnjak, Matko; Panov, Panče; Džeroski, Sašo; Šmuc, Tomislav; Kourmpetis, Yiannis A I; van Dijk, Aalt D J; ter Braak, Cajo J F; Zhou, Yuanpeng; Gong, Qingtian; Dong, Xinran; Tian, Weidong; Falda, Marco; Fontana, Paolo; Lavezzo, Enrico; Di Camillo, Barbara; Toppo, Stefano; Lan, Liang; Djuric, Nemanja; Guo, Yuhong; Vucetic, Slobodan; Bairoch, Amos; Linial, Michal; Babbitt, Patricia C; Brenner, Steven E; Orengo, Christine; Rost, Burkhard; Mooney, Sean D; Friedberg, Iddo

    2013-01-01

    Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based Critical Assessment of protein Function Annotation (CAFA) experiment. Fifty-four methods representing the state-of-the-art for protein function prediction were evaluated on a target set of 866 proteins from eleven organisms. Two findings stand out: (i) today’s best protein function prediction algorithms significantly outperformed widely-used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is significant need for improvement of currently available tools. PMID:23353650

  16. Decoding the encoding of functional brain networks: An fMRI classification comparison of non-negative matrix factorization (NMF), independent component analysis (ICA), and sparse coding algorithms.

    PubMed

    Xie, Jianwen; Douglas, Pamela K; Wu, Ying Nian; Brody, Arthur L; Anderson, Ariana E

    2017-04-15

    Brain networks in fMRI are typically identified using spatial independent component analysis (ICA), yet other mathematical constraints provide alternate biologically-plausible frameworks for generating brain networks. Non-negative matrix factorization (NMF) would suppress negative BOLD signal by enforcing positivity. Spatial sparse coding algorithms (L1 Regularized Learning and K-SVD) would impose local specialization and a discouragement of multitasking, where the total observed activity in a single voxel originates from a restricted number of possible brain networks. The assumptions of independence, positivity, and sparsity to encode task-related brain networks are compared; the resulting brain networks within scan for different constraints are used as basis functions to encode observed functional activity. These encodings are then decoded using machine learning, by using the time series weights to predict within scan whether a subject is viewing a video, listening to an audio cue, or at rest, in 304 fMRI scans from 51 subjects. The sparse coding algorithm of L1 Regularized Learning outperformed 4 variations of ICA (p<0.001) for predicting the task being performed within each scan using artifact-cleaned components. The NMF algorithms, which suppressed negative BOLD signal, had the poorest accuracy compared to the ICA and sparse coding algorithms. Holding constant the effect of the extraction algorithm, encodings using sparser spatial networks (containing more zero-valued voxels) had higher classification accuracy (p<0.001). Lower classification accuracy occurred when the extracted spatial maps contained more CSF regions (p<0.001). The success of sparse coding algorithms suggests that algorithms which enforce sparsity, discourage multitasking, and promote local specialization may capture better the underlying source processes than those which allow inexhaustible local processes such as ICA. Negative BOLD signal may capture task-related activations. Copyright © 2017 Elsevier B.V. All rights reserved.

  17. High-order Newton-penalty algorithms

    NASA Astrophysics Data System (ADS)

    Dussault, Jean-Pierre

    2005-10-01

    Recent efforts in differentiable non-linear programming have been focused on interior point methods, akin to penalty and barrier algorithms. In this paper, we address the classical equality constrained program solved using the simple quadratic loss penalty function/algorithm. The suggestion to use extrapolations to track the differentiable trajectory associated with penalized subproblems goes back to the classic monograph of Fiacco & McCormick. This idea was further developed by Gould who obtained a two-steps quadratically convergent algorithm using prediction steps and Newton correction. Dussault interpreted the prediction step as a combined extrapolation with respect to the penalty parameter and the residual of the first order optimality conditions. Extrapolation with respect to the residual coincides with a Newton step.We explore here higher-order extrapolations, thus higher-order Newton-like methods. We first consider high-order variants of the Newton-Raphson method applied to non-linear systems of equations. Next, we obtain improved asymptotic convergence results for the quadratic loss penalty algorithm by using high-order extrapolation steps.

  18. Comparison of genetic algorithm and imperialist competitive algorithms in predicting bed load transport in clean pipe.

    PubMed

    Ebtehaj, Isa; Bonakdari, Hossein

    2014-01-01

    The existence of sediments in wastewater greatly affects the performance of the sewer and wastewater transmission systems. Increased sedimentation in wastewater collection systems causes problems such as reduced transmission capacity and early combined sewer overflow. The article reviews the performance of the genetic algorithm (GA) and imperialist competitive algorithm (ICA) in minimizing the target function (mean square error of observed and predicted Froude number). To study the impact of bed load transport parameters, using four non-dimensional groups, six different models have been presented. Moreover, the roulette wheel selection method is used to select the parents. The ICA with root mean square error (RMSE) = 0.007, mean absolute percentage error (MAPE) = 3.5% show better results than GA (RMSE = 0.007, MAPE = 5.6%) for the selected model. All six models return better results than the GA. Also, the results of these two algorithms were compared with multi-layer perceptron and existing equations.

  19. NegGOA: negative GO annotations selection using ontology structure.

    PubMed

    Fu, Guangyuan; Wang, Jun; Yang, Bo; Yu, Guoxian

    2016-10-01

    Predicting the biological functions of proteins is one of the key challenges in the post-genomic era. Computational models have demonstrated the utility of applying machine learning methods to predict protein function. Most prediction methods explicitly require a set of negative examples-proteins that are known not carrying out a particular function. However, Gene Ontology (GO) almost always only provides the knowledge that proteins carry out a particular function, and functional annotations of proteins are incomplete. GO structurally organizes more than tens of thousands GO terms and a protein is annotated with several (or dozens) of these terms. For these reasons, the negative examples of a protein can greatly help distinguishing true positive examples of the protein from such a large candidate GO space. In this paper, we present a novel approach (called NegGOA) to select negative examples. Specifically, NegGOA takes advantage of the ontology structure, available annotations and potentiality of additional annotations of a protein to choose negative examples of the protein. We compare NegGOA with other negative examples selection algorithms and find that NegGOA produces much fewer false negatives than them. We incorporate the selected negative examples into an efficient function prediction model to predict the functions of proteins in Yeast, Human, Mouse and Fly. NegGOA also demonstrates improved accuracy than these comparing algorithms across various evaluation metrics. In addition, NegGOA is less suffered from incomplete annotations of proteins than these comparing methods. The Matlab and R codes are available at https://sites.google.com/site/guoxian85/neggoa gxyu@swu.edu.cn Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  20. Optimal Design of Low-Density SNP Arrays for Genomic Prediction: Algorithm and Applications.

    PubMed

    Wu, Xiao-Lin; Xu, Jiaqi; Feng, Guofei; Wiggans, George R; Taylor, Jeremy F; He, Jun; Qian, Changsong; Qiu, Jiansheng; Simpson, Barry; Walker, Jeremy; Bauck, Stewart

    2016-01-01

    Low-density (LD) single nucleotide polymorphism (SNP) arrays provide a cost-effective solution for genomic prediction and selection, but algorithms and computational tools are needed for the optimal design of LD SNP chips. A multiple-objective, local optimization (MOLO) algorithm was developed for design of optimal LD SNP chips that can be imputed accurately to medium-density (MD) or high-density (HD) SNP genotypes for genomic prediction. The objective function facilitates maximization of non-gap map length and system information for the SNP chip, and the latter is computed either as locus-averaged (LASE) or haplotype-averaged Shannon entropy (HASE) and adjusted for uniformity of the SNP distribution. HASE performed better than LASE with ≤1,000 SNPs, but required considerably more computing time. Nevertheless, the differences diminished when >5,000 SNPs were selected. Optimization was accomplished conditionally on the presence of SNPs that were obligated to each chromosome. The frame location of SNPs on a chip can be either uniform (evenly spaced) or non-uniform. For the latter design, a tunable empirical Beta distribution was used to guide location distribution of frame SNPs such that both ends of each chromosome were enriched with SNPs. The SNP distribution on each chromosome was finalized through the objective function that was locally and empirically maximized. This MOLO algorithm was capable of selecting a set of approximately evenly-spaced and highly-informative SNPs, which in turn led to increased imputation accuracy compared with selection solely of evenly-spaced SNPs. Imputation accuracy increased with LD chip size, and imputation error rate was extremely low for chips with ≥3,000 SNPs. Assuming that genotyping or imputation error occurs at random, imputation error rate can be viewed as the upper limit for genomic prediction error. Our results show that about 25% of imputation error rate was propagated to genomic prediction in an Angus population. The utility of this MOLO algorithm was also demonstrated in a real application, in which a 6K SNP panel was optimized conditional on 5,260 obligatory SNP selected based on SNP-trait association in U.S. Holstein animals. With this MOLO algorithm, both imputation error rate and genomic prediction error rate were minimal.

  1. Optimal Design of Low-Density SNP Arrays for Genomic Prediction: Algorithm and Applications

    PubMed Central

    Wu, Xiao-Lin; Xu, Jiaqi; Feng, Guofei; Wiggans, George R.; Taylor, Jeremy F.; He, Jun; Qian, Changsong; Qiu, Jiansheng; Simpson, Barry; Walker, Jeremy; Bauck, Stewart

    2016-01-01

    Low-density (LD) single nucleotide polymorphism (SNP) arrays provide a cost-effective solution for genomic prediction and selection, but algorithms and computational tools are needed for the optimal design of LD SNP chips. A multiple-objective, local optimization (MOLO) algorithm was developed for design of optimal LD SNP chips that can be imputed accurately to medium-density (MD) or high-density (HD) SNP genotypes for genomic prediction. The objective function facilitates maximization of non-gap map length and system information for the SNP chip, and the latter is computed either as locus-averaged (LASE) or haplotype-averaged Shannon entropy (HASE) and adjusted for uniformity of the SNP distribution. HASE performed better than LASE with ≤1,000 SNPs, but required considerably more computing time. Nevertheless, the differences diminished when >5,000 SNPs were selected. Optimization was accomplished conditionally on the presence of SNPs that were obligated to each chromosome. The frame location of SNPs on a chip can be either uniform (evenly spaced) or non-uniform. For the latter design, a tunable empirical Beta distribution was used to guide location distribution of frame SNPs such that both ends of each chromosome were enriched with SNPs. The SNP distribution on each chromosome was finalized through the objective function that was locally and empirically maximized. This MOLO algorithm was capable of selecting a set of approximately evenly-spaced and highly-informative SNPs, which in turn led to increased imputation accuracy compared with selection solely of evenly-spaced SNPs. Imputation accuracy increased with LD chip size, and imputation error rate was extremely low for chips with ≥3,000 SNPs. Assuming that genotyping or imputation error occurs at random, imputation error rate can be viewed as the upper limit for genomic prediction error. Our results show that about 25% of imputation error rate was propagated to genomic prediction in an Angus population. The utility of this MOLO algorithm was also demonstrated in a real application, in which a 6K SNP panel was optimized conditional on 5,260 obligatory SNP selected based on SNP-trait association in U.S. Holstein animals. With this MOLO algorithm, both imputation error rate and genomic prediction error rate were minimal. PMID:27583971

  2. Predicting drug-target interactions by dual-network integrated logistic matrix factorization

    NASA Astrophysics Data System (ADS)

    Hao, Ming; Bryant, Stephen H.; Wang, Yanli

    2017-01-01

    In this work, we propose a dual-network integrated logistic matrix factorization (DNILMF) algorithm to predict potential drug-target interactions (DTI). The prediction procedure consists of four steps: (1) inferring new drug/target profiles and constructing profile kernel matrix; (2) diffusing drug profile kernel matrix with drug structure kernel matrix; (3) diffusing target profile kernel matrix with target sequence kernel matrix; and (4) building DNILMF model and smoothing new drug/target predictions based on their neighbors. We compare our algorithm with the state-of-the-art method based on the benchmark dataset. Results indicate that the DNILMF algorithm outperforms the previously reported approaches in terms of AUPR (area under precision-recall curve) and AUC (area under curve of receiver operating characteristic) based on the 5 trials of 10-fold cross-validation. We conclude that the performance improvement depends on not only the proposed objective function, but also the used nonlinear diffusion technique which is important but under studied in the DTI prediction field. In addition, we also compile a new DTI dataset for increasing the diversity of currently available benchmark datasets. The top prediction results for the new dataset are confirmed by experimental studies or supported by other computational research.

  3. City traffic flow breakdown prediction based on fuzzy rough set

    NASA Astrophysics Data System (ADS)

    Yang, Xu; Da-wei, Hu; Bing, Su; Duo-jia, Zhang

    2017-05-01

    In city traffic management, traffic breakdown is a very important issue, which is defined as a speed drop of a certain amount within a dense traffic situation. In order to predict city traffic flow breakdown accurately, in this paper, we propose a novel city traffic flow breakdown prediction algorithm based on fuzzy rough set. Firstly, we illustrate the city traffic flow breakdown problem, in which three definitions are given, that is, 1) Pre-breakdown flow rate, 2) Rate, density, and speed of the traffic flow breakdown, and 3) Duration of the traffic flow breakdown. Moreover, we define a hazard function to represent the probability of the breakdown ending at a given time point. Secondly, as there are many redundant and irrelevant attributes in city flow breakdown prediction, we propose an attribute reduction algorithm using the fuzzy rough set. Thirdly, we discuss how to predict the city traffic flow breakdown based on attribute reduction and SVM classifier. Finally, experiments are conducted by collecting data from I-405 Freeway, which is located at Irvine, California. Experimental results demonstrate that the proposed algorithm is able to achieve lower average error rate of city traffic flow breakdown prediction.

  4. CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks.

    PubMed

    Li, Min; Li, Dongyan; Tang, Yu; Wu, Fangxiang; Wang, Jianxin

    2017-08-31

    Nowadays, cluster analysis of biological networks has become one of the most important approaches to identifying functional modules as well as predicting protein complexes and network biomarkers. Furthermore, the visualization of clustering results is crucial to display the structure of biological networks. Here we present CytoCluster, a cytoscape plugin integrating six clustering algorithms, HC-PIN (Hierarchical Clustering algorithm in Protein Interaction Networks), OH-PIN (identifying Overlapping and Hierarchical modules in Protein Interaction Networks), IPCA (Identifying Protein Complex Algorithm), ClusterONE (Clustering with Overlapping Neighborhood Expansion), DCU (Detecting Complexes based on Uncertain graph model), IPC-MCE (Identifying Protein Complexes based on Maximal Complex Extension), and BinGO (the Biological networks Gene Ontology) function. Users can select different clustering algorithms according to their requirements. The main function of these six clustering algorithms is to detect protein complexes or functional modules. In addition, BinGO is used to determine which Gene Ontology (GO) categories are statistically overrepresented in a set of genes or a subgraph of a biological network. CytoCluster can be easily expanded, so that more clustering algorithms and functions can be added to this plugin. Since it was created in July 2013, CytoCluster has been downloaded more than 9700 times in the Cytoscape App store and has already been applied to the analysis of different biological networks. CytoCluster is available from http://apps.cytoscape.org/apps/cytocluster.

  5. CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks

    PubMed Central

    Li, Min; Li, Dongyan; Tang, Yu; Wang, Jianxin

    2017-01-01

    Nowadays, cluster analysis of biological networks has become one of the most important approaches to identifying functional modules as well as predicting protein complexes and network biomarkers. Furthermore, the visualization of clustering results is crucial to display the structure of biological networks. Here we present CytoCluster, a cytoscape plugin integrating six clustering algorithms, HC-PIN (Hierarchical Clustering algorithm in Protein Interaction Networks), OH-PIN (identifying Overlapping and Hierarchical modules in Protein Interaction Networks), IPCA (Identifying Protein Complex Algorithm), ClusterONE (Clustering with Overlapping Neighborhood Expansion), DCU (Detecting Complexes based on Uncertain graph model), IPC-MCE (Identifying Protein Complexes based on Maximal Complex Extension), and BinGO (the Biological networks Gene Ontology) function. Users can select different clustering algorithms according to their requirements. The main function of these six clustering algorithms is to detect protein complexes or functional modules. In addition, BinGO is used to determine which Gene Ontology (GO) categories are statistically overrepresented in a set of genes or a subgraph of a biological network. CytoCluster can be easily expanded, so that more clustering algorithms and functions can be added to this plugin. Since it was created in July 2013, CytoCluster has been downloaded more than 9700 times in the Cytoscape App store and has already been applied to the analysis of different biological networks. CytoCluster is available from http://apps.cytoscape.org/apps/cytocluster. PMID:28858211

  6. A fast and accurate online sequential learning algorithm for feedforward networks.

    PubMed

    Liang, Nan-Ying; Huang, Guang-Bin; Saratchandran, P; Sundararajan, N

    2006-11-01

    In this paper, we develop an online sequential learning algorithm for single hidden layer feedforward networks (SLFNs) with additive or radial basis function (RBF) hidden nodes in a unified framework. The algorithm is referred to as online sequential extreme learning machine (OS-ELM) and can learn data one-by-one or chunk-by-chunk (a block of data) with fixed or varying chunk size. The activation functions for additive nodes in OS-ELM can be any bounded nonconstant piecewise continuous functions and the activation functions for RBF nodes can be any integrable piecewise continuous functions. In OS-ELM, the parameters of hidden nodes (the input weights and biases of additive nodes or the centers and impact factors of RBF nodes) are randomly selected and the output weights are analytically determined based on the sequentially arriving data. The algorithm uses the ideas of ELM of Huang et al. developed for batch learning which has been shown to be extremely fast with generalization performance better than other batch training methods. Apart from selecting the number of hidden nodes, no other control parameters have to be manually chosen. Detailed performance comparison of OS-ELM is done with other popular sequential learning algorithms on benchmark problems drawn from the regression, classification and time series prediction areas. The results show that the OS-ELM is faster than the other sequential algorithms and produces better generalization performance.

  7. TargetSpy: a supervised machine learning approach for microRNA target prediction.

    PubMed

    Sturm, Martin; Hackenberg, Michael; Langenberger, David; Frishman, Dmitrij

    2010-05-28

    Virtually all currently available microRNA target site prediction algorithms require the presence of a (conserved) seed match to the 5' end of the microRNA. Recently however, it has been shown that this requirement might be too stringent, leading to a substantial number of missed target sites. We developed TargetSpy, a novel computational approach for predicting target sites regardless of the presence of a seed match. It is based on machine learning and automatic feature selection using a wide spectrum of compositional, structural, and base pairing features covering current biological knowledge. Our model does not rely on evolutionary conservation, which allows the detection of species-specific interactions and makes TargetSpy suitable for analyzing unconserved genomic sequences.In order to allow for an unbiased comparison of TargetSpy to other methods, we classified all algorithms into three groups: I) no seed match requirement, II) seed match requirement, and III) conserved seed match requirement. TargetSpy predictions for classes II and III are generated by appropriate postfiltering. On a human dataset revealing fold-change in protein production for five selected microRNAs our method shows superior performance in all classes. In Drosophila melanogaster not only our class II and III predictions are on par with other algorithms, but notably the class I (no-seed) predictions are just marginally less accurate. We estimate that TargetSpy predicts between 26 and 112 functional target sites without a seed match per microRNA that are missed by all other currently available algorithms. Only a few algorithms can predict target sites without demanding a seed match and TargetSpy demonstrates a substantial improvement in prediction accuracy in that class. Furthermore, when conservation and the presence of a seed match are required, the performance is comparable with state-of-the-art algorithms. TargetSpy was trained on mouse and performs well in human and drosophila, suggesting that it may be applicable to a broad range of species. Moreover, we have demonstrated that the application of machine learning techniques in combination with upcoming deep sequencing data results in a powerful microRNA target site prediction tool http://www.targetspy.org.

  8. TargetSpy: a supervised machine learning approach for microRNA target prediction

    PubMed Central

    2010-01-01

    Background Virtually all currently available microRNA target site prediction algorithms require the presence of a (conserved) seed match to the 5' end of the microRNA. Recently however, it has been shown that this requirement might be too stringent, leading to a substantial number of missed target sites. Results We developed TargetSpy, a novel computational approach for predicting target sites regardless of the presence of a seed match. It is based on machine learning and automatic feature selection using a wide spectrum of compositional, structural, and base pairing features covering current biological knowledge. Our model does not rely on evolutionary conservation, which allows the detection of species-specific interactions and makes TargetSpy suitable for analyzing unconserved genomic sequences. In order to allow for an unbiased comparison of TargetSpy to other methods, we classified all algorithms into three groups: I) no seed match requirement, II) seed match requirement, and III) conserved seed match requirement. TargetSpy predictions for classes II and III are generated by appropriate postfiltering. On a human dataset revealing fold-change in protein production for five selected microRNAs our method shows superior performance in all classes. In Drosophila melanogaster not only our class II and III predictions are on par with other algorithms, but notably the class I (no-seed) predictions are just marginally less accurate. We estimate that TargetSpy predicts between 26 and 112 functional target sites without a seed match per microRNA that are missed by all other currently available algorithms. Conclusion Only a few algorithms can predict target sites without demanding a seed match and TargetSpy demonstrates a substantial improvement in prediction accuracy in that class. Furthermore, when conservation and the presence of a seed match are required, the performance is comparable with state-of-the-art algorithms. TargetSpy was trained on mouse and performs well in human and drosophila, suggesting that it may be applicable to a broad range of species. Moreover, we have demonstrated that the application of machine learning techniques in combination with upcoming deep sequencing data results in a powerful microRNA target site prediction tool http://www.targetspy.org. PMID:20509939

  9. Optimizing the learning rate for adaptive estimation of neural encoding models

    PubMed Central

    2018-01-01

    Closed-loop neurotechnologies often need to adaptively learn an encoding model that relates the neural activity to the brain state, and is used for brain state decoding. The speed and accuracy of adaptive learning algorithms are critically affected by the learning rate, which dictates how fast model parameters are updated based on new observations. Despite the importance of the learning rate, currently an analytical approach for its selection is largely lacking and existing signal processing methods vastly tune it empirically or heuristically. Here, we develop a novel analytical calibration algorithm for optimal selection of the learning rate in adaptive Bayesian filters. We formulate the problem through a fundamental trade-off that learning rate introduces between the steady-state error and the convergence time of the estimated model parameters. We derive explicit functions that predict the effect of learning rate on error and convergence time. Using these functions, our calibration algorithm can keep the steady-state parameter error covariance smaller than a desired upper-bound while minimizing the convergence time, or keep the convergence time faster than a desired value while minimizing the error. We derive the algorithm both for discrete-valued spikes modeled as point processes nonlinearly dependent on the brain state, and for continuous-valued neural recordings modeled as Gaussian processes linearly dependent on the brain state. Using extensive closed-loop simulations, we show that the analytical solution of the calibration algorithm accurately predicts the effect of learning rate on parameter error and convergence time. Moreover, the calibration algorithm allows for fast and accurate learning of the encoding model and for fast convergence of decoding to accurate performance. Finally, larger learning rates result in inaccurate encoding models and decoders, and smaller learning rates delay their convergence. The calibration algorithm provides a novel analytical approach to predictably achieve a desired level of error and convergence time in adaptive learning, with application to closed-loop neurotechnologies and other signal processing domains. PMID:29813069

  10. Optimizing the learning rate for adaptive estimation of neural encoding models.

    PubMed

    Hsieh, Han-Lin; Shanechi, Maryam M

    2018-05-01

    Closed-loop neurotechnologies often need to adaptively learn an encoding model that relates the neural activity to the brain state, and is used for brain state decoding. The speed and accuracy of adaptive learning algorithms are critically affected by the learning rate, which dictates how fast model parameters are updated based on new observations. Despite the importance of the learning rate, currently an analytical approach for its selection is largely lacking and existing signal processing methods vastly tune it empirically or heuristically. Here, we develop a novel analytical calibration algorithm for optimal selection of the learning rate in adaptive Bayesian filters. We formulate the problem through a fundamental trade-off that learning rate introduces between the steady-state error and the convergence time of the estimated model parameters. We derive explicit functions that predict the effect of learning rate on error and convergence time. Using these functions, our calibration algorithm can keep the steady-state parameter error covariance smaller than a desired upper-bound while minimizing the convergence time, or keep the convergence time faster than a desired value while minimizing the error. We derive the algorithm both for discrete-valued spikes modeled as point processes nonlinearly dependent on the brain state, and for continuous-valued neural recordings modeled as Gaussian processes linearly dependent on the brain state. Using extensive closed-loop simulations, we show that the analytical solution of the calibration algorithm accurately predicts the effect of learning rate on parameter error and convergence time. Moreover, the calibration algorithm allows for fast and accurate learning of the encoding model and for fast convergence of decoding to accurate performance. Finally, larger learning rates result in inaccurate encoding models and decoders, and smaller learning rates delay their convergence. The calibration algorithm provides a novel analytical approach to predictably achieve a desired level of error and convergence time in adaptive learning, with application to closed-loop neurotechnologies and other signal processing domains.

  11. Systems Biological Approach of Molecular Descriptors Connectivity: Optimal Descriptors for Oral Bioavailability Prediction

    PubMed Central

    Ahmed, Shiek S. S. J.; Ramakrishnan, V.

    2012-01-01

    Background Poor oral bioavailability is an important parameter accounting for the failure of the drug candidates. Approximately, 50% of developing drugs fail because of unfavorable oral bioavailability. In silico prediction of oral bioavailability (%F) based on physiochemical properties are highly needed. Although many computational models have been developed to predict oral bioavailability, their accuracy remains low with a significant number of false positives. In this study, we present an oral bioavailability model based on systems biological approach, using a machine learning algorithm coupled with an optimal discriminative set of physiochemical properties. Results The models were developed based on computationally derived 247 physicochemical descriptors from 2279 molecules, among which 969, 605 and 705 molecules were corresponds to oral bioavailability, intestinal absorption (HIA) and caco-2 permeability data set, respectively. The partial least squares discriminate analysis showed 49 descriptors of HIA and 50 descriptors of caco-2 are the major contributing descriptors in classifying into groups. Of these descriptors, 47 descriptors were commonly associated to HIA and caco-2, which suggests to play a vital role in classifying oral bioavailability. To determine the best machine learning algorithm, 21 classifiers were compared using a bioavailability data set of 969 molecules with 47 descriptors. Each molecule in the data set was represented by a set of 47 physiochemical properties with the functional relevance labeled as (+bioavailability/−bioavailability) to indicate good-bioavailability/poor-bioavailability molecules. The best-performing algorithm was the logistic algorithm. The correlation based feature selection (CFS) algorithm was implemented, which confirms that these 47 descriptors are the fundamental descriptors for oral bioavailability prediction. Conclusion The logistic algorithm with 47 selected descriptors correctly predicted the oral bioavailability, with a predictive accuracy of more than 71%. Overall, the method captures the fundamental molecular descriptors, that can be used as an entity to facilitate prediction of oral bioavailability. PMID:22815781

  12. Systems biological approach of molecular descriptors connectivity: optimal descriptors for oral bioavailability prediction.

    PubMed

    Ahmed, Shiek S S J; Ramakrishnan, V

    2012-01-01

    Poor oral bioavailability is an important parameter accounting for the failure of the drug candidates. Approximately, 50% of developing drugs fail because of unfavorable oral bioavailability. In silico prediction of oral bioavailability (%F) based on physiochemical properties are highly needed. Although many computational models have been developed to predict oral bioavailability, their accuracy remains low with a significant number of false positives. In this study, we present an oral bioavailability model based on systems biological approach, using a machine learning algorithm coupled with an optimal discriminative set of physiochemical properties. The models were developed based on computationally derived 247 physicochemical descriptors from 2279 molecules, among which 969, 605 and 705 molecules were corresponds to oral bioavailability, intestinal absorption (HIA) and caco-2 permeability data set, respectively. The partial least squares discriminate analysis showed 49 descriptors of HIA and 50 descriptors of caco-2 are the major contributing descriptors in classifying into groups. Of these descriptors, 47 descriptors were commonly associated to HIA and caco-2, which suggests to play a vital role in classifying oral bioavailability. To determine the best machine learning algorithm, 21 classifiers were compared using a bioavailability data set of 969 molecules with 47 descriptors. Each molecule in the data set was represented by a set of 47 physiochemical properties with the functional relevance labeled as (+bioavailability/-bioavailability) to indicate good-bioavailability/poor-bioavailability molecules. The best-performing algorithm was the logistic algorithm. The correlation based feature selection (CFS) algorithm was implemented, which confirms that these 47 descriptors are the fundamental descriptors for oral bioavailability prediction. The logistic algorithm with 47 selected descriptors correctly predicted the oral bioavailability, with a predictive accuracy of more than 71%. Overall, the method captures the fundamental molecular descriptors, that can be used as an entity to facilitate prediction of oral bioavailability.

  13. GPS Modeling and Analysis. Summary of Research: GPS Satellite Axial Ratio Predictions

    NASA Technical Reports Server (NTRS)

    Axelrad, Penina; Reeh, Lisa

    2002-01-01

    This report outlines the algorithms developed at the Colorado Center for Astrodynamics Research to model yaw and predict the axial ratio as measured from a ground station. The algorithms are implemented in a collection of Matlab functions and scripts that read certain user input, such as ground station coordinates, the UTC time, and the desired GPS (Global Positioning System) satellites, and compute the above-mentioned parameters. The position information for the GPS satellites is obtained from Yuma almanac files corresponding to the prescribed date. The results are displayed graphically through time histories and azimuth-elevation plots.

  14. Key algorithms used in GR02: A computer simulation model for predicting tree and stand growth

    Treesearch

    Garrett A. Hughes; Paul E. Sendak; Paul E. Sendak

    1985-01-01

    GR02 is an individual tree, distance-independent simulation model for predicting tree and stand growth over time. It performs five major functions during each run: (1) updates diameter at breast height, (2) updates total height, (3) estimates mortality, (4) determines regeneration, and (5) updates crown class.

  15. Prediction of Spontaneous Protein Deamidation from Sequence-Derived Secondary Structure and Intrinsic Disorder.

    PubMed

    Lorenzo, J Ramiro; Alonso, Leonardo G; Sánchez, Ignacio E

    2015-01-01

    Asparagine residues in proteins undergo spontaneous deamidation, a post-translational modification that may act as a molecular clock for the regulation of protein function and turnover. Asparagine deamidation is modulated by protein local sequence, secondary structure and hydrogen bonding. We present NGOME, an algorithm able to predict non-enzymatic deamidation of internal asparagine residues in proteins in the absence of structural data, using sequence-based predictions of secondary structure and intrinsic disorder. Compared to previous algorithms, NGOME does not require three-dimensional structures yet yields better predictions than available sequence-only methods. Four case studies of specific proteins show how NGOME may help the user identify deamidation-prone asparagine residues, often related to protein gain of function, protein degradation or protein misfolding in pathological processes. A fifth case study applies NGOME at a proteomic scale and unveils a correlation between asparagine deamidation and protein degradation in yeast. NGOME is freely available as a webserver at the National EMBnet node Argentina, URL: http://www.embnet.qb.fcen.uba.ar/ in the subpage "Protein and nucleic acid structure and sequence analysis".

  16. Numerical experience with a class of algorithms for nonlinear optimization using inexact function and gradient information

    NASA Technical Reports Server (NTRS)

    Carter, Richard G.

    1989-01-01

    For optimization problems associated with engineering design, parameter estimation, image reconstruction, and other optimization/simulation applications, low accuracy function and gradient values are frequently much less expensive to obtain than high accuracy values. Here, researchers investigate the computational performance of trust region methods for nonlinear optimization when high accuracy evaluations are unavailable or prohibitively expensive, and confirm earlier theoretical predictions when the algorithm is convergent even with relative gradient errors of 0.5 or more. The proper choice of the amount of accuracy to use in function and gradient evaluations can result in orders-of-magnitude savings in computational cost.

  17. Predictive factors for renal failure and a control and treatment algorithm

    PubMed Central

    Cerqueira, Denise de Paula; Tavares, José Roberto; Machado, Regimar Carla

    2014-01-01

    Objectives to evaluate the renal function of patients in an intensive care unit, to identify the predisposing factors for the development of renal failure, and to develop an algorithm to help in the control of the disease. Method exploratory, descriptive, prospective study with a quantitative approach. Results a total of 30 patients (75.0%) were diagnosed with kidney failure and the main factors associated with this disease were: advanced age, systemic arterial hypertension, diabetes mellitus, lung diseases, and antibiotic use. Of these, 23 patients (76.6%) showed a reduction in creatinine clearance in the first 24 hours of hospitalization. Conclusion a decline in renal function was observed in a significant number of subjects, therefore, an algorithm was developed with the aim of helping in the control of renal failure in a practical and functional way. PMID:26107827

  18. Collision detection for spacecraft proximity operations. Ph.D. Thesis - MIT

    NASA Technical Reports Server (NTRS)

    Vaughan, Robin M.

    1987-01-01

    The development of a new collision detection algorithm to be used when two spacecraft are operating in the same vicinity is described. The two spacecraft are modeled as unions of convex polyhedra, where the polyhedron resulting from the union may be either convex or nonconvex. The relative motion of the two spacecraft is assumed to be such that one vehicle is moving with constant linear and angular velocity with respect to the other. The algorithm determines if a collision is possible and, if so, predicts the time when the collision will take place. The theoretical basis for the new collision detection algorithm is the C-function formulation of the configuration space approach recently introduced by researchers in robotics. Three different types of C-functions are defined that model the contacts between the vertices, edges, and faces of the polyhedra representing the two spacecraft. The C-functions are shown to be transcendental functions of time for the assumed trajectory of the moving spacecraft. The capabilities of the new algorithm are demonstrated for several example cases.

  19. Sensitivity and Uncertainty Analysis for Streamflow Prediction Using Different Objective Functions and Optimization Algorithms: San Joaquin California

    NASA Astrophysics Data System (ADS)

    Paul, M.; Negahban-Azar, M.

    2017-12-01

    The hydrologic models usually need to be calibrated against observed streamflow at the outlet of a particular drainage area through a careful model calibration. However, a large number of parameters are required to fit in the model due to their unavailability of the field measurement. Therefore, it is difficult to calibrate the model for a large number of potential uncertain model parameters. This even becomes more challenging if the model is for a large watershed with multiple land uses and various geophysical characteristics. Sensitivity analysis (SA) can be used as a tool to identify most sensitive model parameters which affect the calibrated model performance. There are many different calibration and uncertainty analysis algorithms which can be performed with different objective functions. By incorporating sensitive parameters in streamflow simulation, effects of the suitable algorithm in improving model performance can be demonstrated by the Soil and Water Assessment Tool (SWAT) modeling. In this study, the SWAT was applied in the San Joaquin Watershed in California covering 19704 km2 to calibrate the daily streamflow. Recently, sever water stress escalating due to intensified climate variability, prolonged drought and depleting groundwater for agricultural irrigation in this watershed. Therefore it is important to perform a proper uncertainty analysis given the uncertainties inherent in hydrologic modeling to predict the spatial and temporal variation of the hydrologic process to evaluate the impacts of different hydrologic variables. The purpose of this study was to evaluate the sensitivity and uncertainty of the calibrated parameters for predicting streamflow. To evaluate the sensitivity of the calibrated parameters three different optimization algorithms (Sequential Uncertainty Fitting- SUFI-2, Generalized Likelihood Uncertainty Estimation- GLUE and Parameter Solution- ParaSol) were used with four different objective functions (coefficient of determination- r2, Nash-Sutcliffe efficiency- NSE, percent bias- PBIAS, and Kling-Gupta efficiency- KGE). The preliminary results showed that using the SUFI-2 algorithm with the objective function NSE and KGE has improved significantly the calibration (e.g. R2 and NSE is found 0.52 and 0.47 respectively for daily streamflow calibration).

  20. A comparative study of clonal selection algorithm for effluent removal forecasting in septic sludge treatment plant.

    PubMed

    Chun, Ting Sie; Malek, M A; Ismail, Amelia Ritahani

    2015-01-01

    The development of effluent removal prediction is crucial in providing a planning tool necessary for the future development and the construction of a septic sludge treatment plant (SSTP), especially in the developing countries. In order to investigate the expected functionality of the required standard, the prediction of the effluent quality, namely biological oxygen demand, chemical oxygen demand and total suspended solid of an SSTP was modelled using an artificial intelligence approach. In this paper, we adopt the clonal selection algorithm (CSA) to set up a prediction model, with a well-established method - namely the least-square support vector machine (LS-SVM) as a baseline model. The test results of the case study showed that the prediction of the CSA-based SSTP model worked well and provided model performance as satisfactory as the LS-SVM model. The CSA approach shows that fewer control and training parameters are required for model simulation as compared with the LS-SVM approach. The ability of a CSA approach in resolving limited data samples, non-linear sample function and multidimensional pattern recognition makes it a powerful tool in modelling the prediction of effluent removals in an SSTP.

  1. Computational Selection of Transcriptomics Experiments Improves Guilt-by-Association Analyses

    PubMed Central

    Bhat, Prajwal; Yang, Haixuan; Bögre, László; Devoto, Alessandra; Paccanaro, Alberto

    2012-01-01

    The Guilt-by-Association (GBA) principle, according to which genes with similar expression profiles are functionally associated, is widely applied for functional analyses using large heterogeneous collections of transcriptomics data. However, the use of such large collections could hamper GBA functional analysis for genes whose expression is condition specific. In these cases a smaller set of condition related experiments should instead be used, but identifying such functionally relevant experiments from large collections based on literature knowledge alone is an impractical task. We begin this paper by analyzing, both from a mathematical and a biological point of view, why only condition specific experiments should be used in GBA functional analysis. We are able to show that this phenomenon is independent of the functional categorization scheme and of the organisms being analyzed. We then present a semi-supervised algorithm that can select functionally relevant experiments from large collections of transcriptomics experiments. Our algorithm is able to select experiments relevant to a given GO term, MIPS FunCat term or even KEGG pathways. We extensively test our algorithm on large dataset collections for yeast and Arabidopsis. We demonstrate that: using the selected experiments there is a statistically significant improvement in correlation between genes in the functional category of interest; the selected experiments improve GBA-based gene function prediction; the effectiveness of the selected experiments increases with annotation specificity; our algorithm can be successfully applied to GBA-based pathway reconstruction. Importantly, the set of experiments selected by the algorithm reflects the existing literature knowledge about the experiments. [A MATLAB implementation of the algorithm and all the data used in this paper can be downloaded from the paper website: http://www.paccanarolab.org/papers/CorrGene/]. PMID:22879875

  2. Using support vector machine to predict beta- and gamma-turns in proteins.

    PubMed

    Hu, Xiuzhen; Li, Qianzhong

    2008-09-01

    By using the composite vector with increment of diversity, position conservation scoring function, and predictive secondary structures to express the information of sequence, a support vector machine (SVM) algorithm for predicting beta- and gamma-turns in the proteins is proposed. The 426 and 320 nonhomologous protein chains described by Guruprasad and Rajkumar (Guruprasad and Rajkumar J. Biosci 2000, 25,143) are used for training and testing the predictive model of the beta- and gamma-turns, respectively. The overall prediction accuracy and the Matthews correlation coefficient in 7-fold cross-validation are 79.8% and 0.47, respectively, for the beta-turns. The overall prediction accuracy in 5-fold cross-validation is 61.0% for the gamma-turns. These results are significantly higher than the other algorithms in the prediction of beta- and gamma-turns using the same datasets. In addition, the 547 and 823 nonhomologous protein chains described by Fuchs and Alix (Fuchs and Alix Proteins: Struct Funct Bioinform 2005, 59, 828) are used for training and testing the predictive model of the beta- and gamma-turns, and better results are obtained. This algorithm may be helpful to improve the performance of protein turns' prediction. To ensure the ability of the SVM method to correctly classify beta-turn and non-beta-turn (gamma-turn and non-gamma-turn), the receiver operating characteristic threshold independent measure curves are provided. (c) 2008 Wiley Periodicals, Inc.

  3. Genetic Algorithms and Classification Trees in Feature Discovery: Diabetes and the NHANES database

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Heredia-Langner, Alejandro; Jarman, Kristin H.; Amidan, Brett G.

    2013-09-01

    This paper presents a feature selection methodology that can be applied to datasets containing a mixture of continuous and categorical variables. Using a Genetic Algorithm (GA), this method explores a dataset and selects a small set of features relevant for the prediction of a binary (1/0) response. Binary classification trees and an objective function based on conditional probabilities are used to measure the fitness of a given subset of features. The method is applied to health data in order to find factors useful for the prediction of diabetes. Results show that our algorithm is capable of narrowing down the setmore » of predictors to around 8 factors that can be validated using reputable medical and public health resources.« less

  4. Predicting clinical response to anticancer drugs using an ex vivo platform that captures tumour heterogeneity.

    PubMed

    Majumder, Biswanath; Baraneedharan, Ulaganathan; Thiyagarajan, Saravanan; Radhakrishnan, Padhma; Narasimhan, Harikrishna; Dhandapani, Muthu; Brijwani, Nilesh; Pinto, Dency D; Prasath, Arun; Shanthappa, Basavaraja U; Thayakumar, Allen; Surendran, Rajagopalan; Babu, Govind K; Shenoy, Ashok M; Kuriakose, Moni A; Bergthold, Guillaume; Horowitz, Peleg; Loda, Massimo; Beroukhim, Rameen; Agarwal, Shivani; Sengupta, Shiladitya; Sundaram, Mallikarjun; Majumder, Pradip K

    2015-02-27

    Predicting clinical response to anticancer drugs remains a major challenge in cancer treatment. Emerging reports indicate that the tumour microenvironment and heterogeneity can limit the predictive power of current biomarker-guided strategies for chemotherapy. Here we report the engineering of personalized tumour ecosystems that contextually conserve the tumour heterogeneity, and phenocopy the tumour microenvironment using tumour explants maintained in defined tumour grade-matched matrix support and autologous patient serum. The functional response of tumour ecosystems, engineered from 109 patients, to anticancer drugs, together with the corresponding clinical outcomes, is used to train a machine learning algorithm; the learned model is then applied to predict the clinical response in an independent validation group of 55 patients, where we achieve 100% sensitivity in predictions while keeping specificity in a desired high range. The tumour ecosystem and algorithm, together termed the CANScript technology, can emerge as a powerful platform for enabling personalized medicine.

  5. Customer demand prediction of service-oriented manufacturing using the least square support vector machine optimized by particle swarm optimization algorithm

    NASA Astrophysics Data System (ADS)

    Cao, Jin; Jiang, Zhibin; Wang, Kangzhou

    2017-07-01

    Many nonlinear customer satisfaction-related factors significantly influence the future customer demand for service-oriented manufacturing (SOM). To address this issue and enhance the prediction accuracy, this article develops a novel customer demand prediction approach for SOM. The approach combines the phase space reconstruction (PSR) technique with the optimized least square support vector machine (LSSVM). First, the prediction sample space is reconstructed by the PSR to enrich the time-series dynamics of the limited data sample. Then, the generalization and learning ability of the LSSVM are improved by the hybrid polynomial and radial basis function kernel. Finally, the key parameters of the LSSVM are optimized by the particle swarm optimization algorithm. In a real case study, the customer demand prediction of an air conditioner compressor is implemented. Furthermore, the effectiveness and validity of the proposed approach are demonstrated by comparison with other classical predication approaches.

  6. Small hydropower spot prediction using SWAT and a diversion algorithm, case study: Upper Citarum Basin

    NASA Astrophysics Data System (ADS)

    Kardhana, Hadi; Arya, Doni Khaira; Hadihardaja, Iwan K.; Widyaningtyas, Riawan, Edi; Lubis, Atika

    2017-11-01

    Small-Scale Hydropower (SHP) had been important electric energy power source in Indonesia. Indonesia is vast countries, consists of more than 17.000 islands. It has large fresh water resource about 3 m of rainfall and 2 m of runoff. Much of its topography is mountainous, remote but abundant with potential energy. Millions of people do not have sufficient access to electricity, some live in the remote places. Recently, SHP development was encouraged for energy supply of the places. Development of global hydrology data provides opportunity to predict distribution of hydropower potential. In this paper, we demonstrate run-of-river type SHP spot prediction tool using SWAT and a river diversion algorithm. The use of Soil and Water Assessment Tool (SWAT) with input of CFSR (Climate Forecast System Re-analysis) of 10 years period had been implemented to predict spatially distributed flow cumulative distribution function (CDF). A simple algorithm to maximize potential head of a location by a river diversion expressing head race and penstock had been applied. Firm flow and power of the SHP were estimated from the CDF and the algorithm. The tool applied to Upper Citarum River Basin and three out of four existing hydropower locations had been well predicted. The result implies that this tool is able to support acceleration of SHP development at earlier phase.

  7. Predicting functional decline and survival in amyotrophic lateral sclerosis.

    PubMed

    Ong, Mei-Lyn; Tan, Pei Fang; Holbrook, Joanna D

    2017-01-01

    Better predictors of amyotrophic lateral sclerosis disease course could enable smaller and more targeted clinical trials. Partially to address this aim, the Prize for Life foundation collected de-identified records from amyotrophic lateral sclerosis sufferers who participated in clinical trials of investigational drugs and made them available to researchers in the PRO-ACT database. In this study, time series data from PRO-ACT subjects were fitted to exponential models. Binary classes for decline in the total score of amyotrophic lateral sclerosis functional rating scale revised (ALSFRS-R) (fast/slow progression) and survival (high/low death risk) were derived. Data was segregated into training and test sets via cross validation. Learning algorithms were applied to the demographic, clinical and laboratory parameters in the training set to predict ALSFRS-R decline and the derived fast/slow progression and high/low death risk categories. The performance of predictive models was assessed by cross-validation in the test set using Receiver Operator Curves and root mean squared errors. A model created using a boosting algorithm containing the decline in four parameters (weight, alkaline phosphatase, albumin and creatine kinase) post baseline, was able to predict functional decline class (fast or slow) with fair accuracy (AUC = 0.82). However similar approaches to build a predictive model for decline class by baseline subject characteristics were not successful. In contrast, baseline values of total bilirubin, gamma glutamyltransferase, urine specific gravity and ALSFRS-R item score-climbing stairs were sufficient to predict survival class. Using combinations of small numbers of variables it was possible to predict classes of functional decline and survival across the 1-2 year timeframe available in PRO-ACT. These findings may have utility for design of future ALS clinical trials.

  8. BetaTPred: prediction of beta-TURNS in a protein using statistical algorithms.

    PubMed

    Kaur, Harpreet; Raghava, G P S

    2002-03-01

    beta-turns play an important role from a structural and functional point of view. beta-turns are the most common type of non-repetitive structures in proteins and comprise on average, 25% of the residues. In the past numerous methods have been developed to predict beta-turns in a protein. Most of these prediction methods are based on statistical approaches. In order to utilize the full potential of these methods, there is a need to develop a web server. This paper describes a web server called BetaTPred, developed for predicting beta-TURNS in a protein from its amino acid sequence. BetaTPred allows the user to predict turns in a protein using existing statistical algorithms. It also allows to predict different types of beta-TURNS e.g. type I, I', II, II', VI, VIII and non-specific. This server assists the users in predicting the consensus beta-TURNS in a protein. The server is accessible from http://imtech.res.in/raghava/betatpred/

  9. A Consensus Method for the Prediction of ‘Aggregation-Prone’ Peptides in Globular Proteins

    PubMed Central

    Tsolis, Antonios C.; Papandreou, Nikos C.; Iconomidou, Vassiliki A.; Hamodrakas, Stavros J.

    2013-01-01

    The purpose of this work was to construct a consensus prediction algorithm of ‘aggregation-prone’ peptides in globular proteins, combining existing tools. This allows comparison of the different algorithms and the production of more objective and accurate results. Eleven (11) individual methods are combined and produce AMYLPRED2, a publicly, freely available web tool to academic users (http://biophysics.biol.uoa.gr/AMYLPRED2), for the consensus prediction of amyloidogenic determinants/‘aggregation-prone’ peptides in proteins, from sequence alone. The performance of AMYLPRED2 indicates that it functions better than individual aggregation-prediction algorithms, as perhaps expected. AMYLPRED2 is a useful tool for identifying amyloid-forming regions in proteins that are associated with several conformational diseases, called amyloidoses, such as Altzheimer's, Parkinson's, prion diseases and type II diabetes. It may also be useful for understanding the properties of protein folding and misfolding and for helping to the control of protein aggregation/solubility in biotechnology (recombinant proteins forming bacterial inclusion bodies) and biotherapeutics (monoclonal antibodies and biopharmaceutical proteins). PMID:23326595

  10. Neural imaging to track mental states while using an intelligent tutoring system.

    PubMed

    Anderson, John R; Betts, Shawn; Ferris, Jennifer L; Fincham, Jon M

    2010-04-13

    Hemodynamic measures of brain activity can be used to interpret a student's mental state when they are interacting with an intelligent tutoring system. Functional magnetic resonance imaging (fMRI) data were collected while students worked with a tutoring system that taught an algebra isomorph. A cognitive model predicted the distribution of solution times from measures of problem complexity. Separately, a linear discriminant analysis used fMRI data to predict whether or not students were engaged in problem solving. A hidden Markov algorithm merged these two sources of information to predict the mental states of students during problem-solving episodes. The algorithm was trained on data from 1 day of interaction and tested with data from a later day. In terms of predicting what state a student was in during a 2-s period, the algorithm achieved 87% accuracy on the training data and 83% accuracy on the test data. The results illustrate the importance of integrating the bottom-up information from imaging data with the top-down information from a cognitive model.

  11. Machine Learning Algorithm Predicts Cardiac Resynchronization Therapy Outcomes: Lessons From the COMPANION Trial.

    PubMed

    Kalscheur, Matthew M; Kipp, Ryan T; Tattersall, Matthew C; Mei, Chaoqun; Buhr, Kevin A; DeMets, David L; Field, Michael E; Eckhardt, Lee L; Page, C David

    2018-01-01

    Cardiac resynchronization therapy (CRT) reduces morbidity and mortality in heart failure patients with reduced left ventricular function and intraventricular conduction delay. However, individual outcomes vary significantly. This study sought to use a machine learning algorithm to develop a model to predict outcomes after CRT. Models were developed with machine learning algorithms to predict all-cause mortality or heart failure hospitalization at 12 months post-CRT in the COMPANION trial (Comparison of Medical Therapy, Pacing, and Defibrillation in Heart Failure). The best performing model was developed with the random forest algorithm. The ability of this model to predict all-cause mortality or heart failure hospitalization and all-cause mortality alone was compared with discrimination obtained using a combination of bundle branch block morphology and QRS duration. In the 595 patients with CRT-defibrillator in the COMPANION trial, 105 deaths occurred (median follow-up, 15.7 months). The survival difference across subgroups differentiated by bundle branch block morphology and QRS duration did not reach significance ( P =0.08). The random forest model produced quartiles of patients with an 8-fold difference in survival between those with the highest and lowest predicted probability for events (hazard ratio, 7.96; P <0.0001). The model also discriminated the risk of the composite end point of all-cause mortality or heart failure hospitalization better than subgroups based on bundle branch block morphology and QRS duration. In the COMPANION trial, a machine learning algorithm produced a model that predicted clinical outcomes after CRT. Applied before device implant, this model may better differentiate outcomes over current clinical discriminators and improve shared decision-making with patients. © 2018 American Heart Association, Inc.

  12. Predicting Recovery Potential for Individual Stroke Patients Increases Rehabilitation Efficiency.

    PubMed

    Stinear, Cathy M; Byblow, Winston D; Ackerley, Suzanne J; Barber, P Alan; Smith, Marie-Claire

    2017-04-01

    Several clinical measures and biomarkers are associated with motor recovery after stroke, but none are used to guide rehabilitation for individual patients. The objective of this study was to evaluate the implementation of upper limb predictions in stroke rehabilitation, by combining clinical measures and biomarkers using the Predict Recovery Potential (PREP) algorithm. Predictions were provided for patients in the implementation group (n=110) and withheld from the comparison group (n=82). Predictions guided rehabilitation therapy focus for patients in the implementation group. The effects of predictive information on clinical practice (length of stay, therapist confidence, therapy content, and dose) were evaluated. Clinical outcomes (upper limb function, impairment and use, independence, and quality of life) were measured 3 and 6 months poststroke. The primary clinical practice outcome was inpatient length of stay. The primary clinical outcome was Action Research Arm Test score 3 months poststroke. Length of stay was 1 week shorter for the implementation group (11 days; 95% confidence interval, 9-13 days) than the comparison group (17 days; 95% confidence interval, 14-21 days; P =0.001), controlling for upper limb impairment, age, sex, and comorbidities. Therapists were more confident ( P =0.004) and modified therapy content according to predictions for the implementation group ( P <0.05). The algorithm correctly predicted the primary clinical outcome for 80% of patients in both groups. There were no adverse effects of algorithm implementation on patient outcomes at 3 or 6 months poststroke. PREP algorithm predictions modify therapy content and increase rehabilitation efficiency after stroke without compromising clinical outcome. URL: http://anzctr.org.au. Unique identifier: ACTRN12611000755932. © 2017 American Heart Association, Inc.

  13. Profiling Arthritis Pain with a Decision Tree.

    PubMed

    Hung, Man; Bounsanga, Jerry; Liu, Fangzhou; Voss, Maren W

    2018-06-01

    Arthritis is the leading cause of work disability and contributes to lost productivity. Previous studies showed that various factors predict pain, but they were limited in sample size and scope from a data analytics perspective. The current study applied machine learning algorithms to identify predictors of pain associated with arthritis in a large national sample. Using data from the 2011 to 2012 Medical Expenditure Panel Survey, data mining was performed to develop algorithms to identify factors and patterns that contribute to risk of pain. The model incorporated over 200 variables within the algorithm development, including demographic data, medical claims, laboratory tests, patient-reported outcomes, and sociobehavioral characteristics. The developed algorithms to predict pain utilize variables readily available in patient medical records. Using the machine learning classification algorithm J48 with 50-fold cross-validations, we found that the model can significantly distinguish those with and without pain (c-statistics = 0.9108). The F measure was 0.856, accuracy rate was 85.68%, sensitivity was 0.862, specificity was 0.852, and precision was 0.849. Physical and mental function scores, the ability to climb stairs, and overall assessment of feeling were the most discriminative predictors from the 12 identified variables, predicting pain with 86% accuracy for individuals with arthritis. In this era of rapid expansion of big data application, the nature of healthcare research is moving from hypothesis-driven to data-driven solutions. The algorithms generated in this study offer new insights on individualized pain prediction, allowing the development of cost-effective care management programs for those experiencing arthritis pain. © 2017 World Institute of Pain.

  14. Establishing a Dynamic Self-Adaptation Learning Algorithm of the BP Neural Network and Its Applications

    NASA Astrophysics Data System (ADS)

    Li, Xiaofeng; Xiang, Suying; Zhu, Pengfei; Wu, Min

    2015-12-01

    In order to avoid the inherent deficiencies of the traditional BP neural network, such as slow convergence speed, that easily leading to local minima, poor generalization ability and difficulty in determining the network structure, the dynamic self-adaptive learning algorithm of the BP neural network is put forward to improve the function of the BP neural network. The new algorithm combines the merit of principal component analysis, particle swarm optimization, correlation analysis and self-adaptive model, hence can effectively solve the problems of selecting structural parameters, initial connection weights and thresholds and learning rates of the BP neural network. This new algorithm not only reduces the human intervention, optimizes the topological structures of BP neural networks and improves the network generalization ability, but also accelerates the convergence speed of a network, avoids trapping into local minima, and enhances network adaptation ability and prediction ability. The dynamic self-adaptive learning algorithm of the BP neural network is used to forecast the total retail sale of consumer goods of Sichuan Province, China. Empirical results indicate that the new algorithm is superior to the traditional BP network algorithm in predicting accuracy and time consumption, which shows the feasibility and effectiveness of the new algorithm.

  15. Smoking Gun or Circumstantial Evidence? Comparison of Statistical Learning Methods using Functional Annotations for Prioritizing Risk Variants.

    PubMed

    Gagliano, Sarah A; Ravji, Reena; Barnes, Michael R; Weale, Michael E; Knight, Jo

    2015-08-24

    Although technology has triumphed in facilitating routine genome sequencing, new challenges have been created for the data-analyst. Genome-scale surveys of human variation generate volumes of data that far exceed capabilities for laboratory characterization. By incorporating functional annotations as predictors, statistical learning has been widely investigated for prioritizing genetic variants likely to be associated with complex disease. We compared three published prioritization procedures, which use different statistical learning algorithms and different predictors with regard to the quantity, type and coding. We also explored different combinations of algorithm and annotation set. As an application, we tested which methodology performed best for prioritizing variants using data from a large schizophrenia meta-analysis by the Psychiatric Genomics Consortium. Results suggest that all methods have considerable (and similar) predictive accuracies (AUCs 0.64-0.71) in test set data, but there is more variability in the application to the schizophrenia GWAS. In conclusion, a variety of algorithms and annotations seem to have a similar potential to effectively enrich true risk variants in genome-scale datasets, however none offer more than incremental improvement in prediction. We discuss how methods might be evolved for risk variant prediction to address the impending bottleneck of the new generation of genome re-sequencing studies.

  16. Solid harmonic wavelet scattering for predictions of molecule properties

    NASA Astrophysics Data System (ADS)

    Eickenberg, Michael; Exarchakis, Georgios; Hirn, Matthew; Mallat, Stéphane; Thiry, Louis

    2018-06-01

    We present a machine learning algorithm for the prediction of molecule properties inspired by ideas from density functional theory (DFT). Using Gaussian-type orbital functions, we create surrogate electronic densities of the molecule from which we compute invariant "solid harmonic scattering coefficients" that account for different types of interactions at different scales. Multilinear regressions of various physical properties of molecules are computed from these invariant coefficients. Numerical experiments show that these regressions have near state-of-the-art performance, even with relatively few training examples. Predictions over small sets of scattering coefficients can reach a DFT precision while being interpretable.

  17. PANDA: Protein function prediction using domain architecture and affinity propagation.

    PubMed

    Wang, Zheng; Zhao, Chenguang; Wang, Yiheng; Sun, Zheng; Wang, Nan

    2018-02-22

    We developed PANDA (Propagation of Affinity and Domain Architecture) to predict protein functions in the format of Gene Ontology (GO) terms. PANDA at first executes profile-profile alignment algorithm to search against PfamA, KOG, COG, and SwissProt databases, and then launches PSI-BLAST against UniProt for homologue search. PANDA integrates a domain architecture inference algorithm based on the Bayesian statistics that calculates the probability of having a GO term. All the candidate GO terms are pooled and filtered based on Z-score. After that, the remaining GO terms are clustered using an affinity propagation algorithm based on the GO directed acyclic graph, followed by a second round of filtering on the clusters of GO terms. We benchmarked the performance of all the baseline predictors PANDA integrates and also for every pooling and filtering step of PANDA. It can be found that PANDA achieves better performances in terms of area under the curve for precision and recall compared to the baseline predictors. PANDA can be accessed from http://dna.cs.miami.edu/PANDA/ .

  18. A Novel approach for predicting monthly water demand by combining singular spectrum analysis with neural networks

    NASA Astrophysics Data System (ADS)

    Zubaidi, Salah L.; Dooley, Jayne; Alkhaddar, Rafid M.; Abdellatif, Mawada; Al-Bugharbee, Hussein; Ortega-Martorell, Sandra

    2018-06-01

    Valid and dependable water demand prediction is a major element of the effective and sustainable expansion of municipal water infrastructures. This study provides a novel approach to quantifying water demand through the assessment of climatic factors, using a combination of a pretreatment signal technique, a hybrid particle swarm optimisation algorithm and an artificial neural network (PSO-ANN). The Singular Spectrum Analysis (SSA) technique was adopted to decompose and reconstruct water consumption in relation to six weather variables, to create a seasonal and stochastic time series. The results revealed that SSA is a powerful technique, capable of decomposing the original time series into many independent components including trend, oscillatory behaviours and noise. In addition, the PSO-ANN algorithm was shown to be a reliable prediction model, outperforming the hybrid Backtracking Search Algorithm BSA-ANN in terms of fitness function (RMSE). The findings of this study also support the view that water demand is driven by climatological variables.

  19. CpG islands: algorithms and applications in methylation studies.

    PubMed

    Zhao, Zhongming; Han, Leng

    2009-05-15

    Methylation occurs frequently at 5'-cytosine of the CpG dinucleotides in vertebrate genomes; however, this epigenetic feature is rarely observed in CpG islands (CGIs) or CpG clusters in the promoter regions of genes. Aberrant methylation of the promoter-associated CGIs might influence gene expression and cause carcinogenesis. Because of the functional importance, multiple algorithms have been available for identifying CGIs in a genome or a sequence. They can be categorized into the traditional algorithms (e.g., Gardiner-Garden and Frommer (1987), Takai and Jones (2002), and CpGPRoD (2002)) or statistical property based algorithms (CpGcluster (2006) and CG cluster (2007)). We reviewed the features of these algorithms and evaluated their performance on identifying functional CGIs using genome-wide methylation data. Moreover, identification of CGIs is an initial step in many recent studies for predicting methylation status as well as in the design of methylation detection platforms. We reviewed the benchmarks and features used in these studies.

  20. Capture, Learning, and Classification of Upper Extremity Movement Primitives in Healthy Controls and Stroke Patients

    PubMed Central

    Guerra, Jorge; Uddin, Jasim; Nilsen, Dawn; Mclnerney, James; Fadoo, Ammarah; Omofuma, Isirame B.; Hughes, Shatif; Agrawal, Sunil; Allen, Peter; Schambra, Heidi M.

    2017-01-01

    There currently exist no practical tools to identify functional movements in the upper extremities (UEs). This absence has limited the precise therapeutic dosing of patients recovering from stroke. In this proof-of-principle study, we aimed to develop an accurate approach for classifying UE functional movement primitives, which comprise functional movements. Data were generated from inertial measurement units (IMUs) placed on upper body segments of older healthy individuals and chronic stroke patients. Subjects performed activities commonly trained during rehabilitation after stroke. Data processing involved the use of a sliding window to obtain statistical descriptors, and resulting features were processed by a Hidden Markov Model (HMM). The likelihoods of the states, resulting from the HMM, were segmented by a second sliding window and their averages were calculated. The final predictions were mapped to human functional movement primitives using a Logistic Regression algorithm. Algorithm performance was assessed with a leave-one-out analysis, which determined its sensitivity, specificity, and positive and negative predictive values for all classified primitives. In healthy control and stroke participants, our approach identified functional movement primitives embedded in training activities with, on average, 80% precision. This approach may support functional movement dosing in stroke rehabilitation. PMID:28813877

  1. Operationalizing the Diagnostic Criteria for Mild Cognitive Impairment: The Salience of Objective Measures in Predicting Incident Dementia.

    PubMed

    Brodaty, Henry; Aerts, Liesbeth; Crawford, John D; Heffernan, Megan; Kochan, Nicole A; Reppermund, Simone; Kang, Kristan; Maston, Kate; Draper, Brian; Trollor, Julian N; Sachdev, Perminder S

    2017-05-01

    Mild cognitive impairment (MCI) is considered an intermediate stage between normal aging and dementia. It is diagnosed in the presence of subjective cognitive decline and objective cognitive impairment without significant functional impairment, although there are no standard operationalizations for each of these criteria. The objective of this study is to determine which operationalization of the MCI criteria is most accurate at predicting dementia. Six-year longitudinal study, part of the Sydney Memory and Ageing Study. Community-based. 873 community-dwelling dementia-free adults between 70 and 90 years of age. Persons from a non-English speaking background were excluded. Seven different operationalizations for subjective cognitive decline and eight measures of objective cognitive impairment (resulting in 56 different MCI operational algorithms) were applied. The accuracy of each algorithm to predict progression to dementia over 6 years was examined for 618 individuals. Baseline MCI prevalence varied between 0.4% and 30.2% and dementia conversion between 15.9% and 61.9% across different algorithms. The predictive accuracy for progression to dementia was poor. The highest accuracy was achieved based on objective cognitive impairment alone. Inclusion of subjective cognitive decline or mild functional impairment did not improve dementia prediction accuracy. Not MCI, but objective cognitive impairment alone, is the best predictor for progression to dementia in a community sample. Nevertheless, clinical assessment procedures need to be refined to improve the identification of pre-dementia individuals. Copyright © 2016 American Association for Geriatric Psychiatry. Published by Elsevier Inc. All rights reserved.

  2. Noise normalization and windowing functions for VALIDAR in wind parameter estimation

    NASA Astrophysics Data System (ADS)

    Beyon, Jeffrey Y.; Koch, Grady J.; Li, Zhiwen

    2006-05-01

    The wind parameter estimates from a state-of-the-art 2-μm coherent lidar system located at NASA Langley, Virginia, named VALIDAR (validation lidar), were compared after normalizing the noise by its estimated power spectra via the periodogram and the linear predictive coding (LPC) scheme. The power spectra and the Doppler shift estimates were the main parameter estimates for comparison. Different types of windowing functions were implemented in VALIDAR data processing algorithm and their impact on the wind parameter estimates was observed. Time and frequency independent windowing functions such as Rectangular, Hanning, and Kaiser-Bessel and time and frequency dependent apodized windowing function were compared. The briefing of current nonlinear algorithm development for Doppler shift correction subsequently follows.

  3. Distinct prediction errors in mesostriatal circuits of the human brain mediate learning about the values of both states and actions: evidence from high-resolution fMRI.

    PubMed

    Colas, Jaron T; Pauli, Wolfgang M; Larsen, Tobias; Tyszka, J Michael; O'Doherty, John P

    2017-10-01

    Prediction-error signals consistent with formal models of "reinforcement learning" (RL) have repeatedly been found within dopaminergic nuclei of the midbrain and dopaminoceptive areas of the striatum. However, the precise form of the RL algorithms implemented in the human brain is not yet well determined. Here, we created a novel paradigm optimized to dissociate the subtypes of reward-prediction errors that function as the key computational signatures of two distinct classes of RL models-namely, "actor/critic" models and action-value-learning models (e.g., the Q-learning model). The state-value-prediction error (SVPE), which is independent of actions, is a hallmark of the actor/critic architecture, whereas the action-value-prediction error (AVPE) is the distinguishing feature of action-value-learning algorithms. To test for the presence of these prediction-error signals in the brain, we scanned human participants with a high-resolution functional magnetic-resonance imaging (fMRI) protocol optimized to enable measurement of neural activity in the dopaminergic midbrain as well as the striatal areas to which it projects. In keeping with the actor/critic model, the SVPE signal was detected in the substantia nigra. The SVPE was also clearly present in both the ventral striatum and the dorsal striatum. However, alongside these purely state-value-based computations we also found evidence for AVPE signals throughout the striatum. These high-resolution fMRI findings suggest that model-free aspects of reward learning in humans can be explained algorithmically with RL in terms of an actor/critic mechanism operating in parallel with a system for more direct action-value learning.

  4. Distinct prediction errors in mesostriatal circuits of the human brain mediate learning about the values of both states and actions: evidence from high-resolution fMRI

    PubMed Central

    Pauli, Wolfgang M.; Larsen, Tobias; Tyszka, J. Michael; O’Doherty, John P.

    2017-01-01

    Prediction-error signals consistent with formal models of “reinforcement learning” (RL) have repeatedly been found within dopaminergic nuclei of the midbrain and dopaminoceptive areas of the striatum. However, the precise form of the RL algorithms implemented in the human brain is not yet well determined. Here, we created a novel paradigm optimized to dissociate the subtypes of reward-prediction errors that function as the key computational signatures of two distinct classes of RL models—namely, “actor/critic” models and action-value-learning models (e.g., the Q-learning model). The state-value-prediction error (SVPE), which is independent of actions, is a hallmark of the actor/critic architecture, whereas the action-value-prediction error (AVPE) is the distinguishing feature of action-value-learning algorithms. To test for the presence of these prediction-error signals in the brain, we scanned human participants with a high-resolution functional magnetic-resonance imaging (fMRI) protocol optimized to enable measurement of neural activity in the dopaminergic midbrain as well as the striatal areas to which it projects. In keeping with the actor/critic model, the SVPE signal was detected in the substantia nigra. The SVPE was also clearly present in both the ventral striatum and the dorsal striatum. However, alongside these purely state-value-based computations we also found evidence for AVPE signals throughout the striatum. These high-resolution fMRI findings suggest that model-free aspects of reward learning in humans can be explained algorithmically with RL in terms of an actor/critic mechanism operating in parallel with a system for more direct action-value learning. PMID:29049406

  5. A Particle Swarm Optimization-Based Approach with Local Search for Predicting Protein Folding.

    PubMed

    Yang, Cheng-Hong; Lin, Yu-Shiun; Chuang, Li-Yeh; Chang, Hsueh-Wei

    2017-10-01

    The hydrophobic-polar (HP) model is commonly used for predicting protein folding structures and hydrophobic interactions. This study developed a particle swarm optimization (PSO)-based algorithm combined with local search algorithms; specifically, the high exploration PSO (HEPSO) algorithm (which can execute global search processes) was combined with three local search algorithms (hill-climbing algorithm, greedy algorithm, and Tabu table), yielding the proposed HE-L-PSO algorithm. By using 20 known protein structures, we evaluated the performance of the HE-L-PSO algorithm in predicting protein folding in the HP model. The proposed HE-L-PSO algorithm exhibited favorable performance in predicting both short and long amino acid sequences with high reproducibility and stability, compared with seven reported algorithms. The HE-L-PSO algorithm yielded optimal solutions for all predicted protein folding structures. All HE-L-PSO-predicted protein folding structures possessed a hydrophobic core that is similar to normal protein folding.

  6. NEP: web server for epitope prediction based on antibody neutralization of viral strains with diverse sequences

    PubMed Central

    Chuang, Gwo-Yu; Liou, David; Kwong, Peter D.; Georgiev, Ivelin S.

    2014-01-01

    Delineation of the antigenic site, or epitope, recognized by an antibody can provide clues about functional vulnerabilities and resistance mechanisms, and can therefore guide antibody optimization and epitope-based vaccine design. Previously, we developed an algorithm for antibody-epitope prediction based on antibody neutralization of viral strains with diverse sequences and validated the algorithm on a set of broadly neutralizing HIV-1 antibodies. Here we describe the implementation of this algorithm, NEP (Neutralization-based Epitope Prediction), as a web-based server. The users must supply as input: (i) an alignment of antigen sequences of diverse viral strains; (ii) neutralization data for the antibody of interest against the same set of antigen sequences; and (iii) (optional) a structure of the unbound antigen, for enhanced prediction accuracy. The prediction results can be downloaded or viewed interactively on the antigen structure (if supplied) from the web browser using a JSmol applet. Since neutralization experiments are typically performed as one of the first steps in the characterization of an antibody to determine its breadth and potency, the NEP server can be used to predict antibody-epitope information at no additional experimental costs. NEP can be accessed on the internet at http://exon.niaid.nih.gov/nep. PMID:24782517

  7. Comparison of linear–stochastic and nonlinear–deterministic algorithms in the analysis of 15-minute clinical ECGs to predict risk of arrhythmic death

    PubMed Central

    Skinner, James E; Meyer, Michael; Nester, Brian A; Geary, Una; Taggart, Pamela; Mangione, Antoinette; Ramalanjaona, George; Terregino, Carol; Dalsey, William C

    2009-01-01

    Objective: Comparative algorithmic evaluation of heartbeat series in low-to-high risk cardiac patients for the prospective prediction of risk of arrhythmic death (AD). Background: Heartbeat variation reflects cardiac autonomic function and risk of AD. Indices based on linear stochastic models are independent risk factors for AD in post-myocardial infarction (post-MI) cohorts. Indices based on nonlinear deterministic models have superior predictability in retrospective data. Methods: Patients were enrolled (N = 397) in three emergency departments upon presenting with chest pain and were determined to be at low-to-high risk of acute MI (>7%). Brief ECGs were recorded (15 min) and R-R intervals assessed by three nonlinear algorithms (PD2i, DFA, and ApEn) and four conventional linear-stochastic measures (SDNN, MNN, 1/f-Slope, LF/HF). Out-of-hospital AD was determined by modified Hinkle–Thaler criteria. Results: All-cause mortality at one-year follow-up was 10.3%, with 7.7% adjudicated to be AD. The sensitivity and relative risk for predicting AD was highest at all time-points for the nonlinear PD2i algorithm (p ≤0.001). The sensitivity at 30 days was 100%, specificity 58%, and relative risk >100 (p ≤0.001); sensitivity at 360 days was 95%, specificity 58%, and relative risk >11.4 (p ≤0.001). Conclusions: Heartbeat analysis by the time-dependent nonlinear PD2i algorithm is comparatively the superior test. PMID:19707283

  8. A priori mesh grading for the numerical calculation of the head-related transfer functions

    PubMed Central

    Ziegelwanger, Harald; Kreuzer, Wolfgang; Majdak, Piotr

    2017-01-01

    Head-related transfer functions (HRTFs) describe the directional filtering of the incoming sound caused by the morphology of a listener’s head and pinnae. When an accurate model of a listener’s morphology exists, HRTFs can be calculated numerically with the boundary element method (BEM). However, the general recommendation to model the head and pinnae with at least six elements per wavelength renders the BEM as a time-consuming procedure when calculating HRTFs for the full audible frequency range. In this study, a mesh preprocessing algorithm is proposed, viz., a priori mesh grading, which reduces the computational costs in the HRTF calculation process significantly. The mesh grading algorithm deliberately violates the recommendation of at least six elements per wavelength in certain regions of the head and pinnae and varies the size of elements gradually according to an a priori defined grading function. The evaluation of the algorithm involved HRTFs calculated for various geometric objects including meshes of three human listeners and various grading functions. The numerical accuracy and the predicted sound-localization performance of calculated HRTFs were analyzed. A-priori mesh grading appeared to be suitable for the numerical calculation of HRTFs in the full audible frequency range and outperformed uniform meshes in terms of numerical errors, perception based predictions of sound-localization performance, and computational costs. PMID:28239186

  9. Research on adaptive optics image restoration algorithm based on improved joint maximum a posteriori method

    NASA Astrophysics Data System (ADS)

    Zhang, Lijuan; Li, Yang; Wang, Junnan; Liu, Ying

    2018-03-01

    In this paper, we propose a point spread function (PSF) reconstruction method and joint maximum a posteriori (JMAP) estimation method for the adaptive optics image restoration. Using the JMAP method as the basic principle, we establish the joint log likelihood function of multi-frame adaptive optics (AO) images based on the image Gaussian noise models. To begin with, combining the observed conditions and AO system characteristics, a predicted PSF model for the wavefront phase effect is developed; then, we build up iterative solution formulas of the AO image based on our proposed algorithm, addressing the implementation process of multi-frame AO images joint deconvolution method. We conduct a series of experiments on simulated and real degraded AO images to evaluate our proposed algorithm. Compared with the Wiener iterative blind deconvolution (Wiener-IBD) algorithm and Richardson-Lucy IBD algorithm, our algorithm has better restoration effects including higher peak signal-to-noise ratio ( PSNR) and Laplacian sum ( LS) value than the others. The research results have a certain application values for actual AO image restoration.

  10. Machine learning algorithm accurately detects fMRI signature of vulnerability to major depression.

    PubMed

    Sato, João R; Moll, Jorge; Green, Sophie; Deakin, John F W; Thomaz, Carlos E; Zahn, Roland

    2015-08-30

    Standard functional magnetic resonance imaging (fMRI) analyses cannot assess the potential of a neuroimaging signature as a biomarker to predict individual vulnerability to major depression (MD). Here, we use machine learning for the first time to address this question. Using a recently identified neural signature of guilt-selective functional disconnection, the classification algorithm was able to distinguish remitted MD from control participants with 78.3% accuracy. This demonstrates the high potential of our fMRI signature as a biomarker of MD vulnerability. Crown Copyright © 2015. Published by Elsevier Ireland Ltd. All rights reserved.

  11. Network propagation in the cytoscape cyberinfrastructure.

    PubMed

    Carlin, Daniel E; Demchak, Barry; Pratt, Dexter; Sage, Eric; Ideker, Trey

    2017-10-01

    Network propagation is an important and widely used algorithm in systems biology, with applications in protein function prediction, disease gene prioritization, and patient stratification. However, up to this point it has required significant expertise to run. Here we extend the popular network analysis program Cytoscape to perform network propagation as an integrated function. Such integration greatly increases the access to network propagation by putting it in the hands of biologists and linking it to the many other types of network analysis and visualization available through Cytoscape. We demonstrate the power and utility of the algorithm by identifying mutations conferring resistance to Vemurafenib.

  12. Bioinactivation: Software for modelling dynamic microbial inactivation.

    PubMed

    Garre, Alberto; Fernández, Pablo S; Lindqvist, Roland; Egea, Jose A

    2017-03-01

    This contribution presents the bioinactivation software, which implements functions for the modelling of isothermal and non-isothermal microbial inactivation. This software offers features such as user-friendliness, modelling of dynamic conditions, possibility to choose the fitting algorithm and generation of prediction intervals. The software is offered in two different formats: Bioinactivation core and Bioinactivation SE. Bioinactivation core is a package for the R programming language, which includes features for the generation of predictions and for the fitting of models to inactivation experiments using non-linear regression or a Markov Chain Monte Carlo algorithm (MCMC). The calculations are based on inactivation models common in academia and industry (Bigelow, Peleg, Mafart and Geeraerd). Bioinactivation SE supplies a user-friendly interface to selected functions of Bioinactivation core, namely the model fitting of non-isothermal experiments and the generation of prediction intervals. The capabilities of bioinactivation are presented in this paper through a case study, modelling the non-isothermal inactivation of Bacillus sporothermodurans. This study has provided a full characterization of the response of the bacteria to dynamic temperature conditions, including confidence intervals for the model parameters and a prediction interval of the survivor curve. We conclude that the MCMC algorithm produces a better characterization of the biological uncertainty and variability than non-linear regression. The bioinactivation software can be relevant to the food and pharmaceutical industry, as well as to regulatory agencies, as part of a (quantitative) microbial risk assessment. Copyright © 2017 Elsevier Ltd. All rights reserved.

  13. Developing EHR-driven heart failure risk prediction models using CPXR(Log) with the probabilistic loss function.

    PubMed

    Taslimitehrani, Vahid; Dong, Guozhu; Pereira, Naveen L; Panahiazar, Maryam; Pathak, Jyotishman

    2016-04-01

    Computerized survival prediction in healthcare identifying the risk of disease mortality, helps healthcare providers to effectively manage their patients by providing appropriate treatment options. In this study, we propose to apply a classification algorithm, Contrast Pattern Aided Logistic Regression (CPXR(Log)) with the probabilistic loss function, to develop and validate prognostic risk models to predict 1, 2, and 5year survival in heart failure (HF) using data from electronic health records (EHRs) at Mayo Clinic. The CPXR(Log) constructs a pattern aided logistic regression model defined by several patterns and corresponding local logistic regression models. One of the models generated by CPXR(Log) achieved an AUC and accuracy of 0.94 and 0.91, respectively, and significantly outperformed prognostic models reported in prior studies. Data extracted from EHRs allowed incorporation of patient co-morbidities into our models which helped improve the performance of the CPXR(Log) models (15.9% AUC improvement), although did not improve the accuracy of the models built by other classifiers. We also propose a probabilistic loss function to determine the large error and small error instances. The new loss function used in the algorithm outperforms other functions used in the previous studies by 1% improvement in the AUC. This study revealed that using EHR data to build prediction models can be very challenging using existing classification methods due to the high dimensionality and complexity of EHR data. The risk models developed by CPXR(Log) also reveal that HF is a highly heterogeneous disease, i.e., different subgroups of HF patients require different types of considerations with their diagnosis and treatment. Our risk models provided two valuable insights for application of predictive modeling techniques in biomedicine: Logistic risk models often make systematic prediction errors, and it is prudent to use subgroup based prediction models such as those given by CPXR(Log) when investigating heterogeneous diseases. Copyright © 2016 Elsevier Inc. All rights reserved.

  14. Prediction of forced expiratory volume in pulmonary function test using radial basis neural networks and k-means clustering.

    PubMed

    Manoharan, Sujatha C; Ramakrishnan, Swaminathan

    2009-10-01

    In this work, prediction of forced expiratory volume in pulmonary function test, carried out using spirometry and neural networks is presented. The pulmonary function data were recorded from volunteers using commercial available flow volume spirometer in standard acquisition protocol. The Radial Basis Function neural networks were used to predict forced expiratory volume in 1 s (FEV1) from the recorded flow volume curves. The optimal centres of the hidden layer of radial basis function were determined by k-means clustering algorithm. The performance of the neural network model was evaluated by computing their prediction error statistics of average value, standard deviation, root mean square and their correlation with the true data for normal, restrictive and obstructive cases. Results show that the adopted neural networks are capable of predicting FEV1 in both normal and abnormal cases. Prediction accuracy was more in obstructive abnormality when compared to restrictive cases. It appears that this method of assessment is useful in diagnosing the pulmonary abnormalities with incomplete data and data with poor recording.

  15. BWM*: A Novel, Provable, Ensemble-based Dynamic Programming Algorithm for Sparse Approximations of Computational Protein Design.

    PubMed

    Jou, Jonathan D; Jain, Swati; Georgiev, Ivelin S; Donald, Bruce R

    2016-06-01

    Sparse energy functions that ignore long range interactions between residue pairs are frequently used by protein design algorithms to reduce computational cost. Current dynamic programming algorithms that fully exploit the optimal substructure produced by these energy functions only compute the GMEC. This disproportionately favors the sequence of a single, static conformation and overlooks better binding sequences with multiple low-energy conformations. Provable, ensemble-based algorithms such as A* avoid this problem, but A* cannot guarantee better performance than exhaustive enumeration. We propose a novel, provable, dynamic programming algorithm called Branch-Width Minimization* (BWM*) to enumerate a gap-free ensemble of conformations in order of increasing energy. Given a branch-decomposition of branch-width w for an n-residue protein design with at most q discrete side-chain conformations per residue, BWM* returns the sparse GMEC in O([Formula: see text]) time and enumerates each additional conformation in merely O([Formula: see text]) time. We define a new measure, Total Effective Search Space (TESS), which can be computed efficiently a priori before BWM* or A* is run. We ran BWM* on 67 protein design problems and found that TESS discriminated between BWM*-efficient and A*-efficient cases with 100% accuracy. As predicted by TESS and validated experimentally, BWM* outperforms A* in 73% of the cases and computes the full ensemble or a close approximation faster than A*, enumerating each additional conformation in milliseconds. Unlike A*, the performance of BWM* can be predicted in polynomial time before running the algorithm, which gives protein designers the power to choose the most efficient algorithm for their particular design problem.

  16. Application of multivariate Gaussian detection theory to known non-Gaussian probability density functions

    NASA Astrophysics Data System (ADS)

    Schwartz, Craig R.; Thelen, Brian J.; Kenton, Arthur C.

    1995-06-01

    A statistical parametric multispectral sensor performance model was developed by ERIM to support mine field detection studies, multispectral sensor design/performance trade-off studies, and target detection algorithm development. The model assumes target detection algorithms and their performance models which are based on data assumed to obey multivariate Gaussian probability distribution functions (PDFs). The applicability of these algorithms and performance models can be generalized to data having non-Gaussian PDFs through the use of transforms which convert non-Gaussian data to Gaussian (or near-Gaussian) data. An example of one such transform is the Box-Cox power law transform. In practice, such a transform can be applied to non-Gaussian data prior to the introduction of a detection algorithm that is formally based on the assumption of multivariate Gaussian data. This paper presents an extension of these techniques to the case where the joint multivariate probability density function of the non-Gaussian input data is known, and where the joint estimate of the multivariate Gaussian statistics, under the Box-Cox transform, is desired. The jointly estimated multivariate Gaussian statistics can then be used to predict the performance of a target detection algorithm which has an associated Gaussian performance model.

  17. Diametrical clustering for identifying anti-correlated gene clusters.

    PubMed

    Dhillon, Inderjit S; Marcotte, Edward M; Roshan, Usman

    2003-09-01

    Clustering genes based upon their expression patterns allows us to predict gene function. Most existing clustering algorithms cluster genes together when their expression patterns show high positive correlation. However, it has been observed that genes whose expression patterns are strongly anti-correlated can also be functionally similar. Biologically, this is not unintuitive-genes responding to the same stimuli, regardless of the nature of the response, are more likely to operate in the same pathways. We present a new diametrical clustering algorithm that explicitly identifies anti-correlated clusters of genes. Our algorithm proceeds by iteratively (i). re-partitioning the genes and (ii). computing the dominant singular vector of each gene cluster; each singular vector serving as the prototype of a 'diametric' cluster. We empirically show the effectiveness of the algorithm in identifying diametrical or anti-correlated clusters. Testing the algorithm on yeast cell cycle data, fibroblast gene expression data, and DNA microarray data from yeast mutants reveals that opposed cellular pathways can be discovered with this method. We present systems whose mRNA expression patterns, and likely their functions, oppose the yeast ribosome and proteosome, along with evidence for the inverse transcriptional regulation of a number of cellular systems.

  18. Binding ligand prediction for proteins using partial matching of local surface patches.

    PubMed

    Sael, Lee; Kihara, Daisuke

    2010-01-01

    Functional elucidation of uncharacterized protein structures is an important task in bioinformatics. We report our new approach for structure-based function prediction which captures local surface features of ligand binding pockets. Function of proteins, specifically, binding ligands of proteins, can be predicted by finding similar local surface regions of known proteins. To enable partial comparison of binding sites in proteins, a weighted bipartite matching algorithm is used to match pairs of surface patches. The surface patches are encoded with the 3D Zernike descriptors. Unlike the existing methods which compare global characteristics of the protein fold or the global pocket shape, the local surface patch method can find functional similarity between non-homologous proteins and binding pockets for flexible ligand molecules. The proposed method improves prediction results over global pocket shape-based method which was previously developed by our group.

  19. Binding Ligand Prediction for Proteins Using Partial Matching of Local Surface Patches

    PubMed Central

    Sael, Lee; Kihara, Daisuke

    2010-01-01

    Functional elucidation of uncharacterized protein structures is an important task in bioinformatics. We report our new approach for structure-based function prediction which captures local surface features of ligand binding pockets. Function of proteins, specifically, binding ligands of proteins, can be predicted by finding similar local surface regions of known proteins. To enable partial comparison of binding sites in proteins, a weighted bipartite matching algorithm is used to match pairs of surface patches. The surface patches are encoded with the 3D Zernike descriptors. Unlike the existing methods which compare global characteristics of the protein fold or the global pocket shape, the local surface patch method can find functional similarity between non-homologous proteins and binding pockets for flexible ligand molecules. The proposed method improves prediction results over global pocket shape-based method which was previously developed by our group. PMID:21614188

  20. Reconstructing genome-wide regulatory network of E. coli using transcriptome data and predicted transcription factor activities

    PubMed Central

    2011-01-01

    Background Gene regulatory networks play essential roles in living organisms to control growth, keep internal metabolism running and respond to external environmental changes. Understanding the connections and the activity levels of regulators is important for the research of gene regulatory networks. While relevance score based algorithms that reconstruct gene regulatory networks from transcriptome data can infer genome-wide gene regulatory networks, they are unfortunately prone to false positive results. Transcription factor activities (TFAs) quantitatively reflect the ability of the transcription factor to regulate target genes. However, classic relevance score based gene regulatory network reconstruction algorithms use models do not include the TFA layer, thus missing a key regulatory element. Results This work integrates TFA prediction algorithms with relevance score based network reconstruction algorithms to reconstruct gene regulatory networks with improved accuracy over classic relevance score based algorithms. This method is called Gene expression and Transcription factor activity based Relevance Network (GTRNetwork). Different combinations of TFA prediction algorithms and relevance score functions have been applied to find the most efficient combination. When the integrated GTRNetwork method was applied to E. coli data, the reconstructed genome-wide gene regulatory network predicted 381 new regulatory links. This reconstructed gene regulatory network including the predicted new regulatory links show promising biological significances. Many of the new links are verified by known TF binding site information, and many other links can be verified from the literature and databases such as EcoCyc. The reconstructed gene regulatory network is applied to a recent transcriptome analysis of E. coli during isobutanol stress. In addition to the 16 significantly changed TFAs detected in the original paper, another 7 significantly changed TFAs have been detected by using our reconstructed network. Conclusions The GTRNetwork algorithm introduces the hidden layer TFA into classic relevance score-based gene regulatory network reconstruction processes. Integrating the TFA biological information with regulatory network reconstruction algorithms significantly improves both detection of new links and reduces that rate of false positives. The application of GTRNetwork on E. coli gene transcriptome data gives a set of potential regulatory links with promising biological significance for isobutanol stress and other conditions. PMID:21668997

  1. On the suitability of different representations of solid catalysts for combinatorial library design by genetic algorithms.

    PubMed

    Gobin, Oliver C; Schüth, Ferdi

    2008-01-01

    Genetic algorithms are widely used to solve and optimize combinatorial problems and are more often applied for library design in combinatorial chemistry. Because of their flexibility, however, their implementation can be challenging. In this study, the influence of the representation of solid catalysts on the performance of genetic algorithms was systematically investigated on the basis of a new, constrained, multiobjective, combinatorial test problem with properties common to problems in combinatorial materials science. Constraints were satisfied by penalty functions, repair algorithms, or special representations. The tests were performed using three state-of-the-art evolutionary multiobjective algorithms by performing 100 optimization runs for each algorithm and test case. Experimental data obtained during the optimization of a noble metal-free solid catalyst system active in the selective catalytic reduction of nitric oxide with propene was used to build up a predictive model to validate the results of the theoretical test problem. A significant influence of the representation on the optimization performance was observed. Binary encodings were found to be the preferred encoding in most of the cases, and depending on the experimental test unit, repair algorithms or penalty functions performed best.

  2. Predictive Rate-Distortion for Infinite-Order Markov Processes

    NASA Astrophysics Data System (ADS)

    Marzen, Sarah E.; Crutchfield, James P.

    2016-06-01

    Predictive rate-distortion analysis suffers from the curse of dimensionality: clustering arbitrarily long pasts to retain information about arbitrarily long futures requires resources that typically grow exponentially with length. The challenge is compounded for infinite-order Markov processes, since conditioning on finite sequences cannot capture all of their past dependencies. Spectral arguments confirm a popular intuition: algorithms that cluster finite-length sequences fail dramatically when the underlying process has long-range temporal correlations and can fail even for processes generated by finite-memory hidden Markov models. We circumvent the curse of dimensionality in rate-distortion analysis of finite- and infinite-order processes by casting predictive rate-distortion objective functions in terms of the forward- and reverse-time causal states of computational mechanics. Examples demonstrate that the resulting algorithms yield substantial improvements.

  3. Simulation of an expanding plasma using the Boris algorithm

    NASA Astrophysics Data System (ADS)

    Neal, Luke; Aguirre, Evan; Steinberger, Thomas; Good, Timothy; Scime, Earl

    2017-10-01

    We present a Boris algorithm simulation in a cylindrical geometry of charged particle motion in a helicon plasma confined by a diverging magnetic field. Laboratory measurements of ion velocity distribution functions (ivdfs) provide evidence for acceleration of ions into the divergent field region in the center of the discharge. The increase in ion velocity is inconsistent with expectations for simple magnetic moment conservation given the magnetic field mirror ratio and is therefore attributed to the presence of a double layer in the literature. Using measured electric fields and ivdfs (at different radial locations across the entire plasma column) upstream and downstream of the divergent magnetic field region, we compare predictions for the downstream ivdfs to measurements. We also present predictions for the evolution of the electron velocity distribution function downstream of the divergent magnetic field. This work was supported by U.S. National Science Foundation Grant No. PHY-1360278.

  4. Accurate SHAPE-directed RNA secondary structure modeling, including pseudoknots.

    PubMed

    Hajdin, Christine E; Bellaousov, Stanislav; Huggins, Wayne; Leonard, Christopher W; Mathews, David H; Weeks, Kevin M

    2013-04-02

    A pseudoknot forms in an RNA when nucleotides in a loop pair with a region outside the helices that close the loop. Pseudoknots occur relatively rarely in RNA but are highly overrepresented in functionally critical motifs in large catalytic RNAs, in riboswitches, and in regulatory elements of viruses. Pseudoknots are usually excluded from RNA structure prediction algorithms. When included, these pairings are difficult to model accurately, especially in large RNAs, because allowing this structure dramatically increases the number of possible incorrect folds and because it is difficult to search the fold space for an optimal structure. We have developed a concise secondary structure modeling approach that combines SHAPE (selective 2'-hydroxyl acylation analyzed by primer extension) experimental chemical probing information and a simple, but robust, energy model for the entropic cost of single pseudoknot formation. Structures are predicted with iterative refinement, using a dynamic programming algorithm. This melded experimental and thermodynamic energy function predicted the secondary structures and the pseudoknots for a set of 21 challenging RNAs of known structure ranging in size from 34 to 530 nt. On average, 93% of known base pairs were predicted, and all pseudoknots in well-folded RNAs were identified.

  5. Critical Features of Fragment Libraries for Protein Structure Prediction

    PubMed Central

    dos Santos, Karina Baptista

    2017-01-01

    The use of fragment libraries is a popular approach among protein structure prediction methods and has proven to substantially improve the quality of predicted structures. However, some vital aspects of a fragment library that influence the accuracy of modeling a native structure remain to be determined. This study investigates some of these features. Particularly, we analyze the effect of using secondary structure prediction guiding fragments selection, different fragments sizes and the effect of structural clustering of fragments within libraries. To have a clearer view of how these factors affect protein structure prediction, we isolated the process of model building by fragment assembly from some common limitations associated with prediction methods, e.g., imprecise energy functions and optimization algorithms, by employing an exact structure-based objective function under a greedy algorithm. Our results indicate that shorter fragments reproduce the native structure more accurately than the longer. Libraries composed of multiple fragment lengths generate even better structures, where longer fragments show to be more useful at the beginning of the simulations. The use of many different fragment sizes shows little improvement when compared to predictions carried out with libraries that comprise only three different fragment sizes. Models obtained from libraries built using only sequence similarity are, on average, better than those built with a secondary structure prediction bias. However, we found that the use of secondary structure prediction allows greater reduction of the search space, which is invaluable for prediction methods. The results of this study can be critical guidelines for the use of fragment libraries in protein structure prediction. PMID:28085928

  6. Critical Features of Fragment Libraries for Protein Structure Prediction.

    PubMed

    Trevizani, Raphael; Custódio, Fábio Lima; Dos Santos, Karina Baptista; Dardenne, Laurent Emmanuel

    2017-01-01

    The use of fragment libraries is a popular approach among protein structure prediction methods and has proven to substantially improve the quality of predicted structures. However, some vital aspects of a fragment library that influence the accuracy of modeling a native structure remain to be determined. This study investigates some of these features. Particularly, we analyze the effect of using secondary structure prediction guiding fragments selection, different fragments sizes and the effect of structural clustering of fragments within libraries. To have a clearer view of how these factors affect protein structure prediction, we isolated the process of model building by fragment assembly from some common limitations associated with prediction methods, e.g., imprecise energy functions and optimization algorithms, by employing an exact structure-based objective function under a greedy algorithm. Our results indicate that shorter fragments reproduce the native structure more accurately than the longer. Libraries composed of multiple fragment lengths generate even better structures, where longer fragments show to be more useful at the beginning of the simulations. The use of many different fragment sizes shows little improvement when compared to predictions carried out with libraries that comprise only three different fragment sizes. Models obtained from libraries built using only sequence similarity are, on average, better than those built with a secondary structure prediction bias. However, we found that the use of secondary structure prediction allows greater reduction of the search space, which is invaluable for prediction methods. The results of this study can be critical guidelines for the use of fragment libraries in protein structure prediction.

  7. regSNPs-splicing: a tool for prioritizing synonymous single-nucleotide substitution.

    PubMed

    Zhang, Xinjun; Li, Meng; Lin, Hai; Rao, Xi; Feng, Weixing; Yang, Yuedong; Mort, Matthew; Cooper, David N; Wang, Yue; Wang, Yadong; Wells, Clark; Zhou, Yaoqi; Liu, Yunlong

    2017-09-01

    While synonymous single-nucleotide variants (sSNVs) have largely been unstudied, since they do not alter protein sequence, mounting evidence suggests that they may affect RNA conformation, splicing, and the stability of nascent-mRNAs to promote various diseases. Accurately prioritizing deleterious sSNVs from a pool of neutral ones can significantly improve our ability of selecting functional genetic variants identified from various genome-sequencing projects, and, therefore, advance our understanding of disease etiology. In this study, we develop a computational algorithm to prioritize sSNVs based on their impact on mRNA splicing and protein function. In addition to genomic features that potentially affect splicing regulation, our proposed algorithm also includes dozens structural features that characterize the functions of alternatively spliced exons on protein function. Our systematical evaluation on thousands of sSNVs suggests that several structural features, including intrinsic disorder protein scores, solvent accessible surface areas, protein secondary structures, and known and predicted protein family domains, show significant differences between disease-causing and neutral sSNVs. Our result suggests that the protein structure features offer an added dimension of information while distinguishing disease-causing and neutral synonymous variants. The inclusion of structural features increases the predictive accuracy for functional sSNV prioritization.

  8. A variable structure fuzzy neural network model of squamous dysplasia and esophageal squamous cell carcinoma based on a global chaotic optimization algorithm.

    PubMed

    Moghtadaei, Motahareh; Hashemi Golpayegani, Mohammad Reza; Malekzadeh, Reza

    2013-02-07

    Identification of squamous dysplasia and esophageal squamous cell carcinoma (ESCC) is of great importance in prevention of cancer incidence. Computer aided algorithms can be very useful for identification of people with higher risks of squamous dysplasia, and ESCC. Such method can limit the clinical screenings to people with higher risks. Different regression methods have been used to predict ESCC and dysplasia. In this paper, a Fuzzy Neural Network (FNN) model is selected for ESCC and dysplasia prediction. The inputs to the classifier are the risk factors. Since the relation between risk factors in the tumor system has a complex nonlinear behavior, in comparison to most of ordinary data, the cost function of its model can have more local optimums. Thus the need for global optimization methods is more highlighted. The proposed method in this paper is a Chaotic Optimization Algorithm (COA) proceeding by the common Error Back Propagation (EBP) local method. Since the model has many parameters, we use a strategy to reduce the dependency among parameters caused by the chaotic series generator. This dependency was not considered in the previous COA methods. The algorithm is compared with logistic regression model as the latest successful methods of ESCC and dysplasia prediction. The results represent a more precise prediction with less mean and variance of error. Copyright © 2012 Elsevier Ltd. All rights reserved.

  9. New algorithm for tensor contractions on multi-core CPUs, GPUs, and accelerators enables CCSD and EOM-CCSD calculations with over 1000 basis functions on a single compute node.

    PubMed

    Kaliman, Ilya A; Krylov, Anna I

    2017-04-30

    A new hardware-agnostic contraction algorithm for tensors of arbitrary symmetry and sparsity is presented. The algorithm is implemented as a stand-alone open-source code libxm. This code is also integrated with general tensor library libtensor and with the Q-Chem quantum-chemistry package. An overview of the algorithm, its implementation, and benchmarks are presented. Similarly to other tensor software, the algorithm exploits efficient matrix multiplication libraries and assumes that tensors are stored in a block-tensor form. The distinguishing features of the algorithm are: (i) efficient repackaging of the individual blocks into large matrices and back, which affords efficient graphics processing unit (GPU)-enabled calculations without modifications of higher-level codes; (ii) fully asynchronous data transfer between disk storage and fast memory. The algorithm enables canonical all-electron coupled-cluster and equation-of-motion coupled-cluster calculations with single and double substitutions (CCSD and EOM-CCSD) with over 1000 basis functions on a single quad-GPU machine. We show that the algorithm exhibits predicted theoretical scaling for canonical CCSD calculations, O(N 6 ), irrespective of the data size on disk. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.

  10. Using the brain's fight-or-flight response for predicting mental illness on the human space flight program

    NASA Astrophysics Data System (ADS)

    Losik, L.

    A predictive medicine program allows disease and illness including mental illness to be predicted using tools created to identify the presence of accelerated aging (a.k.a. disease) in electrical and mechanical equipment. When illness and disease can be predicted, actions can be taken so that the illness and disease can be prevented and eliminated. A predictive medicine program uses the same tools and practices from a prognostic and health management program to process biological and engineering diagnostic data provided in analog telemetry during prelaunch readiness and space exploration missions. The biological and engineering diagnostic data necessary to predict illness and disease is collected from the pre-launch spaceflight readiness activities and during space flight for the ground crew to perform a prognostic analysis on the results from a diagnostic analysis. The diagnostic, biological data provided in telemetry is converted to prognostic (predictive) data using the predictive algorithms. Predictive algorithms demodulate telemetry behavior. They illustrate the presence of accelerated aging/disease in normal appearing systems that function normally. Mental illness can predicted using biological diagnostic measurements provided in CCSDS telemetry from a spacecraft such as the ISS or from a manned spacecraft in deep space. The measurements used to predict mental illness include biological and engineering data from an astronaut's circadian and ultranian rhythms. This data originates deep in the brain that is also damaged from the long-term exposure to cortisol and adrenaline anytime the body's fight or flight response is activated. This paper defines the brain's FOFR; the diagnostic, biological and engineering measurements needed to predict mental illness, identifies the predictive algorithms necessary to process the behavior in CCSDS analog telemetry to predict and thus prevent mental illness from occurring on human spaceflight missions.

  11. Predicting Ligand Binding Sites on Protein Surfaces by 3-Dimensional Probability Density Distributions of Interacting Atoms

    PubMed Central

    Jian, Jhih-Wei; Elumalai, Pavadai; Pitti, Thejkiran; Wu, Chih Yuan; Tsai, Keng-Chang; Chang, Jeng-Yih; Peng, Hung-Pin; Yang, An-Suei

    2016-01-01

    Predicting ligand binding sites (LBSs) on protein structures, which are obtained either from experimental or computational methods, is a useful first step in functional annotation or structure-based drug design for the protein structures. In this work, the structure-based machine learning algorithm ISMBLab-LIG was developed to predict LBSs on protein surfaces with input attributes derived from the three-dimensional probability density maps of interacting atoms, which were reconstructed on the query protein surfaces and were relatively insensitive to local conformational variations of the tentative ligand binding sites. The prediction accuracy of the ISMBLab-LIG predictors is comparable to that of the best LBS predictors benchmarked on several well-established testing datasets. More importantly, the ISMBLab-LIG algorithm has substantial tolerance to the prediction uncertainties of computationally derived protein structure models. As such, the method is particularly useful for predicting LBSs not only on experimental protein structures without known LBS templates in the database but also on computationally predicted model protein structures with structural uncertainties in the tentative ligand binding sites. PMID:27513851

  12. sNebula, a network-based algorithm to predict binding between human leukocyte antigens and peptides

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Luo, Heng; Ye, Hao; Ng, Hui Wen

    Understanding the binding between human leukocyte antigens (HLAs) and peptides is important to understand the functioning of the immune system. Since it is time-consuming and costly to measure the binding between large numbers of HLAs and peptides, computational methods including machine learning models and network approaches have been developed to predict HLA-peptide binding. However, there are several limitations for the existing methods. We developed a network-based algorithm called sNebula to address these limitations. We curated qualitative Class I HLA-peptide binding data and demonstrated the prediction performance of sNebula on this dataset using leave-one-out cross-validation and five-fold cross-validations. Furthermore, this algorithmmore » can predict not only peptides of different lengths and different types of HLAs, but also the peptides or HLAs that have no existing binding data. We believe sNebula is an effective method to predict HLA-peptide binding and thus improve our understanding of the immune system.« less

  13. sNebula, a network-based algorithm to predict binding between human leukocyte antigens and peptides

    DOE PAGES

    Luo, Heng; Ye, Hao; Ng, Hui Wen; ...

    2016-08-25

    Understanding the binding between human leukocyte antigens (HLAs) and peptides is important to understand the functioning of the immune system. Since it is time-consuming and costly to measure the binding between large numbers of HLAs and peptides, computational methods including machine learning models and network approaches have been developed to predict HLA-peptide binding. However, there are several limitations for the existing methods. We developed a network-based algorithm called sNebula to address these limitations. We curated qualitative Class I HLA-peptide binding data and demonstrated the prediction performance of sNebula on this dataset using leave-one-out cross-validation and five-fold cross-validations. Furthermore, this algorithmmore » can predict not only peptides of different lengths and different types of HLAs, but also the peptides or HLAs that have no existing binding data. We believe sNebula is an effective method to predict HLA-peptide binding and thus improve our understanding of the immune system.« less

  14. Predicting the survival of diabetes using neural network

    NASA Astrophysics Data System (ADS)

    Mamuda, Mamman; Sathasivam, Saratha

    2017-08-01

    Data mining techniques at the present time are used in predicting diseases of health care industries. Neural Network is one among the prevailing method in data mining techniques of an intelligent field for predicting diseases in health care industries. This paper presents a study on the prediction of the survival of diabetes diseases using different learning algorithms from the supervised learning algorithms of neural network. Three learning algorithms are considered in this study: (i) The levenberg-marquardt learning algorithm (ii) The Bayesian regulation learning algorithm and (iii) The scaled conjugate gradient learning algorithm. The network is trained using the Pima Indian Diabetes Dataset with the help of MATLAB R2014(a) software. The performance of each algorithm is further discussed through regression analysis. The prediction accuracy of the best algorithm is further computed to validate the accurate prediction

  15. Analysis and Prediction of Myristoylation Sites Using the mRMR Method, the IFS Method and an Extreme Learning Machine Algorithm.

    PubMed

    Wang, ShaoPeng; Zhang, Yu-Hang; Huang, GuoHua; Chen, Lei; Cai, Yu-Dong

    2017-01-01

    Myristoylation is an important hydrophobic post-translational modification that is covalently bound to the amino group of Gly residues on the N-terminus of proteins. The many diverse functions of myristoylation on proteins, such as membrane targeting, signal pathway regulation and apoptosis, are largely due to the lipid modification, whereas abnormal or irregular myristoylation on proteins can lead to several pathological changes in the cell. To better understand the function of myristoylated sites and to correctly identify them in protein sequences, this study conducted a novel computational investigation on identifying myristoylation sites in protein sequences. A training dataset with 196 positive and 84 negative peptide segments were obtained. Four types of features derived from the peptide segments following the myristoylation sites were used to specify myristoylatedand non-myristoylated sites. Then, feature selection methods including maximum relevance and minimum redundancy (mRMR), incremental feature selection (IFS), and a machine learning algorithm (extreme learning machine method) were adopted to extract optimal features for the algorithm to identify myristoylation sites in protein sequences, thereby building an optimal prediction model. As a result, 41 key features were extracted and used to build an optimal prediction model. The effectiveness of the optimal prediction model was further validated by its performance on a test dataset. Furthermore, detailed analyses were also performed on the extracted 41 features to gain insight into the mechanism of myristoylation modification. This study provided a new computational method for identifying myristoylation sites in protein sequences. We believe that it can be a useful tool to predict myristoylation sites from protein sequences. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  16. Robust functional regression model for marginal mean and subject-specific inferences.

    PubMed

    Cao, Chunzheng; Shi, Jian Qing; Lee, Youngjo

    2017-01-01

    We introduce flexible robust functional regression models, using various heavy-tailed processes, including a Student t-process. We propose efficient algorithms in estimating parameters for the marginal mean inferences and in predicting conditional means as well as interpolation and extrapolation for the subject-specific inferences. We develop bootstrap prediction intervals (PIs) for conditional mean curves. Numerical studies show that the proposed model provides a robust approach against data contamination or distribution misspecification, and the proposed PIs maintain the nominal confidence levels. A real data application is presented as an illustrative example.

  17. Sparse Zero-Sum Games as Stable Functional Feature Selection

    PubMed Central

    Sokolovska, Nataliya; Teytaud, Olivier; Rizkalla, Salwa; Clément, Karine; Zucker, Jean-Daniel

    2015-01-01

    In large-scale systems biology applications, features are structured in hidden functional categories whose predictive power is identical. Feature selection, therefore, can lead not only to a problem with a reduced dimensionality, but also reveal some knowledge on functional classes of variables. In this contribution, we propose a framework based on a sparse zero-sum game which performs a stable functional feature selection. In particular, the approach is based on feature subsets ranking by a thresholding stochastic bandit. We provide a theoretical analysis of the introduced algorithm. We illustrate by experiments on both synthetic and real complex data that the proposed method is competitive from the predictive and stability viewpoints. PMID:26325268

  18. Navier-Stokes Dynamics by a Discrete Boltzmann Model

    NASA Technical Reports Server (NTRS)

    Rubinstein, Robet

    2010-01-01

    This work investigates the possibility of particle-based algorithms for the Navier-Stokes equations and higher order continuum approximations of the Boltzmann equation; such algorithms would generalize the well-known Pullin scheme for the Euler equations. One such method is proposed in the context of a discrete velocity model of the Boltzmann equation. Preliminary results on shock structure are consistent with the expectation that the shock should be much broader than the near discontinuity predicted by the Pullin scheme, yet narrower than the prediction of the Boltzmann equation. We discuss the extension of this essentially deterministic method to a stochastic particle method that, like DSMC, samples the distribution function rather than resolving it completely.

  19. Testing block subdivision algorithms on block designs

    NASA Astrophysics Data System (ADS)

    Wiseman, Natalie; Patterson, Zachary

    2016-01-01

    Integrated land use-transportation models predict future transportation demand taking into account how households and firms arrange themselves partly as a function of the transportation system. Recent integrated models require parcels as inputs and produce household and employment predictions at the parcel scale. Block subdivision algorithms automatically generate parcel patterns within blocks. Evaluating block subdivision algorithms is done by way of generating parcels and comparing them to those in a parcel database. Three block subdivision algorithms are evaluated on how closely they reproduce parcels of different block types found in a parcel database from Montreal, Canada. While the authors who developed each of the algorithms have evaluated them, they have used their own metrics and block types to evaluate their own algorithms. This makes it difficult to compare their strengths and weaknesses. The contribution of this paper is in resolving this difficulty with the aim of finding a better algorithm suited to subdividing each block type. The proposed hypothesis is that given the different approaches that block subdivision algorithms take, it's likely that different algorithms are better adapted to subdividing different block types. To test this, a standardized block type classification is used that consists of mutually exclusive and comprehensive categories. A statistical method is used for finding a better algorithm and the probability it will perform well for a given block type. Results suggest the oriented bounding box algorithm performs better for warped non-uniform sites, as well as gridiron and fragmented uniform sites. It also produces more similar parcel areas and widths. The Generalized Parcel Divider 1 algorithm performs better for gridiron non-uniform sites. The Straight Skeleton algorithm performs better for loop and lollipop networks as well as fragmented non-uniform and warped uniform sites. It also produces more similar parcel shapes and patterns.

  20. Efficient Green's Function Reaction Dynamics (GFRD) simulations for diffusion-limited, reversible reactions

    NASA Astrophysics Data System (ADS)

    Bashardanesh, Zahedeh; Lötstedt, Per

    2018-03-01

    In diffusion controlled reversible bimolecular reactions in three dimensions, a dissociation step is typically followed by multiple, rapid re-association steps slowing down the simulations of such systems. In order to improve the efficiency, we first derive an exact Green's function describing the rate at which an isolated pair of particles undergoing reversible bimolecular reactions and unimolecular decay separates beyond an arbitrarily chosen distance. Then the Green's function is used in an algorithm for particle-based stochastic reaction-diffusion simulations for prediction of the dynamics of biochemical networks. The accuracy and efficiency of the algorithm are evaluated using a reversible reaction and a push-pull chemical network. The computational work is independent of the rates of the re-associations.

  1. Estimating gene function with least squares nonnegative matrix factorization.

    PubMed

    Wang, Guoli; Ochs, Michael F

    2007-01-01

    Nonnegative matrix factorization is a machine learning algorithm that has extracted information from data in a number of fields, including imaging and spectral analysis, text mining, and microarray data analysis. One limitation with the method for linking genes through microarray data in order to estimate gene function is the high variance observed in transcription levels between different genes. Least squares nonnegative matrix factorization uses estimates of the uncertainties on the mRNA levels for each gene in each condition, to guide the algorithm to a local minimum in normalized chi2, rather than a Euclidean distance or divergence between the reconstructed data and the data itself. Herein, application of this method to microarray data is demonstrated in order to predict gene function.

  2. Comprehensive modeling of microRNA targets predicts functional non-conserved and non-canonical sites.

    PubMed

    Betel, Doron; Koppal, Anjali; Agius, Phaedra; Sander, Chris; Leslie, Christina

    2010-01-01

    mirSVR is a new machine learning method for ranking microRNA target sites by a down-regulation score. The algorithm trains a regression model on sequence and contextual features extracted from miRanda-predicted target sites. In a large-scale evaluation, miRanda-mirSVR is competitive with other target prediction methods in identifying target genes and predicting the extent of their downregulation at the mRNA or protein levels. Importantly, the method identifies a significant number of experimentally determined non-canonical and non-conserved sites.

  3. FPGA accelerator for protein secondary structure prediction based on the GOR algorithm

    PubMed Central

    2011-01-01

    Background Protein is an important molecule that performs a wide range of functions in biological systems. Recently, the protein folding attracts much more attention since the function of protein can be generally derived from its molecular structure. The GOR algorithm is one of the most successful computational methods and has been widely used as an efficient analysis tool to predict secondary structure from protein sequence. However, the execution time is still intolerable with the steep growth in protein database. Recently, FPGA chips have emerged as one promising application accelerator to accelerate bioinformatics algorithms by exploiting fine-grained custom design. Results In this paper, we propose a complete fine-grained parallel hardware implementation on FPGA to accelerate the GOR-IV package for 2D protein structure prediction. To improve computing efficiency, we partition the parameter table into small segments and access them in parallel. We aggressively exploit data reuse schemes to minimize the need for loading data from external memory. The whole computation structure is carefully pipelined to overlap the sequence loading, computing and back-writing operations as much as possible. We implemented a complete GOR desktop system based on an FPGA chip XC5VLX330. Conclusions The experimental results show a speedup factor of more than 430x over the original GOR-IV version and 110x speedup over the optimized version with multi-thread SIMD implementation running on a PC platform with AMD Phenom 9650 Quad CPU for 2D protein structure prediction. However, the power consumption is only about 30% of that of current general-propose CPUs. PMID:21342582

  4. Testing an earthquake prediction algorithm

    USGS Publications Warehouse

    Kossobokov, V.G.; Healy, J.H.; Dewey, J.W.

    1997-01-01

    A test to evaluate earthquake prediction algorithms is being applied to a Russian algorithm known as M8. The M8 algorithm makes intermediate term predictions for earthquakes to occur in a large circle, based on integral counts of transient seismicity in the circle. In a retroactive prediction for the period January 1, 1985 to July 1, 1991 the algorithm as configured for the forward test would have predicted eight of ten strong earthquakes in the test area. A null hypothesis, based on random assignment of predictions, predicts eight earthquakes in 2.87% of the trials. The forward test began July 1, 1991 and will run through December 31, 1997. As of July 1, 1995, the algorithm had forward predicted five out of nine earthquakes in the test area, which success ratio would have been achieved in 53% of random trials with the null hypothesis.

  5. ClusterViz: A Cytoscape APP for Cluster Analysis of Biological Network.

    PubMed

    Wang, Jianxin; Zhong, Jiancheng; Chen, Gang; Li, Min; Wu, Fang-xiang; Pan, Yi

    2015-01-01

    Cluster analysis of biological networks is one of the most important approaches for identifying functional modules and predicting protein functions. Furthermore, visualization of clustering results is crucial to uncover the structure of biological networks. In this paper, ClusterViz, an APP of Cytoscape 3 for cluster analysis and visualization, has been developed. In order to reduce complexity and enable extendibility for ClusterViz, we designed the architecture of ClusterViz based on the framework of Open Services Gateway Initiative. According to the architecture, the implementation of ClusterViz is partitioned into three modules including interface of ClusterViz, clustering algorithms and visualization and export. ClusterViz fascinates the comparison of the results of different algorithms to do further related analysis. Three commonly used clustering algorithms, FAG-EC, EAGLE and MCODE, are included in the current version. Due to adopting the abstract interface of algorithms in module of the clustering algorithms, more clustering algorithms can be included for the future use. To illustrate usability of ClusterViz, we provided three examples with detailed steps from the important scientific articles, which show that our tool has helped several research teams do their research work on the mechanism of the biological networks.

  6. Applied Distributed Model Predictive Control for Energy Efficient Buildings and Ramp Metering

    NASA Astrophysics Data System (ADS)

    Koehler, Sarah Muraoka

    Industrial large-scale control problems present an interesting algorithmic design challenge. A number of controllers must cooperate in real-time on a network of embedded hardware with limited computing power in order to maximize system efficiency while respecting constraints and despite communication delays. Model predictive control (MPC) can automatically synthesize a centralized controller which optimizes an objective function subject to a system model, constraints, and predictions of disturbance. Unfortunately, the computations required by model predictive controllers for large-scale systems often limit its industrial implementation only to medium-scale slow processes. Distributed model predictive control (DMPC) enters the picture as a way to decentralize a large-scale model predictive control problem. The main idea of DMPC is to split the computations required by the MPC problem amongst distributed processors that can compute in parallel and communicate iteratively to find a solution. Some popularly proposed solutions are distributed optimization algorithms such as dual decomposition and the alternating direction method of multipliers (ADMM). However, these algorithms ignore two practical challenges: substantial communication delays present in control systems and also problem non-convexity. This thesis presents two novel and practically effective DMPC algorithms. The first DMPC algorithm is based on a primal-dual active-set method which achieves fast convergence, making it suitable for large-scale control applications which have a large communication delay across its communication network. In particular, this algorithm is suited for MPC problems with a quadratic cost, linear dynamics, forecasted demand, and box constraints. We measure the performance of this algorithm and show that it significantly outperforms both dual decomposition and ADMM in the presence of communication delay. The second DMPC algorithm is based on an inexact interior point method which is suited for nonlinear optimization problems. The parallel computation of the algorithm exploits iterative linear algebra methods for the main linear algebra computations in the algorithm. We show that the splitting of the algorithm is flexible and can thus be applied to various distributed platform configurations. The two proposed algorithms are applied to two main energy and transportation control problems. The first application is energy efficient building control. Buildings represent 40% of energy consumption in the United States. Thus, it is significant to improve the energy efficiency of buildings. The goal is to minimize energy consumption subject to the physics of the building (e.g. heat transfer laws), the constraints of the actuators as well as the desired operating constraints (thermal comfort of the occupants), and heat load on the system. In this thesis, we describe the control systems of forced air building systems in practice. We discuss the "Trim and Respond" algorithm which is a distributed control algorithm that is used in practice, and show that it performs similarly to a one-step explicit DMPC algorithm. Then, we apply the novel distributed primal-dual active-set method and provide extensive numerical results for the building MPC problem. The second main application is the control of ramp metering signals to optimize traffic flow through a freeway system. This application is particularly important since urban congestion has more than doubled in the past few decades. The ramp metering problem is to maximize freeway throughput subject to freeway dynamics (derived from mass conservation), actuation constraints, freeway capacity constraints, and predicted traffic demand. In this thesis, we develop a hybrid model predictive controller for ramp metering that is guaranteed to be persistently feasible and stable. This contrasts to previous work on MPC for ramp metering where such guarantees are absent. We apply a smoothing method to the hybrid model predictive controller and apply the inexact interior point method to this nonlinear non-convex ramp metering problem.

  7. The knowledge instinct, cognitive algorithms, modeling of language and cultural evolution

    NASA Astrophysics Data System (ADS)

    Perlovsky, Leonid I.

    2008-04-01

    The talk discusses mechanisms of the mind and their engineering applications. The past attempts at designing "intelligent systems" encountered mathematical difficulties related to algorithmic complexity. The culprit turned out to be logic, which in one way or another was used not only in logic rule systems, but also in statistical, neural, and fuzzy systems. Algorithmic complexity is related to Godel's theory, a most fundamental mathematical result. These difficulties were overcome by replacing logic with a dynamic process "from vague to crisp," dynamic logic. It leads to algorithms overcoming combinatorial complexity, and resulting in orders of magnitude improvement in classical problems of detection, tracking, fusion, and prediction in noise. I present engineering applications to pattern recognition, detection, tracking, fusion, financial predictions, and Internet search engines. Mathematical and engineering efficiency of dynamic logic can also be understood as cognitive algorithm, which describes fundamental property of the mind, the knowledge instinct responsible for all our higher cognitive functions: concepts, perception, cognition, instincts, imaginations, intuitions, emotions, including emotions of the beautiful. I present our latest results in modeling evolution of languages and cultures, their interactions in these processes, and role of music in cultural evolution. Experimental data is presented that support the theory. Future directions are outlined.

  8. Computational intelligence techniques for biological data mining: An overview

    NASA Astrophysics Data System (ADS)

    Faye, Ibrahima; Iqbal, Muhammad Javed; Said, Abas Md; Samir, Brahim Belhaouari

    2014-10-01

    Computational techniques have been successfully utilized for a highly accurate analysis and modeling of multifaceted and raw biological data gathered from various genome sequencing projects. These techniques are proving much more effective to overcome the limitations of the traditional in-vitro experiments on the constantly increasing sequence data. However, most critical problems that caught the attention of the researchers may include, but not limited to these: accurate structure and function prediction of unknown proteins, protein subcellular localization prediction, finding protein-protein interactions, protein fold recognition, analysis of microarray gene expression data, etc. To solve these problems, various classification and clustering techniques using machine learning have been extensively used in the published literature. These techniques include neural network algorithms, genetic algorithms, fuzzy ARTMAP, K-Means, K-NN, SVM, Rough set classifiers, decision tree and HMM based algorithms. Major difficulties in applying the above algorithms include the limitations found in the previous feature encoding and selection methods while extracting the best features, increasing classification accuracy and decreasing the running time overheads of the learning algorithms. The application of this research would be potentially useful in the drug design and in the diagnosis of some diseases. This paper presents a concise overview of the well-known protein classification techniques.

  9. Evaluation of several state-of-charge algorithms

    NASA Astrophysics Data System (ADS)

    Espinosa, J. M.; Martin, M. E.; Burke, A. F.

    1988-09-01

    One of the important needs in marketing an electric vehicle is a device which reliably indicates battery state-of-charge for all types of driving. The purpose of the state-of-charge indicator is analogous to a gas gauge in an internal combustion engine powered vehicle. Many different approaches have been tried to accurately predict battery state-of-charge. This report evaluates several of these approaches. Four different algorithms were implemented into software on an IBM PC and tested using a battery test database for ALCO 2200 lead-acid batteries generated at the INEL. The database was obtained under controlled conditions which compare with the battery response in real EV use. Each algorithm is described in detail as to theory and operational functionality. Also discussed is the hardware and data requirements particular to implementing the individual algorithms. The algorithms were evaluated for accuracy using constant power, stepped power, and simulated vehicle (SFUDS79) discharge profiles. Attempts were made to explain the cause of differences between the predicted and actual state-of-charge and to provide possible remedies to correct them. Recommendations for future work on battery state-of-charge indicators are presented that utilize the hardware and software now in place in the INEL Battery Laboratory.

  10. Algorithms for a Closed-Loop Artificial Pancreas: The Case for Model Predictive Control

    PubMed Central

    Bequette, B. Wayne

    2013-01-01

    The relative merits of model predictive control (MPC) and proportional-integral-derivative (PID) control are discussed, with the end goal of a closed-loop artificial pancreas (AP). It is stressed that neither MPC nor PID are single algorithms, but rather are approaches or strategies that may be implemented very differently by different engineers. The primary advantages to MPC are that (i) constraints on the insulin delivery rate (and/or insulin on board) can be explicitly included in the control calculation; (ii) it is a general framework that makes it relatively easy to include the effect of meals, exercise, and other events that are a function of the time of day; and (iii) it is flexible enough to include many different objectives, from set-point tracking (target) to zone (control to range). In the end, however, it is recognized that the control algorithm, while important, represents only a portion of the effort required to develop a closed-loop AP. Thus, any number of algorithms/approaches can be successful—the engineers involved in the design must have experience with the particular technique, including the important experience of implementing the algorithm in human studies and not simply through simulation studies. PMID:24351190

  11. Prediction of pKa Values for Neutral and Basic Drugs based on Hybrid Artificial Intelligence Methods.

    PubMed

    Li, Mengshan; Zhang, Huaijing; Chen, Bingsheng; Wu, Yan; Guan, Lixin

    2018-03-05

    The pKa value of drugs is an important parameter in drug design and pharmacology. In this paper, an improved particle swarm optimization (PSO) algorithm was proposed based on the population entropy diversity. In the improved algorithm, when the population entropy was higher than the set maximum threshold, the convergence strategy was adopted; when the population entropy was lower than the set minimum threshold the divergence strategy was adopted; when the population entropy was between the maximum and minimum threshold, the self-adaptive adjustment strategy was maintained. The improved PSO algorithm was applied in the training of radial basis function artificial neural network (RBF ANN) model and the selection of molecular descriptors. A quantitative structure-activity relationship model based on RBF ANN trained by the improved PSO algorithm was proposed to predict the pKa values of 74 kinds of neutral and basic drugs and then validated by another database containing 20 molecules. The validation results showed that the model had a good prediction performance. The absolute average relative error, root mean square error, and squared correlation coefficient were 0.3105, 0.0411, and 0.9685, respectively. The model can be used as a reference for exploring other quantitative structure-activity relationships.

  12. Adaptive reference update (ARU) algorithm. A stochastic search algorithm for efficient optimization of multi-drug cocktails

    PubMed Central

    2012-01-01

    Background Multi-target therapeutics has been shown to be effective for treating complex diseases, and currently, it is a common practice to combine multiple drugs to treat such diseases to optimize the therapeutic outcomes. However, considering the huge number of possible ways to mix multiple drugs at different concentrations, it is practically difficult to identify the optimal drug combination through exhaustive testing. Results In this paper, we propose a novel stochastic search algorithm, called the adaptive reference update (ARU) algorithm, that can provide an efficient and systematic way for optimizing multi-drug cocktails. The ARU algorithm iteratively updates the drug combination to improve its response, where the update is made by comparing the response of the current combination with that of a reference combination, based on which the beneficial update direction is predicted. The reference combination is continuously updated based on the drug response values observed in the past, thereby adapting to the underlying drug response function. To demonstrate the effectiveness of the proposed algorithm, we evaluated its performance based on various multi-dimensional drug functions and compared it with existing algorithms. Conclusions Simulation results show that the ARU algorithm significantly outperforms existing stochastic search algorithms, including the Gur Game algorithm. In fact, the ARU algorithm can more effectively identify potent drug combinations and it typically spends fewer iterations for finding effective combinations. Furthermore, the ARU algorithm is robust to random fluctuations and noise in the measured drug response, which makes the algorithm well-suited for practical drug optimization applications. PMID:23134742

  13. Comparison of the accuracy of three algorithms in predicting accessory pathways among adult Wolff-Parkinson-White syndrome patients.

    PubMed

    Maden, Orhan; Balci, Kevser Gülcihan; Selcuk, Mehmet Timur; Balci, Mustafa Mücahit; Açar, Burak; Unal, Sefa; Kara, Meryem; Selcuk, Hatice

    2015-12-01

    The aim of this study was to investigate the accuracy of three algorithms in predicting accessory pathway locations in adult patients with Wolff-Parkinson-White syndrome in Turkish population. A total of 207 adult patients with Wolff-Parkinson-White syndrome were retrospectively analyzed. The most preexcited 12-lead electrocardiogram in sinus rhythm was used for analysis. Two investigators blinded to the patient data used three algorithms for prediction of accessory pathway location. Among all locations, 48.5% were left-sided, 44% were right-sided, and 7.5% were located in the midseptum or anteroseptum. When only exact locations were accepted as match, predictive accuracy for Chiang was 71.5%, 72.4% for d'Avila, and 71.5% for Arruda. The percentage of predictive accuracy of all algorithms did not differ between the algorithms (p = 1.000; p = 0.875; p = 0.885, respectively). The best algorithm for prediction of right-sided, left-sided, and anteroseptal and midseptal accessory pathways was Arruda (p < 0.001). Arruda was significantly better than d'Avila in predicting adjacent sites (p = 0.035) and the percent of the contralateral site prediction was higher with d'Avila than Arruda (p = 0.013). All algorithms were similar in predicting accessory pathway location and the predicted accuracy was lower than previously reported by their authors. However, according to the accessory pathway site, the algorithm designed by Arruda et al. showed better predictions than the other algorithms and using this algorithm may provide advantages before a planned ablation.

  14. Hill-Climbing search and diversification within an evolutionary approach to protein structure prediction.

    PubMed

    Chira, Camelia; Horvath, Dragos; Dumitrescu, D

    2011-07-30

    Proteins are complex structures made of amino acids having a fundamental role in the correct functioning of living cells. The structure of a protein is the result of the protein folding process. However, the general principles that govern the folding of natural proteins into a native structure are unknown. The problem of predicting a protein structure with minimum-energy starting from the unfolded amino acid sequence is a highly complex and important task in molecular and computational biology. Protein structure prediction has important applications in fields such as drug design and disease prediction. The protein structure prediction problem is NP-hard even in simplified lattice protein models. An evolutionary model based on hill-climbing genetic operators is proposed for protein structure prediction in the hydrophobic - polar (HP) model. Problem-specific search operators are implemented and applied using a steepest-ascent hill-climbing approach. Furthermore, the proposed model enforces an explicit diversification stage during the evolution in order to avoid local optimum. The main features of the resulting evolutionary algorithm - hill-climbing mechanism and diversification strategy - are evaluated in a set of numerical experiments for the protein structure prediction problem to assess their impact to the efficiency of the search process. Furthermore, the emerging consolidated model is compared to relevant algorithms from the literature for a set of difficult bidimensional instances from lattice protein models. The results obtained by the proposed algorithm are promising and competitive with those of related methods.

  15. Use of a machine learning framework to predict substance use disorder treatment success

    PubMed Central

    Kelmansky, Diana; van der Laan, Mark; Sahker, Ethan; Jones, DeShauna; Arndt, Stephan

    2017-01-01

    There are several methods for building prediction models. The wealth of currently available modeling techniques usually forces the researcher to judge, a priori, what will likely be the best method. Super learning (SL) is a methodology that facilitates this decision by combining all identified prediction algorithms pertinent for a particular prediction problem. SL generates a final model that is at least as good as any of the other models considered for predicting the outcome. The overarching aim of this work is to introduce SL to analysts and practitioners. This work compares the performance of logistic regression, penalized regression, random forests, deep learning neural networks, and SL to predict successful substance use disorders (SUD) treatment. A nationwide database including 99,013 SUD treatment patients was used. All algorithms were evaluated using the area under the receiver operating characteristic curve (AUC) in a test sample that was not included in the training sample used to fit the prediction models. AUC for the models ranged between 0.793 and 0.820. SL was superior to all but one of the algorithms compared. An explanation of SL steps is provided. SL is the first step in targeted learning, an analytic framework that yields double robust effect estimation and inference with fewer assumptions than the usual parametric methods. Different aspects of SL depending on the context, its function within the targeted learning framework, and the benefits of this methodology in the addiction field are discussed. PMID:28394905

  16. Use of a machine learning framework to predict substance use disorder treatment success.

    PubMed

    Acion, Laura; Kelmansky, Diana; van der Laan, Mark; Sahker, Ethan; Jones, DeShauna; Arndt, Stephan

    2017-01-01

    There are several methods for building prediction models. The wealth of currently available modeling techniques usually forces the researcher to judge, a priori, what will likely be the best method. Super learning (SL) is a methodology that facilitates this decision by combining all identified prediction algorithms pertinent for a particular prediction problem. SL generates a final model that is at least as good as any of the other models considered for predicting the outcome. The overarching aim of this work is to introduce SL to analysts and practitioners. This work compares the performance of logistic regression, penalized regression, random forests, deep learning neural networks, and SL to predict successful substance use disorders (SUD) treatment. A nationwide database including 99,013 SUD treatment patients was used. All algorithms were evaluated using the area under the receiver operating characteristic curve (AUC) in a test sample that was not included in the training sample used to fit the prediction models. AUC for the models ranged between 0.793 and 0.820. SL was superior to all but one of the algorithms compared. An explanation of SL steps is provided. SL is the first step in targeted learning, an analytic framework that yields double robust effect estimation and inference with fewer assumptions than the usual parametric methods. Different aspects of SL depending on the context, its function within the targeted learning framework, and the benefits of this methodology in the addiction field are discussed.

  17. A novel computer algorithm improves antibody epitope prediction using affinity-selected mimotopes: a case study using monoclonal antibodies against the West Nile virus E protein.

    PubMed

    Denisova, Galina F; Denisov, Dimitri A; Yeung, Jeffrey; Loeb, Mark B; Diamond, Michael S; Bramson, Jonathan L

    2008-11-01

    Understanding antibody function is often enhanced by knowledge of the specific binding epitope. Here, we describe a computer algorithm that permits epitope prediction based on a collection of random peptide epitopes (mimotopes) isolated by antibody affinity purification. We applied this methodology to the prediction of epitopes for five monoclonal antibodies against the West Nile virus (WNV) E protein, two of which exhibit therapeutic activity in vivo. This strategy was validated by comparison of our results with existing F(ab)-E protein crystal structures and mutational analysis by yeast surface display. We demonstrate that by combining the results of the mimotope method with our data from mutational analysis, epitopes could be predicted with greater certainty. The two methods displayed great complementarity as the mutational analysis facilitated epitope prediction when the results with the mimotope method were equivocal and the mimotope method revealed a broader number of residues within the epitope than the mutational analysis. Our results demonstrate that the combination of these two prediction strategies provides a robust platform for epitope characterization.

  18. Robust prediction of consensus secondary structures using averaged base pairing probability matrices.

    PubMed

    Kiryu, Hisanori; Kin, Taishin; Asai, Kiyoshi

    2007-02-15

    Recent transcriptomic studies have revealed the existence of a considerable number of non-protein-coding RNA transcripts in higher eukaryotic cells. To investigate the functional roles of these transcripts, it is of great interest to find conserved secondary structures from multiple alignments on a genomic scale. Since multiple alignments are often created using alignment programs that neglect the special conservation patterns of RNA secondary structures for computational efficiency, alignment failures can cause potential risks of overlooking conserved stem structures. We investigated the dependence of the accuracy of secondary structure prediction on the quality of alignments. We compared three algorithms that maximize the expected accuracy of secondary structures as well as other frequently used algorithms. We found that one of our algorithms, called McCaskill-MEA, was more robust against alignment failures than others. The McCaskill-MEA method first computes the base pairing probability matrices for all the sequences in the alignment and then obtains the base pairing probability matrix of the alignment by averaging over these matrices. The consensus secondary structure is predicted from this matrix such that the expected accuracy of the prediction is maximized. We show that the McCaskill-MEA method performs better than other methods, particularly when the alignment quality is low and when the alignment consists of many sequences. Our model has a parameter that controls the sensitivity and specificity of predictions. We discussed the uses of that parameter for multi-step screening procedures to search for conserved secondary structures and for assigning confidence values to the predicted base pairs. The C++ source code that implements the McCaskill-MEA algorithm and the test dataset used in this paper are available at http://www.ncrna.org/papers/McCaskillMEA/. Supplementary data are available at Bioinformatics online.

  19. Sequential reconstruction of driving-forces from nonlinear nonstationary dynamics

    NASA Astrophysics Data System (ADS)

    Güntürkün, Ulaş

    2010-07-01

    This paper describes a functional analysis-based method for the estimation of driving-forces from nonlinear dynamic systems. The driving-forces account for the perturbation inputs induced by the external environment or the secular variations in the internal variables of the system. The proposed algorithm is applicable to the problems for which there is too little or no prior knowledge to build a rigorous mathematical model of the unknown dynamics. We derive the estimator conditioned on the differentiability of the unknown system’s mapping, and smoothness of the driving-force. The proposed algorithm is an adaptive sequential realization of the blind prediction error method, where the basic idea is to predict the observables, and retrieve the driving-force from the prediction error. Our realization of this idea is embodied by predicting the observables one-step into the future using a bank of echo state networks (ESN) in an online fashion, and then extracting the raw estimates from the prediction error and smoothing these estimates in two adaptive filtering stages. The adaptive nature of the algorithm enables to retrieve both slowly and rapidly varying driving-forces accurately, which are illustrated by simulations. Logistic and Moran-Ricker maps are studied in controlled experiments, exemplifying chaotic state and stochastic measurement models. The algorithm is also applied to the estimation of a driving-force from another nonlinear dynamic system that is stochastic in both state and measurement equations. The results are judged by the posterior Cramer-Rao lower bounds. The method is finally put into test on a real-world application; extracting sun’s magnetic flux from the sunspot time series.

  20. Cost Function Network-based Design of Protein-Protein Interactions: predicting changes in binding affinity.

    PubMed

    Viricel, Clément; de Givry, Simon; Schiex, Thomas; Barbe, Sophie

    2018-02-20

    Accurate and economic methods to predict change in protein binding free energy upon mutation are imperative to accelerate the design of proteins for a wide range of applications. Free energy is defined by enthalpic and entropic contributions. Following the recent progresses of Artificial Intelligence-based algorithms for guaranteed NP-hard energy optimization and partition function computation, it becomes possible to quickly compute minimum energy conformations and to reliably estimate the entropic contribution of side-chains in the change of free energy of large protein interfaces. Using guaranteed Cost Function Network algorithms, Rosetta energy functions and Dunbrack's rotamer library, we developed and assessed EasyE and JayZ, two methods for binding affinity estimation that ignore or include conformational entropic contributions on a large benchmark of binding affinity experimental measures. If both approaches outperform most established tools, we observe that side-chain conformational entropy brings little or no improvement on most systems but becomes crucial in some rare cases. as open-source Python/C ++ code at sourcesup.renater.fr/projects/easy-jayz. thomas.schiex@inra.fr and sophie.barbe@insa-toulouse.fr. Supplementary data are available at Bioinformatics online.

  1. Methodology for the inference of gene function from phenotype data.

    PubMed

    Ascensao, Joao A; Dolan, Mary E; Hill, David P; Blake, Judith A

    2014-12-12

    Biomedical ontologies are increasingly instrumental in the advancement of biological research primarily through their use to efficiently consolidate large amounts of data into structured, accessible sets. However, ontology development and usage can be hampered by the segregation of knowledge by domain that occurs due to independent development and use of the ontologies. The ability to infer data associated with one ontology to data associated with another ontology would prove useful in expanding information content and scope. We here focus on relating two ontologies: the Gene Ontology (GO), which encodes canonical gene function, and the Mammalian Phenotype Ontology (MP), which describes non-canonical phenotypes, using statistical methods to suggest GO functional annotations from existing MP phenotype annotations. This work is in contrast to previous studies that have focused on inferring gene function from phenotype primarily through lexical or semantic similarity measures. We have designed and tested a set of algorithms that represents a novel methodology to define rules for predicting gene function by examining the emergent structure and relationships between the gene functions and phenotypes rather than inspecting the terms semantically. The algorithms inspect relationships among multiple phenotype terms to deduce if there are cases where they all arise from a single gene function. We apply this methodology to data about genes in the laboratory mouse that are formally represented in the Mouse Genome Informatics (MGI) resource. From the data, 7444 rule instances were generated from five generalized rules, resulting in 4818 unique GO functional predictions for 1796 genes. We show that our method is capable of inferring high-quality functional annotations from curated phenotype data. As well as creating inferred annotations, our method has the potential to allow for the elucidation of unforeseen, biologically significant associations between gene function and phenotypes that would be overlooked by a semantics-based approach. Future work will include the implementation of the described algorithms for a variety of other model organism databases, taking full advantage of the abundance of available high quality curated data.

  2. Base pair probability estimates improve the prediction accuracy of RNA non-canonical base pairs

    PubMed Central

    2017-01-01

    Prediction of RNA tertiary structure from sequence is an important problem, but generating accurate structure models for even short sequences remains difficult. Predictions of RNA tertiary structure tend to be least accurate in loop regions, where non-canonical pairs are important for determining the details of structure. Non-canonical pairs can be predicted using a knowledge-based model of structure that scores nucleotide cyclic motifs, or NCMs. In this work, a partition function algorithm is introduced that allows the estimation of base pairing probabilities for both canonical and non-canonical interactions. Pairs that are predicted to be probable are more likely to be found in the true structure than pairs of lower probability. Pair probability estimates can be further improved by predicting the structure conserved across multiple homologous sequences using the TurboFold algorithm. These pairing probabilities, used in concert with prior knowledge of the canonical secondary structure, allow accurate inference of non-canonical pairs, an important step towards accurate prediction of the full tertiary structure. Software to predict non-canonical base pairs and pairing probabilities is now provided as part of the RNAstructure software package. PMID:29107980

  3. Conformational Space Annealing explained: A general optimization algorithm, with diverse applications

    NASA Astrophysics Data System (ADS)

    Joung, InSuk; Kim, Jong Yun; Gross, Steven P.; Joo, Keehyoung; Lee, Jooyoung

    2018-02-01

    Many problems in science and engineering can be formulated as optimization problems. One way to solve these problems is to develop tailored problem-specific approaches. As such development is challenging, an alternative is to develop good generally-applicable algorithms. Such algorithms are easy to apply, typically function robustly, and reduce development time. Here we provide a description for one such algorithm called Conformational Space Annealing (CSA) along with its python version, PyCSA. We previously applied it to many optimization problems including protein structure prediction and graph community detection. To demonstrate its utility, we have applied PyCSA to two continuous test functions, namely Ackley and Eggholder functions. In addition, in order to provide complete generality of PyCSA to any types of an objective function, we demonstrate the way PyCSA can be applied to a discrete objective function, namely a parameter optimization problem. Based on the benchmarking results of the three problems, the performance of CSA is shown to be better than or similar to the most popular optimization method, simulated annealing. For continuous objective functions, we found that, L-BFGS-B was the best performing local optimization method, while for a discrete objective function Nelder-Mead was the best. The current version of PyCSA can be run in parallel at the coarse grained level by calculating multiple independent local optimizations separately. The source code of PyCSA is available from http://lee.kias.re.kr.

  4. Predicting protein functions from redundancies in large-scale protein interaction networks

    NASA Technical Reports Server (NTRS)

    Samanta, Manoj Pratim; Liang, Shoudan

    2003-01-01

    Interpreting data from large-scale protein interaction experiments has been a challenging task because of the widespread presence of random false positives. Here, we present a network-based statistical algorithm that overcomes this difficulty and allows us to derive functions of unannotated proteins from large-scale interaction data. Our algorithm uses the insight that if two proteins share significantly larger number of common interaction partners than random, they have close functional associations. Analysis of publicly available data from Saccharomyces cerevisiae reveals >2,800 reliable functional associations, 29% of which involve at least one unannotated protein. By further analyzing these associations, we derive tentative functions for 81 unannotated proteins with high certainty. Our method is not overly sensitive to the false positives present in the data. Even after adding 50% randomly generated interactions to the measured data set, we are able to recover almost all (approximately 89%) of the original associations.

  5. System identification and model reduction using modulating function techniques

    NASA Technical Reports Server (NTRS)

    Shen, Yan

    1993-01-01

    Weighted least squares (WLS) and adaptive weighted least squares (AWLS) algorithms are initiated for continuous-time system identification using Fourier type modulating function techniques. Two stochastic signal models are examined using the mean square properties of the stochastic calculus: an equation error signal model with white noise residuals, and a more realistic white measurement noise signal model. The covariance matrices in each model are shown to be banded and sparse, and a joint likelihood cost function is developed which links the real and imaginary parts of the modulated quantities. The superior performance of above algorithms is demonstrated by comparing them with the LS/MFT and popular predicting error method (PEM) through 200 Monte Carlo simulations. A model reduction problem is formulated with the AWLS/MFT algorithm, and comparisons are made via six examples with a variety of model reduction techniques, including the well-known balanced realization method. Here the AWLS/MFT algorithm manifests higher accuracy in almost all cases, and exhibits its unique flexibility and versatility. Armed with this model reduction, the AWLS/MFT algorithm is extended into MIMO transfer function system identification problems. The impact due to the discrepancy in bandwidths and gains among subsystem is explored through five examples. Finally, as a comprehensive application, the stability derivatives of the longitudinal and lateral dynamics of an F-18 aircraft are identified using physical flight data provided by NASA. A pole-constrained SIMO and MIMO AWLS/MFT algorithm is devised and analyzed. Monte Carlo simulations illustrate its high-noise rejecting properties. Utilizing the flight data, comparisons among different MFT algorithms are tabulated and the AWLS is found to be strongly favored in almost all facets.

  6. A model of proto-object based saliency

    PubMed Central

    Russell, Alexander F.; Mihalaş, Stefan; von der Heydt, Rudiger; Niebur, Ernst; Etienne-Cummings, Ralph

    2013-01-01

    Organisms use the process of selective attention to optimally allocate their computational resources to the instantaneously most relevant subsets of a visual scene, ensuring that they can parse the scene in real time. Many models of bottom-up attentional selection assume that elementary image features, like intensity, color and orientation, attract attention. Gestalt psychologists, how-ever, argue that humans perceive whole objects before they analyze individual features. This is supported by recent psychophysical studies that show that objects predict eye-fixations better than features. In this report we present a neurally inspired algorithm of object based, bottom-up attention. The model rivals the performance of state of the art non-biologically plausible feature based algorithms (and outperforms biologically plausible feature based algorithms) in its ability to predict perceptual saliency (eye fixations and subjective interest points) in natural scenes. The model achieves this by computing saliency as a function of proto-objects that establish the perceptual organization of the scene. All computational mechanisms of the algorithm have direct neural correlates, and our results provide evidence for the interface theory of attention. PMID:24184601

  7. Simulation analysis of adaptive cruise prediction control

    NASA Astrophysics Data System (ADS)

    Zhang, Li; Cui, Sheng Min

    2017-09-01

    Predictive control is suitable for multi-variable and multi-constraint system control.In order to discuss the effect of predictive control on the vehicle longitudinal motion, this paper establishes the expected spacing model by combining variable pitch spacing and the of safety distance strategy. The model predictive control theory and the optimization method based on secondary planning are designed to obtain and track the best expected acceleration trajectory quickly. Simulation models are established including predictive and adaptive fuzzy control. Simulation results show that predictive control can realize the basic function of the system while ensuring the safety. The application of predictive and fuzzy adaptive algorithm in cruise condition indicates that the predictive control effect is better.

  8. A novel method for identifying disease associated protein complexes based on functional similarity protein complex networks.

    PubMed

    Le, Duc-Hau

    2015-01-01

    Protein complexes formed by non-covalent interaction among proteins play important roles in cellular functions. Computational and purification methods have been used to identify many protein complexes and their cellular functions. However, their roles in terms of causing disease have not been well discovered yet. There exist only a few studies for the identification of disease-associated protein complexes. However, they mostly utilize complicated heterogeneous networks which are constructed based on an out-of-date database of phenotype similarity network collected from literature. In addition, they only apply for diseases for which tissue-specific data exist. In this study, we propose a method to identify novel disease-protein complex associations. First, we introduce a framework to construct functional similarity protein complex networks where two protein complexes are functionally connected by either shared protein elements, shared annotating GO terms or based on protein interactions between elements in each protein complex. Second, we propose a simple but effective neighborhood-based algorithm, which yields a local similarity measure, to rank disease candidate protein complexes. Comparing the predictive performance of our proposed algorithm with that of two state-of-the-art network propagation algorithms including one we used in our previous study, we found that it performed statistically significantly better than that of these two algorithms for all the constructed functional similarity protein complex networks. In addition, it ran about 32 times faster than these two algorithms. Moreover, our proposed method always achieved high performance in terms of AUC values irrespective of the ways to construct the functional similarity protein complex networks and the used algorithms. The performance of our method was also higher than that reported in some existing methods which were based on complicated heterogeneous networks. Finally, we also tested our method with prostate cancer and selected the top 100 highly ranked candidate protein complexes. Interestingly, 69 of them were evidenced since at least one of their protein elements are known to be associated with prostate cancer. Our proposed method, including the framework to construct functional similarity protein complex networks and the neighborhood-based algorithm on these networks, could be used for identification of novel disease-protein complex associations.

  9. Radial basis function neural networks applied to NASA SSME data

    NASA Technical Reports Server (NTRS)

    Wheeler, Kevin R.; Dhawan, Atam P.

    1993-01-01

    This paper presents a brief report on the application of Radial Basis Function Neural Networks (RBFNN) to the prediction of sensor values for fault detection and diagnosis of the Space Shuttle's Main Engines (SSME). The location of the Radial Basis Function (RBF) node centers was determined with a K-means clustering algorithm. A neighborhood operation about these center points was used to determine the variances of the individual processing notes.

  10. CRIMEtoYHU: a new web tool to develop yeast-based functional assays for characterizing cancer-associated missense variants.

    PubMed

    Mercatanti, Alberto; Lodovichi, Samuele; Cervelli, Tiziana; Galli, Alvaro

    2017-12-01

    Evaluation of the functional impact of cancer-associated missense variants is more difficult than for protein-truncating mutations and consequently standard guidelines for the interpretation of sequence variants have been recently proposed. A number of algorithms and software products were developed to predict the impact of cancer-associated missense mutations on protein structure and function. Importantly, direct assessment of the variants using high-throughput functional assays using simple genetic systems can help in speeding up the functional evaluation of newly identified cancer-associated variants. We developed the web tool CRIMEtoYHU (CTY) to help geneticists in the evaluation of the functional impact of cancer-associated missense variants. Humans and the yeast Saccharomyces cerevisiae share thousands of protein-coding genes although they have diverged for a billion years. Therefore, yeast humanization can be helpful in deciphering the functional consequences of human genetic variants found in cancer and give information on the pathogenicity of missense variants. To humanize specific positions within yeast genes, human and yeast genes have to share functional homology. If a mutation in a specific residue is associated with a particular phenotype in humans, a similar substitution in the yeast counterpart may reveal its effect at the organism level. CTY simultaneously finds yeast homologous genes, identifies the corresponding variants and determines the transferability of human variants to yeast counterparts by assigning a reliability score (RS) that may be predictive for the validity of a functional assay. CTY analyzes newly identified mutations or retrieves mutations reported in the COSMIC database, provides information about the functional conservation between yeast and human and shows the mutation distribution in human genes. CTY analyzes also newly found mutations and aborts when no yeast homologue is found. Then, on the basis of the protein domain localization and functional conservation between yeast and human, the selected variants are ranked by the RS. The RS is assigned by an algorithm that computes functional data, type of mutation, chemistry of amino acid substitution and the degree of mutation transferability between human and yeast protein. Mutations giving a positive RS are highly transferable to yeast and, therefore, yeast functional assays will be more predictable. To validate the web application, we have analyzed 8078 cancer-associated variants located in 31 genes that have a yeast homologue. More than 50% of variants are transferable to yeast. Incidentally, 88% of all transferable mutations have a reliability score >0. Moreover, we analyzed by CTY 72 functionally validated missense variants located in yeast genes at positions corresponding to the human cancer-associated variants. All these variants gave a positive RS. To further validate CTY, we analyzed 3949 protein variants (with positive RS) by the predictive algorithm PROVEAN. This analysis shows that yeast-based functional assays will be more predictable for the variants with positive RS. We believe that CTY could be an important resource for the cancer research community by providing information concerning the functional impact of specific mutations, as well as for the design of functional assays useful for decision support in precision medicine. © FEMS 2017. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  11. NARMAX model identification of a palm oil biodiesel engine using multi-objective optimization differential evolution

    NASA Astrophysics Data System (ADS)

    Mansor, Zakwan; Zakaria, Mohd Zakimi; Nor, Azuwir Mohd; Saad, Mohd Sazli; Ahmad, Robiah; Jamaluddin, Hishamuddin

    2017-09-01

    This paper presents the black-box modelling of palm oil biodiesel engine (POB) using multi-objective optimization differential evolution (MOODE) algorithm. Two objective functions are considered in the algorithm for optimization; minimizing the number of term of a model structure and minimizing the mean square error between actual and predicted outputs. The mathematical model used in this study to represent the POB system is nonlinear auto-regressive moving average with exogenous input (NARMAX) model. Finally, model validity tests are applied in order to validate the possible models that was obtained from MOODE algorithm and lead to select an optimal model.

  12. A study of hydrogen diffusion flames using PDF turbulence model

    NASA Technical Reports Server (NTRS)

    Hsu, Andrew T.

    1991-01-01

    The application of probability density function (pdf) turbulence models is addressed. For the purpose of accurate prediction of turbulent combustion, an algorithm that combines a conventional computational fluid dynamic (CFD) flow solver with the Monte Carlo simulation of the pdf evolution equation was developed. The algorithm was validated using experimental data for a heated turbulent plane jet. The study of H2-F2 diffusion flames was carried out using this algorithm. Numerical results compared favorably with experimental data. The computations show that the flame center shifts as the equivalence ratio changes, and that for the same equivalence ratio, similarity solutions for flames exist.

  13. A study of hydrogen diffusion flames using PDF turbulence model

    NASA Technical Reports Server (NTRS)

    Hsu, Andrew T.

    1991-01-01

    The application of probability density function (pdf) turbulence models is addressed in this work. For the purpose of accurate prediction of turbulent combustion, an algorithm that combines a conventional CFD flow solver with the Monte Carlo simulation of the pdf evolution equation has been developed. The algorithm has been validated using experimental data for a heated turbulent plane jet. The study of H2-F2 diffusion flames has been carried out using this algorithm. Numerical results compared favorably with experimental data. The computuations show that the flame center shifts as the equivalence ratio changes, and that for the same equivalence ratio, similarity solutions for flames exist.

  14. Predicting Protein-Protein Interaction Sites with a Novel Membership Based Fuzzy SVM Classifier.

    PubMed

    Sriwastava, Brijesh K; Basu, Subhadip; Maulik, Ujjwal

    2015-01-01

    Predicting residues that participate in protein-protein interactions (PPI) helps to identify, which amino acids are located at the interface. In this paper, we show that the performance of the classical support vector machine (SVM) algorithm can further be improved with the use of a custom-designed fuzzy membership function, for the partner-specific PPI interface prediction problem. We evaluated the performances of both classical SVM and fuzzy SVM (F-SVM) on the PPI databases of three different model proteomes of Homo sapiens, Escherichia coli and Saccharomyces Cerevisiae and calculated the statistical significance of the developed F-SVM over classical SVM algorithm. We also compared our performance with the available state-of-the-art fuzzy methods in this domain and observed significant performance improvements. To predict interaction sites in protein complexes, local composition of amino acids together with their physico-chemical characteristics are used, where the F-SVM based prediction method exploits the membership function for each pair of sequence fragments. The average F-SVM performance (area under ROC curve) on the test samples in 10-fold cross validation experiment are measured as 77.07, 78.39, and 74.91 percent for the aforementioned organisms respectively. Performances on independent test sets are obtained as 72.09, 73.24 and 82.74 percent respectively. The software is available for free download from http://code.google.com/p/cmater-bioinfo.

  15. Automated and fast building of three-dimensional RNA structures.

    PubMed

    Zhao, Yunjie; Huang, Yangyu; Gong, Zhou; Wang, Yanjie; Man, Jianfen; Xiao, Yi

    2012-01-01

    Building tertiary structures of non-coding RNA is required to understand their functions and design new molecules. Current algorithms of RNA tertiary structure prediction give satisfactory accuracy only for small size and simple topology and many of them need manual manipulation. Here, we present an automated and fast program, 3dRNA, for RNA tertiary structure prediction with reasonable accuracy for RNAs of larger size and complex topology.

  16. Prediction system of hydroponic plant growth and development using algorithm Fuzzy Mamdani method

    NASA Astrophysics Data System (ADS)

    Sudana, I. Made; Purnawirawan, Okta; Arief, Ulfa Mediaty

    2017-03-01

    Hydroponics is a method of farming without soil. One of the Hydroponic plants is Watercress (Nasturtium Officinale). The development and growth process of hydroponic Watercress was influenced by levels of nutrients, acidity and temperature. The independent variables can be used as input variable system to predict the value level of plants growth and development. The prediction system is using Fuzzy Algorithm Mamdani method. This system was built to implement the function of Fuzzy Inference System (Fuzzy Inference System/FIS) as a part of the Fuzzy Logic Toolbox (FLT) by using MATLAB R2007b. FIS is a computing system that works on the principle of fuzzy reasoning which is similar to humans' reasoning. Basically FIS consists of four units which are fuzzification unit, fuzzy logic reasoning unit, base knowledge unit and defuzzification unit. In addition to know the effect of independent variables on the plants growth and development that can be visualized with the function diagram of FIS output surface that is shaped three-dimensional, and statistical tests based on the data from the prediction system using multiple linear regression method, which includes multiple linear regression analysis, T test, F test, the coefficient of determination and donations predictor that are calculated using SPSS (Statistical Product and Service Solutions) software applications.

  17. Algorithm for predicting death among older adults in the home care setting: study protocol for the Risk Evaluation for Support: Predictions for Elder-life in the Community Tool (RESPECT)

    PubMed Central

    Manuel, Douglas G; Taljaard, Monica; Chalifoux, Mathieu; Bennett, Carol; Costa, Andrew P; Bronskill, Susan; Kobewka, Daniel; Tanuseputro, Peter

    2016-01-01

    Introduction Older adults living in the community often have multiple, chronic conditions and functional impairments. A challenge for healthcare providers working in the community is the lack of a predictive tool that can be applied to the broad spectrum of mortality risks observed and may be used to inform care planning. Objective To predict survival time for older adults in the home care setting. The final mortality risk algorithm will be implemented as a web-based calculator that can be used by older adults needing care and by their caregivers. Design Open cohort study using the Resident Assessment Instrument for Home Care (RAI-HC) data in Ontario, Canada, from 1 January 2007 to 31 December 2013. Participants The derivation cohort will consist of ∼437 000 older adults who had an RAI-HC assessment between 1 January 2007 and 31 December 2012. A split sample validation cohort will include ∼122 000 older adults with an RAI-HC assessment between 1 January and 31 December 2013. Main outcome measures Predicted survival from the time of an RAI-HC assessment. All deaths (n≈245 000) will be ascertained through linkage to a population-based registry that is maintained by the Ministry of Health in Ontario. Statistical analysis Proportional hazards regression will be estimated after assessment of assumptions. Predictors will include sociodemographic factors, social support, health conditions, functional status, cognition, symptoms of decline and prior healthcare use. Model performance will be evaluated for 6-month and 12-month predicted risks, including measures of calibration (eg, calibration plots) and discrimination (eg, c-statistics). The final algorithm will use combined development and validation data. Ethics and dissemination Research ethics approval has been granted by the Sunnybrook Health Sciences Centre Review Board. Findings will be disseminated through presentations at conferences and in peer-reviewed journals. Trial registration number NCT02779309, Pre-results. PMID:27909039

  18. Quantitative thickness prediction of tectonically deformed coal using Extreme Learning Machine and Principal Component Analysis: a case study

    NASA Astrophysics Data System (ADS)

    Wang, Xin; Li, Yan; Chen, Tongjun; Yan, Qiuyan; Ma, Li

    2017-04-01

    The thickness of tectonically deformed coal (TDC) has positive correlation associations with gas outbursts. In order to predict the TDC thickness of coal beds, we propose a new quantitative predicting method using an extreme learning machine (ELM) algorithm, a principal component analysis (PCA) algorithm, and seismic attributes. At first, we build an ELM prediction model using the PCA attributes of a synthetic seismic section. The results suggest that the ELM model can produce a reliable and accurate prediction of the TDC thickness for synthetic data, preferring Sigmoid activation function and 20 hidden nodes. Then, we analyze the applicability of the ELM model on the thickness prediction of the TDC with real application data. Through the cross validation of near-well traces, the results suggest that the ELM model can produce a reliable and accurate prediction of the TDC. After that, we use 250 near-well traces from 10 wells to build an ELM predicting model and use the model to forecast the TDC thickness of the No. 15 coal in the study area using the PCA attributes as the inputs. Comparing the predicted results, it is noted that the trained ELM model with two selected PCA attributes yields better predication results than those from the other combinations of the attributes. Finally, the trained ELM model with real seismic data have a different number of hidden nodes (10) than the trained ELM model with synthetic seismic data. In summary, it is feasible to use an ELM model to predict the TDC thickness using the calculated PCA attributes as the inputs. However, the input attributes, the activation function and the number of hidden nodes in the ELM model should be selected and tested carefully based on individual application.

  19. From Statistical Significance to Clinical Relevance: A Simple Algorithm to Integrate BNP and the Seattle Heart Failure Model for Risk Stratification in Heart Failure

    PubMed Central

    AbouEzzeddine, Omar F.; French, Benjamin; Mirzoyev, Sultan A.; Jaffe, Allan S; Levy, Wayne C.; Fang, James C.; Sweitzer, Nancy K.; Cappola, Thomas P.; Redfield, Margaret M.

    2016-01-01

    Background Heart failure (HF) guidelines recommend brain natriuretic peptide (BNP) and multivariable risk-scores such as the Seattle HF Model (SHFM) to predict risk in HF with reduced ejection fraction (HFrEF). A practical way to integrate information from these two prognostic tools is lacking. We sought to establish a SHFM+BNP risk-stratification algorithm. Methods The retrospective derivation cohort included consecutive patients with HFrEF at Mayo. One-year outcome (death, transplantation or ventricular assist device) was assessed. The SHFM+BNP algorithm was derived by stratifying patients within SHFM-predicted risk categories (≤2.5%, 2.6–≤10%, >10%) according to BNP above or below 700 pg/mL and comparing SHFM-predicted and observed event rates within each SHFM+BNP category. The algorithm was validated in a prospective, multicenter HFrEF registry (Penn HF Study). Results Derivation (n=441; one-year event rate 17%) and validation (n=1513; one-year event rate 12%) cohorts differed with the former being older and more likely ischemic with worse symptoms, lower EF, worse renal function, higher BNP and SHFM scores. In both cohorts, across the three SHFM-predicted risk strata, a BNP>700 pg/ml consistently identified patients with approximately three-fold the risk that the SHFM would have otherwise estimated regardless stage of HF, intensity and duration of HF-therapy, and comorbidities. Conversely, the SHFM was appropriately calibrated in patients with a BNP<700 pg/ml. Conclusion The simple SHFM+BNP algorithm displays stable performance across diverse HFrEF cohorts and may enhance risk stratification to enable appropriate decisions regarding HF therapeutic or palliative strategies. PMID:27021278

  20. Conservation of hot regions in protein-protein interaction in evolution.

    PubMed

    Hu, Jing; Li, Jiarui; Chen, Nansheng; Zhang, Xiaolong

    2016-11-01

    The hot regions of protein-protein interactions refer to the active area which formed by those most important residues to protein combination process. With the research development on protein interactions, lots of predicted hot regions can be discovered efficiently by intelligent computing methods, while performing biology experiments to verify each every prediction is hardly to be done due to the time-cost and the complexity of the experiment. This study based on the research of hot spot residue conservations, the proposed method is used to verify authenticity of predicted hot regions that using machine learning algorithm combined with protein's biological features and sequence conservation, though multiple sequence alignment, module substitute matrix and sequence similarity to create conservation scoring algorithm, and then using threshold module to verify the conservation tendency of hot regions in evolution. This research work gives an effective method to verify predicted hot regions in protein-protein interactions, which also provides a useful way to deeply investigate the functional activities of protein hot regions. Copyright © 2016. Published by Elsevier Inc.

  1. Comparison of multiobjective evolutionary algorithms: empirical results.

    PubMed

    Zitzler, E; Deb, K; Thiele, L

    2000-01-01

    In this paper, we provide a systematic comparison of various evolutionary approaches to multiobjective optimization using six carefully chosen test functions. Each test function involves a particular feature that is known to cause difficulty in the evolutionary optimization process, mainly in converging to the Pareto-optimal front (e.g., multimodality and deception). By investigating these different problem features separately, it is possible to predict the kind of problems to which a certain technique is or is not well suited. However, in contrast to what was suspected beforehand, the experimental results indicate a hierarchy of the algorithms under consideration. Furthermore, the emerging effects are evidence that the suggested test functions provide sufficient complexity to compare multiobjective optimizers. Finally, elitism is shown to be an important factor for improving evolutionary multiobjective search.

  2. Self-Tuning of Design Variables for Generalized Predictive Control

    NASA Technical Reports Server (NTRS)

    Lin, Chaung; Juang, Jer-Nan

    2000-01-01

    Three techniques are introduced to determine the order and control weighting for the design of a generalized predictive controller. These techniques are based on the application of fuzzy logic, genetic algorithms, and simulated annealing to conduct an optimal search on specific performance indexes or objective functions. Fuzzy logic is found to be feasible for real-time and on-line implementation due to its smooth and quick convergence. On the other hand, genetic algorithms and simulated annealing are applicable for initial estimation of the model order and control weighting, and final fine-tuning within a small region of the solution space, Several numerical simulations for a multiple-input and multiple-output system are given to illustrate the techniques developed in this paper.

  3. Time domain simulation of the response of geometrically nonlinear panels subjected to random loading

    NASA Technical Reports Server (NTRS)

    Moyer, E. Thomas, Jr.

    1988-01-01

    The response of composite panels subjected to random pressure loads large enough to cause geometrically nonlinear responses is studied. A time domain simulation is employed to solve the equations of motion. An adaptive time stepping algorithm is employed to minimize intermittent transients. A modified algorithm for the prediction of response spectral density is presented which predicts smooth spectral peaks for discrete time histories. Results are presented for a number of input pressure levels and damping coefficients. Response distributions are calculated and compared with the analytical solution of the Fokker-Planck equations. RMS response is reported as a function of input pressure level and damping coefficient. Spectral densities are calculated for a number of examples.

  4. Improved packing of protein side chains with parallel ant colonies.

    PubMed

    Quan, Lijun; Lü, Qiang; Li, Haiou; Xia, Xiaoyan; Wu, Hongjie

    2014-01-01

    The accurate packing of protein side chains is important for many computational biology problems, such as ab initio protein structure prediction, homology modelling, and protein design and ligand docking applications. Many of existing solutions are modelled as a computational optimisation problem. As well as the design of search algorithms, most solutions suffer from an inaccurate energy function for judging whether a prediction is good or bad. Even if the search has found the lowest energy, there is no certainty of obtaining the protein structures with correct side chains. We present a side-chain modelling method, pacoPacker, which uses a parallel ant colony optimisation strategy based on sharing a single pheromone matrix. This parallel approach combines different sources of energy functions and generates protein side-chain conformations with the lowest energies jointly determined by the various energy functions. We further optimised the selected rotamers to construct subrotamer by rotamer minimisation, which reasonably improved the discreteness of the rotamer library. We focused on improving the accuracy of side-chain conformation prediction. For a testing set of 442 proteins, 87.19% of X1 and 77.11% of X12 angles were predicted correctly within 40° of the X-ray positions. We compared the accuracy of pacoPacker with state-of-the-art methods, such as CIS-RR and SCWRL4. We analysed the results from different perspectives, in terms of protein chain and individual residues. In this comprehensive benchmark testing, 51.5% of proteins within a length of 400 amino acids predicted by pacoPacker were superior to the results of CIS-RR and SCWRL4 simultaneously. Finally, we also showed the advantage of using the subrotamers strategy. All results confirmed that our parallel approach is competitive to state-of-the-art solutions for packing side chains. This parallel approach combines various sources of searching intelligence and energy functions to pack protein side chains. It provides a frame-work for combining different inaccuracy/usefulness objective functions by designing parallel heuristic search algorithms.

  5. High performance transcription factor-DNA docking with GPU computing

    PubMed Central

    2012-01-01

    Background Protein-DNA docking is a very challenging problem in structural bioinformatics and has important implications in a number of applications, such as structure-based prediction of transcription factor binding sites and rational drug design. Protein-DNA docking is very computational demanding due to the high cost of energy calculation and the statistical nature of conformational sampling algorithms. More importantly, experiments show that the docking quality depends on the coverage of the conformational sampling space. It is therefore desirable to accelerate the computation of the docking algorithm, not only to reduce computing time, but also to improve docking quality. Methods In an attempt to accelerate the sampling process and to improve the docking performance, we developed a graphics processing unit (GPU)-based protein-DNA docking algorithm. The algorithm employs a potential-based energy function to describe the binding affinity of a protein-DNA pair, and integrates Monte-Carlo simulation and a simulated annealing method to search through the conformational space. Algorithmic techniques were developed to improve the computation efficiency and scalability on GPU-based high performance computing systems. Results The effectiveness of our approach is tested on a non-redundant set of 75 TF-DNA complexes and a newly developed TF-DNA docking benchmark. We demonstrated that the GPU-based docking algorithm can significantly accelerate the simulation process and thereby improving the chance of finding near-native TF-DNA complex structures. This study also suggests that further improvement in protein-DNA docking research would require efforts from two integral aspects: improvement in computation efficiency and energy function design. Conclusions We present a high performance computing approach for improving the prediction accuracy of protein-DNA docking. The GPU-based docking algorithm accelerates the search of the conformational space and thus increases the chance of finding more near-native structures. To the best of our knowledge, this is the first ad hoc effort of applying GPU or GPU clusters to the protein-DNA docking problem. PMID:22759575

  6. Model-on-Demand Predictive Control for Nonlinear Hybrid Systems With Application to Adaptive Behavioral Interventions

    PubMed Central

    Nandola, Naresh N.; Rivera, Daniel E.

    2011-01-01

    This paper presents a data-centric modeling and predictive control approach for nonlinear hybrid systems. System identification of hybrid systems represents a challenging problem because model parameters depend on the mode or operating point of the system. The proposed algorithm applies Model-on-Demand (MoD) estimation to generate a local linear approximation of the nonlinear hybrid system at each time step, using a small subset of data selected by an adaptive bandwidth selector. The appeal of the MoD approach lies in the fact that model parameters are estimated based on a current operating point; hence estimation of locations or modes governed by autonomous discrete events is achieved automatically. The local MoD model is then converted into a mixed logical dynamical (MLD) system representation which can be used directly in a model predictive control (MPC) law for hybrid systems using multiple-degree-of-freedom tuning. The effectiveness of the proposed MoD predictive control algorithm for nonlinear hybrid systems is demonstrated on a hypothetical adaptive behavioral intervention problem inspired by Fast Track, a real-life preventive intervention for improving parental function and reducing conduct disorder in at-risk children. Simulation results demonstrate that the proposed algorithm can be useful for adaptive intervention problems exhibiting both nonlinear and hybrid character. PMID:21874087

  7. Efficient genetic algorithms using discretization scheduling.

    PubMed

    McLay, Laura A; Goldberg, David E

    2005-01-01

    In many applications of genetic algorithms, there is a tradeoff between speed and accuracy in fitness evaluations when evaluations use numerical methods with varying discretization. In these types of applications, the cost and accuracy vary from discretization errors when implicit or explicit quadrature is used to estimate the function evaluations. This paper examines discretization scheduling, or how to vary the discretization within the genetic algorithm in order to use the least amount of computation time for a solution of a desired quality. The effectiveness of discretization scheduling can be determined by comparing its computation time to the computation time of a GA using a constant discretization. There are three ingredients for the discretization scheduling: population sizing, estimated time for each function evaluation and predicted convergence time analysis. Idealized one- and two-dimensional experiments and an inverse groundwater application illustrate the computational savings to be achieved from using discretization scheduling.

  8. Self Improving Methods for Materials and Process Design

    DTIC Science & Technology

    1998-08-31

    using inductive coupling techniques. The first phase of the work focuses on developing an artificial neural network learning for function approximation...developing an artificial neural network learning algorithm for time-series prediction. The third phase of the work focuses on model selection. We have

  9. High-confidence assessment of functional impact of human mitochondrial non-synonymous genome variations by APOGEE.

    PubMed

    Castellana, Stefano; Fusilli, Caterina; Mazzoccoli, Gianluigi; Biagini, Tommaso; Capocefalo, Daniele; Carella, Massimo; Vescovi, Angelo Luigi; Mazza, Tommaso

    2017-06-01

    24,189 are all the possible non-synonymous amino acid changes potentially affecting the human mitochondrial DNA. Only a tiny subset was functionally evaluated with certainty so far, while the pathogenicity of the vast majority was only assessed in-silico by software predictors. Since these tools proved to be rather incongruent, we have designed and implemented APOGEE, a machine-learning algorithm that outperforms all existing prediction methods in estimating the harmfulness of mitochondrial non-synonymous genome variations. We provide a detailed description of the underlying algorithm, of the selected and manually curated training and test sets of variants, as well as of its classification ability.

  10. A label distance maximum-based classifier for multi-label learning.

    PubMed

    Liu, Xiaoli; Bao, Hang; Zhao, Dazhe; Cao, Peng

    2015-01-01

    Multi-label classification is useful in many bioinformatics tasks such as gene function prediction and protein site localization. This paper presents an improved neural network algorithm, Max Label Distance Back Propagation Algorithm for Multi-Label Classification. The method was formulated by modifying the total error function of the standard BP by adding a penalty term, which was realized by maximizing the distance between the positive and negative labels. Extensive experiments were conducted to compare this method against state-of-the-art multi-label methods on three popular bioinformatic benchmark datasets. The results illustrated that this proposed method is more effective for bioinformatic multi-label classification compared to commonly used techniques.

  11. NEP: web server for epitope prediction based on antibody neutralization of viral strains with diverse sequences.

    PubMed

    Chuang, Gwo-Yu; Liou, David; Kwong, Peter D; Georgiev, Ivelin S

    2014-07-01

    Delineation of the antigenic site, or epitope, recognized by an antibody can provide clues about functional vulnerabilities and resistance mechanisms, and can therefore guide antibody optimization and epitope-based vaccine design. Previously, we developed an algorithm for antibody-epitope prediction based on antibody neutralization of viral strains with diverse sequences and validated the algorithm on a set of broadly neutralizing HIV-1 antibodies. Here we describe the implementation of this algorithm, NEP (Neutralization-based Epitope Prediction), as a web-based server. The users must supply as input: (i) an alignment of antigen sequences of diverse viral strains; (ii) neutralization data for the antibody of interest against the same set of antigen sequences; and (iii) (optional) a structure of the unbound antigen, for enhanced prediction accuracy. The prediction results can be downloaded or viewed interactively on the antigen structure (if supplied) from the web browser using a JSmol applet. Since neutralization experiments are typically performed as one of the first steps in the characterization of an antibody to determine its breadth and potency, the NEP server can be used to predict antibody-epitope information at no additional experimental costs. NEP can be accessed on the internet at http://exon.niaid.nih.gov/nep. Published by Oxford University Press on behalf of Nucleic Acids Research 2014. This work is written by (a) US Government employee(s) and is in the public domain in the US.

  12. Ionosphere monitoring and forecast activities within the IAG working group "Ionosphere Prediction"

    NASA Astrophysics Data System (ADS)

    Hoque, Mainul; Garcia-Rigo, Alberto; Erdogan, Eren; Cueto Santamaría, Marta; Jakowski, Norbert; Berdermann, Jens; Hernandez-Pajares, Manuel; Schmidt, Michael; Wilken, Volker

    2017-04-01

    Ionospheric disturbances can affect technologies in space and on Earth disrupting satellite and airline operations, communications networks, navigation systems. As the world becomes ever more dependent on these technologies, ionospheric disturbances as part of space weather pose an increasing risk to the economic vitality and national security. Therefore, having the knowledge of ionospheric state in advance during space weather events is becoming more and more important. To promote scientific cooperation we recently formed a Working Group (WG) called "Ionosphere Predictions" within the International Association of Geodesy (IAG) under Sub-Commission 4.3 "Atmosphere Remote Sensing" of the Commission 4 "Positioning and Applications". The general objective of the WG is to promote the development of ionosphere prediction algorithm/models based on the dependence of ionospheric characteristics on solar and magnetic conditions combining data from different sensors to improve the spatial and temporal resolution and sensitivity taking advantage of different sounding geometries and latency. Our presented work enables the possibility to compare total electron content (TEC) prediction approaches/results from different centers contributing to this WG such as German Aerospace Center (DLR), Universitat Politècnica de Catalunya (UPC), Technische Universität München (TUM) and GMV. DLR developed a model-assisted TEC forecast algorithm taking benefit from actual trends of the TEC behavior at each grid point. Since during perturbations, characterized by large TEC fluctuations or ionization fronts, this approach may fail, the trend information is merged with the current background model which provides a stable climatological TEC behavior. The presented solution is a first step to regularly provide forecasted TEC services via SWACI/IMPC by DLR. UPC forecast model is based on applying linear regression to a temporal window of TEC maps in the Discrete Cosine Transform (DCT) domain. Performance tests are being conducted at the moment in order to improve UPC predicted products for 1-, 2-days ahead. In addition, UPC is working to enable short-term predictions based on UPC real-time GIMs (labelled URTG) and implementing an improved prediction approach. TUM developed a forecast method based on a time series analysis of TEC products which are either B-spline coefficients estimated by a Kalman filter or TEC grid maps derived from the B-spline coefficients. The forecast method uses a Fourier series expansion to extract the trend functions from the estimated TEC product. Then the trend functions are carried out to provide predicted TEC products. The forecast algorithm developed by GMV is based on the ionospheric delay estimation from previous epochs using GNSS data and the main dependence of ionospheric delays on solar and magnetic conditions. Since the ionospheric behavior is highly dependent on the region of the Earth, different region-based algorithmic modifications have been implemented in GMV's magicSBAS ionospheric algorithms to be able to estimate and forecast ionospheric delays worldwide. Different TEC prediction approaches outlined here will certainly help to learn about forecasting ionospheric ionization.

  13. Small angle X-ray scattering and cross-linking for data assisted protein structure prediction in CASP 12 with prospects for improved accuracy.

    PubMed

    Ogorzalek, Tadeusz L; Hura, Greg L; Belsom, Adam; Burnett, Kathryn H; Kryshtafovych, Andriy; Tainer, John A; Rappsilber, Juri; Tsutakawa, Susan E; Fidelis, Krzysztof

    2018-03-01

    Experimental data offers empowering constraints for structure prediction. These constraints can be used to filter equivalently scored models or more powerfully within optimization functions toward prediction. In CASP12, Small Angle X-ray Scattering (SAXS) and Cross-Linking Mass Spectrometry (CLMS) data, measured on an exemplary set of novel fold targets, were provided to the CASP community of protein structure predictors. As solution-based techniques, SAXS and CLMS can efficiently measure states of the full-length sequence in its native solution conformation and assembly. However, this experimental data did not substantially improve prediction accuracy judged by fits to crystallographic models. One issue, beyond intrinsic limitations of the algorithms, was a disconnect between crystal structures and solution-based measurements. Our analyses show that many targets had substantial percentages of disordered regions (up to 40%) or were multimeric or both. Thus, solution measurements of flexibility and assembly support variations that may confound prediction algorithms trained on crystallographic data and expecting globular fully-folded monomeric proteins. Here, we consider the CLMS and SAXS data collected, the information in these solution measurements, and the challenges in incorporating them into computational prediction. As improvement opportunities were only partly realized in CASP12, we provide guidance on how data from the full-length biological unit and the solution state can better aid prediction of the folded monomer or subunit. We furthermore describe strategic integrations of solution measurements with computational prediction programs with the aim of substantially improving foundational knowledge and the accuracy of computational algorithms for biologically-relevant structure predictions for proteins in solution. © 2018 Wiley Periodicals, Inc.

  14. The inclusion of N-terminal pro-brain natriuretic peptide in a sensitive screening strategy for systemic sclerosis-related pulmonary arterial hypertension: a cohort study

    PubMed Central

    2013-01-01

    Introduction Pulmonary arterial hypertension (PAH) is a major cause of mortality in systemic sclerosis (SSc). Screening guidelines for PAH recommend multiple investigations, including annual echocardiography, which together have low specificity and may not be cost-effective. We sought to evaluate the predictive accuracy of serum N-terminal pro-brain natriuretic peptide (NT-proBNP) in combination with pulmonary function tests (PFT) (‘proposed’ algorithm) in a screening algorithm for SSc-PAH. Methods We evaluated our proposed algorithm (PFT with NT-proBNP) on 49 consecutive SSc patients with suspected pulmonary hypertension undergoing right heart catherisation (RHC). The predictive accuracy of the proposed algorithm was compared with existing screening recommendations, and is presented as sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV). Results Overall, 27 patients were found to have pulmonary hypertension (PH) at RHC, while 22 had no PH. The sensitivity, specificity, PPV and NPV of the proposed algorithm for PAH was 94.1%, 54.5%, 61.5% and 92.3%, respectively; current European Society of Cardiology (ESC)/European Respiratory Society (ERS) guidelines achieved a sensitivity, specificity, PPV and NPV of 94.1%, 31.8%, 51.6% and 87.5%, respectively. In an alternate case scenario analysis, estimating a PAH prevalence of 10%, the proposed algorithm achieved a sensitivity, specificity, PPV and NPV for PAH of 94.1%, 54.5%, 18.7% and 98.8%, respectively. Conclusions The combination of NT-proBNP with PFT is a sensitive, yet simple and non-invasive, screening strategy for SSc-PAH. Patients with a positive screening result can be referred for echocardiography, and further confirmatory testing for PAH. In this way, it may be possible to shift the burden of routine screening away from echocardiography. The findings of this study should be confirmed in larger studies. PMID:24246100

  15. Adaptive non-linear control for cancer therapy through a Fokker-Planck observer.

    PubMed

    Shakeri, Ehsan; Latif-Shabgahi, Gholamreza; Esmaeili Abharian, Amir

    2018-04-01

    In recent years, many efforts have been made to present optimal strategies for cancer therapy through the mathematical modelling of tumour-cell population dynamics and optimal control theory. In many cases, therapy effect is included in the drift term of the stochastic Gompertz model. By fitting the model with empirical data, the parameters of therapy function are estimated. The reported research works have not presented any algorithm to determine the optimal parameters of therapy function. In this study, a logarithmic therapy function is entered in the drift term of the Gompertz model. Using the proposed control algorithm, the therapy function parameters are predicted and adaptively adjusted. To control the growth of tumour-cell population, its moments must be manipulated. This study employs the probability density function (PDF) control approach because of its ability to control all the process moments. A Fokker-Planck-based non-linear stochastic observer will be used to determine the PDF of the process. A cost function based on the difference between a predefined desired PDF and PDF of tumour-cell population is defined. Using the proposed algorithm, the therapy function parameters are adjusted in such a manner that the cost function is minimised. The existence of an optimal therapy function is also proved. The numerical results are finally given to demonstrate the effectiveness of the proposed method.

  16. Dinucleotide controlled null models for comparative RNA gene prediction.

    PubMed

    Gesell, Tanja; Washietl, Stefan

    2008-05-27

    Comparative prediction of RNA structures can be used to identify functional noncoding RNAs in genomic screens. It was shown recently by Babak et al. [BMC Bioinformatics. 8:33] that RNA gene prediction programs can be biased by the genomic dinucleotide content, in particular those programs using a thermodynamic folding model including stacking energies. As a consequence, there is need for dinucleotide-preserving control strategies to assess the significance of such predictions. While there have been randomization algorithms for single sequences for many years, the problem has remained challenging for multiple alignments and there is currently no algorithm available. We present a program called SISSIz that simulates multiple alignments of a given average dinucleotide content. Meeting additional requirements of an accurate null model, the randomized alignments are on average of the same sequence diversity and preserve local conservation and gap patterns. We make use of a phylogenetic substitution model that includes overlapping dependencies and site-specific rates. Using fast heuristics and a distance based approach, a tree is estimated under this model which is used to guide the simulations. The new algorithm is tested on vertebrate genomic alignments and the effect on RNA structure predictions is studied. In addition, we directly combined the new null model with the RNAalifold consensus folding algorithm giving a new variant of a thermodynamic structure based RNA gene finding program that is not biased by the dinucleotide content. SISSIz implements an efficient algorithm to randomize multiple alignments preserving dinucleotide content. It can be used to get more accurate estimates of false positive rates of existing programs, to produce negative controls for the training of machine learning based programs, or as standalone RNA gene finding program. Other applications in comparative genomics that require randomization of multiple alignments can be considered. SISSIz is available as open source C code that can be compiled for every major platform and downloaded here: http://sourceforge.net/projects/sissiz.

  17. Validation of Coevolving Residue Algorithms via Pipeline Sensitivity Analysis: ELSC and OMES and ZNMI, Oh My!

    PubMed Central

    Brown, Christopher A.; Brown, Kevin S.

    2010-01-01

    Correlated amino acid substitution algorithms attempt to discover groups of residues that co-fluctuate due to either structural or functional constraints. Although these algorithms could inform both ab initio protein folding calculations and evolutionary studies, their utility for these purposes has been hindered by a lack of confidence in their predictions due to hard to control sources of error. To complicate matters further, naive users are confronted with a multitude of methods to choose from, in addition to the mechanics of assembling and pruning a dataset. We first introduce a new pair scoring method, called ZNMI (Z-scored-product Normalized Mutual Information), which drastically improves the performance of mutual information for co-fluctuating residue prediction. Second and more important, we recast the process of finding coevolving residues in proteins as a data-processing pipeline inspired by the medical imaging literature. We construct an ensemble of alignment partitions that can be used in a cross-validation scheme to assess the effects of choices made during the procedure on the resulting predictions. This pipeline sensitivity study gives a measure of reproducibility (how similar are the predictions given perturbations to the pipeline?) and accuracy (are residue pairs with large couplings on average close in tertiary structure?). We choose a handful of published methods, along with ZNMI, and compare their reproducibility and accuracy on three diverse protein families. We find that (i) of the algorithms tested, while none appear to be both highly reproducible and accurate, ZNMI is one of the most accurate by far and (ii) while users should be wary of predictions drawn from a single alignment, considering an ensemble of sub-alignments can help to determine both highly accurate and reproducible couplings. Our cross-validation approach should be of interest both to developers and end users of algorithms that try to detect correlated amino acid substitutions. PMID:20531955

  18. Detecting uber-operons in prokaryotic genomes.

    PubMed

    Che, Dongsheng; Li, Guojun; Mao, Fenglou; Wu, Hongwei; Xu, Ying

    2006-01-01

    We present a study on computational identification of uber-operons in a prokaryotic genome, each of which represents a group of operons that are evolutionarily or functionally associated through operons in other (reference) genomes. Uber-operons represent a rich set of footprints of operon evolution, whose full utilization could lead to new and more powerful tools for elucidation of biological pathways and networks than what operons have provided, and a better understanding of prokaryotic genome structures and evolution. Our prediction algorithm predicts uber-operons through identifying groups of functionally or transcriptionally related operons, whose gene sets are conserved across the target and multiple reference genomes. Using this algorithm, we have predicted uber-operons for each of a group of 91 genomes, using the other 90 genomes as references. In particular, we predicted 158 uber-operons in Escherichia coli K12 covering 1830 genes, and found that many of the uber-operons correspond to parts of known regulons or biological pathways or are involved in highly related biological processes based on their Gene Ontology (GO) assignments. For some of the predicted uber-operons that are not parts of known regulons or pathways, our analyses indicate that their genes are highly likely to work together in the same biological processes, suggesting the possibility of new regulons and pathways. We believe that our uber-operon prediction provides a highly useful capability and a rich information source for elucidation of complex biological processes, such as pathways in microbes. All the prediction results are available at our Uber-Operon Database: http://csbl.bmb.uga.edu/uber, the first of its kind.

  19. Detecting uber-operons in prokaryotic genomes

    PubMed Central

    Che, Dongsheng; Li, Guojun; Mao, Fenglou; Wu, Hongwei; Xu, Ying

    2006-01-01

    We present a study on computational identification of uber-operons in a prokaryotic genome, each of which represents a group of operons that are evolutionarily or functionally associated through operons in other (reference) genomes. Uber-operons represent a rich set of footprints of operon evolution, whose full utilization could lead to new and more powerful tools for elucidation of biological pathways and networks than what operons have provided, and a better understanding of prokaryotic genome structures and evolution. Our prediction algorithm predicts uber-operons through identifying groups of functionally or transcriptionally related operons, whose gene sets are conserved across the target and multiple reference genomes. Using this algorithm, we have predicted uber-operons for each of a group of 91 genomes, using the other 90 genomes as references. In particular, we predicted 158 uber-operons in Escherichia coli K12 covering 1830 genes, and found that many of the uber-operons correspond to parts of known regulons or biological pathways or are involved in highly related biological processes based on their Gene Ontology (GO) assignments. For some of the predicted uber-operons that are not parts of known regulons or pathways, our analyses indicate that their genes are highly likely to work together in the same biological processes, suggesting the possibility of new regulons and pathways. We believe that our uber-operon prediction provides a highly useful capability and a rich information source for elucidation of complex biological processes, such as pathways in microbes. All the prediction results are available at our Uber-Operon Database: , the first of its kind. PMID:16682449

  20. Application of XGBoost algorithm in hourly PM2.5 concentration prediction

    NASA Astrophysics Data System (ADS)

    Pan, Bingyue

    2018-02-01

    In view of prediction techniques of hourly PM2.5 concentration in China, this paper applied the XGBoost(Extreme Gradient Boosting) algorithm to predict hourly PM2.5 concentration. The monitoring data of air quality in Tianjin city was analyzed by using XGBoost algorithm. The prediction performance of the XGBoost method is evaluated by comparing observed and predicted PM2.5 concentration using three measures of forecast accuracy. The XGBoost method is also compared with the random forest algorithm, multiple linear regression, decision tree regression and support vector machines for regression models using computational results. The results demonstrate that the XGBoost algorithm outperforms other data mining methods.

  1. Performance of Geno-Fuzzy Model on rainfall-runoff predictions in claypan watersheds

    USDA-ARS?s Scientific Manuscript database

    Fuzzy logic provides a relatively simple approach to simulate complex hydrological systems while accounting for the uncertainty of environmental variables. The objective of this study was to develop a fuzzy inference system (FIS) with genetic algorithm (GA) optimization for membership functions (MF...

  2. Clinical algorithms for the diagnosis and prognosis of interstitial lung disease in systemic sclerosis.

    PubMed

    Hax, Vanessa; Bredemeier, Markus; Didonet Moro, Ana Laura; Pavan, Thaís Rohde; Vieira, Marcelo Vasconcellos; Pitrez, Eduardo Hennemann; da Silva Chakr, Rafael Mendonça; Xavier, Ricardo Machado

    2017-10-01

    Interstitial lung disease (ILD) is currently the primary cause of death in systemic sclerosis (SSc). Thoracic high-resolution computed tomography (HRCT) is considered the gold standard for diagnosis. Recent studies have proposed several clinical algorithms to predict the diagnosis and prognosis of SSc-ILD. To test the clinical algorithms to predict the presence and prognosis of SSc-ILD and to evaluate the association of extent of ILD with mortality in a cohort of SSc patients. Retrospective cohort study, including 177 SSc patients assessed by clinical evaluation, laboratory tests, pulmonary function tests, and HRCT. Three clinical algorithms, combining lung auscultation, chest radiography, and percentage predicted forced vital capacity (FVC), were applied for the diagnosis of different extents of ILD on HRCT. Univariate and multivariate Cox proportional models were used to analyze the association of algorithms and the extent of ILD on HRCT with the risk of death using hazard ratios (HR). The prevalence of ILD on HRCT was 57.1% and 79 patients died (44.6%) in a median follow-up of 11.1 years. For identification of ILD with extent ≥10% and ≥20% on HRCT, all algorithms presented a high sensitivity (>89%) and a very low negative likelihood ratio (<0.16). For prognosis, survival was decreased for all algorithms, especially the algorithm C (HR = 3.47, 95% CI: 1.62-7.42), which identified the presence of ILD based on crackles on lung auscultation, findings on chest X-ray, or FVC <80%. Extensive disease as proposed by Goh et al. (extent of ILD > 20% on HRCT or, in indeterminate cases, FVC < 70%) had a significantly higher risk of death (HR = 3.42, 95% CI: 2.12-5.52). Survival was not different between patients with extent of 10% or 20% of ILD on HRCT, and analysis of 10-year mortality suggested that a threshold of 10% may also have a good predictive value for mortality. However, there is no clear cutoff above which mortality is sharply increased. Clinical algorithms had a good diagnostic performance for extents of SSc-ILD on HRCT with clinical and prognostic relevance (≥10% and ≥20%), and were also strongly related to mortality. Non-HRCT-based algorithms could be useful when HRCT is not available. This is the first study to replicate the prognostic algorithm proposed by Goh et al. in a developing country. Copyright © 2017 Elsevier Inc. All rights reserved.

  3. RBind: computational network method to predict RNA binding sites.

    PubMed

    Wang, Kaili; Jian, Yiren; Wang, Huiwen; Zeng, Chen; Zhao, Yunjie

    2018-04-26

    Non-coding RNA molecules play essential roles by interacting with other molecules to perform various biological functions. However, it is difficult to determine RNA structures due to their flexibility. At present, the number of experimentally solved RNA-ligand and RNA-protein structures is still insufficient. Therefore, binding sites prediction of non-coding RNA is required to understand their functions. Current RNA binding site prediction algorithms produce many false positive nucleotides that are distance away from the binding sites. Here, we present a network approach, RBind, to predict the RNA binding sites. We benchmarked RBind in RNA-ligand and RNA-protein datasets. The average accuracy of 0.82 in RNA-ligand and 0.63 in RNA-protein testing showed that this network strategy has a reliable accuracy for binding sites prediction. The codes and datasets are available at https://zhaolab.com.cn/RBind. yjzhaowh@mail.ccnu.edu.cn. Supplementary data are available at Bioinformatics online.

  4. Learning Instance-Specific Predictive Models

    PubMed Central

    Visweswaran, Shyam; Cooper, Gregory F.

    2013-01-01

    This paper introduces a Bayesian algorithm for constructing predictive models from data that are optimized to predict a target variable well for a particular instance. This algorithm learns Markov blanket models, carries out Bayesian model averaging over a set of models to predict a target variable of the instance at hand, and employs an instance-specific heuristic to locate a set of suitable models to average over. We call this method the instance-specific Markov blanket (ISMB) algorithm. The ISMB algorithm was evaluated on 21 UCI data sets using five different performance measures and its performance was compared to that of several commonly used predictive algorithms, including nave Bayes, C4.5 decision tree, logistic regression, neural networks, k-Nearest Neighbor, Lazy Bayesian Rules, and AdaBoost. Over all the data sets, the ISMB algorithm performed better on average on all performance measures against all the comparison algorithms. PMID:25045325

  5. Support Vector Machines Trained with Evolutionary Algorithms Employing Kernel Adatron for Large Scale Classification of Protein Structures.

    PubMed

    Arana-Daniel, Nancy; Gallegos, Alberto A; López-Franco, Carlos; Alanís, Alma Y; Morales, Jacob; López-Franco, Adriana

    2016-01-01

    With the increasing power of computers, the amount of data that can be processed in small periods of time has grown exponentially, as has the importance of classifying large-scale data efficiently. Support vector machines have shown good results classifying large amounts of high-dimensional data, such as data generated by protein structure prediction, spam recognition, medical diagnosis, optical character recognition and text classification, etc. Most state of the art approaches for large-scale learning use traditional optimization methods, such as quadratic programming or gradient descent, which makes the use of evolutionary algorithms for training support vector machines an area to be explored. The present paper proposes an approach that is simple to implement based on evolutionary algorithms and Kernel-Adatron for solving large-scale classification problems, focusing on protein structure prediction. The functional properties of proteins depend upon their three-dimensional structures. Knowing the structures of proteins is crucial for biology and can lead to improvements in areas such as medicine, agriculture and biofuels.

  6. A high order accurate finite element algorithm for high Reynolds number flow prediction

    NASA Technical Reports Server (NTRS)

    Baker, A. J.

    1978-01-01

    A Galerkin-weighted residuals formulation is employed to establish an implicit finite element solution algorithm for generally nonlinear initial-boundary value problems. Solution accuracy, and convergence rate with discretization refinement, are quantized in several error norms, by a systematic study of numerical solutions to several nonlinear parabolic and a hyperbolic partial differential equation characteristic of the equations governing fluid flows. Solutions are generated using selective linear, quadratic and cubic basis functions. Richardson extrapolation is employed to generate a higher-order accurate solution to facilitate isolation of truncation error in all norms. Extension of the mathematical theory underlying accuracy and convergence concepts for linear elliptic equations is predicted for equations characteristic of laminar and turbulent fluid flows at nonmodest Reynolds number. The nondiagonal initial-value matrix structure introduced by the finite element theory is determined intrinsic to improved solution accuracy and convergence. A factored Jacobian iteration algorithm is derived and evaluated to yield a consequential reduction in both computer storage and execution CPU requirements while retaining solution accuracy.

  7. How Are Mate Preferences Linked with Actual Mate Selection? Tests of Mate Preference Integration Algorithms Using Computer Simulations and Actual Mating Couples

    PubMed Central

    Conroy-Beam, Daniel; Buss, David M.

    2016-01-01

    Prior mate preference research has focused on the content of mate preferences. Yet in real life, people must select mates among potentials who vary along myriad dimensions. How do people incorporate information on many different mate preferences in order to choose which partner to pursue? Here, in Study 1, we compare seven candidate algorithms for integrating multiple mate preferences in a competitive agent-based model of human mate choice evolution. This model shows that a Euclidean algorithm is the most evolvable solution to the problem of selecting fitness-beneficial mates. Next, across three studies of actual couples (Study 2: n = 214; Study 3: n = 259; Study 4: n = 294) we apply the Euclidean algorithm toward predicting mate preference fulfillment overall and preference fulfillment as a function of mate value. Consistent with the hypothesis that mate preferences are integrated according to a Euclidean algorithm, we find that actual mates lie close in multidimensional preference space to the preferences of their partners. Moreover, this Euclidean preference fulfillment is greater for people who are higher in mate value, highlighting theoretically-predictable individual differences in who gets what they want. These new Euclidean tools have important implications for understanding real-world dynamics of mate selection. PMID:27276030

  8. How Are Mate Preferences Linked with Actual Mate Selection? Tests of Mate Preference Integration Algorithms Using Computer Simulations and Actual Mating Couples.

    PubMed

    Conroy-Beam, Daniel; Buss, David M

    2016-01-01

    Prior mate preference research has focused on the content of mate preferences. Yet in real life, people must select mates among potentials who vary along myriad dimensions. How do people incorporate information on many different mate preferences in order to choose which partner to pursue? Here, in Study 1, we compare seven candidate algorithms for integrating multiple mate preferences in a competitive agent-based model of human mate choice evolution. This model shows that a Euclidean algorithm is the most evolvable solution to the problem of selecting fitness-beneficial mates. Next, across three studies of actual couples (Study 2: n = 214; Study 3: n = 259; Study 4: n = 294) we apply the Euclidean algorithm toward predicting mate preference fulfillment overall and preference fulfillment as a function of mate value. Consistent with the hypothesis that mate preferences are integrated according to a Euclidean algorithm, we find that actual mates lie close in multidimensional preference space to the preferences of their partners. Moreover, this Euclidean preference fulfillment is greater for people who are higher in mate value, highlighting theoretically-predictable individual differences in who gets what they want. These new Euclidean tools have important implications for understanding real-world dynamics of mate selection.

  9. A High Performance Cloud-Based Protein-Ligand Docking Prediction Algorithm

    PubMed Central

    Chen, Jui-Le; Yang, Chu-Sing

    2013-01-01

    The potential of predicting druggability for a particular disease by integrating biological and computer science technologies has witnessed success in recent years. Although the computer science technologies can be used to reduce the costs of the pharmaceutical research, the computation time of the structure-based protein-ligand docking prediction is still unsatisfied until now. Hence, in this paper, a novel docking prediction algorithm, named fast cloud-based protein-ligand docking prediction algorithm (FCPLDPA), is presented to accelerate the docking prediction algorithm. The proposed algorithm works by leveraging two high-performance operators: (1) the novel migration (information exchange) operator is designed specially for cloud-based environments to reduce the computation time; (2) the efficient operator is aimed at filtering out the worst search directions. Our simulation results illustrate that the proposed method outperforms the other docking algorithms compared in this paper in terms of both the computation time and the quality of the end result. PMID:23762864

  10. Prediction of microRNA target genes using an efficient genetic algorithm-based decision tree.

    PubMed

    Rabiee-Ghahfarrokhi, Behzad; Rafiei, Fariba; Niknafs, Ali Akbar; Zamani, Behzad

    2015-01-01

    MicroRNAs (miRNAs) are small, non-coding RNA molecules that regulate gene expression in almost all plants and animals. They play an important role in key processes, such as proliferation, apoptosis, and pathogen-host interactions. Nevertheless, the mechanisms by which miRNAs act are not fully understood. The first step toward unraveling the function of a particular miRNA is the identification of its direct targets. This step has shown to be quite challenging in animals primarily because of incomplete complementarities between miRNA and target mRNAs. In recent years, the use of machine-learning techniques has greatly increased the prediction of miRNA targets, avoiding the need for costly and time-consuming experiments to achieve miRNA targets experimentally. Among the most important machine-learning algorithms are decision trees, which classify data based on extracted rules. In the present work, we used a genetic algorithm in combination with C4.5 decision tree for prediction of miRNA targets. We applied our proposed method to a validated human datasets. We nearly achieved 93.9% accuracy of classification, which could be related to the selection of best rules.

  11. Simulating polarized light scattering in terrestrial snow based on bicontinuous random medium and Monte Carlo ray tracing

    NASA Astrophysics Data System (ADS)

    Xiong, Chuan; Shi, Jiancheng

    2014-01-01

    To date, the light scattering models of snow consider very little about the real snow microstructures. The ideal spherical or other single shaped particle assumptions in previous snow light scattering models can cause error in light scattering modeling of snow and further cause errors in remote sensing inversion algorithms. This paper tries to build up a snow polarized reflectance model based on bicontinuous medium, with which the real snow microstructure is considered. The accurate specific surface area of bicontinuous medium can be analytically derived. The polarized Monte Carlo ray tracing technique is applied to the computer generated bicontinuous medium. With proper algorithms, the snow surface albedo, bidirectional reflectance distribution function (BRDF) and polarized BRDF can be simulated. The validation of model predicted spectral albedo and bidirectional reflectance factor (BRF) using experiment data shows good results. The relationship between snow surface albedo and snow specific surface area (SSA) were predicted, and this relationship can be used for future improvement of snow specific surface area (SSA) inversion algorithms. The model predicted polarized reflectance is validated and proved accurate, which can be further applied in polarized remote sensing.

  12. Prediction of microRNA target genes using an efficient genetic algorithm-based decision tree

    PubMed Central

    Rabiee-Ghahfarrokhi, Behzad; Rafiei, Fariba; Niknafs, Ali Akbar; Zamani, Behzad

    2015-01-01

    MicroRNAs (miRNAs) are small, non-coding RNA molecules that regulate gene expression in almost all plants and animals. They play an important role in key processes, such as proliferation, apoptosis, and pathogen–host interactions. Nevertheless, the mechanisms by which miRNAs act are not fully understood. The first step toward unraveling the function of a particular miRNA is the identification of its direct targets. This step has shown to be quite challenging in animals primarily because of incomplete complementarities between miRNA and target mRNAs. In recent years, the use of machine-learning techniques has greatly increased the prediction of miRNA targets, avoiding the need for costly and time-consuming experiments to achieve miRNA targets experimentally. Among the most important machine-learning algorithms are decision trees, which classify data based on extracted rules. In the present work, we used a genetic algorithm in combination with C4.5 decision tree for prediction of miRNA targets. We applied our proposed method to a validated human datasets. We nearly achieved 93.9% accuracy of classification, which could be related to the selection of best rules. PMID:26649272

  13. A test to evaluate the earthquake prediction algorithm, M8

    USGS Publications Warehouse

    Healy, John H.; Kossobokov, Vladimir G.; Dewey, James W.

    1992-01-01

    A test of the algorithm M8 is described. The test is constructed to meet four rules, which we propose to be applicable to the test of any method for earthquake prediction:  1. An earthquake prediction technique should be presented as a well documented, logical algorithm that can be used by  investigators without restrictions. 2. The algorithm should be coded in a common programming language and implementable on widely available computer systems. 3. A test of the earthquake prediction technique should involve future predictions with a black box version of the algorithm in which potentially adjustable parameters are fixed in advance. The source of the input data must be defined and ambiguities in these data must be resolved automatically by the algorithm. 4. At least one reasonable null hypothesis should be stated in advance of testing the earthquake prediction method, and it should be stated how this null hypothesis will be used to estimate the statistical significance of the earthquake predictions. The M8 algorithm has successfully predicted several destructive earthquakes, in the sense that the earthquakes occurred inside regions with linear dimensions from 384 to 854 km that the algorithm had identified as being in times of increased probability for strong earthquakes. In addition, M8 has successfully "post predicted" high percentages of strong earthquakes in regions to which it has been applied in retroactive studies. The statistical significance of previous predictions has not been established, however, and post-prediction studies in general are notoriously subject to success-enhancement through hindsight. Nor has it been determined how much more precise an M8 prediction might be than forecasts and probability-of-occurrence estimates made by other techniques. We view our test of M8 both as a means to better determine the effectiveness of M8 and as an experimental structure within which to make observations that might lead to improvements in the algorithm or conceivably lead to a radically different approach to earthquake prediction.

  14. Disease Prediction based on Functional Connectomes using a Scalable and Spatially-Informed Support Vector Machine

    PubMed Central

    Watanabe, Takanori; Kessler, Daniel; Scott, Clayton; Angstadt, Michael; Sripada, Chandra

    2014-01-01

    Substantial evidence indicates that major psychiatric disorders are associated with distributed neural dysconnectivity, leading to strong interest in using neuroimaging methods to accurately predict disorder status. In this work, we are specifically interested in a multivariate approach that uses features derived from whole-brain resting state functional connectomes. However, functional connectomes reside in a high dimensional space, which complicates model interpretation and introduces numerous statistical and computational challenges. Traditional feature selection techniques are used to reduce data dimensionality, but are blind to the spatial structure of the connectomes. We propose a regularization framework where the 6-D structure of the functional connectome (defined by pairs of points in 3-D space) is explicitly taken into account via the fused Lasso or the GraphNet regularizer. Our method only restricts the loss function to be convex and margin-based, allowing non-differentiable loss functions such as the hinge-loss to be used. Using the fused Lasso or GraphNet regularizer with the hinge-loss leads to a structured sparse support vector machine (SVM) with embedded feature selection. We introduce a novel efficient optimization algorithm based on the augmented Lagrangian and the classical alternating direction method, which can solve both fused Lasso and GraphNet regularized SVM with very little modification. We also demonstrate that the inner subproblems of the algorithm can be solved efficiently in analytic form by coupling the variable splitting strategy with a data augmentation scheme. Experiments on simulated data and resting state scans from a large schizophrenia dataset show that our proposed approach can identify predictive regions that are spatially contiguous in the 6-D “connectome space,” offering an additional layer of interpretability that could provide new insights about various disease processes. PMID:24704268

  15. Optimal design of minimum mean-square error noise reduction algorithms using the simulated annealing technique.

    PubMed

    Bai, Mingsian R; Hsieh, Ping-Ju; Hur, Kur-Nan

    2009-02-01

    The performance of the minimum mean-square error noise reduction (MMSE-NR) algorithm in conjunction with time-recursive averaging (TRA) for noise estimation is found to be very sensitive to the choice of two recursion parameters. To address this problem in a more systematic manner, this paper proposes an optimization method to efficiently search the optimal parameters of the MMSE-TRA-NR algorithms. The objective function is based on a regression model, whereas the optimization process is carried out with the simulated annealing algorithm that is well suited for problems with many local optima. Another NR algorithm proposed in the paper employs linear prediction coding as a preprocessor for extracting the correlated portion of human speech. Objective and subjective tests were undertaken to compare the optimized MMSE-TRA-NR algorithm with several conventional NR algorithms. The results of subjective tests were processed by using analysis of variance to justify the statistic significance. A post hoc test, Tukey's Honestly Significant Difference, was conducted to further assess the pairwise difference between the NR algorithms.

  16. Self-consistency test reveals systematic bias in programs for prediction change of stability upon mutation.

    PubMed

    Usmanova, Dinara R; Bogatyreva, Natalya S; Ariño Bernad, Joan; Eremina, Aleksandra A; Gorshkova, Anastasiya A; Kanevskiy, German M; Lonishin, Lyubov R; Meister, Alexander V; Yakupova, Alisa G; Kondrashov, Fyodor A; Ivankov, Dmitry N

    2018-05-02

    Computational prediction of the effect of mutations on protein stability is used by researchers in many fields. The utility of the prediction methods is affected by their accuracy and bias. Bias, a systematic shift of the predicted change of stability, has been noted as an issue for several methods, but has not been investigated systematically. Presence of the bias may lead to misleading results especially when exploring the effects of combination of different mutations. Here we use a protocol to measure the bias as a function of the number of introduced mutations. It is based on a self-consistency test of the reciprocity the effect of a mutation. An advantage of the used approach is that it relies solely on crystal structures without experimentally measured stability values. We applied the protocol to four popular algorithms predicting change of protein stability upon mutation, FoldX, Eris, Rosetta, and I-Mutant, and found an inherent bias. For one program, FoldX, we manage to substantially reduce the bias using additional relaxation by Modeller. Authors using algorithms for predicting effects of mutations should be aware of the bias described here. ivankov13@gmail.com. Supplementary data are available at Bioinformatics online.

  17. Molecular beacon sequence design algorithm.

    PubMed

    Monroe, W Todd; Haselton, Frederick R

    2003-01-01

    A method based on Web-based tools is presented to design optimally functioning molecular beacons. Molecular beacons, fluorogenic hybridization probes, are a powerful tool for the rapid and specific detection of a particular nucleic acid sequence. However, their synthesis costs can be considerable. Since molecular beacon performance is based on its sequence, it is imperative to rationally design an optimal sequence before synthesis. The algorithm presented here uses simple Microsoft Excel formulas and macros to rank candidate sequences. This analysis is carried out using mfold structural predictions along with other free Web-based tools. For smaller laboratories where molecular beacons are not the focus of research, the public domain algorithm described here may be usefully employed to aid in molecular beacon design.

  18. External validation of the international risk prediction algorithm for major depressive episode in the US general population: the PredictD-US study.

    PubMed

    Nigatu, Yeshambel T; Liu, Yan; Wang, JianLi

    2016-07-22

    Multivariable risk prediction algorithms are useful for making clinical decisions and for health planning. While prediction algorithms for new onset of major depression in the primary care attendees in Europe and elsewhere have been developed, the performance of these algorithms in different populations is not known. The objective of this study was to validate the PredictD algorithm for new onset of major depressive episode (MDE) in the US general population. Longitudinal study design was conducted with approximate 3-year follow-up data from a nationally representative sample of the US general population. A total of 29,621 individuals who participated in Wave 1 and 2 of the US National Epidemiologic Survey on Alcohol and Related Conditions (NESARC) and who did not have an MDE in the past year at Wave 1 were included. The PredictD algorithm was directly applied to the selected participants. MDE was assessed by the Alcohol Use Disorder and Associated Disabilities Interview Schedule, based on the DSM-IV criteria. Among the participants, 8 % developed an MDE over three years. The PredictD algorithm had acceptable discriminative power (C-statistics = 0.708, 95 % CI: 0.696, 0.720), but poor calibration (p < 0.001) with the NESARC data. In the European primary care attendees, the algorithm had a C-statistics of 0.790 (95 % CI: 0.767, 0.813) with a perfect calibration. The PredictD algorithm has acceptable discrimination, but the calibration capacity was poor in the US general population despite of re-calibration. Therefore, based on the results, at current stage, the use of PredictD in the US general population for predicting individual risk of MDE is not encouraged. More independent validation research is needed.

  19. Characteristics of genomic signatures derived using univariate methods and mechanistically anchored functional descriptors for predicting drug- and xenobiotic-induced nephrotoxicity.

    PubMed

    Shi, Weiwei; Bugrim, Andrej; Nikolsky, Yuri; Nikolskya, Tatiana; Brennan, Richard J

    2008-01-01

    ABSTRACT The ideal toxicity biomarker is composed of the properties of prediction (is detected prior to traditional pathological signs of injury), accuracy (high sensitivity and specificity), and mechanistic relationships to the endpoint measured (biological relevance). Gene expression-based toxicity biomarkers ("signatures") have shown good predictive power and accuracy, but are difficult to interpret biologically. We have compared different statistical methods of feature selection with knowledge-based approaches, using GeneGo's database of canonical pathway maps, to generate gene sets for the classification of renal tubule toxicity. The gene set selection algorithms include four univariate analyses: t-statistics, fold-change, B-statistics, and RankProd, and their combination and overlap for the identification of differentially expressed probes. Enrichment analysis following the results of the four univariate analyses, Hotelling T-square test, and, finally out-of-bag selection, a variant of cross-validation, were used to identify canonical pathway maps-sets of genes coordinately involved in key biological processes-with classification power. Differentially expressed genes identified by the different statistical univariate analyses all generated reasonably performing classifiers of tubule toxicity. Maps identified by enrichment analysis or Hotelling T-square had lower classification power, but highlighted perturbed lipid homeostasis as a common discriminator of nephrotoxic treatments. The out-of-bag method yielded the best functionally integrated classifier. The map "ephrins signaling" performed comparably to a classifier derived using sparse linear programming, a machine learning algorithm, and represents a signaling network specifically involved in renal tubule development and integrity. Such functional descriptors of toxicity promise to better integrate predictive toxicogenomics with mechanistic analysis, facilitating the interpretation and risk assessment of predictive genomic investigations.

  20. PyRosetta: a script-based interface for implementing molecular modeling algorithms using Rosetta.

    PubMed

    Chaudhury, Sidhartha; Lyskov, Sergey; Gray, Jeffrey J

    2010-03-01

    PyRosetta is a stand-alone Python-based implementation of the Rosetta molecular modeling package that allows users to write custom structure prediction and design algorithms using the major Rosetta sampling and scoring functions. PyRosetta contains Python bindings to libraries that define Rosetta functions including those for accessing and manipulating protein structure, calculating energies and running Monte Carlo-based simulations. PyRosetta can be used in two ways: (i) interactively, using iPython and (ii) script-based, using Python scripting. Interactive mode contains a number of help features and is ideal for beginners while script-mode is best suited for algorithm development. PyRosetta has similar computational performance to Rosetta, can be easily scaled up for cluster applications and has been implemented for algorithms demonstrating protein docking, protein folding, loop modeling and design. PyRosetta is a stand-alone package available at http://www.pyrosetta.org under the Rosetta license which is free for academic and non-profit users. A tutorial, user's manual and sample scripts demonstrating usage are also available on the web site.

  1. Efficient Actor-Critic Algorithm with Hierarchical Model Learning and Planning

    PubMed Central

    Fu, QiMing

    2016-01-01

    To improve the convergence rate and the sample efficiency, two efficient learning methods AC-HMLP and RAC-HMLP (AC-HMLP with ℓ 2-regularization) are proposed by combining actor-critic algorithm with hierarchical model learning and planning. The hierarchical models consisting of the local and the global models, which are learned at the same time during learning of the value function and the policy, are approximated by local linear regression (LLR) and linear function approximation (LFA), respectively. Both the local model and the global model are applied to generate samples for planning; the former is used only if the state-prediction error does not surpass the threshold at each time step, while the latter is utilized at the end of each episode. The purpose of taking both models is to improve the sample efficiency and accelerate the convergence rate of the whole algorithm through fully utilizing the local and global information. Experimentally, AC-HMLP and RAC-HMLP are compared with three representative algorithms on two Reinforcement Learning (RL) benchmark problems. The results demonstrate that they perform best in terms of convergence rate and sample efficiency. PMID:27795704

  2. PyRosetta: a script-based interface for implementing molecular modeling algorithms using Rosetta

    PubMed Central

    Chaudhury, Sidhartha; Lyskov, Sergey; Gray, Jeffrey J.

    2010-01-01

    Summary: PyRosetta is a stand-alone Python-based implementation of the Rosetta molecular modeling package that allows users to write custom structure prediction and design algorithms using the major Rosetta sampling and scoring functions. PyRosetta contains Python bindings to libraries that define Rosetta functions including those for accessing and manipulating protein structure, calculating energies and running Monte Carlo-based simulations. PyRosetta can be used in two ways: (i) interactively, using iPython and (ii) script-based, using Python scripting. Interactive mode contains a number of help features and is ideal for beginners while script-mode is best suited for algorithm development. PyRosetta has similar computational performance to Rosetta, can be easily scaled up for cluster applications and has been implemented for algorithms demonstrating protein docking, protein folding, loop modeling and design. Availability: PyRosetta is a stand-alone package available at http://www.pyrosetta.org under the Rosetta license which is free for academic and non-profit users. A tutorial, user's manual and sample scripts demonstrating usage are also available on the web site. Contact: pyrosetta@graylab.jhu.edu PMID:20061306

  3. Efficient Actor-Critic Algorithm with Hierarchical Model Learning and Planning.

    PubMed

    Zhong, Shan; Liu, Quan; Fu, QiMing

    2016-01-01

    To improve the convergence rate and the sample efficiency, two efficient learning methods AC-HMLP and RAC-HMLP (AC-HMLP with ℓ 2 -regularization) are proposed by combining actor-critic algorithm with hierarchical model learning and planning. The hierarchical models consisting of the local and the global models, which are learned at the same time during learning of the value function and the policy, are approximated by local linear regression (LLR) and linear function approximation (LFA), respectively. Both the local model and the global model are applied to generate samples for planning; the former is used only if the state-prediction error does not surpass the threshold at each time step, while the latter is utilized at the end of each episode. The purpose of taking both models is to improve the sample efficiency and accelerate the convergence rate of the whole algorithm through fully utilizing the local and global information. Experimentally, AC-HMLP and RAC-HMLP are compared with three representative algorithms on two Reinforcement Learning (RL) benchmark problems. The results demonstrate that they perform best in terms of convergence rate and sample efficiency.

  4. Putative synaptic genes defined from a Drosophila whole body developmental transcriptome by a machine learning approach.

    PubMed

    Pazos Obregón, Flavio; Papalardo, Cecilia; Castro, Sebastián; Guerberoff, Gustavo; Cantera, Rafael

    2015-09-15

    Assembly and function of neuronal synapses require the coordinated expression of a yet undetermined set of genes. Although roughly a thousand genes are expected to be important for this function in Drosophila melanogaster, just a few hundreds of them are known so far. In this work we trained three learning algorithms to predict a "synaptic function" for genes of Drosophila using data from a whole-body developmental transcriptome published by others. Using statistical and biological criteria to analyze and combine the predictions, we obtained a gene catalogue that is highly enriched in genes of relevance for Drosophila synapse assembly and function but still not recognized as such. The utility of our approach is that it reduces the number of genes to be tested through hypothesis-driven experimentation.

  5. Gene function prediction with gene interaction networks: a context graph kernel approach.

    PubMed

    Li, Xin; Chen, Hsinchun; Li, Jiexun; Zhang, Zhu

    2010-01-01

    Predicting gene functions is a challenge for biologists in the postgenomic era. Interactions among genes and their products compose networks that can be used to infer gene functions. Most previous studies adopt a linkage assumption, i.e., they assume that gene interactions indicate functional similarities between connected genes. In this study, we propose to use a gene's context graph, i.e., the gene interaction network associated with the focal gene, to infer its functions. In a kernel-based machine-learning framework, we design a context graph kernel to capture the information in context graphs. Our experimental study on a testbed of p53-related genes demonstrates the advantage of using indirect gene interactions and shows the empirical superiority of the proposed approach over linkage-assumption-based methods, such as the algorithm to minimize inconsistent connected genes and diffusion kernels.

  6. REGIONAL SOIL WATER RETENTION IN THE CONTIGUOUS US: SOURCES OF VARIABILITY AND VOLCANIC SOIL EFFECTS

    EPA Science Inventory

    Water retention of mineral soil is often well predicted using algorithms (pedotransfer functions) with basic soil properties but the spatial variability of these properties has not been well characterized. A further source of uncertainty is that water retention by volcanic soils...

  7. Mono-isotope Prediction for Mass Spectra Using Bayes Network.

    PubMed

    Li, Hui; Liu, Chunmei; Rwebangira, Mugizi Robert; Burge, Legand

    2014-12-01

    Mass spectrometry is one of the widely utilized important methods to study protein functions and components. The challenge of mono-isotope pattern recognition from large scale protein mass spectral data needs computational algorithms and tools to speed up the analysis and improve the analytic results. We utilized naïve Bayes network as the classifier with the assumption that the selected features are independent to predict mono-isotope pattern from mass spectrometry. Mono-isotopes detected from validated theoretical spectra were used as prior information in the Bayes method. Three main features extracted from the dataset were employed as independent variables in our model. The application of the proposed algorithm to publicMo dataset demonstrates that our naïve Bayes classifier is advantageous over existing methods in both accuracy and sensitivity.

  8. Using multicriteria decision analysis during drug development to predict reimbursement decisions.

    PubMed

    Williams, Paul; Mauskopf, Josephine; Lebiecki, Jake; Kilburg, Anne

    2014-01-01

    Pharmaceutical companies design clinical development programs to generate the data that they believe will support reimbursement for the experimental compound. The objective of the study was to present a process for using multicriteria decision analysis (MCDA) by a pharmaceutical company to estimate the probability of a positive recommendation for reimbursement for a new drug given drug and environmental attributes. The MCDA process included 1) selection of decisions makers who were representative of those making reimbursement decisions in a specific country; 2) two pre-workshop questionnaires to identify the most important attributes and their relative importance for a positive recommendation for a new drug; 3) a 1-day workshop during which participants undertook three tasks: i) they agreed on a final list of decision attributes and their importance weights, ii) they developed level descriptions for these attributes and mapped each attribute level to a value function, and iii) they developed profiles for hypothetical products 'just likely to be reimbursed'; and 4) use of the data from the workshop to develop a prediction algorithm based on a logistic regression analysis. The MCDA process is illustrated using case studies for three countries, the United Kingdom, Germany, and Spain. The extent to which the prediction algorithms for each country captured the decision processes for the workshop participants in our case studies was tested using a post-meeting questionnaire that asked the participants to make recommendations for a set of hypothetical products. The data collected in the case study workshops resulted in a prediction algorithm: 1) for the United Kingdom, the probability of a positive recommendation for different ranges of cost-effectiveness ratios; 2) for Spain, the probability of a positive recommendation at the national and regional levels; and 3) for Germany, the probability of a determination of clinical benefit. The results from the post-meeting questionnaire revealed a high predictive value for the algorithm developed using MCDA. Prediction algorithms developed using MCDA could be used by pharmaceutical companies when designing their clinical development programs to estimate the likelihood of a favourable reimbursement recommendation for different product profiles and for different positions in the treatment pathway.

  9. Using multicriteria decision analysis during drug development to predict reimbursement decisions

    PubMed Central

    Williams, Paul; Mauskopf, Josephine; Lebiecki, Jake; Kilburg, Anne

    2014-01-01

    Background Pharmaceutical companies design clinical development programs to generate the data that they believe will support reimbursement for the experimental compound. Objective The objective of the study was to present a process for using multicriteria decision analysis (MCDA) by a pharmaceutical company to estimate the probability of a positive recommendation for reimbursement for a new drug given drug and environmental attributes. Methods The MCDA process included 1) selection of decisions makers who were representative of those making reimbursement decisions in a specific country; 2) two pre-workshop questionnaires to identify the most important attributes and their relative importance for a positive recommendation for a new drug; 3) a 1-day workshop during which participants undertook three tasks: i) they agreed on a final list of decision attributes and their importance weights, ii) they developed level descriptions for these attributes and mapped each attribute level to a value function, and iii) they developed profiles for hypothetical products ‘just likely to be reimbursed’; and 4) use of the data from the workshop to develop a prediction algorithm based on a logistic regression analysis. The MCDA process is illustrated using case studies for three countries, the United Kingdom, Germany, and Spain. The extent to which the prediction algorithms for each country captured the decision processes for the workshop participants in our case studies was tested using a post-meeting questionnaire that asked the participants to make recommendations for a set of hypothetical products. Results The data collected in the case study workshops resulted in a prediction algorithm: 1) for the United Kingdom, the probability of a positive recommendation for different ranges of cost-effectiveness ratios; 2) for Spain, the probability of a positive recommendation at the national and regional levels; and 3) for Germany, the probability of a determination of clinical benefit. The results from the post-meeting questionnaire revealed a high predictive value for the algorithm developed using MCDA. Conclusions Prediction algorithms developed using MCDA could be used by pharmaceutical companies when designing their clinical development programs to estimate the likelihood of a favourable reimbursement recommendation for different product profiles and for different positions in the treatment pathway.

  10. Fast Demand Forecast of Electric Vehicle Charging Stations for Cell Phone Application

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Majidpour, Mostafa; Qiu, Charlie; Chung, Ching-Yen

    This paper describes the core cellphone application algorithm which has been implemented for the prediction of energy consumption at Electric Vehicle (EV) Charging Stations at UCLA. For this interactive user application, the total time of accessing database, processing the data and making the prediction, needs to be within a few seconds. We analyze four relatively fast Machine Learning based time series prediction algorithms for our prediction engine: Historical Average, kNearest Neighbor, Weighted k-Nearest Neighbor, and Lazy Learning. The Nearest Neighbor algorithm (k Nearest Neighbor with k=1) shows better performance and is selected to be the prediction algorithm implemented for themore » cellphone application. Two applications have been designed on top of the prediction algorithm: one predicts the expected available energy at the station and the other one predicts the expected charging finishing time. The total time, including accessing the database, data processing, and prediction is about one second for both applications.« less

  11. Linear Scaling Density Functional Calculations with Gaussian Orbitals

    NASA Technical Reports Server (NTRS)

    Scuseria, Gustavo E.

    1999-01-01

    Recent advances in linear scaling algorithms that circumvent the computational bottlenecks of large-scale electronic structure simulations make it possible to carry out density functional calculations with Gaussian orbitals on molecules containing more than 1000 atoms and 15000 basis functions using current workstations and personal computers. This paper discusses the recent theoretical developments that have led to these advances and demonstrates in a series of benchmark calculations the present capabilities of state-of-the-art computational quantum chemistry programs for the prediction of molecular structure and properties.

  12. A density distribution algorithm for bone incorporating local orthotropy, modal analysis and theories of cellular solids.

    PubMed

    Impelluso, Thomas J

    2003-06-01

    An algorithm for bone remodeling is presented which allows for both a redistribution of density and a continuous change of principal material directions for the orthotropic material properties of bone. It employs a modal analysis to add density for growth and a local effective strain based analysis to redistribute density. General re-distribution functions are presented. The model utilizes theories of cellular solids to relate density and strength. The code predicts the same general density distributions and local orthotropy as observed in reality.

  13. Feature-based classification of amino acid substitutions outside conserved functional protein domains.

    PubMed

    Gemovic, Branislava; Perovic, Vladimir; Glisic, Sanja; Veljkovic, Nevena

    2013-01-01

    There are more than 500 amino acid substitutions in each human genome, and bioinformatics tools irreplaceably contribute to determination of their functional effects. We have developed feature-based algorithm for the detection of mutations outside conserved functional domains (CFDs) and compared its classification efficacy with the most commonly used phylogeny-based tools, PolyPhen-2 and SIFT. The new algorithm is based on the informational spectrum method (ISM), a feature-based technique, and statistical analysis. Our dataset contained neutral polymorphisms and mutations associated with myeloid malignancies from epigenetic regulators ASXL1, DNMT3A, EZH2, and TET2. PolyPhen-2 and SIFT had significantly lower accuracies in predicting the effects of amino acid substitutions outside CFDs than expected, with especially low sensitivity. On the other hand, only ISM algorithm showed statistically significant classification of these sequences. It outperformed PolyPhen-2 and SIFT by 15% and 13%, respectively. These results suggest that feature-based methods, like ISM, are more suitable for the classification of amino acid substitutions outside CFDs than phylogeny-based tools.

  14. Estimating the size of the solution space of metabolic networks

    PubMed Central

    Braunstein, Alfredo; Mulet, Roberto; Pagnani, Andrea

    2008-01-01

    Background Cellular metabolism is one of the most investigated system of biological interactions. While the topological nature of individual reactions and pathways in the network is quite well understood there is still a lack of comprehension regarding the global functional behavior of the system. In the last few years flux-balance analysis (FBA) has been the most successful and widely used technique for studying metabolism at system level. This method strongly relies on the hypothesis that the organism maximizes an objective function. However only under very specific biological conditions (e.g. maximization of biomass for E. coli in reach nutrient medium) the cell seems to obey such optimization law. A more refined analysis not assuming extremization remains an elusive task for large metabolic systems due to algorithmic limitations. Results In this work we propose a novel algorithmic strategy that provides an efficient characterization of the whole set of stable fluxes compatible with the metabolic constraints. Using a technique derived from the fields of statistical physics and information theory we designed a message-passing algorithm to estimate the size of the affine space containing all possible steady-state flux distributions of metabolic networks. The algorithm, based on the well known Bethe approximation, can be used to approximately compute the volume of a non full-dimensional convex polytope in high dimensions. We first compare the accuracy of the predictions with an exact algorithm on small random metabolic networks. We also verify that the predictions of the algorithm match closely those of Monte Carlo based methods in the case of the Red Blood Cell metabolic network. Then we test the effect of gene knock-outs on the size of the solution space in the case of E. coli central metabolism. Finally we analyze the statistical properties of the average fluxes of the reactions in the E. coli metabolic network. Conclusion We propose a novel efficient distributed algorithmic strategy to estimate the size and shape of the affine space of a non full-dimensional convex polytope in high dimensions. The method is shown to obtain, quantitatively and qualitatively compatible results with the ones of standard algorithms (where this comparison is possible) being still efficient on the analysis of large biological systems, where exact deterministic methods experience an explosion in algorithmic time. The algorithm we propose can be considered as an alternative to Monte Carlo sampling methods. PMID:18489757

  15. Heuristics and biases: interactions among numeracy, ability, and reflectiveness predict normative responding

    PubMed Central

    Klaczynski, Paul A.

    2014-01-01

    In Stanovich's (2009a, 2011) dual-process theory, analytic processing occurs in the algorithmic and reflective minds. Thinking dispositions, indexes of reflective mind functioning, are believed to regulate operations at the algorithmic level, indexed by general cognitive ability. General limitations at the algorithmic level impose constraints on, and affect the adequacy of, specific strategies and abilities (e.g., numeracy). In a study of 216 undergraduates, the hypothesis that thinking dispositions and general ability moderate the relationship between numeracy (understanding of mathematical concepts and attention to numerical information) and normative responses on probabilistic heuristics and biases (HB) problems was tested. Although all three individual difference measures predicted normative responses, the numeracy-normative response association depended on thinking dispositions and general ability. Specifically, numeracy directly affected normative responding only at relatively high levels of thinking dispositions and general ability. At low levels of thinking dispositions, neither general ability nor numeric skills related to normative responses. Discussion focuses on the consistency of these findings with the hypothesis that the implementation of specific skills is constrained by limitations at both the reflective level and the algorithmic level, methodological limitations that prohibit definitive conclusions, and alternative explanations. PMID:25071639

  16. Heuristics and biases: interactions among numeracy, ability, and reflectiveness predict normative responding.

    PubMed

    Klaczynski, Paul A

    2014-01-01

    In Stanovich's (2009a, 2011) dual-process theory, analytic processing occurs in the algorithmic and reflective minds. Thinking dispositions, indexes of reflective mind functioning, are believed to regulate operations at the algorithmic level, indexed by general cognitive ability. General limitations at the algorithmic level impose constraints on, and affect the adequacy of, specific strategies and abilities (e.g., numeracy). In a study of 216 undergraduates, the hypothesis that thinking dispositions and general ability moderate the relationship between numeracy (understanding of mathematical concepts and attention to numerical information) and normative responses on probabilistic heuristics and biases (HB) problems was tested. Although all three individual difference measures predicted normative responses, the numeracy-normative response association depended on thinking dispositions and general ability. Specifically, numeracy directly affected normative responding only at relatively high levels of thinking dispositions and general ability. At low levels of thinking dispositions, neither general ability nor numeric skills related to normative responses. Discussion focuses on the consistency of these findings with the hypothesis that the implementation of specific skills is constrained by limitations at both the reflective level and the algorithmic level, methodological limitations that prohibit definitive conclusions, and alternative explanations.

  17. [The therapeutic drug monitoring network server of tacrolimus for Chinese renal transplant patients].

    PubMed

    Deng, Chen-Hui; Zhang, Guan-Min; Bi, Shan-Shan; Zhou, Tian-Yan; Lu, Wei

    2011-07-01

    This study is to develop a therapeutic drug monitoring (TDM) network server of tacrolimus for Chinese renal transplant patients, which can facilitate doctor to manage patients' information and provide three levels of predictions. Database management system MySQL was employed to build and manage the database of patients and doctors' information, and hypertext mark-up language (HTML) and Java server pages (JSP) technology were employed to construct network server for database management. Based on the population pharmacokinetic model of tacrolimus for Chinese renal transplant patients, above program languages were used to construct the population prediction and subpopulation prediction modules. Based on Bayesian principle and maximization of the posterior probability function, an objective function was established, and minimized by an optimization algorithm to estimate patient's individual pharmacokinetic parameters. It is proved that the network server has the basic functions for database management and three levels of prediction to aid doctor to optimize the regimen of tacrolimus for Chinese renal transplant patients.

  18. Optimization on Paddy Crops in Central Java (with Solver, SVD on Least Square and ACO (Ant Colony Algorithm))

    NASA Astrophysics Data System (ADS)

    Parhusip, H. A.; Trihandaru, S.; Susanto, B.; Prasetyo, S. Y. J.; Agus, Y. H.; Simanjuntak, B. H.

    2017-03-01

    Several algorithms and objective functions on paddy crops have been studied to get optimal paddy crops in Central Java based on the data given from Surakarta and Boyolali. The algorithms are linear solver, least square and Ant Colony Algorithms (ACO) to develop optimization procedures on paddy crops modelled with Modified GSTAR (Generalized Space-Time Autoregressive) and nonlinear models where the nonlinear models are quadratic and power functions. The studied data contain paddy crops from Surakarta and Boyolali determining the best period of planting in the year 1992-2012 for Surakarta where 3 periods for planting are known and the optimal amount of paddy crops in Boyolali in the year 2008-2013. Having these analyses may guide the local agriculture government to give a decision on rice sustainability in its region. The best period for planting in Surakarta is observed, i.e. the best period is in September-December based on the data 1992-2012 by considering the planting area, the cropping area, and the paddy crops are the most important factors to be taken into account. As a result, we can refer the paddy crops in this best period (about 60.4 thousand tons per year) as the optimal results in 1992-2012 where the used objective function is quadratic. According to the research, the optimal paddy crops in Boyolali about 280 thousand tons per year where the studied factors are the amount of rainfalls, the harvested area and the paddy crops in 2008-2013. In this case, linear and power functions are studied to be the objective functions. Compared to all studied algorithms, the linear solver is still recommended to be an optimization tool for a local agriculture government to predict paddy crops in future.

  19. Detection of functionally important regions in "hypothetical proteins" of known structure.

    PubMed

    Nimrod, Guy; Schushan, Maya; Steinberg, David M; Ben-Tal, Nir

    2008-12-10

    Structural genomics initiatives provide ample structures of "hypothetical proteins" (i.e., proteins of unknown function) at an ever increasing rate. However, without function annotation, this structural goldmine is of little use to biologists who are interested in particular molecular systems. To this end, we used (an improved version of) the PatchFinder algorithm for the detection of functional regions on the protein surface, which could mediate its interactions with, e.g., substrates, ligands, and other proteins. Examination, using a data set of annotated proteins, showed that PatchFinder outperforms similar methods. We collected 757 structures of hypothetical proteins and their predicted functional regions in the N-Func database. Inspection of several of these regions demonstrated that they are useful for function prediction. For example, we suggested an interprotein interface and a putative nucleotide-binding site. A web-server implementation of PatchFinder and the N-Func database are available at http://patchfinder.tau.ac.il/.

  20. Combining neural networks and genetic algorithms for hydrological flow forecasting

    NASA Astrophysics Data System (ADS)

    Neruda, Roman; Srejber, Jan; Neruda, Martin; Pascenko, Petr

    2010-05-01

    We present a neural network approach to rainfall-runoff modeling for small size river basins based on several time series of hourly measured data. Different neural networks are considered for short time runoff predictions (from one to six hours lead time) based on runoff and rainfall data observed in previous time steps. Correlation analysis shows that runoff data, short time rainfall history, and aggregated API values are the most significant data for the prediction. Neural models of multilayer perceptron and radial basis function networks with different numbers of units are used and compared with more traditional linear time series predictors. Out of possible 48 hours of relevant history of all the input variables, the most important ones are selected by means of input filters created by a genetic algorithm. The genetic algorithm works with population of binary encoded vectors defining input selection patterns. Standard genetic operators of two-point crossover, random bit-flipping mutation, and tournament selection were used. The evaluation of objective function of each individual consists of several rounds of building and testing a particular neural network model. The whole procedure is rather computational exacting (taking hours to days on a desktop PC), thus a high-performance mainframe computer has been used for our experiments. Results based on two years worth data from the Ploucnice river in Northern Bohemia suggest that main problems connected with this approach to modeling are ovetraining that can lead to poor generalization, and relatively small number of extreme events which makes it difficult for a model to predict the amplitude of the event. Thus, experiments with both absolute and relative runoff predictions were carried out. In general it can be concluded that the neural models show about 5 per cent improvement in terms of efficiency coefficient over liner models. Multilayer perceptrons with one hidden layer trained by back propagation algorithm and predicting relative runoff show the best behavior so far. Utilizing the genetically evolved input filter improves the performance of yet another 5 per cent. In the future we would like to continue with experiments in on-line prediction using real-time data from Smeda River with 6 hours lead time forecast. Following the operational reality we will focus on classification of the runoffs into flood alert levels, and reformulation of the time series prediction task as a classification problem. The main goal of all this work is to improve flood warning system operated by the Czech Hydrometeorological Institute.

  1. Impact of mutations on the allosteric conformational equilibrium

    PubMed Central

    Weinkam, Patrick; Chen, Yao Chi; Pons, Jaume; Sali, Andrej

    2012-01-01

    Allostery in a protein involves effector binding at an allosteric site that changes the structure and/or dynamics at a distant, functional site. In addition to the chemical equilibrium of ligand binding, allostery involves a conformational equilibrium between one protein substate that binds the effector and a second substate that less strongly binds the effector. We run molecular dynamics simulations using simple, smooth energy landscapes to sample specific ligand-induced conformational transitions, as defined by the effector-bound and unbound protein structures. These simulations can be performed using our web server: http://salilab.org/allosmod/. We then develop a set of features to analyze the simulations and capture the relevant thermodynamic properties of the allosteric conformational equilibrium. These features are based on molecular mechanics energy functions, stereochemical effects, and structural/dynamic coupling between sites. Using a machine-learning algorithm on a dataset of 10 proteins and 179 mutations, we predict both the magnitude and sign of the allosteric conformational equilibrium shift by the mutation; the impact of a large identifiable fraction of the mutations can be predicted with an average unsigned error of 1 kBT. With similar accuracy, we predict the mutation effects for an 11th protein that was omitted from the initial training and testing of the machine-learning algorithm. We also assess which calculated thermodynamic properties contribute most to the accuracy of the prediction. PMID:23228330

  2. Prediction of 5-year overall survival in cervical cancer patients treated with radical hysterectomy using computational intelligence methods.

    PubMed

    Obrzut, Bogdan; Kusy, Maciej; Semczuk, Andrzej; Obrzut, Marzanna; Kluska, Jacek

    2017-12-12

    Computational intelligence methods, including non-linear classification algorithms, can be used in medical research and practice as a decision making tool. This study aimed to evaluate the usefulness of artificial intelligence models for 5-year overall survival prediction in patients with cervical cancer treated by radical hysterectomy. The data set was collected from 102 patients with cervical cancer FIGO stage IA2-IIB, that underwent primary surgical treatment. Twenty-three demographic, tumor-related parameters and selected perioperative data of each patient were collected. The simulations involved six computational intelligence methods: the probabilistic neural network (PNN), multilayer perceptron network, gene expression programming classifier, support vector machines algorithm, radial basis function neural network and k-Means algorithm. The prediction ability of the models was determined based on the accuracy, sensitivity, specificity, as well as the area under the receiver operating characteristic curve. The results of the computational intelligence methods were compared with the results of linear regression analysis as a reference model. The best results were obtained by the PNN model. This neural network provided very high prediction ability with an accuracy of 0.892 and sensitivity of 0.975. The area under the receiver operating characteristics curve of PNN was also high, 0.818. The outcomes obtained by other classifiers were markedly worse. The PNN model is an effective tool for predicting 5-year overall survival in cervical cancer patients treated with radical hysterectomy.

  3. Examining speed versus selection in connectivity models using elk migration as an example

    USGS Publications Warehouse

    Brennan, Angela; Hanks, EM; Merkle, JA; Cole, EK; Dewey, SR; Courtemanch, AB; Cross, Paul C.

    2018-01-01

    Context: Landscape resistance is vital to connectivity modeling and frequently derived from resource selection functions (RSFs). RSFs estimate relative probability of use and tend to focus on understanding habitat preferences during slow, routine animal movements (e.g., foraging). Dispersal and migration, however, can produce rarer, faster movements, in which case models of movement speed rather than resource selection may be more realistic for identifying habitats that facilitate connectivity. Objective: To compare two connectivity modeling approaches applied to resistance estimated from models of movement rate and resource selection. Methods: Using movement data from migrating elk, we evaluated continuous time Markov chain (CTMC) and movement-based RSF models (i.e., step selection functions [SSFs]). We applied circuit theory and shortest random path (SRP) algorithms to CTMC, SSF and null (i.e., flat) resistance surfaces to predict corridors between elk seasonal ranges. We evaluated prediction accuracy by comparing model predictions to empirical elk movements. Results: All models predicted elk movements well, but models applied to CTMC resistance were more accurate than models applied to SSF and null resistance. Circuit theory models were more accurate on average than SRP algorithms. Conclusions: CTMC can be more realistic than SSFs for estimating resistance for fast movements, though SSFs may demonstrate some predictive ability when animals also move slowly through corridors (e.g., stopover use during migration). High null model accuracy suggests seasonal range data may also be critical for predicting direct migration routes. For animals that migrate or disperse across large landscapes, we recommend incorporating CTMC into the connectivity modeling toolkit.

  4. Time-domain parameter identification of aeroelastic loads by forced-vibration method for response of flexible structures subject to transient wind

    NASA Astrophysics Data System (ADS)

    Cao, Bochao

    Slender structures representing civil, mechanical and aerospace systems such as long-span bridges, high-rise buildings, stay cables, power-line cables, high light mast poles, crane-booms and aircraft wings could experience vortex-induced and buffeting excitations below their design wind speeds and divergent self-excited oscillations (flutter) beyond a critical wind speed because these are flexible. Traditional linear aerodynamic theories that are routinely applied for their response prediction are not valid in the galloping, or near-flutter regime, where large-amplitude vibrations could occur and during non-stationary and transient wind excitations that occur, for example, during hurricanes, thunderstorms and gust fronts. The linear aerodynamic load formulation for lift, drag and moment are expressed in terms of aerodynamic functions in frequency domain that are valid for straight-line winds which are stationary or weakly-stationary. Application of the frequency domain formulation is restricted from use in the nonlinear and transient domain because these are valid for linear models and stationary wind. The time-domain aerodynamic force formulations are suitable for finite element modeling, feedback-dependent structural control mechanism, fatigue-life prediction, and above all modeling of transient structural behavior during non-stationary wind phenomena. This has motivated the developing of time-domain models of aerodynamic loads that are in parallel to the existing frequency-dependent models. Parameters defining these time-domain models can be now extracted from wind tunnel tests, for example, the Rational Function Coefficients defining the self-excited wind loads can be extracted using section model tests using the free vibration technique. However, the free vibration method has some limitations because it is difficult to apply at high wind speeds, in turbulent wind environment, or on unstable cross sections with negative aerodynamic damping. In the current research, new algorithms were developed based on forced vibration technique for direct extraction of the Rational Functions. The first of the two algorithms developed uses the two angular phase lag values between the measured vertical or torsional displacement and the measured aerodynamic lift and moment produced on the section model subject to forced vibration to identify the Rational Functions. This algorithm uses two separate one-degree-of-freedom tests (vertical or torsional) to identify all the four Rational Functions or corresponding Rational Function Coefficients for a two degrees-of-freedom (DOF) vertical-torsional vibration model. It was applied to a streamlined section model and the results compared well with those obtained from earlier free vibration experiment. The second algorithm that was developed is based on direct least squares method. It uses all the data points of displacements and aerodynamic lift and moment instead of phase lag values for more accurate estimates. This algorithm can be used for one-, two- and three-degree-of-freedom motions. A two-degree-of-freedom forced vibration system was developed and the algorithm was shown to work well for both streamlined and bluff section models. The uniqueness of the second algorithms lies in the fact that it requires testing the model at only two wind speeds for extraction of all four Rational Functions. The Rational Function Coefficients that were extracted for a streamlined section model using the two-DOF Least Squares algorithm were validated in a separate wind tunnel by testing a larger scaled model subject to straight-line, gusty and boundary-layer wind.

  5. A Robustly Stabilizing Model Predictive Control Algorithm

    NASA Technical Reports Server (NTRS)

    Ackmece, A. Behcet; Carson, John M., III

    2007-01-01

    A model predictive control (MPC) algorithm that differs from prior MPC algorithms has been developed for controlling an uncertain nonlinear system. This algorithm guarantees the resolvability of an associated finite-horizon optimal-control problem in a receding-horizon implementation.

  6. iDBPs: a web server for the identification of DNA binding proteins.

    PubMed

    Nimrod, Guy; Schushan, Maya; Szilágyi, András; Leslie, Christina; Ben-Tal, Nir

    2010-03-01

    The iDBPs server uses the three-dimensional (3D) structure of a query protein to predict whether it binds DNA. First, the algorithm predicts the functional region of the protein based on its evolutionary profile; the assumption is that large clusters of conserved residues are good markers of functional regions. Next, various characteristics of the predicted functional region as well as global features of the protein are calculated, such as the average surface electrostatic potential, the dipole moment and cluster-based amino acid conservation patterns. Finally, a random forests classifier is used to predict whether the query protein is likely to bind DNA and to estimate the prediction confidence. We have trained and tested the classifier on various datasets and shown that it outperformed related methods. On a dataset that reflects the fraction of DNA binding proteins (DBPs) in a proteome, the area under the ROC curve was 0.90. The application of the server to an updated version of the N-Func database, which contains proteins of unknown function with solved 3D-structure, suggested new putative DBPs for experimental studies. http://idbps.tau.ac.il/

  7. Life prediction and mechanical reliability of NT551 silicon nitride

    NASA Astrophysics Data System (ADS)

    Andrews, Mark Jay

    The inert strength and fatigue performance of a diesel engine exhaust valve made from silicon nitride (Si3N4) ceramic were assessed. The Si3N4 characterized in this study was manufactured by Saint Gobain/Norton Industrial Ceramics and was designated as NT551. The evaluation was made utilizing a probabilistic life prediction algorithm that combined censored test specimen strength data with a Weibull distribution function and the stress field of the ceramic valve obtained from finite element analysis. The major assumptions of the life prediction algorithm are that the bulk ceramic material is isotropic and homogeneous and that the strength-limiting flaws are uniformly distributed. The results from mechanical testing indicated that NT551 was not a homogeneous ceramic and that its strength were functions of temperature, loading rate, and machining orientation. Fractographic analysis identified four different failure modes; 2 were identified as inhomogeneities that were located throughout the bulk of NT551 and were due to processing operations. The fractographic analysis concluded that the strength degradation of NT551 observed from the temperature and loading rate test parameters was due to a change of state that occurred in its secondary phase. Pristine and engine-tested valves made from NT551 were loaded to failure and the inert strengths were obtained. Fractographic analysis of the valves identified the same four failure mechanisms as found with the test specimens. The fatigue performance and the inert strength of the Si3N 4 valves were assessed from censored and uncensored test specimen strength data, respectively. The inert strength failure probability predictions were compared to the inert strength of the Si3N4 valves. The inert strength failure probability predictions were more conservative than the strength of the valves. The lack of correlation between predicted and actual valve strength was due to the nonuniform distribution of inhomogeneities present in NT551. For the same reasons, the predicted and actual fatigue performance did not correlate well. The results of this study should not be considered a limitation of the life prediction algorithm but emphasize the requirement that ceramics be homogeneous and strength-limiting flaws uniformly distributed as a perquisite for accurate life prediction and reliability analyses.

  8. Efficient and accurate Greedy Search Methods for mining functional modules in protein interaction networks.

    PubMed

    He, Jieyue; Li, Chaojun; Ye, Baoliu; Zhong, Wei

    2012-06-25

    Most computational algorithms mainly focus on detecting highly connected subgraphs in PPI networks as protein complexes but ignore their inherent organization. Furthermore, many of these algorithms are computationally expensive. However, recent analysis indicates that experimentally detected protein complexes generally contain Core/attachment structures. In this paper, a Greedy Search Method based on Core-Attachment structure (GSM-CA) is proposed. The GSM-CA method detects densely connected regions in large protein-protein interaction networks based on the edge weight and two criteria for determining core nodes and attachment nodes. The GSM-CA method improves the prediction accuracy compared to other similar module detection approaches, however it is computationally expensive. Many module detection approaches are based on the traditional hierarchical methods, which is also computationally inefficient because the hierarchical tree structure produced by these approaches cannot provide adequate information to identify whether a network belongs to a module structure or not. In order to speed up the computational process, the Greedy Search Method based on Fast Clustering (GSM-FC) is proposed in this work. The edge weight based GSM-FC method uses a greedy procedure to traverse all edges just once to separate the network into the suitable set of modules. The proposed methods are applied to the protein interaction network of S. cerevisiae. Experimental results indicate that many significant functional modules are detected, most of which match the known complexes. Results also demonstrate that the GSM-FC algorithm is faster and more accurate as compared to other competing algorithms. Based on the new edge weight definition, the proposed algorithm takes advantages of the greedy search procedure to separate the network into the suitable set of modules. Experimental analysis shows that the identified modules are statistically significant. The algorithm can reduce the computational time significantly while keeping high prediction accuracy.

  9. Monthly prediction of air temperature in Australia and New Zealand with machine learning algorithms

    NASA Astrophysics Data System (ADS)

    Salcedo-Sanz, S.; Deo, R. C.; Carro-Calvo, L.; Saavedra-Moreno, B.

    2016-07-01

    Long-term air temperature prediction is of major importance in a large number of applications, including climate-related studies, energy, agricultural, or medical. This paper examines the performance of two Machine Learning algorithms (Support Vector Regression (SVR) and Multi-layer Perceptron (MLP)) in a problem of monthly mean air temperature prediction, from the previous measured values in observational stations of Australia and New Zealand, and climate indices of importance in the region. The performance of the two considered algorithms is discussed in the paper and compared to alternative approaches. The results indicate that the SVR algorithm is able to obtain the best prediction performance among all the algorithms compared in the paper. Moreover, the results obtained have shown that the mean absolute error made by the two algorithms considered is significantly larger for the last 20 years than in the previous decades, in what can be interpreted as a change in the relationship among the prediction variables involved in the training of the algorithms.

  10. Comparison between different direct search optimization algorithms in the calibration of a distributed hydrological model

    NASA Astrophysics Data System (ADS)

    Campo, Lorenzo; Castelli, Fabio; Caparrini, Francesca

    2010-05-01

    The modern distributed hydrological models allow the representation of the different surface and subsurface phenomena with great accuracy and high spatial and temporal resolution. Such complexity requires, in general, an equally accurate parametrization. A number of approaches have been followed in this respect, from simple local search method (like Nelder-Mead algorithm), that minimize a cost function representing some distance between model's output and available measures, to more complex approaches like dynamic filters (such as the Ensemble Kalman Filter) that carry on an assimilation of the observations. In this work the first approach was followed in order to compare the performances of three different direct search algorithms on the calibration of a distributed hydrological balance model. The direct search family can be defined as that category of algorithms that make no use of derivatives of the cost function (that is, in general, a black box) and comprehend a large number of possible approaches. The main benefit of this class of methods is that they don't require changes in the implementation of the numerical codes to be calibrated. The first algorithm is the classical Nelder-Mead, often used in many applications and utilized as reference. The second algorithm is a GSS (Generating Set Search) algorithm, built in order to guarantee the conditions of global convergence and suitable for a parallel and multi-start implementation, here presented. The third one is the EGO algorithm (Efficient Global Optimization), that is particularly suitable to calibrate black box cost functions that require expensive computational resource (like an hydrological simulation). EGO minimizes the number of evaluations of the cost function balancing the need to minimize a response surface that approximates the problem and the need to improve the approximation sampling where prediction error may be high. The hydrological model to be calibrated was MOBIDIC, a complete balance distributed model developed at the Department of Civil and Environmental Engineering of the University of Florence. Discussion on the comparisons between the effectiveness of the different algorithms on different cases of study on Central Italy basins is provided.

  11. Deadbeat Predictive Controllers

    NASA Technical Reports Server (NTRS)

    Juang, Jer-Nan; Phan, Minh

    1997-01-01

    Several new computational algorithms are presented to compute the deadbeat predictive control law. The first algorithm makes use of a multi-step-ahead output prediction to compute the control law without explicitly calculating the controllability matrix. The system identification must be performed first and then the predictive control law is designed. The second algorithm uses the input and output data directly to compute the feedback law. It combines the system identification and the predictive control law into one formulation. The third algorithm uses an observable-canonical form realization to design the predictive controller. The relationship between all three algorithms is established through the use of the state-space representation. All algorithms are applicable to multi-input, multi-output systems with disturbance inputs. In addition to the feedback terms, feed forward terms may also be added for disturbance inputs if they are measurable. Although the feedforward terms do not influence the stability of the closed-loop feedback law, they enhance the performance of the controlled system.

  12. Evaluating High-Degree-and-Order Gravitational Harmonics and its Application to the State Predictions of a Lunar Orbiting Satellite

    NASA Astrophysics Data System (ADS)

    Song, Young-Joo; Kim, Bang-Yeop

    2015-09-01

    In this work, an efficient method with which to evaluate the high-degree-and-order gravitational harmonics of the nonsphericity of a central body is described and applied to state predictions of a lunar orbiter. Unlike the work of Song et al. (2010), which used a conventional computation method to process gravitational harmonic coefficients, the current work adapted a well-known recursion formula that directly uses fully normalized associated Legendre functions to compute the acceleration due to the non-sphericity of the moon. With the formulated algorithms, the states of a lunar orbiting satellite are predicted and its performance is validated in comparisons with solutions obtained from STK/Astrogator. The predicted differences in the orbital states between STK/Astrogator and the current work all remain at a position of less than 1 m with velocity accuracy levels of less than 1 mm/s, even with different orbital inclinations. The effectiveness of the current algorithm, in terms of both the computation time and the degree of accuracy degradation, is also shown in comparisons with results obtained from earlier work. It is expected that the proposed algorithm can be used as a foundation for the development of an operational flight dynamics subsystem for future lunar exploration missions by Korea. It can also be used to analyze missions which require very close operations to the moon.

  13. A Comparison of Supervised Machine Learning Algorithms and Feature Vectors for MS Lesion Segmentation Using Multimodal Structural MRI

    PubMed Central

    Sweeney, Elizabeth M.; Vogelstein, Joshua T.; Cuzzocreo, Jennifer L.; Calabresi, Peter A.; Reich, Daniel S.; Crainiceanu, Ciprian M.; Shinohara, Russell T.

    2014-01-01

    Machine learning is a popular method for mining and analyzing large collections of medical data. We focus on a particular problem from medical research, supervised multiple sclerosis (MS) lesion segmentation in structural magnetic resonance imaging (MRI). We examine the extent to which the choice of machine learning or classification algorithm and feature extraction function impacts the performance of lesion segmentation methods. As quantitative measures derived from structural MRI are important clinical tools for research into the pathophysiology and natural history of MS, the development of automated lesion segmentation methods is an active research field. Yet, little is known about what drives performance of these methods. We evaluate the performance of automated MS lesion segmentation methods, which consist of a supervised classification algorithm composed with a feature extraction function. These feature extraction functions act on the observed T1-weighted (T1-w), T2-weighted (T2-w) and fluid-attenuated inversion recovery (FLAIR) MRI voxel intensities. Each MRI study has a manual lesion segmentation that we use to train and validate the supervised classification algorithms. Our main finding is that the differences in predictive performance are due more to differences in the feature vectors, rather than the machine learning or classification algorithms. Features that incorporate information from neighboring voxels in the brain were found to increase performance substantially. For lesion segmentation, we conclude that it is better to use simple, interpretable, and fast algorithms, such as logistic regression, linear discriminant analysis, and quadratic discriminant analysis, and to develop the features to improve performance. PMID:24781953

  14. A comparison of supervised machine learning algorithms and feature vectors for MS lesion segmentation using multimodal structural MRI.

    PubMed

    Sweeney, Elizabeth M; Vogelstein, Joshua T; Cuzzocreo, Jennifer L; Calabresi, Peter A; Reich, Daniel S; Crainiceanu, Ciprian M; Shinohara, Russell T

    2014-01-01

    Machine learning is a popular method for mining and analyzing large collections of medical data. We focus on a particular problem from medical research, supervised multiple sclerosis (MS) lesion segmentation in structural magnetic resonance imaging (MRI). We examine the extent to which the choice of machine learning or classification algorithm and feature extraction function impacts the performance of lesion segmentation methods. As quantitative measures derived from structural MRI are important clinical tools for research into the pathophysiology and natural history of MS, the development of automated lesion segmentation methods is an active research field. Yet, little is known about what drives performance of these methods. We evaluate the performance of automated MS lesion segmentation methods, which consist of a supervised classification algorithm composed with a feature extraction function. These feature extraction functions act on the observed T1-weighted (T1-w), T2-weighted (T2-w) and fluid-attenuated inversion recovery (FLAIR) MRI voxel intensities. Each MRI study has a manual lesion segmentation that we use to train and validate the supervised classification algorithms. Our main finding is that the differences in predictive performance are due more to differences in the feature vectors, rather than the machine learning or classification algorithms. Features that incorporate information from neighboring voxels in the brain were found to increase performance substantially. For lesion segmentation, we conclude that it is better to use simple, interpretable, and fast algorithms, such as logistic regression, linear discriminant analysis, and quadratic discriminant analysis, and to develop the features to improve performance.

  15. Gaussian processes with optimal kernel construction for neuro-degenerative clinical onset prediction

    NASA Astrophysics Data System (ADS)

    Canas, Liane S.; Yvernault, Benjamin; Cash, David M.; Molteni, Erika; Veale, Tom; Benzinger, Tammie; Ourselin, Sébastien; Mead, Simon; Modat, Marc

    2018-02-01

    Gaussian Processes (GP) are a powerful tool to capture the complex time-variations of a dataset. In the context of medical imaging analysis, they allow a robust modelling even in case of highly uncertain or incomplete datasets. Predictions from GP are dependent of the covariance kernel function selected to explain the data variance. To overcome this limitation, we propose a framework to identify the optimal covariance kernel function to model the data.The optimal kernel is defined as a composition of base kernel functions used to identify correlation patterns between data points. Our approach includes a modified version of the Compositional Kernel Learning (CKL) algorithm, in which we score the kernel families using a new energy function that depends both the Bayesian Information Criterion (BIC) and the explained variance score. We applied the proposed framework to model the progression of neurodegenerative diseases over time, in particular the progression of autosomal dominantly-inherited Alzheimer's disease, and use it to predict the time to clinical onset of subjects carrying genetic mutation.

  16. Can human experts predict solubility better than computers?

    PubMed

    Boobier, Samuel; Osbourn, Anne; Mitchell, John B O

    2017-12-13

    In this study, we design and carry out a survey, asking human experts to predict the aqueous solubility of druglike organic compounds. We investigate whether these experts, drawn largely from the pharmaceutical industry and academia, can match or exceed the predictive power of algorithms. Alongside this, we implement 10 typical machine learning algorithms on the same dataset. The best algorithm, a variety of neural network known as a multi-layer perceptron, gave an RMSE of 0.985 log S units and an R 2 of 0.706. We would not have predicted the relative success of this particular algorithm in advance. We found that the best individual human predictor generated an almost identical prediction quality with an RMSE of 0.942 log S units and an R 2 of 0.723. The collection of algorithms contained a higher proportion of reasonably good predictors, nine out of ten compared with around half of the humans. We found that, for either humans or algorithms, combining individual predictions into a consensus predictor by taking their median generated excellent predictivity. While our consensus human predictor achieved very slightly better headline figures on various statistical measures, the difference between it and the consensus machine learning predictor was both small and statistically insignificant. We conclude that human experts can predict the aqueous solubility of druglike molecules essentially equally well as machine learning algorithms. We find that, for either humans or algorithms, combining individual predictions into a consensus predictor by taking their median is a powerful way of benefitting from the wisdom of crowds.

  17. Enhancing membrane protein subcellular localization prediction by parallel fusion of multi-view features.

    PubMed

    Yu, Dongjun; Wu, Xiaowei; Shen, Hongbin; Yang, Jian; Tang, Zhenmin; Qi, Yong; Yang, Jingyu

    2012-12-01

    Membrane proteins are encoded by ~ 30% in the genome and function importantly in the living organisms. Previous studies have revealed that membrane proteins' structures and functions show obvious cell organelle-specific properties. Hence, it is highly desired to predict membrane protein's subcellular location from the primary sequence considering the extreme difficulties of membrane protein wet-lab studies. Although many models have been developed for predicting protein subcellular locations, only a few are specific to membrane proteins. Existing prediction approaches were constructed based on statistical machine learning algorithms with serial combination of multi-view features, i.e., different feature vectors are simply serially combined to form a super feature vector. However, such simple combination of features will simultaneously increase the information redundancy that could, in turn, deteriorate the final prediction accuracy. That's why it was often found that prediction success rates in the serial super space were even lower than those in a single-view space. The purpose of this paper is investigation of a proper method for fusing multiple multi-view protein sequential features for subcellular location predictions. Instead of serial strategy, we propose a novel parallel framework for fusing multiple membrane protein multi-view attributes that will represent protein samples in complex spaces. We also proposed generalized principle component analysis (GPCA) for feature reduction purpose in the complex geometry. All the experimental results through different machine learning algorithms on benchmark membrane protein subcellular localization datasets demonstrate that the newly proposed parallel strategy outperforms the traditional serial approach. We also demonstrate the efficacy of the parallel strategy on a soluble protein subcellular localization dataset indicating the parallel technique is flexible to suite for other computational biology problems. The software and datasets are available at: http://www.csbio.sjtu.edu.cn/bioinf/mpsp.

  18. Poster - 51: A tumor motion-compensating system with tracking and prediction – a proof-of-concept study

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Guo, Kaiming; Teo, Peng; Kawalec, Philip

    2016-08-15

    Purpose: This work reports on the development of a mechanical slider system for the counter-steering of tumor motion in adaptive Radiation Therapy (RT). The tumor motion was tracked using a weighted optical flow algorithm and its position is being predicted with a neural network (NN). Methods: The components of the proposed mechanical counter-steering system includes: (1) an actuator which provides the tumor motion, (2) the motion detection using an optical flow algorithm, (3) motion prediction using a neural network, (4) a control module and (5) a mechanical slider to counter-steer the anticipated motion of the tumor phantom. An asymmetrical cosinemore » function and five patient traces (P1–P5) were used to evaluate the tracking of a 3D printed lung tumor. In the proposed mechanical counter-steering system, both actuator (Zaber NA14D60) and slider (Zaber A-BLQ0070-E01) were programed to move independently with LabVIEW and their positions were recorded by 2 potentiometers (ETI LCP12S-25). The accuracy of this counter-steering system is given by the difference between the two potentiometers. Results: The inherent accuracy of the system, measured using the cosine function, is −0.15 ± 0.06 mm. While the errors when tracking and prediction were included, is (0.04 ± 0.71) mm. Conclusion: A prototype tumor motion counter-steering system with tracking and prediction was implemented. The inherent errors are small in comparison to the tracking and prediction errors, which in turn are small in comparison to the magnitude of tumor motion. The results show that this system is suited for evaluating RT tracking and prediction.« less

  19. Sparse regressions for predicting and interpreting subcellular localization of multi-label proteins.

    PubMed

    Wan, Shibiao; Mak, Man-Wai; Kung, Sun-Yuan

    2016-02-24

    Predicting protein subcellular localization is indispensable for inferring protein functions. Recent studies have been focusing on predicting not only single-location proteins, but also multi-location proteins. Almost all of the high performing predictors proposed recently use gene ontology (GO) terms to construct feature vectors for classification. Despite their high performance, their prediction decisions are difficult to interpret because of the large number of GO terms involved. This paper proposes using sparse regressions to exploit GO information for both predicting and interpreting subcellular localization of single- and multi-location proteins. Specifically, we compared two multi-label sparse regression algorithms, namely multi-label LASSO (mLASSO) and multi-label elastic net (mEN), for large-scale predictions of protein subcellular localization. Both algorithms can yield sparse and interpretable solutions. By using the one-vs-rest strategy, mLASSO and mEN identified 87 and 429 out of more than 8,000 GO terms, respectively, which play essential roles in determining subcellular localization. More interestingly, many of the GO terms selected by mEN are from the biological process and molecular function categories, suggesting that the GO terms of these categories also play vital roles in the prediction. With these essential GO terms, not only where a protein locates can be decided, but also why it resides there can be revealed. Experimental results show that the output of both mEN and mLASSO are interpretable and they perform significantly better than existing state-of-the-art predictors. Moreover, mEN selects more features and performs better than mLASSO on a stringent human benchmark dataset. For readers' convenience, an online server called SpaPredictor for both mLASSO and mEN is available at http://bioinfo.eie.polyu.edu.hk/SpaPredictorServer/.

  20. Tracking children's mental states while solving algebra equations.

    PubMed

    Anderson, John R; Betts, Shawn; Ferris, Jennifer L; Fincham, Jon M

    2012-11-01

    Behavioral and function magnetic resonance imagery (fMRI) data were combined to infer the mental states of students as they interacted with an intelligent tutoring system. Sixteen children interacted with a computer tutor for solving linear equations over a six-day period (days 0-5), with days 1 and 5 occurring in an fMRI scanner. Hidden Markov model algorithms combined a model of student behavior with multi-voxel imaging pattern data to predict the mental states of students. We separately assessed the algorithms' ability to predict which step in a problem-solving sequence was performed and whether the step was performed correctly. For day 1, the data patterns of other students were used to predict the mental states of a target student. These predictions were improved on day 5 by adding information about the target student's behavioral and imaging data from day 1. Successful tracking of mental states depended on using the combination of a behavioral model and multi-voxel pattern analysis, illustrating the effectiveness of an integrated approach to tracking the cognition of individuals in real time as they perform complex tasks. Copyright © 2011 Wiley Periodicals, Inc.

  1. A Semi-Supervised Learning Algorithm for Predicting Four Types MiRNA-Disease Associations by Mutual Information in a Heterogeneous Network.

    PubMed

    Zhang, Xiaotian; Yin, Jian; Zhang, Xu

    2018-03-02

    Increasing evidence suggests that dysregulation of microRNAs (miRNAs) may lead to a variety of diseases. Therefore, identifying disease-related miRNAs is a crucial problem. Currently, many computational approaches have been proposed to predict binary miRNA-disease associations. In this study, in order to predict underlying miRNA-disease association types, a semi-supervised model called the network-based label propagation algorithm is proposed to infer multiple types of miRNA-disease associations (NLPMMDA) by mutual information derived from the heterogeneous network. The NLPMMDA method integrates disease semantic similarity, miRNA functional similarity, and Gaussian interaction profile kernel similarity information of miRNAs and diseases to construct a heterogeneous network. NLPMMDA is a semi-supervised model which does not require verified negative samples. Leave-one-out cross validation (LOOCV) was implemented for four known types of miRNA-disease associations and demonstrated the reliable performance of our method. Moreover, case studies of lung cancer and breast cancer confirmed effective performance of NLPMMDA to predict novel miRNA-disease associations and their association types.

  2. Discrete sequence prediction and its applications

    NASA Technical Reports Server (NTRS)

    Laird, Philip

    1992-01-01

    Learning from experience to predict sequences of discrete symbols is a fundamental problem in machine learning with many applications. We apply sequence prediction using a simple and practical sequence-prediction algorithm, called TDAG. The TDAG algorithm is first tested by comparing its performance with some common data compression algorithms. Then it is adapted to the detailed requirements of dynamic program optimization, with excellent results.

  3. A prediction algorithm for first onset of major depression in the general population: development and validation.

    PubMed

    Wang, JianLi; Sareen, Jitender; Patten, Scott; Bolton, James; Schmitz, Norbert; Birney, Arden

    2014-05-01

    Prediction algorithms are useful for making clinical decisions and for population health planning. However, such prediction algorithms for first onset of major depression do not exist. The objective of this study was to develop and validate a prediction algorithm for first onset of major depression in the general population. Longitudinal study design with approximate 3-year follow-up. The study was based on data from a nationally representative sample of the US general population. A total of 28 059 individuals who participated in Waves 1 and 2 of the US National Epidemiologic Survey on Alcohol and Related Conditions and who had not had major depression at Wave 1 were included. The prediction algorithm was developed using logistic regression modelling in 21 813 participants from three census regions. The algorithm was validated in participants from the 4th census region (n=6246). Major depression occurred since Wave 1 of the National Epidemiologic Survey on Alcohol and Related Conditions, assessed by the Alcohol Use Disorder and Associated Disabilities Interview Schedule-diagnostic and statistical manual for mental disorders IV. A prediction algorithm containing 17 unique risk factors was developed. The algorithm had good discriminative power (C statistics=0.7538, 95% CI 0.7378 to 0.7699) and excellent calibration (F-adjusted test=1.00, p=0.448) with the weighted data. In the validation sample, the algorithm had a C statistic of 0.7259 and excellent calibration (Hosmer-Lemeshow χ(2)=3.41, p=0.906). The developed prediction algorithm has good discrimination and calibration capacity. It can be used by clinicians, mental health policy-makers and service planners and the general public to predict future risk of having major depression. The application of the algorithm may lead to increased personalisation of treatment, better clinical decisions and more optimal mental health service planning.

  4. When the lowest energy does not induce native structures: parallel minimization of multi-energy values by hybridizing searching intelligences.

    PubMed

    Lü, Qiang; Xia, Xiao-Yan; Chen, Rong; Miao, Da-Jun; Chen, Sha-Sha; Quan, Li-Jun; Li, Hai-Ou

    2012-01-01

    Protein structure prediction (PSP), which is usually modeled as a computational optimization problem, remains one of the biggest challenges in computational biology. PSP encounters two difficult obstacles: the inaccurate energy function problem and the searching problem. Even if the lowest energy has been luckily found by the searching procedure, the correct protein structures are not guaranteed to obtain. A general parallel metaheuristic approach is presented to tackle the above two problems. Multi-energy functions are employed to simultaneously guide the parallel searching threads. Searching trajectories are in fact controlled by the parameters of heuristic algorithms. The parallel approach allows the parameters to be perturbed during the searching threads are running in parallel, while each thread is searching the lowest energy value determined by an individual energy function. By hybridizing the intelligences of parallel ant colonies and Monte Carlo Metropolis search, this paper demonstrates an implementation of our parallel approach for PSP. 16 classical instances were tested to show that the parallel approach is competitive for solving PSP problem. This parallel approach combines various sources of both searching intelligences and energy functions, and thus predicts protein conformations with good quality jointly determined by all the parallel searching threads and energy functions. It provides a framework to combine different searching intelligence embedded in heuristic algorithms. It also constructs a container to hybridize different not-so-accurate objective functions which are usually derived from the domain expertise.

  5. When the Lowest Energy Does Not Induce Native Structures: Parallel Minimization of Multi-Energy Values by Hybridizing Searching Intelligences

    PubMed Central

    Lü, Qiang; Xia, Xiao-Yan; Chen, Rong; Miao, Da-Jun; Chen, Sha-Sha; Quan, Li-Jun; Li, Hai-Ou

    2012-01-01

    Background Protein structure prediction (PSP), which is usually modeled as a computational optimization problem, remains one of the biggest challenges in computational biology. PSP encounters two difficult obstacles: the inaccurate energy function problem and the searching problem. Even if the lowest energy has been luckily found by the searching procedure, the correct protein structures are not guaranteed to obtain. Results A general parallel metaheuristic approach is presented to tackle the above two problems. Multi-energy functions are employed to simultaneously guide the parallel searching threads. Searching trajectories are in fact controlled by the parameters of heuristic algorithms. The parallel approach allows the parameters to be perturbed during the searching threads are running in parallel, while each thread is searching the lowest energy value determined by an individual energy function. By hybridizing the intelligences of parallel ant colonies and Monte Carlo Metropolis search, this paper demonstrates an implementation of our parallel approach for PSP. 16 classical instances were tested to show that the parallel approach is competitive for solving PSP problem. Conclusions This parallel approach combines various sources of both searching intelligences and energy functions, and thus predicts protein conformations with good quality jointly determined by all the parallel searching threads and energy functions. It provides a framework to combine different searching intelligence embedded in heuristic algorithms. It also constructs a container to hybridize different not-so-accurate objective functions which are usually derived from the domain expertise. PMID:23028708

  6. Prediction of pork quality parameters by applying fractals and data mining on MRI.

    PubMed

    Caballero, Daniel; Pérez-Palacios, Trinidad; Caro, Andrés; Amigo, José Manuel; Dahl, Anders B; ErsbØll, Bjarne K; Antequera, Teresa

    2017-09-01

    This work firstly investigates the use of MRI, fractal algorithms and data mining techniques to determine pork quality parameters non-destructively. The main objective was to evaluate the capability of fractal algorithms (Classical Fractal algorithm, CFA; Fractal Texture Algorithm, FTA and One Point Fractal Texture Algorithm, OPFTA) to analyse MRI in order to predict quality parameters of loin. In addition, the effect of the sequence acquisition of MRI (Gradient echo, GE; Spin echo, SE and Turbo 3D, T3D) and the predictive technique of data mining (Isotonic regression, IR and Multiple linear regression, MLR) were analysed. Both fractal algorithm, FTA and OPFTA are appropriate to analyse MRI of loins. The sequence acquisition, the fractal algorithm and the data mining technique seems to influence on the prediction results. For most physico-chemical parameters, prediction equations with moderate to excellent correlation coefficients were achieved by using the following combinations of acquisition sequences of MRI, fractal algorithms and data mining techniques: SE-FTA-MLR, SE-OPFTA-IR, GE-OPFTA-MLR, SE-OPFTA-MLR, with the last one offering the best prediction results. Thus, SE-OPFTA-MLR could be proposed as an alternative technique to determine physico-chemical traits of fresh and dry-cured loins in a non-destructive way with high accuracy. Copyright © 2017. Published by Elsevier Ltd.

  7. Protein-protein docking using region-based 3D Zernike descriptors

    PubMed Central

    2009-01-01

    Background Protein-protein interactions are a pivotal component of many biological processes and mediate a variety of functions. Knowing the tertiary structure of a protein complex is therefore essential for understanding the interaction mechanism. However, experimental techniques to solve the structure of the complex are often found to be difficult. To this end, computational protein-protein docking approaches can provide a useful alternative to address this issue. Prediction of docking conformations relies on methods that effectively capture shape features of the participating proteins while giving due consideration to conformational changes that may occur. Results We present a novel protein docking algorithm based on the use of 3D Zernike descriptors as regional features of molecular shape. The key motivation of using these descriptors is their invariance to transformation, in addition to a compact representation of local surface shape characteristics. Docking decoys are generated using geometric hashing, which are then ranked by a scoring function that incorporates a buried surface area and a novel geometric complementarity term based on normals associated with the 3D Zernike shape description. Our docking algorithm was tested on both bound and unbound cases in the ZDOCK benchmark 2.0 dataset. In 74% of the bound docking predictions, our method was able to find a near-native solution (interface C-αRMSD ≤ 2.5 Å) within the top 1000 ranks. For unbound docking, among the 60 complexes for which our algorithm returned at least one hit, 60% of the cases were ranked within the top 2000. Comparison with existing shape-based docking algorithms shows that our method has a better performance than the others in unbound docking while remaining competitive for bound docking cases. Conclusion We show for the first time that the 3D Zernike descriptors are adept in capturing shape complementarity at the protein-protein interface and useful for protein docking prediction. Rigorous benchmark studies show that our docking approach has a superior performance compared to existing methods. PMID:20003235

  8. Protein-protein docking using region-based 3D Zernike descriptors.

    PubMed

    Venkatraman, Vishwesh; Yang, Yifeng D; Sael, Lee; Kihara, Daisuke

    2009-12-09

    Protein-protein interactions are a pivotal component of many biological processes and mediate a variety of functions. Knowing the tertiary structure of a protein complex is therefore essential for understanding the interaction mechanism. However, experimental techniques to solve the structure of the complex are often found to be difficult. To this end, computational protein-protein docking approaches can provide a useful alternative to address this issue. Prediction of docking conformations relies on methods that effectively capture shape features of the participating proteins while giving due consideration to conformational changes that may occur. We present a novel protein docking algorithm based on the use of 3D Zernike descriptors as regional features of molecular shape. The key motivation of using these descriptors is their invariance to transformation, in addition to a compact representation of local surface shape characteristics. Docking decoys are generated using geometric hashing, which are then ranked by a scoring function that incorporates a buried surface area and a novel geometric complementarity term based on normals associated with the 3D Zernike shape description. Our docking algorithm was tested on both bound and unbound cases in the ZDOCK benchmark 2.0 dataset. In 74% of the bound docking predictions, our method was able to find a near-native solution (interface C-alphaRMSD < or = 2.5 A) within the top 1000 ranks. For unbound docking, among the 60 complexes for which our algorithm returned at least one hit, 60% of the cases were ranked within the top 2000. Comparison with existing shape-based docking algorithms shows that our method has a better performance than the others in unbound docking while remaining competitive for bound docking cases. We show for the first time that the 3D Zernike descriptors are adept in capturing shape complementarity at the protein-protein interface and useful for protein docking prediction. Rigorous benchmark studies show that our docking approach has a superior performance compared to existing methods.

  9. Predicting protein-protein interactions from protein domains using a set cover approach.

    PubMed

    Huang, Chengbang; Morcos, Faruck; Kanaan, Simon P; Wuchty, Stefan; Chen, Danny Z; Izaguirre, Jesús A

    2007-01-01

    One goal of contemporary proteome research is the elucidation of cellular protein interactions. Based on currently available protein-protein interaction and domain data, we introduce a novel method, Maximum Specificity Set Cover (MSSC), for the prediction of protein-protein interactions. In our approach, we map the relationship between interactions of proteins and their corresponding domain architectures to a generalized weighted set cover problem. The application of a greedy algorithm provides sets of domain interactions which explain the presence of protein interactions to the largest degree of specificity. Utilizing domain and protein interaction data of S. cerevisiae, MSSC enables prediction of previously unknown protein interactions, links that are well supported by a high tendency of coexpression and functional homogeneity of the corresponding proteins. Focusing on concrete examples, we show that MSSC reliably predicts protein interactions in well-studied molecular systems, such as the 26S proteasome and RNA polymerase II of S. cerevisiae. We also show that the quality of the predictions is comparable to the Maximum Likelihood Estimation while MSSC is faster. This new algorithm and all data sets used are accessible through a Web portal at http://ppi.cse.nd.edu.

  10. A Performance Weighted Collaborative Filtering algorithm for personalized radiology education.

    PubMed

    Lin, Hongli; Yang, Xuedong; Wang, Weisheng; Luo, Jiawei

    2014-10-01

    Devising an accurate prediction algorithm that can predict the difficulty level of cases for individuals and then selects suitable cases for them is essential to the development of a personalized training system. In this paper, we propose a novel approach, called Performance Weighted Collaborative Filtering (PWCF), to predict the difficulty level of each case for individuals. The main idea of PWCF is to assign an optimal weight to each rating used for predicting the difficulty level of a target case for a trainee, rather than using an equal weight for all ratings as in traditional collaborative filtering methods. The assigned weight is a function of the performance level of the trainee at which the rating was made. The PWCF method and the traditional method are compared using two datasets. The experimental data are then evaluated by means of the MAE metric. Our experimental results show that PWCF outperforms the traditional methods by 8.12% and 17.05%, respectively, over the two datasets, in terms of prediction precision. This suggests that PWCF is a viable method for the development of personalized training systems in radiology education. Copyright © 2014. Published by Elsevier Inc.

  11. Topology of membrane proteins-predictions, limitations and variations.

    PubMed

    Tsirigos, Konstantinos D; Govindarajan, Sudha; Bassot, Claudio; Västermark, Åke; Lamb, John; Shu, Nanjiang; Elofsson, Arne

    2017-10-26

    Transmembrane proteins perform a variety of important biological functions necessary for the survival and growth of the cells. Membrane proteins are built up by transmembrane segments that span the lipid bilayer. The segments can either be in the form of hydrophobic alpha-helices or beta-sheets which create a barrel. A fundamental aspect of the structure of transmembrane proteins is the membrane topology, that is, the number of transmembrane segments, their position in the protein sequence and their orientation in the membrane. Along these lines, many predictive algorithms for the prediction of the topology of alpha-helical and beta-barrel transmembrane proteins exist. The newest algorithms obtain an accuracy close to 80% both for alpha-helical and beta-barrel transmembrane proteins. However, lately it has been shown that the simplified picture presented when describing a protein family by its topology is limited. To demonstrate this, we highlight examples where the topology is either not conserved in a protein superfamily or where the structure cannot be described solely by the topology of a protein. The prediction of these non-standard features from sequence alone was not successful until the recent revolutionary progress in 3D-structure prediction of proteins. Copyright © 2017 Elsevier Ltd. All rights reserved.

  12. Prediction of interface residue based on the features of residue interaction network.

    PubMed

    Jiao, Xiong; Ranganathan, Shoba

    2017-11-07

    Protein-protein interaction plays a crucial role in the cellular biological processes. Interface prediction can improve our understanding of the molecular mechanisms of the related processes and functions. In this work, we propose a classification method to recognize the interface residue based on the features of a weighted residue interaction network. The random forest algorithm is used for the prediction and 16 network parameters and the B-factor are acting as the element of the input feature vector. Compared with other similar work, the method is feasible and effective. The relative importance of these features also be analyzed to identify the key feature for the prediction. Some biological meaning of the important feature is explained. The results of this work can be used for the related work about the structure-function relationship analysis via a residue interaction network model. Copyright © 2017 Elsevier Ltd. All rights reserved.

  13. Systematic discovery of novel eukaryotic transcriptional regulators using sequence homology independent prediction.

    PubMed

    Bossi, Flavia; Fan, Jue; Xiao, Jun; Chandra, Lilyana; Shen, Max; Dorone, Yanniv; Wagner, Doris; Rhee, Seung Y

    2017-06-26

    The molecular function of a gene is most commonly inferred by sequence similarity. Therefore, genes that lack sufficient sequence similarity to characterized genes (such as certain classes of transcriptional regulators) are difficult to classify using most function prediction algorithms and have remained uncharacterized. To identify novel transcriptional regulators systematically, we used a feature-based pipeline to screen protein families of unknown function. This method predicted 43 transcriptional regulator families in Arabidopsis thaliana, 7 families in Drosophila melanogaster, and 9 families in Homo sapiens. Literature curation validated 12 of the predicted families to be involved in transcriptional regulation. We tested 33 out of the 195 Arabidopsis putative transcriptional regulators for their ability to activate transcription of a reporter gene in planta and found twelve coactivators, five of which had no prior literature support. To investigate mechanisms of action in which the predicted regulators might work, we looked for interactors of an Arabidopsis candidate that did not show transactivation activity in planta and found that it might work with other members of its own family and a subunit of the Polycomb Repressive Complex 2 to regulate transcription. Our results demonstrate the feasibility of assigning molecular function to proteins of unknown function without depending on sequence similarity. In particular, we identified novel transcriptional regulators using biological features enriched in transcription factors. The predictions reported here should accelerate the characterization of novel regulators.

  14. Functional Principal Component Analysis and Randomized Sparse Clustering Algorithm for Medical Image Analysis

    PubMed Central

    Lin, Nan; Jiang, Junhai; Guo, Shicheng; Xiong, Momiao

    2015-01-01

    Due to the advancement in sensor technology, the growing large medical image data have the ability to visualize the anatomical changes in biological tissues. As a consequence, the medical images have the potential to enhance the diagnosis of disease, the prediction of clinical outcomes and the characterization of disease progression. But in the meantime, the growing data dimensions pose great methodological and computational challenges for the representation and selection of features in image cluster analysis. To address these challenges, we first extend the functional principal component analysis (FPCA) from one dimension to two dimensions to fully capture the space variation of image the signals. The image signals contain a large number of redundant features which provide no additional information for clustering analysis. The widely used methods for removing the irrelevant features are sparse clustering algorithms using a lasso-type penalty to select the features. However, the accuracy of clustering using a lasso-type penalty depends on the selection of the penalty parameters and the threshold value. In practice, they are difficult to determine. Recently, randomized algorithms have received a great deal of attentions in big data analysis. This paper presents a randomized algorithm for accurate feature selection in image clustering analysis. The proposed method is applied to both the liver and kidney cancer histology image data from the TCGA database. The results demonstrate that the randomized feature selection method coupled with functional principal component analysis substantially outperforms the current sparse clustering algorithms in image cluster analysis. PMID:26196383

  15. Benchmarking Procedures for High-Throughput Context Specific Reconstruction Algorithms

    PubMed Central

    Pacheco, Maria P.; Pfau, Thomas; Sauter, Thomas

    2016-01-01

    Recent progress in high-throughput data acquisition has shifted the focus from data generation to processing and understanding of how to integrate collected information. Context specific reconstruction based on generic genome scale models like ReconX or HMR has the potential to become a diagnostic and treatment tool tailored to the analysis of specific individuals. The respective computational algorithms require a high level of predictive power, robustness and sensitivity. Although multiple context specific reconstruction algorithms were published in the last 10 years, only a fraction of them is suitable for model building based on human high-throughput data. Beside other reasons, this might be due to problems arising from the limitation to only one metabolic target function or arbitrary thresholding. This review describes and analyses common validation methods used for testing model building algorithms. Two major methods can be distinguished: consistency testing and comparison based testing. The first is concerned with robustness against noise, e.g., missing data due to the impossibility to distinguish between the signal and the background of non-specific binding of probes in a microarray experiment, and whether distinct sets of input expressed genes corresponding to i.e., different tissues yield distinct models. The latter covers methods comparing sets of functionalities, comparison with existing networks or additional databases. We test those methods on several available algorithms and deduce properties of these algorithms that can be compared with future developments. The set of tests performed, can therefore serve as a benchmarking procedure for future algorithms. PMID:26834640

  16. CPHmodels-3.0--remote homology modeling using structure-guided sequence profiles.

    PubMed

    Nielsen, Morten; Lundegaard, Claus; Lund, Ole; Petersen, Thomas Nordahl

    2010-07-01

    CPHmodels-3.0 is a web server predicting protein 3D structure by use of single template homology modeling. The server employs a hybrid of the scoring functions of CPHmodels-2.0 and a novel remote homology-modeling algorithm. A query sequence is first attempted modeled using the fast CPHmodels-2.0 profile-profile scoring function suitable for close homology modeling. The new computational costly remote homology-modeling algorithm is only engaged provided that no suitable PDB template is identified in the initial search. CPHmodels-3.0 was benchmarked in the CASP8 competition and produced models for 94% of the targets (117 out of 128), 74% were predicted as high reliability models (87 out of 117). These achieved an average RMSD of 4.6 A when superimposed to the 3D structure. The remaining 26% low reliably models (30 out of 117) could superimpose to the true 3D structure with an average RMSD of 9.3 A. These performance values place the CPHmodels-3.0 method in the group of high performing 3D prediction tools. Beside its accuracy, one of the important features of the method is its speed. For most queries, the response time of the server is <20 min. The web server is available at http://www.cbs.dtu.dk/services/CPHmodels/.

  17. BFLCRM: A BAYESIAN FUNCTIONAL LINEAR COX REGRESSION MODEL FOR PREDICTING TIME TO CONVERSION TO ALZHEIMER’S DISEASE*

    PubMed Central

    Lee, Eunjee; Zhu, Hongtu; Kong, Dehan; Wang, Yalin; Giovanello, Kelly Sullivan; Ibrahim, Joseph G

    2015-01-01

    The aim of this paper is to develop a Bayesian functional linear Cox regression model (BFLCRM) with both functional and scalar covariates. This new development is motivated by establishing the likelihood of conversion to Alzheimer’s disease (AD) in 346 patients with mild cognitive impairment (MCI) enrolled in the Alzheimer’s Disease Neuroimaging Initiative 1 (ADNI-1) and the early markers of conversion. These 346 MCI patients were followed over 48 months, with 161 MCI participants progressing to AD at 48 months. The functional linear Cox regression model was used to establish that functional covariates including hippocampus surface morphology and scalar covariates including brain MRI volumes, cognitive performance (ADAS-Cog), and APOE status can accurately predict time to onset of AD. Posterior computation proceeds via an efficient Markov chain Monte Carlo algorithm. A simulation study is performed to evaluate the finite sample performance of BFLCRM. PMID:26900412

  18. A Prize-Collecting Steiner Tree Approach for Transduction Network Inference

    NASA Astrophysics Data System (ADS)

    Bailly-Bechet, Marc; Braunstein, Alfredo; Zecchina, Riccardo

    Into the cell, information from the environment is mainly propagated via signaling pathways which form a transduction network. Here we propose a new algorithm to infer transduction networks from heterogeneous data, using both the protein interaction network and expression datasets. We formulate the inference problem as an optimization task, and develop a message-passing, probabilistic and distributed formalism to solve it. We apply our algorithm to the pheromone response in the baker’s yeast S. cerevisiae. We are able to find the backbone of the known structure of the MAPK cascade of pheromone response, validating our algorithm. More importantly, we make biological predictions about some proteins whose role could be at the interface between pheromone response and other cellular functions.

  19. Limited distortion in LSB steganography

    NASA Astrophysics Data System (ADS)

    Kim, Younhee; Duric, Zoran; Richards, Dana

    2006-02-01

    It is well known that all information hiding methods that modify the least significant bits introduce distortions into the cover objects. Those distortions have been utilized by steganalysis algorithms to detect that the objects had been modified. It has been proposed that only coefficients whose modification does not introduce large distortions should be used for embedding. In this paper we propose an effcient algorithm for information hiding in the LSBs of JPEG coefficients. Our algorithm uses parity coding to choose the coefficients whose modifications introduce minimal additional distortion. We derive the expected value of the additional distortion as a function of the message length and the probability distribution of the JPEG quantization errors of cover images. Our experiments show close agreement between the theoretical prediction and the actual additional distortion.

  20. Bio-Inspired Neural Model for Learning Dynamic Models

    NASA Technical Reports Server (NTRS)

    Duong, Tuan; Duong, Vu; Suri, Ronald

    2009-01-01

    A neural-network mathematical model that, relative to prior such models, places greater emphasis on some of the temporal aspects of real neural physical processes, has been proposed as a basis for massively parallel, distributed algorithms that learn dynamic models of possibly complex external processes by means of learning rules that are local in space and time. The algorithms could be made to perform such functions as recognition and prediction of words in speech and of objects depicted in video images. The approach embodied in this model is said to be "hardware-friendly" in the following sense: The algorithms would be amenable to execution by special-purpose computers implemented as very-large-scale integrated (VLSI) circuits that would operate at relatively high speeds and low power demands.

  1. Strategy Execution in Cognitive Skill Learning: An Item-Level Test of Candidate Models

    ERIC Educational Resources Information Center

    Rickard, Timothy C.

    2004-01-01

    This article investigates the transition to memory-based performance that commonly occurs with practice on tasks that initially require use of a multistep algorithm. In an alphabet arithmetic task, item response times exhibited pronounced step-function decreases after moderate practice that were uniquely predicted by T. C. Rickard's (1997)…

  2. A rapid learning and dynamic stepwise updating algorithm for flat neural networks and the application to time-series prediction.

    PubMed

    Chen, C P; Wan, J Z

    1999-01-01

    A fast learning algorithm is proposed to find an optimal weights of the flat neural networks (especially, the functional-link network). Although the flat networks are used for nonlinear function approximation, they can be formulated as linear systems. Thus, the weights of the networks can be solved easily using a linear least-square method. This formulation makes it easier to update the weights instantly for both a new added pattern and a new added enhancement node. A dynamic stepwise updating algorithm is proposed to update the weights of the system on-the-fly. The model is tested on several time-series data including an infrared laser data set, a chaotic time-series, a monthly flour price data set, and a nonlinear system identification problem. The simulation results are compared to existing models in which more complex architectures and more costly training are needed. The results indicate that the proposed model is very attractive to real-time processes.

  3. Using machine learning algorithms to guide rehabilitation planning for home care clients.

    PubMed

    Zhu, Mu; Zhang, Zhanyang; Hirdes, John P; Stolee, Paul

    2007-12-20

    Targeting older clients for rehabilitation is a clinical challenge and a research priority. We investigate the potential of machine learning algorithms - Support Vector Machine (SVM) and K-Nearest Neighbors (KNN) - to guide rehabilitation planning for home care clients. This study is a secondary analysis of data on 24,724 longer-term clients from eight home care programs in Ontario. Data were collected with the RAI-HC assessment system, in which the Activities of Daily Living Clinical Assessment Protocol (ADLCAP) is used to identify clients with rehabilitation potential. For study purposes, a client is defined as having rehabilitation potential if there was: i) improvement in ADL functioning, or ii) discharge home. SVM and KNN results are compared with those obtained using the ADLCAP. For comparison, the machine learning algorithms use the same functional and health status indicators as the ADLCAP. The KNN and SVM algorithms achieved similar substantially improved performance over the ADLCAP, although false positive and false negative rates were still fairly high (FP > .18, FN > .34 versus FP > .29, FN. > .58 for ADLCAP). Results are used to suggest potential revisions to the ADLCAP. Machine learning algorithms achieved superior predictions than the current protocol. Machine learning results are less readily interpretable, but can also be used to guide development of improved clinical protocols.

  4. New algorithms to represent complex pseudoknotted RNA structures in dot-bracket notation.

    PubMed

    Antczak, Maciej; Popenda, Mariusz; Zok, Tomasz; Zurkowski, Michal; Adamiak, Ryszard W; Szachniuk, Marta

    2018-04-15

    Understanding the formation, architecture and roles of pseudoknots in RNA structures are one of the most difficult challenges in RNA computational biology and structural bioinformatics. Methods predicting pseudoknots typically perform this with poor accuracy, often despite experimental data incorporation. Existing bioinformatic approaches differ in terms of pseudoknots' recognition and revealing their nature. A few ways of pseudoknot classification exist, most common ones refer to a genus or order. Following the latter one, we propose new algorithms that identify pseudoknots in RNA structure provided in BPSEQ format, determine their order and encode in dot-bracket-letter notation. The proposed encoding aims to illustrate the hierarchy of RNA folding. New algorithms are based on dynamic programming and hybrid (combining exhaustive search and random walk) approaches. They evolved from elementary algorithm implemented within the workflow of RNA FRABASE 1.0, our database of RNA structure fragments. They use different scoring functions to rank dissimilar dot-bracket representations of RNA structure. Computational experiments show an advantage of new methods over the others, especially for large RNA structures. Presented algorithms have been implemented as new functionality of RNApdbee webserver and are ready to use at http://rnapdbee.cs.put.poznan.pl. mszachniuk@cs.put.poznan.pl. Supplementary data are available at Bioinformatics online.

  5. Performance index and meta-optimization of a direct search optimization method

    NASA Astrophysics Data System (ADS)

    Krus, P.; Ölvander, J.

    2013-10-01

    Design optimization is becoming an increasingly important tool for design, often using simulation as part of the evaluation of the objective function. A measure of the efficiency of an optimization algorithm is of great importance when comparing methods. The main contribution of this article is the introduction of a singular performance criterion, the entropy rate index based on Shannon's information theory, taking both reliability and rate of convergence into account. It can also be used to characterize the difficulty of different optimization problems. Such a performance criterion can also be used for optimization of the optimization algorithms itself. In this article the Complex-RF optimization method is described and its performance evaluated and optimized using the established performance criterion. Finally, in order to be able to predict the resources needed for optimization an objective function temperament factor is defined that indicates the degree of difficulty of the objective function.

  6. The Importance of Ligand Conformational Energies in Carbohydrate Docking: Sorting the Wheat from the Chaff

    PubMed Central

    Nivedha, Anita K.; Makeneni, Spandana; Foley, B. Lachele; Tessier, Matthew B.; Woods, Robert J.

    2014-01-01

    Docking algorithms that aim to be applicable to a broad range of ligands suffer reduced accuracy because they are unable to incorporate ligand-specific conformational energies. Here, we develop internal energy functions, Carbohydrate Intrinsic (CHI), to account for the rotational preferences of the glycosidic torsion angles in carbohydrates. The relative energies predicted by the CHI energy functions mirror the conformational distributions of glycosidic linkages determined from a survey of oligosaccharide-protein complexes in the Protein Data Bank. Addition of CHI energies to the standard docking scores in Autodock 3, 4.2, and Vina consistently improves pose ranking of oligosaccharides docked to a set of anti-carbohydrate antibodies. The CHI energy functions are also independent of docking algorithm, and with minor modifications, may be incorporated into both theoretical modeling methods, and experimental NMR or X-ray structure refinement programs. PMID:24375430

  7. Functional Classification of Immune Regulatory Proteins

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rubinstein, Rotem; Ramagopal, Udupi A.; Nathenson, Stanley G.

    2013-05-01

    Members of the immunoglobulin superfamily (IgSF) control innate and adaptive immunity and are prime targets for the treatment of autoimmune diseases, infectious diseases, and malignancies. We describe a computational method, termed the Brotherhood algorithm, which utilizes intermediate sequence information to classify proteins into functionally related families. This approach identifies functional relationships within the IgSF and predicts additional receptor-ligand interactions. As a specific example, we examine the nectin/nectin-like family of cell adhesion and signaling proteins and propose receptor-ligand interactions within this family. We were guided by the Brotherhood approach and present the high-resolution structural characterization of a homophilic interaction involving themore » class-I MHC-restricted T-cell-associated molecule, which we now classify as a nectin-like family member. The Brotherhood algorithm is likely to have a significant impact on structural immunology by identifying those proteins and complexes for which structural characterization will be particularly informative.« less

  8. PRESS-based EFOR algorithm for the dynamic parametrical modeling of nonlinear MDOF systems

    NASA Astrophysics Data System (ADS)

    Liu, Haopeng; Zhu, Yunpeng; Luo, Zhong; Han, Qingkai

    2017-09-01

    In response to the identification problem concerning multi-degree of freedom (MDOF) nonlinear systems, this study presents the extended forward orthogonal regression (EFOR) based on predicted residual sums of squares (PRESS) to construct a nonlinear dynamic parametrical model. The proposed parametrical model is based on the non-linear autoregressive with exogenous inputs (NARX) model and aims to explicitly reveal the physical design parameters of the system. The PRESS-based EFOR algorithm is proposed to identify such a model for MDOF systems. By using the algorithm, we built a common-structured model based on the fundamental concept of evaluating its generalization capability through cross-validation. The resulting model aims to prevent over-fitting with poor generalization performance caused by the average error reduction ratio (AERR)-based EFOR algorithm. Then, a functional relationship is established between the coefficients of the terms and the design parameters of the unified model. Moreover, a 5-DOF nonlinear system is taken as a case to illustrate the modeling of the proposed algorithm. Finally, a dynamic parametrical model of a cantilever beam is constructed from experimental data. Results indicate that the dynamic parametrical model of nonlinear systems, which depends on the PRESS-based EFOR, can accurately predict the output response, thus providing a theoretical basis for the optimal design of modeling methods for MDOF nonlinear systems.

  9. In Silico Prediction and Validation of Gfap as an miR-3099 Target in Mouse Brain.

    PubMed

    Abidin, Shahidee Zainal; Leong, Jia-Wen; Mahmoudi, Marzieh; Nordin, Norshariza; Abdullah, Syahril; Cheah, Pike-See; Ling, King-Hwa

    2017-08-01

    MicroRNAs are small non-coding RNAs that play crucial roles in the regulation of gene expression and protein synthesis during brain development. MiR-3099 is highly expressed throughout embryogenesis, especially in the developing central nervous system. Moreover, miR-3099 is also expressed at a higher level in differentiating neurons in vitro, suggesting that it is a potential regulator during neuronal cell development. This study aimed to predict the target genes of miR-3099 via in-silico analysis using four independent prediction algorithms (miRDB, miRanda, TargetScan, and DIANA-micro-T-CDS) with emphasis on target genes related to brain development and function. Based on the analysis, a total of 3,174 miR-3099 target genes were predicted. Those predicted by at least three algorithms (324 genes) were subjected to DAVID bioinformatics analysis to understand their overall functional themes and representation. The analysis revealed that nearly 70% of the target genes were expressed in the nervous system and a significant proportion were associated with transcriptional regulation and protein ubiquitination mechanisms. Comparison of in situ hybridization (ISH) expression patterns of miR-3099 in both published and in-house-generated ISH sections with the ISH sections of target genes from the Allen Brain Atlas identified 7 target genes (Dnmt3a, Gabpa, Gfap, Itga4, Lxn, Smad7, and Tbx18) having expression patterns complementary to miR-3099 in the developing and adult mouse brain samples. Of these, we validated Gfap as a direct downstream target of miR-3099 using the luciferase reporter gene system. In conclusion, we report the successful prediction and validation of Gfap as an miR-3099 target gene using a combination of bioinformatics resources with enrichment of annotations based on functional ontologies and a spatio-temporal expression dataset.

  10. SU-C-204-01: A Fast Analytical Approach for Prompt Gamma and PET Predictions in a TPS for Proton Range Verification

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kroniger, K; Herzog, M; Landry, G

    2015-06-15

    Purpose: We describe and demonstrate a fast analytical tool for prompt-gamma emission prediction based on filter functions applied on the depth dose profile. We present the implementation in a treatment planning system (TPS) of the same algorithm for positron emitter distributions. Methods: The prediction of the desired observable is based on the convolution of filter functions with the depth dose profile. For both prompt-gammas and positron emitters, the results of Monte Carlo simulations (MC) are compared with those of the analytical tool. For prompt-gamma emission from inelastic proton-induced reactions, homogeneous and inhomogeneous phantoms alongside with patient data are used asmore » irradiation targets of mono-energetic proton pencil beams. The accuracy of the tool is assessed in terms of the shape of the analytically calculated depth profiles and their absolute yields, compared to MC. For the positron emitters, the method is implemented in a research RayStation TPS and compared to MC predictions. Digital phantoms and patient data are used and positron emitter spatial density distributions are analyzed. Results: Calculated prompt-gamma profiles agree with MC within 3 % in terms of absolute yield and reproduce the correct shape. Based on an arbitrary reference material and by means of 6 filter functions (one per chemical element), profiles in any other material composed of those elements can be predicted. The TPS implemented algorithm is accurate enough to enable, via the analytically calculated positron emitters profiles, detection of range differences between the TPS and MC with errors of the order of 1–2 mm. Conclusion: The proposed analytical method predicts prompt-gamma and positron emitter profiles which generally agree with the distributions obtained by a full MC. The implementation of the tool in a TPS shows that reliable profiles can be obtained directly from the dose calculated by the TPS, without the need of full MC simulation.« less

  11. Particle Swarm Optimization for Programming Deep Brain Stimulation Arrays

    PubMed Central

    Peña, Edgar; Zhang, Simeng; Deyo, Steve; Xiao, YiZi; Johnson, Matthew D.

    2017-01-01

    Objective Deep brain stimulation (DBS) therapy relies on both precise neurosurgical targeting and systematic optimization of stimulation settings to achieve beneficial clinical outcomes. One recent advance to improve targeting is the development of DBS arrays (DBSAs) with electrodes segmented both along and around the DBS lead. However, increasing the number of independent electrodes creates the logistical challenge of optimizing stimulation parameters efficiently. Approach Solving such complex problems with multiple solutions and objectives is well known to occur in biology, in which complex collective behaviors emerge out of swarms of individual organisms engaged in learning through social interactions. Here, we developed a particle swarm optimization (PSO) algorithm to program DBSAs using a swarm of individual particles representing electrode configurations and stimulation amplitudes. Using a finite element model of motor thalamic DBS, we demonstrate how the PSO algorithm can efficiently optimize a multi-objective function that maximizes predictions of axonal activation in regions of interest (ROI, cerebellar-receiving area of motor thalamus), minimizes predictions of axonal activation in regions of avoidance (ROA, somatosensory thalamus), and minimizes power consumption. Main Results The algorithm solved the multi-objective problem by producing a Pareto front. ROI and ROA activation predictions were consistent across swarms (<1% median discrepancy in axon activation). The algorithm was able to accommodate for (1) lead displacement (1 mm) with relatively small ROI (≤9.2%) and ROA (≤1%) activation changes, irrespective of shift direction; (2) reduction in maximum per-electrode current (by 50% and 80%) with ROI activation decreasing by 5.6% and 16%, respectively; and (3) disabling electrodes (n=3 and 12) with ROI activation reduction by 1.8% and 14%, respectively. Additionally, comparison between PSO predictions and multi-compartment axon model simulations showed discrepancies of <1% between approaches. Significance The PSO algorithm provides a computationally efficient way to program DBS systems especially those with higher electrode counts. PMID:28068291

  12. Particle swarm optimization for programming deep brain stimulation arrays

    NASA Astrophysics Data System (ADS)

    Peña, Edgar; Zhang, Simeng; Deyo, Steve; Xiao, YiZi; Johnson, Matthew D.

    2017-02-01

    Objective. Deep brain stimulation (DBS) therapy relies on both precise neurosurgical targeting and systematic optimization of stimulation settings to achieve beneficial clinical outcomes. One recent advance to improve targeting is the development of DBS arrays (DBSAs) with electrodes segmented both along and around the DBS lead. However, increasing the number of independent electrodes creates the logistical challenge of optimizing stimulation parameters efficiently. Approach. Solving such complex problems with multiple solutions and objectives is well known to occur in biology, in which complex collective behaviors emerge out of swarms of individual organisms engaged in learning through social interactions. Here, we developed a particle swarm optimization (PSO) algorithm to program DBSAs using a swarm of individual particles representing electrode configurations and stimulation amplitudes. Using a finite element model of motor thalamic DBS, we demonstrate how the PSO algorithm can efficiently optimize a multi-objective function that maximizes predictions of axonal activation in regions of interest (ROI, cerebellar-receiving area of motor thalamus), minimizes predictions of axonal activation in regions of avoidance (ROA, somatosensory thalamus), and minimizes power consumption. Main results. The algorithm solved the multi-objective problem by producing a Pareto front. ROI and ROA activation predictions were consistent across swarms (<1% median discrepancy in axon activation). The algorithm was able to accommodate for (1) lead displacement (1 mm) with relatively small ROI (⩽9.2%) and ROA (⩽1%) activation changes, irrespective of shift direction; (2) reduction in maximum per-electrode current (by 50% and 80%) with ROI activation decreasing by 5.6% and 16%, respectively; and (3) disabling electrodes (n  =  3 and 12) with ROI activation reduction by 1.8% and 14%, respectively. Additionally, comparison between PSO predictions and multi-compartment axon model simulations showed discrepancies of  <1% between approaches. Significance. The PSO algorithm provides a computationally efficient way to program DBS systems especially those with higher electrode counts.

  13. Prediction of miRNA-mRNA associations in Alzheimer's disease mice using network topology.

    PubMed

    Noh, Haneul; Park, Charny; Park, Soojun; Lee, Young Seek; Cho, Soo Young; Seo, Hyemyung

    2014-08-03

    Little is known about the relationship between miRNA and mRNA expression in Alzheimer's disease (AD) at early- or late-symptomatic stages. Sequence-based target prediction algorithms and anti-correlation profiles have been applied to predict miRNA targets using omics data, but this approach often leads to false positive predictions. Here, we applied the joint profiling analysis of mRNA and miRNA expression levels to Tg6799 AD model mice at 4 and 8 months of age using a network topology-based method. We constructed gene regulatory networks and used the PageRank algorithm to predict significant interactions between miRNA and mRNA. In total, 8 cluster modules were predicted by the transcriptome data for co-expression networks of AD pathology. In total, 54 miRNAs were identified as being differentially expressed in AD. Among these, 50 significant miRNA-mRNA interactions were predicted by integrating sequence target prediction, expression analysis, and the PageRank algorithm. We identified a set of miRNA-mRNA interactions that were changed in the hippocampus of Tg6799 AD model mice. We determined the expression levels of several candidate genes and miRNA. For functional validation in primary cultured neurons from Tg6799 mice (MT) and littermate (LM) controls, the overexpression of ARRDC3 enhanced PPP1R3C expression. ARRDC3 overexpression showed the tendency to decrease the expression of miR139-5p and miR3470a in both LM and MT primary cells. Pathological environment created by Aβ treatment increased the gene expression of PPP1R3C and Sfpq but did not significantly alter the expression of miR139-5p or miR3470a. Aβ treatment increased the promoter activity of ARRDC3 gene in LM primary cells but not in MT primary cells. Our results demonstrate AD-specific changes in the miRNA regulatory system as well as the relationship between the expression levels of miRNAs and their targets in the hippocampus of Tg6799 mice. These data help further our understanding of the function and mechanism of various miRNAs and their target genes in the molecular pathology of AD.

  14. Peak-Seeking Optimization of Trim for Reduced Fuel Consumption: Architecture and Performance Predictions

    NASA Technical Reports Server (NTRS)

    Schaefer, Jacob; Brown, Nelson

    2013-01-01

    A peak-seeking control approach for real-time trim configuration optimization for reduced fuel consumption has been developed by researchers at the National Aeronautics and Space Administration (NASA) Dryden Flight Research Center to address the goals of the NASA Environmentally Responsible Aviation project to reduce fuel burn and emissions. The peak-seeking control approach is based on a steepest-descent algorithm using a time-varying Kalman filter to estimate the gradient of a performance function of fuel flow versus control surface positions. In real-time operation, deflections of symmetric ailerons, trailing-edge flaps, and leading-edge flaps of an FA-18 airplane (McDonnell Douglas, now The Boeing Company, Chicago, Illinois) are controlled for optimization of fuel flow. This presentation presents the design and integration of this peak-seeking controller on a modified NASA FA-18 airplane with research flight control computers. A research flight was performed to collect data to build a realistic model of the performance function and characterize measurement noise. This model was then implemented into a nonlinear six-degree-of-freedom FA-18 simulation along with the peak-seeking control algorithm. With the goal of eventual flight tests, the algorithm was first evaluated in the improved simulation environment. Results from the simulation predict good convergence on minimum fuel flow with a 2.5-percent reduction in fuel flow relative to the baseline trim of the aircraft.

  15. Peak-Seeking Optimization of Trim for Reduced Fuel Consumption: Architecture and Performance Predictions

    NASA Technical Reports Server (NTRS)

    Schaefer, Jacob; Brown, Nelson A.

    2013-01-01

    A peak-seeking control approach for real-time trim configuration optimization for reduced fuel consumption has been developed by researchers at the National Aeronautics and Space Administration (NASA) Dryden Flight Research Center to address the goals of the NASA Environmentally Responsible Aviation project to reduce fuel burn and emissions. The peak-seeking control approach is based on a steepest-descent algorithm using a time-varying Kalman filter to estimate the gradient of a performance function of fuel flow versus control surface positions. In real-time operation, deflections of symmetric ailerons, trailing-edge flaps, and leading-edge flaps of an F/A-18 airplane (McDonnell Douglas, now The Boeing Company, Chicago, Illinois) are controlled for optimization of fuel flow. This paper presents the design and integration of this peak-seeking controller on a modified NASA F/A-18 airplane with research flight control computers. A research flight was performed to collect data to build a realistic model of the performance function and characterize measurement noise. This model was then implemented into a nonlinear six-degree-of-freedom F/A-18 simulation along with the peak-seeking control algorithm. With the goal of eventual flight tests, the algorithm was first evaluated in the improved simulation environment. Results from the simulation predict good convergence on minimum fuel flow with a 2.5-percent reduction in fuel flow relative to the baseline trim of the aircraft.

  16. Making predictions in a changing world-inference, uncertainty, and learning.

    PubMed

    O'Reilly, Jill X

    2013-01-01

    To function effectively, brains need to make predictions about their environment based on past experience, i.e., they need to learn about their environment. The algorithms by which learning occurs are of interest to neuroscientists, both in their own right (because they exist in the brain) and as a tool to model participants' incomplete knowledge of task parameters and hence, to better understand their behavior. This review focusses on a particular challenge for learning algorithms-how to match the rate at which they learn to the rate of change in the environment, so that they use as much observed data as possible whilst disregarding irrelevant, old observations. To do this algorithms must evaluate whether the environment is changing. We discuss the concepts of likelihood, priors and transition functions, and how these relate to change detection. We review expected and estimation uncertainty, and how these relate to change detection and learning rate. Finally, we consider the neural correlates of uncertainty and learning. We argue that the neural correlates of uncertainty bear a resemblance to neural systems that are active when agents actively explore their environments, suggesting that the mechanisms by which the rate of learning is set may be subject to top down control (in circumstances when agents actively seek new information) as well as bottom up control (by observations that imply change in the environment).

  17. Ensemble-based prediction of RNA secondary structures.

    PubMed

    Aghaeepour, Nima; Hoos, Holger H

    2013-04-24

    Accurate structure prediction methods play an important role for the understanding of RNA function. Energy-based, pseudoknot-free secondary structure prediction is one of the most widely used and versatile approaches, and improved methods for this task have received much attention over the past five years. Despite the impressive progress that as been achieved in this area, existing evaluations of the prediction accuracy achieved by various algorithms do not provide a comprehensive, statistically sound assessment. Furthermore, while there is increasing evidence that no prediction algorithm consistently outperforms all others, no work has been done to exploit the complementary strengths of multiple approaches. In this work, we present two contributions to the area of RNA secondary structure prediction. Firstly, we use state-of-the-art, resampling-based statistical methods together with a previously published and increasingly widely used dataset of high-quality RNA structures to conduct a comprehensive evaluation of existing RNA secondary structure prediction procedures. The results from this evaluation clarify the performance relationship between ten well-known existing energy-based pseudoknot-free RNA secondary structure prediction methods and clearly demonstrate the progress that has been achieved in recent years. Secondly, we introduce AveRNA, a generic and powerful method for combining a set of existing secondary structure prediction procedures into an ensemble-based method that achieves significantly higher prediction accuracies than obtained from any of its component procedures. Our new, ensemble-based method, AveRNA, improves the state of the art for energy-based, pseudoknot-free RNA secondary structure prediction by exploiting the complementary strengths of multiple existing prediction procedures, as demonstrated using a state-of-the-art statistical resampling approach. In addition, AveRNA allows an intuitive and effective control of the trade-off between false negative and false positive base pair predictions. Finally, AveRNA can make use of arbitrary sets of secondary structure prediction procedures and can therefore be used to leverage improvements in prediction accuracy offered by algorithms and energy models developed in the future. Our data, MATLAB software and a web-based version of AveRNA are publicly available at http://www.cs.ubc.ca/labs/beta/Software/AveRNA.

  18. Abschätzung des Einflusses von Parameterunsicherheiten bei der Planung und Auswertung von Tracertests unter Verwendung von Ensembleprognosen

    NASA Astrophysics Data System (ADS)

    Klotzsch, Stephan; Binder, Martin; Händel, Falk

    2017-06-01

    While planning tracer tests, uncertainties in geohydraulic parameters should be considered as an important factor. Neglecting these uncertainties can lead to missing the tracer breakthrough, for example. One way to consider uncertainties during tracer test design is the so called ensemble forecast. The applicability of this method to geohydrological problems is demonstrated by coupling the method with two analytical solute transport models. The algorithm presented in this article is suitable for prediction as well as parameter estimation. The parameter estimation function can be used in a tracer test for reducing the uncertainties in the measured data which can improve the initial prediction. The algorithm was implemented into a software tool which is freely downloadable from the website of the Institute for Groundwater Management at TU Dresden, Germany.

  19. Predicting nucleic acid binding interfaces from structural models of proteins

    PubMed Central

    Dror, Iris; Shazman, Shula; Mukherjee, Srayanta; Zhang, Yang; Glaser, Fabian; Mandel-Gutfreund, Yael

    2011-01-01

    The function of DNA- and RNA-binding proteins can be inferred from the characterization and accurate prediction of their binding interfaces. However the main pitfall of various structure-based methods for predicting nucleic acid binding function is that they are all limited to a relatively small number of proteins for which high-resolution three dimensional structures are available. In this study, we developed a pipeline for extracting functional electrostatic patches from surfaces of protein structural models, obtained using the I-TASSER protein structure predictor. The largest positive patches are extracted from the protein surface using the patchfinder algorithm. We show that functional electrostatic patches extracted from an ensemble of structural models highly overlap the patches extracted from high-resolution structures. Furthermore, by testing our pipeline on a set of 55 known nucleic acid binding proteins for which I-TASSER produces high-quality models, we show that the method accurately identifies the nucleic acids binding interface on structural models of proteins. Employing a combined patch approach we show that patches extracted from an ensemble of models better predicts the real nucleic acid binding interfaces compared to patches extracted from independent models. Overall, these results suggest that combining information from a collection of low-resolution structural models could be a valuable approach for functional annotation. We suggest that our method will be further applicable for predicting other functional surfaces of proteins with unknown structure. PMID:22086767

  20. RNA-TVcurve: a Web server for RNA secondary structure comparison based on a multi-scale similarity of its triple vector curve representation.

    PubMed

    Li, Ying; Shi, Xiaohu; Liang, Yanchun; Xie, Juan; Zhang, Yu; Ma, Qin

    2017-01-21

    RNAs have been found to carry diverse functionalities in nature. Inferring the similarity between two given RNAs is a fundamental step to understand and interpret their functional relationship. The majority of functional RNAs show conserved secondary structures, rather than sequence conservation. Those algorithms relying on sequence-based features usually have limitations in their prediction performance. Hence, integrating RNA structure features is very critical for RNA analysis. Existing algorithms mainly fall into two categories: alignment-based and alignment-free. The alignment-free algorithms of RNA comparison usually have lower time complexity than alignment-based algorithms. An alignment-free RNA comparison algorithm was proposed, in which novel numerical representations RNA-TVcurve (triple vector curve representation) of RNA sequence and corresponding secondary structure features are provided. Then a multi-scale similarity score of two given RNAs was designed based on wavelet decomposition of their numerical representation. In support of RNA mutation and phylogenetic analysis, a web server (RNA-TVcurve) was designed based on this alignment-free RNA comparison algorithm. It provides three functional modules: 1) visualization of numerical representation of RNA secondary structure; 2) detection of single-point mutation based on secondary structure; and 3) comparison of pairwise and multiple RNA secondary structures. The inputs of the web server require RNA primary sequences, while corresponding secondary structures are optional. For the primary sequences alone, the web server can compute the secondary structures using free energy minimization algorithm in terms of RNAfold tool from Vienna RNA package. RNA-TVcurve is the first integrated web server, based on an alignment-free method, to deliver a suite of RNA analysis functions, including visualization, mutation analysis and multiple RNAs structure comparison. The comparison results with two popular RNA comparison tools, RNApdist and RNAdistance, showcased that RNA-TVcurve can efficiently capture subtle relationships among RNAs for mutation detection and non-coding RNA classification. All the relevant results were shown in an intuitive graphical manner, and can be freely downloaded from this server. RNA-TVcurve, along with test examples and detailed documents, are available at: http://ml.jlu.edu.cn/tvcurve/ .

  1. Integrative gene network construction to analyze cancer recurrence using semi-supervised learning.

    PubMed

    Park, Chihyun; Ahn, Jaegyoon; Kim, Hyunjin; Park, Sanghyun

    2014-01-01

    The prognosis of cancer recurrence is an important research area in bioinformatics and is challenging due to the small sample sizes compared to the vast number of genes. There have been several attempts to predict cancer recurrence. Most studies employed a supervised approach, which uses only a few labeled samples. Semi-supervised learning can be a great alternative to solve this problem. There have been few attempts based on manifold assumptions to reveal the detailed roles of identified cancer genes in recurrence. In order to predict cancer recurrence, we proposed a novel semi-supervised learning algorithm based on a graph regularization approach. We transformed the gene expression data into a graph structure for semi-supervised learning and integrated protein interaction data with the gene expression data to select functionally-related gene pairs. Then, we predicted the recurrence of cancer by applying a regularization approach to the constructed graph containing both labeled and unlabeled nodes. The average improvement rate of accuracy for three different cancer datasets was 24.9% compared to existing supervised and semi-supervised methods. We performed functional enrichment on the gene networks used for learning. We identified that those gene networks are significantly associated with cancer-recurrence-related biological functions. Our algorithm was developed with standard C++ and is available in Linux and MS Windows formats in the STL library. The executable program is freely available at: http://embio.yonsei.ac.kr/~Park/ssl.php.

  2. Mechanobiological simulations of peri-acetabular bone ingrowth: a comparative analysis of cell-phenotype specific and phenomenological algorithms.

    PubMed

    Mukherjee, Kaushik; Gupta, Sanjay

    2017-03-01

    Several mechanobiology algorithms have been employed to simulate bone ingrowth around porous coated implants. However, there is a scarcity of quantitative comparison between the efficacies of commonly used mechanoregulatory algorithms. The objectives of this study are: (1) to predict peri-acetabular bone ingrowth using cell-phenotype specific algorithm and to compare these predictions with those obtained using phenomenological algorithm and (2) to investigate the influences of cellular parameters on bone ingrowth. The variation in host bone material property and interfacial micromotion of the implanted pelvis were mapped onto the microscale model of implant-bone interface. An overall variation of 17-88 % in peri-acetabular bone ingrowth was observed. Despite differences in predicted tissue differentiation patterns during the initial period, both the algorithms predicted similar spatial distribution of neo-tissue layer, after attainment of equilibrium. Results indicated that phenomenological algorithm, being computationally faster than the cell-phenotype specific algorithm, might be used to predict peri-prosthetic bone ingrowth. The cell-phenotype specific algorithm, however, was found to be useful in numerically investigating the influence of alterations in cellular activities on bone ingrowth, owing to biologically related factors. Amongst the host of cellular activities, matrix production rate of bone tissue was found to have predominant influence on peri-acetabular bone ingrowth.

  3. Developing a dengue forecast model using machine learning: A case study in China.

    PubMed

    Guo, Pi; Liu, Tao; Zhang, Qin; Wang, Li; Xiao, Jianpeng; Zhang, Qingying; Luo, Ganfeng; Li, Zhihao; He, Jianfeng; Zhang, Yonghui; Ma, Wenjun

    2017-10-01

    In China, dengue remains an important public health issue with expanded areas and increased incidence recently. Accurate and timely forecasts of dengue incidence in China are still lacking. We aimed to use the state-of-the-art machine learning algorithms to develop an accurate predictive model of dengue. Weekly dengue cases, Baidu search queries and climate factors (mean temperature, relative humidity and rainfall) during 2011-2014 in Guangdong were gathered. A dengue search index was constructed for developing the predictive models in combination with climate factors. The observed year and week were also included in the models to control for the long-term trend and seasonality. Several machine learning algorithms, including the support vector regression (SVR) algorithm, step-down linear regression model, gradient boosted regression tree algorithm (GBM), negative binomial regression model (NBM), least absolute shrinkage and selection operator (LASSO) linear regression model and generalized additive model (GAM), were used as candidate models to predict dengue incidence. Performance and goodness of fit of the models were assessed using the root-mean-square error (RMSE) and R-squared measures. The residuals of the models were examined using the autocorrelation and partial autocorrelation function analyses to check the validity of the models. The models were further validated using dengue surveillance data from five other provinces. The epidemics during the last 12 weeks and the peak of the 2014 large outbreak were accurately forecasted by the SVR model selected by a cross-validation technique. Moreover, the SVR model had the consistently smallest prediction error rates for tracking the dynamics of dengue and forecasting the outbreaks in other areas in China. The proposed SVR model achieved a superior performance in comparison with other forecasting techniques assessed in this study. The findings can help the government and community respond early to dengue epidemics.

  4. Improved packing of protein side chains with parallel ant colonies

    PubMed Central

    2014-01-01

    Introduction The accurate packing of protein side chains is important for many computational biology problems, such as ab initio protein structure prediction, homology modelling, and protein design and ligand docking applications. Many of existing solutions are modelled as a computational optimisation problem. As well as the design of search algorithms, most solutions suffer from an inaccurate energy function for judging whether a prediction is good or bad. Even if the search has found the lowest energy, there is no certainty of obtaining the protein structures with correct side chains. Methods We present a side-chain modelling method, pacoPacker, which uses a parallel ant colony optimisation strategy based on sharing a single pheromone matrix. This parallel approach combines different sources of energy functions and generates protein side-chain conformations with the lowest energies jointly determined by the various energy functions. We further optimised the selected rotamers to construct subrotamer by rotamer minimisation, which reasonably improved the discreteness of the rotamer library. Results We focused on improving the accuracy of side-chain conformation prediction. For a testing set of 442 proteins, 87.19% of X1 and 77.11% of X12 angles were predicted correctly within 40° of the X-ray positions. We compared the accuracy of pacoPacker with state-of-the-art methods, such as CIS-RR and SCWRL4. We analysed the results from different perspectives, in terms of protein chain and individual residues. In this comprehensive benchmark testing, 51.5% of proteins within a length of 400 amino acids predicted by pacoPacker were superior to the results of CIS-RR and SCWRL4 simultaneously. Finally, we also showed the advantage of using the subrotamers strategy. All results confirmed that our parallel approach is competitive to state-of-the-art solutions for packing side chains. Conclusions This parallel approach combines various sources of searching intelligence and energy functions to pack protein side chains. It provides a frame-work for combining different inaccuracy/usefulness objective functions by designing parallel heuristic search algorithms. PMID:25474164

  5. Predicting overlapping protein complexes from weighted protein interaction graphs by gradually expanding dense neighborhoods.

    PubMed

    Dimitrakopoulos, Christos; Theofilatos, Konstantinos; Pegkas, Andreas; Likothanassis, Spiros; Mavroudi, Seferina

    2016-07-01

    Proteins are vital biological molecules driving many fundamental cellular processes. They rarely act alone, but form interacting groups called protein complexes. The study of protein complexes is a key goal in systems biology. Recently, large protein-protein interaction (PPI) datasets have been published and a plethora of computational methods that provide new ideas for the prediction of protein complexes have been implemented. However, most of the methods suffer from two major limitations: First, they do not account for proteins participating in multiple functions and second, they are unable to handle weighted PPI graphs. Moreover, the problem remains open as existing algorithms and tools are insufficient in terms of predictive metrics. In the present paper, we propose gradually expanding neighborhoods with adjustment (GENA), a new algorithm that gradually expands neighborhoods in a graph starting from highly informative "seed" nodes. GENA considers proteins as multifunctional molecules allowing them to participate in more than one protein complex. In addition, GENA accepts weighted PPI graphs by using a weighted evaluation function for each cluster. In experiments with datasets from Saccharomyces cerevisiae and human, GENA outperformed Markov clustering, restricted neighborhood search and clustering with overlapping neighborhood expansion, three state-of-the-art methods for computationally predicting protein complexes. Seven PPI networks and seven evaluation datasets were used in total. GENA outperformed existing methods in 16 out of 18 experiments achieving an average improvement of 5.5% when the maximum matching ratio metric was used. Our method was able to discover functionally homogeneous protein clusters and uncover important network modules in a Parkinson expression dataset. When used on the human networks, around 47% of the detected clusters were enriched in gene ontology (GO) terms with depth higher than five in the GO hierarchy. In the present manuscript, we introduce a new method for the computational prediction of protein complexes by making the realistic assumption that proteins participate in multiple protein complexes and cellular functions. Our method can detect accurate and functionally homogeneous clusters. Copyright © 2016 Elsevier B.V. All rights reserved.

  6. The Role of Multimodel Combination in Improving Streamflow Prediction

    NASA Astrophysics Data System (ADS)

    Arumugam, S.; Li, W.

    2008-12-01

    Model errors are the inevitable part in any prediction exercise. One approach that is currently gaining attention to reduce model errors is by optimally combining multiple models to develop improved predictions. The rationale behind this approach primarily lies on the premise that optimal weights could be derived for each model so that the developed multimodel predictions will result in improved predictability. In this study, we present a new approach to combine multiple hydrological models by evaluating their predictability contingent on the predictor state. We combine two hydrological models, 'abcd' model and Variable Infiltration Capacity (VIC) model, with each model's parameter being estimated by two different objective functions to develop multimodel streamflow predictions. The performance of multimodel predictions is compared with individual model predictions using correlation, root mean square error and Nash-Sutcliffe coefficient. To quantify precisely under what conditions the multimodel predictions result in improved predictions, we evaluate the proposed algorithm by testing it against streamflow generated from a known model ('abcd' model or VIC model) with errors being homoscedastic or heteroscedastic. Results from the study show that streamflow simulated from individual models performed better than multimodels under almost no model error. Under increased model error, the multimodel consistently performed better than the single model prediction in terms of all performance measures. The study also evaluates the proposed algorithm for streamflow predictions in two humid river basins from NC as well as in two arid basins from Arizona. Through detailed validation in these four sites, the study shows that multimodel approach better predicts the observed streamflow in comparison to the single model predictions.

  7. Ab-initio conformational epitope structure prediction using genetic algorithm and SVM for vaccine design.

    PubMed

    Moghram, Basem Ameen; Nabil, Emad; Badr, Amr

    2018-01-01

    T-cell epitope structure identification is a significant challenging immunoinformatic problem within epitope-based vaccine design. Epitopes or antigenic peptides are a set of amino acids that bind with the Major Histocompatibility Complex (MHC) molecules. The aim of this process is presented by Antigen Presenting Cells to be inspected by T-cells. MHC-molecule-binding epitopes are responsible for triggering the immune response to antigens. The epitope's three-dimensional (3D) molecular structure (i.e., tertiary structure) reflects its proper function. Therefore, the identification of MHC class-II epitopes structure is a significant step towards epitope-based vaccine design and understanding of the immune system. In this paper, we propose a new technique using a Genetic Algorithm for Predicting the Epitope Structure (GAPES), to predict the structure of MHC class-II epitopes based on their sequence. The proposed Elitist-based genetic algorithm for predicting the epitope's tertiary structure is based on Ab-Initio Empirical Conformational Energy Program for Peptides (ECEPP) Force Field Model. The developed secondary structure prediction technique relies on Ramachandran Plot. We used two alignment algorithms: the ROSS alignment and TM-Score alignment. We applied four different alignment approaches to calculate the similarity scores of the dataset under test. We utilized the support vector machine (SVM) classifier as an evaluation of the prediction performance. The prediction accuracy and the Area Under Receiver Operating Characteristic (ROC) Curve (AUC) were calculated as measures of performance. The calculations are performed on twelve similarity-reduced datasets of the Immune Epitope Data Base (IEDB) and a large dataset of peptide-binding affinities to HLA-DRB1*0101. The results showed that GAPES was reliable and very accurate. We achieved an average prediction accuracy of 93.50% and an average AUC of 0.974 in the IEDB dataset. Also, we achieved an accuracy of 95.125% and an AUC of 0.987 on the HLA-DRB1*0101 allele of the Wang benchmark dataset. The results indicate that the proposed prediction technique "GAPES" is a promising technique that will help researchers and scientists to predict the protein structure and it will assist them in the intelligent design of new epitope-based vaccines. Copyright © 2017 Elsevier B.V. All rights reserved.

  8. Improving personalized link prediction by hybrid diffusion

    NASA Astrophysics Data System (ADS)

    Liu, Jin-Hu; Zhu, Yu-Xiao; Zhou, Tao

    2016-04-01

    Inspired by traditional link prediction and to solve the problem of recommending friends in social networks, we introduce the personalized link prediction in this paper, in which each individual will get equal number of diversiform predictions. While the performances of many classical algorithms are not satisfactory under this framework, thus new algorithms are in urgent need. Motivated by previous researches in other fields, we generalize heat conduction process to the framework of personalized link prediction and find that this method outperforms many classical similarity-based algorithms, especially in the performance of diversity. In addition, we demonstrate that adding one ground node that is supposed to connect all the nodes in the system will greatly benefit the performance of heat conduction. Finally, better hybrid algorithms composed of local random walk and heat conduction have been proposed. Numerical results show that the hybrid algorithms can outperform other algorithms simultaneously in all four adopted metrics: AUC, precision, recall and hamming distance. In a word, this work may shed some light on the in-depth understanding of the effect of physical processes in personalized link prediction.

  9. Evolutionary Dynamic Multiobjective Optimization Via Kalman Filter Prediction.

    PubMed

    Muruganantham, Arrchana; Tan, Kay Chen; Vadakkepat, Prahlad

    2016-12-01

    Evolutionary algorithms are effective in solving static multiobjective optimization problems resulting in the emergence of a number of state-of-the-art multiobjective evolutionary algorithms (MOEAs). Nevertheless, the interest in applying them to solve dynamic multiobjective optimization problems has only been tepid. Benchmark problems, appropriate performance metrics, as well as efficient algorithms are required to further the research in this field. One or more objectives may change with time in dynamic optimization problems. The optimization algorithm must be able to track the moving optima efficiently. A prediction model can learn the patterns from past experience and predict future changes. In this paper, a new dynamic MOEA using Kalman filter (KF) predictions in decision space is proposed to solve the aforementioned problems. The predictions help to guide the search toward the changed optima, thereby accelerating convergence. A scoring scheme is devised to hybridize the KF prediction with a random reinitialization method. Experimental results and performance comparisons with other state-of-the-art algorithms demonstrate that the proposed algorithm is capable of significantly improving the dynamic optimization performance.

  10. Predicting Vasovagal Syncope from Heart Rate and Blood Pressure: A Prospective Study in 140 Subjects.

    PubMed

    Virag, Nathalie; Erickson, Mark; Taraborrelli, Patricia; Vetter, Rolf; Lim, Phang Boon; Sutton, Richard

    2018-04-28

    We developed a vasovagal syncope (VVS) prediction algorithm for use during head-up tilt with simultaneous analysis of heart rate (HR) and systolic blood pressure (SBP). We previously tested this algorithm retrospectively in 1155 subjects, showing sensitivity 95%, specificity 93% and median prediction time of 59s. This study was prospective, single center, on 140 subjects to evaluate this VVS prediction algorithm and assess if retrospective results were reproduced and clinically relevant. Primary endpoint was VVS prediction: sensitivity and specificity >80%. In subjects, referred for 60° head-up tilt (Italian protocol), non-invasive HR and SBP were supplied to the VVS prediction algorithm: simultaneous analysis of RR intervals, SBP trends and their variability represented by low-frequency power generated cumulative risk which was compared with a predetermined VVS risk threshold. When cumulative risk exceeded threshold, an alert was generated. Prediction time was duration between first alert and syncope. Of 140 subjects enrolled, data was usable for 134. Of 83 tilt+ve (61.9%), 81 VVS events were correctly predicted and of 51 tilt-ve subjects (38.1%), 45 were correctly identified as negative by the algorithm. Resulting algorithm performance was sensitivity 97.6%, specificity 88.2%, meeting primary endpoint. Mean VVS prediction time was 2min 26s±3min16s with median 1min 25s. Using only HR and HR variability (without SBP) the mean prediction time reduced to 1min34s±1min45s with median 1min13s. The VVS prediction algorithm, is clinically-relevant tool and could offer applications including providing a patient alarm, shortening tilt-test time, or triggering pacing intervention in implantable devices. Copyright © 2018. Published by Elsevier Inc.

  11. Velocity and stress autocorrelation decay in isothermal dissipative particle dynamics

    NASA Astrophysics Data System (ADS)

    Chaudhri, Anuj; Lukes, Jennifer R.

    2010-02-01

    The velocity and stress autocorrelation decay in a dissipative particle dynamics ideal fluid model is analyzed in this paper. The autocorrelation functions are calculated at three different friction parameters and three different time steps using the well-known Groot/Warren algorithm and newer algorithms including self-consistent leap-frog, self-consistent velocity Verlet and Shardlow first and second order integrators. At low friction values, the velocity autocorrelation function decays exponentially at short times, shows slower-than exponential decay at intermediate times, and approaches zero at long times for all five integrators. As friction value increases, the deviation from exponential behavior occurs earlier and is more pronounced. At small time steps, all the integrators give identical decay profiles. As time step increases, there are qualitative and quantitative differences between the integrators. The stress correlation behavior is markedly different for the algorithms. The self-consistent velocity Verlet and the Shardlow algorithms show very similar stress autocorrelation decay with change in friction parameter, whereas the Groot/Warren and leap-frog schemes show variations at higher friction factors. Diffusion coefficients and shear viscosities are calculated using Green-Kubo integration of the velocity and stress autocorrelation functions. The diffusion coefficients match well-known theoretical results at low friction limits. Although the stress autocorrelation function is different for each integrator, fluctuates rapidly, and gives poor statistics for most of the cases, the calculated shear viscosities still fall within range of theoretical predictions and nonequilibrium studies.

  12. Global Bathymetry: Machine Learning for Data Editing

    NASA Astrophysics Data System (ADS)

    Sandwell, D. T.; Tea, B.; Freund, Y.

    2017-12-01

    The accuracy of global bathymetry depends primarily on the coverage and accuracy of the sounding data and secondarily on the depth predicted from gravity. A main focus of our research is to add newly-available data to the global compilation. Most data sources have 1-12% of erroneous soundings caused by a wide array of blunders and measurement errors. Over the years we have hand-edited this data using undergraduate employees at UCSD (440 million soundings at 500 m resolution). We are developing a machine learning approach to refine the flagging of the older soundings and provide automated editing of newly-acquired soundings. The approach has three main steps: 1) Combine the sounding data with additional information that may inform the machine learning algorithm. The additional parameters include: depth predicted from gravity; distance to the nearest sounding from other cruises; seafloor age; spreading rate; sediment thickness; and vertical gravity gradient. 2) Use available edit decisions as training data sets for a boosted tree algorithm with a binary logistic objective function and L2 regularization. Initial results with poor quality single beam soundings show that the automated algorithm matches the hand-edited data 89% of the time. The results show that most of the information for detecting outliers comes from predicted depth with secondary contributions from distance to the nearest sounding and longitude. A similar analysis using very high quality multibeam data shows that the automated algorithm matches the hand-edited data 93% of the time. Again, most of the information for detecting outliers comes from predicted depth secondary contributions from distance to the nearest sounding and longitude. 3) The third step in the process is to use the machine learning parameters, derived from the training data, to edit 12 million newly acquired single beam sounding data provided by the National Geospatial-Intelligence Agency. The output of the learning algorithm will be confidence ratedindicating which edits the algorithm is confident on and which it is not confident. We expect the majority ( 90%) of edits to be confident and not require human intervention. Human intervention will be required only on the 10% unconfident decisions, thus reducing the amount of human work by a factor of 10 or more.

  13. Application of the Artificial Neural Network model for prediction of monthly Standardized Precipitation and Evapotranspiration Index using hydrometeorological parameters and climate indices in eastern Australia

    NASA Astrophysics Data System (ADS)

    Deo, Ravinesh C.; Şahin, Mehmet

    2015-07-01

    The forecasting of drought based on cumulative influence of rainfall, temperature and evaporation is greatly beneficial for mitigating adverse consequences on water-sensitive sectors such as agriculture, ecosystems, wildlife, tourism, recreation, crop health and hydrologic engineering. Predictive models of drought indices help in assessing water scarcity situations, drought identification and severity characterization. In this paper, we tested the feasibility of the Artificial Neural Network (ANN) as a data-driven model for predicting the monthly Standardized Precipitation and Evapotranspiration Index (SPEI) for eight candidate stations in eastern Australia using predictive variable data from 1915 to 2005 (training) and simulated data for the period 2006-2012. The predictive variables were: monthly rainfall totals, mean temperature, minimum temperature, maximum temperature and evapotranspiration, which were supplemented by large-scale climate indices (Southern Oscillation Index, Pacific Decadal Oscillation, Southern Annular Mode and Indian Ocean Dipole) and the Sea Surface Temperatures (Nino 3.0, 3.4 and 4.0). A total of 30 ANN models were developed with 3-layer ANN networks. To determine the best combination of learning algorithms, hidden transfer and output functions of the optimum model, the Levenberg-Marquardt and Broyden-Fletcher-Goldfarb-Shanno (BFGS) quasi-Newton backpropagation algorithms were utilized to train the network, tangent and logarithmic sigmoid equations used as the activation functions and the linear, logarithmic and tangent sigmoid equations used as the output function. The best ANN architecture had 18 input neurons, 43 hidden neurons and 1 output neuron, trained using the Levenberg-Marquardt learning algorithm using tangent sigmoid equation as the activation and output functions. An evaluation of the model performance based on statistical rules yielded time-averaged Coefficient of Determination, Root Mean Squared Error and the Mean Absolute Error ranging from 0.9945-0.9990, 0.0466-0.1117, and 0.0013-0.0130, respectively for individual stations. Also, the Willmott's Index of Agreement and the Nash-Sutcliffe Coefficient of Efficiency were between 0.932-0.959 and 0.977-0.998, respectively. When checked for the severity (S), duration (D) and peak intensity (I) of drought events determined from the simulated and observed SPEI, differences in drought parameters ranged from - 1.41-0.64%, - 2.17-1.92% and - 3.21-1.21%, respectively. Based on performance evaluation measures, we aver that the Artificial Neural Network model is a useful data-driven tool for forecasting monthly SPEI and its drought-related properties in the region of study.

  14. Motion prediction in MRI-guided radiotherapy based on interleaved orthogonal cine-MRI

    NASA Astrophysics Data System (ADS)

    Seregni, M.; Paganelli, C.; Lee, D.; Greer, P. B.; Baroni, G.; Keall, P. J.; Riboldi, M.

    2016-01-01

    In-room cine-MRI guidance can provide non-invasive target localization during radiotherapy treatment. However, in order to cope with finite imaging frequency and system latencies between target localization and dose delivery, tumour motion prediction is required. This work proposes a framework for motion prediction dedicated to cine-MRI guidance, aiming at quantifying the geometric uncertainties introduced by this process for both tumour tracking and beam gating. The tumour position, identified through scale invariant features detected in cine-MRI slices, is estimated at high-frequency (25 Hz) using three independent predictors, one for each anatomical coordinate. Linear extrapolation, auto-regressive and support vector machine algorithms are compared against systems that use no prediction or surrogate-based motion estimation. Geometric uncertainties are reported as a function of image acquisition period and system latency. Average results show that the tracking error RMS can be decreased down to a [0.2; 1.2] mm range, for acquisition periods between 250 and 750 ms and system latencies between 50 and 300 ms. Except for the linear extrapolator, tracking and gating prediction errors were, on average, lower than those measured for surrogate-based motion estimation. This finding suggests that cine-MRI guidance, combined with appropriate prediction algorithms, could relevantly decrease geometric uncertainties in motion compensated treatments.

  15. Prediction of cardiovascular risk in rheumatoid arthritis: performance of original and adapted SCORE algorithms.

    PubMed

    Arts, E E A; Popa, C D; Den Broeder, A A; Donders, R; Sandoo, A; Toms, T; Rollefstad, S; Ikdahl, E; Semb, A G; Kitas, G D; Van Riel, P L C M; Fransen, J

    2016-04-01

    Predictive performance of cardiovascular disease (CVD) risk calculators appears suboptimal in rheumatoid arthritis (RA). A disease-specific CVD risk algorithm may improve CVD risk prediction in RA. The objectives of this study are to adapt the Systematic COronary Risk Evaluation (SCORE) algorithm with determinants of CVD risk in RA and to assess the accuracy of CVD risk prediction calculated with the adapted SCORE algorithm. Data from the Nijmegen early RA inception cohort were used. The primary outcome was first CVD events. The SCORE algorithm was recalibrated by reweighing included traditional CVD risk factors and adapted by adding other potential predictors of CVD. Predictive performance of the recalibrated and adapted SCORE algorithms was assessed and the adapted SCORE was externally validated. Of the 1016 included patients with RA, 103 patients experienced a CVD event. Discriminatory ability was comparable across the original, recalibrated and adapted SCORE algorithms. The Hosmer-Lemeshow test results indicated that all three algorithms provided poor model fit (p<0.05) for the Nijmegen and external validation cohort. The adapted SCORE algorithm mainly improves CVD risk estimation in non-event cases and does not show a clear advantage in reclassifying patients with RA who develop CVD (event cases) into more appropriate risk groups. This study demonstrates for the first time that adaptations of the SCORE algorithm do not provide sufficient improvement in risk prediction of future CVD in RA to serve as an appropriate alternative to the original SCORE. Risk assessment using the original SCORE algorithm may underestimate CVD risk in patients with RA. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/

  16. Non-equilibrium Green's functions method: Non-trivial and disordered leads

    NASA Astrophysics Data System (ADS)

    He, Yu; Wang, Yu; Klimeck, Gerhard; Kubis, Tillmann

    2014-11-01

    The non-equilibrium Green's function algorithm requires contact self-energies to model charge injection and extraction. All existing approaches assume infinitely periodic leads attached to a possibly quite complex device. This contradicts today's realistic devices in which contacts are spatially inhomogeneous, chemically disordered, and impacting the overall device characteristics. This work extends the complex absorbing potentials method for arbitrary, ideal, or non-ideal leads in atomistic tight binding representation. The algorithm is demonstrated on a Si nanowire with periodic leads, a graphene nanoribbon with trumpet shape leads, and devices with leads of randomly alloyed Si0.5Ge0.5. It is found that alloy randomness in the leads can reduce the predicted ON-state current of Si0.5Ge0.5 transistors by 45% compared to conventional lead methods.

  17. Software tool for data mining and its applications

    NASA Astrophysics Data System (ADS)

    Yang, Jie; Ye, Chenzhou; Chen, Nianyi

    2002-03-01

    A software tool for data mining is introduced, which integrates pattern recognition (PCA, Fisher, clustering, hyperenvelop, regression), artificial intelligence (knowledge representation, decision trees), statistical learning (rough set, support vector machine), computational intelligence (neural network, genetic algorithm, fuzzy systems). It consists of nine function models: pattern recognition, decision trees, association rule, fuzzy rule, neural network, genetic algorithm, Hyper Envelop, support vector machine, visualization. The principle and knowledge representation of some function models of data mining are described. The software tool of data mining is realized by Visual C++ under Windows 2000. Nonmonotony in data mining is dealt with by concept hierarchy and layered mining. The software tool of data mining has satisfactorily applied in the prediction of regularities of the formation of ternary intermetallic compounds in alloy systems, and diagnosis of brain glioma.

  18. Influenza detection and prediction algorithms: comparative accuracy trial in Östergötland county, Sweden, 2008-2012.

    PubMed

    Spreco, A; Eriksson, O; Dahlström, Ö; Timpka, T

    2017-07-01

    Methods for the detection of influenza epidemics and prediction of their progress have seldom been comparatively evaluated using prospective designs. This study aimed to perform a prospective comparative trial of algorithms for the detection and prediction of increased local influenza activity. Data on clinical influenza diagnoses recorded by physicians and syndromic data from a telenursing service were used. Five detection and three prediction algorithms previously evaluated in public health settings were calibrated and then evaluated over 3 years. When applied on diagnostic data, only detection using the Serfling regression method and prediction using the non-adaptive log-linear regression method showed acceptable performances during winter influenza seasons. For the syndromic data, none of the detection algorithms displayed a satisfactory performance, while non-adaptive log-linear regression was the best performing prediction method. We conclude that evidence was found for that available algorithms for influenza detection and prediction display satisfactory performance when applied on local diagnostic data during winter influenza seasons. When applied on local syndromic data, the evaluated algorithms did not display consistent performance. Further evaluations and research on combination of methods of these types in public health information infrastructures for 'nowcasting' (integrated detection and prediction) of influenza activity are warranted.

  19. An improved predictive functional control method with application to PMSM systems

    NASA Astrophysics Data System (ADS)

    Li, Shihua; Liu, Huixian; Fu, Wenshu

    2017-01-01

    In common design of prediction model-based control method, usually disturbances are not considered in the prediction model as well as the control design. For the control systems with large amplitude or strong disturbances, it is difficult to precisely predict the future outputs according to the conventional prediction model, and thus the desired optimal closed-loop performance will be degraded to some extent. To this end, an improved predictive functional control (PFC) method is developed in this paper by embedding disturbance information into the system model. Here, a composite prediction model is thus obtained by embedding the estimated value of disturbances, where disturbance observer (DOB) is employed to estimate the lumped disturbances. So the influence of disturbances on system is taken into account in optimisation procedure. Finally, considering the speed control problem for permanent magnet synchronous motor (PMSM) servo system, a control scheme based on the improved PFC method is designed to ensure an optimal closed-loop performance even in the presence of disturbances. Simulation and experimental results based on a hardware platform are provided to confirm the effectiveness of the proposed algorithm.

  20. Early Benchmarks of Product Generation Capabilities of the GOES-R Ground System for Operational Weather Prediction

    NASA Astrophysics Data System (ADS)

    Kalluri, S. N.; Haman, B.; Vititoe, D.

    2014-12-01

    The ground system under development for Geostationary Operational Environmental Satellite-R (GOES-R) series of weather satellite has completed a key milestone in implementing the science algorithms that process raw sensor data to higher level products in preparation for launch. Real time observations from GOES-R are expected to make significant contributions to Earth and space weather prediction, and there are stringent requirements to product weather products at very low latency to meet NOAA's operational needs. Simulated test data from all the six GOES-R sensors are being processed by the system to test and verify performance of the fielded system. Early results show that the system development is on track to meet functional and performance requirements to process science data. Comparison of science products generated by the ground system from simulated data with those generated by the algorithm developers show close agreement among data sets which demonstrates that the algorithms are implemented correctly. Successful delivery of products to AWIPS and the Product Distribution and Access (PDA) system from the core system demonstrate that the external interfaces are working.

  1. Ternary alloy material prediction using genetic algorithm and cluster expansion

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chen, Chong

    2015-12-01

    This thesis summarizes our study on the crystal structures prediction of Fe-V-Si system using genetic algorithm and cluster expansion. Our goal is to explore and look for new stable compounds. We started from the current ten known experimental phases, and calculated formation energies of those compounds using density functional theory (DFT) package, namely, VASP. The convex hull was generated based on the DFT calculations of the experimental known phases. Then we did random search on some metal rich (Fe and V) compositions and found that the lowest energy structures were body centered cube (bcc) underlying lattice, under which we didmore » our computational systematic searches using genetic algorithm and cluster expansion. Among hundreds of the searched compositions, thirteen were selected and DFT formation energies were obtained by VASP. The stability checking of those thirteen compounds was done in reference to the experimental convex hull. We found that the composition, 24-8-16, i.e., Fe 3VSi 2 is a new stable phase and it can be very inspiring to the future experiments.« less

  2. A dynamic feedforward neural network based on gaussian particle swarm optimization and its application for predictive control.

    PubMed

    Han, Min; Fan, Jianchao; Wang, Jun

    2011-09-01

    A dynamic feedforward neural network (DFNN) is proposed for predictive control, whose adaptive parameters are adjusted by using Gaussian particle swarm optimization (GPSO) in the training process. Adaptive time-delay operators are added in the DFNN to improve its generalization for poorly known nonlinear dynamic systems with long time delays. Furthermore, GPSO adopts a chaotic map with Gaussian function to balance the exploration and exploitation capabilities of particles, which improves the computational efficiency without compromising the performance of the DFNN. The stability of the particle dynamics is analyzed, based on the robust stability theory, without any restrictive assumption. A stability condition for the GPSO+DFNN model is derived, which ensures a satisfactory global search and quick convergence, without the need for gradients. The particle velocity ranges could change adaptively during the optimization process. The results of a comparative study show that the performance of the proposed algorithm can compete with selected algorithms on benchmark problems. Additional simulation results demonstrate the effectiveness and accuracy of the proposed combination algorithm in identifying and controlling nonlinear systems with long time delays.

  3. Multi-task feature selection in microarray data by binary integer programming.

    PubMed

    Lan, Liang; Vucetic, Slobodan

    2013-12-20

    A major challenge in microarray classification is that the number of features is typically orders of magnitude larger than the number of examples. In this paper, we propose a novel feature filter algorithm to select the feature subset with maximal discriminative power and minimal redundancy by solving a quadratic objective function with binary integer constraints. To improve the computational efficiency, the binary integer constraints are relaxed and a low-rank approximation to the quadratic term is applied. The proposed feature selection algorithm was extended to solve multi-task microarray classification problems. We compared the single-task version of the proposed feature selection algorithm with 9 existing feature selection methods on 4 benchmark microarray data sets. The empirical results show that the proposed method achieved the most accurate predictions overall. We also evaluated the multi-task version of the proposed algorithm on 8 multi-task microarray datasets. The multi-task feature selection algorithm resulted in significantly higher accuracy than when using the single-task feature selection methods.

  4. Verification studies of Seasat-A satellite scatterometer /SASS/ measurements

    NASA Technical Reports Server (NTRS)

    Halberstam, I.

    1981-01-01

    Two comparisons between Seasat-A satellite scatterometer (SASS) data and surface truth, obtained from the Gulf of Alaska Seasat Experiment and the Joint Air-Sea Interaction program, have been made to determine the behavior of SASS and its algorithms. The performance of SASS was first evaluated irrespective of the algorithms employed to convert the SASS data to geophysical parameters, which was done by separating the backscatter measurements into small bins of incidence and azimuth angles and polarity and regression against wind speed measurements. The algorithms were then tested by comparing their predicted slopes and y intercepts with those derived from the regressions, and by comparing each SASS backscatter measurement with the backscatter derived from the algorithms, and the given wind velocity from the observations. It was shown that SASS was insensitive to winds at high incidence angles for horizontal polarizations. Fairly high correlations were found between backscatter and wind speeds. The algorithms functioned well at mid-ranges of incidence angle and backscattering coefficient.

  5. Real-Time Adaptive Control of Flow-Induced Cavity Tones

    NASA Technical Reports Server (NTRS)

    Kegerise, Michael A.; Cabell, Randolph H.; Cattafesta, Louis N.

    2004-01-01

    An adaptive generalized predictive control (GPC) algorithm was formulated and applied to the cavity flow-tone problem. The algorithm employs gradient descent to update the GPC coefficients at each time step. The adaptive control algorithm demonstrated multiple Rossiter mode suppression at fixed Mach numbers ranging from 0.275 to 0.38. The algorithm was also able t o maintain suppression of multiple cavity tones as the freestream Mach number was varied over a modest range (0.275 to 0.29). Controller performance was evaluated with a measure of output disturbance rejection and an input sensitivity transfer function. The results suggest that disturbances entering the cavity flow are colocated with the control input at the cavity leading edge. In that case, only tonal components of the cavity wall-pressure fluctuations can be suppressed and arbitrary broadband pressure reduction is not possible. In the control-algorithm development, the cavity dynamics are treated as linear and time invariant (LTI) for a fixed Mach number. The experimental results lend support this treatment.

  6. A statistical framework for evaluating neural networks to predict recurrent events in breast cancer

    NASA Astrophysics Data System (ADS)

    Gorunescu, Florin; Gorunescu, Marina; El-Darzi, Elia; Gorunescu, Smaranda

    2010-07-01

    Breast cancer is the second leading cause of cancer deaths in women today. Sometimes, breast cancer can return after primary treatment. A medical diagnosis of recurrent cancer is often a more challenging task than the initial one. In this paper, we investigate the potential contribution of neural networks (NNs) to support health professionals in diagnosing such events. The NN algorithms are tested and applied to two different datasets. An extensive statistical analysis has been performed to verify our experiments. The results show that a simple network structure for both the multi-layer perceptron and radial basis function can produce equally good results, not all attributes are needed to train these algorithms and, finally, the classification performances of all algorithms are statistically robust. Moreover, we have shown that the best performing algorithm will strongly depend on the features of the datasets, and hence, there is not necessarily a single best classifier.

  7. NWRA AVOSS Wake Vortex Prediction Algorithm. 3.1.1

    NASA Technical Reports Server (NTRS)

    Robins, R. E.; Delisi, D. P.; Hinton, David (Technical Monitor)

    2002-01-01

    This report provides a detailed description of the wake vortex prediction algorithm used in the Demonstration Version of NASA's Aircraft Vortex Spacing System (AVOSS). The report includes all equations used in the algorithm, an explanation of how to run the algorithm, and a discussion of how the source code for the algorithm is organized. Several appendices contain important supplementary information, including suggestions for enhancing the algorithm and results from test cases.

  8. Curvelet-domain multiple matching method combined with cubic B-spline function

    NASA Astrophysics Data System (ADS)

    Wang, Tong; Wang, Deli; Tian, Mi; Hu, Bin; Liu, Chengming

    2018-05-01

    Since the large amount of surface-related multiple existed in the marine data would influence the results of data processing and interpretation seriously, many researchers had attempted to develop effective methods to remove them. The most successful surface-related multiple elimination method was proposed based on data-driven theory. However, the elimination effect was unsatisfactory due to the existence of amplitude and phase errors. Although the subsequent curvelet-domain multiple-primary separation method achieved better results, poor computational efficiency prevented its application. In this paper, we adopt the cubic B-spline function to improve the traditional curvelet multiple matching method. First, select a little number of unknowns as the basis points of the matching coefficient; second, apply the cubic B-spline function on these basis points to reconstruct the matching array; third, build constraint solving equation based on the relationships of predicted multiple, matching coefficients, and actual data; finally, use the BFGS algorithm to iterate and realize the fast-solving sparse constraint of multiple matching algorithm. Moreover, the soft-threshold method is used to make the method perform better. With the cubic B-spline function, the differences between predicted multiple and original data diminish, which results in less processing time to obtain optimal solutions and fewer iterative loops in the solving procedure based on the L1 norm constraint. The applications to synthetic and field-derived data both validate the practicability and validity of the method.

  9. Comparison of two optimization algorithms for fuzzy finite element model updating for damage detection in a wind turbine blade

    NASA Astrophysics Data System (ADS)

    Turnbull, Heather; Omenzetter, Piotr

    2018-03-01

    vDifficulties associated with current health monitoring and inspection practices combined with harsh, often remote, operational environments of wind turbines highlight the requirement for a non-destructive evaluation system capable of remotely monitoring the current structural state of turbine blades. This research adopted a physics based structural health monitoring methodology through calibration of a finite element model using inverse techniques. A 2.36m blade from a 5kW turbine was used as an experimental specimen, with operational modal analysis techniques utilised to realize the modal properties of the system. Modelling the experimental responses as fuzzy numbers using the sub-level technique, uncertainty in the response parameters was propagated back through the model and into the updating parameters. Initially, experimental responses of the blade were obtained, with a numerical model of the blade created and updated. Deterministic updating was carried out through formulation and minimisation of a deterministic objective function using both firefly algorithm and virus optimisation algorithm. Uncertainty in experimental responses were modelled using triangular membership functions, allowing membership functions of updating parameters (Young's modulus and shear modulus) to be obtained. Firefly algorithm and virus optimisation algorithm were again utilised, however, this time in the solution of fuzzy objective functions. This enabled uncertainty associated with updating parameters to be quantified. Varying damage location and severity was simulated experimentally through addition of small masses to the structure intended to cause a structural alteration. A damaged model was created, modelling four variable magnitude nonstructural masses at predefined points and updated to provide a deterministic damage prediction and information in relation to the parameters uncertainty via fuzzy updating.

  10. Safe Exploration Algorithms for Reinforcement Learning Controllers.

    PubMed

    Mannucci, Tommaso; van Kampen, Erik-Jan; de Visser, Cornelis; Chu, Qiping

    2018-04-01

    Self-learning approaches, such as reinforcement learning, offer new possibilities for autonomous control of uncertain or time-varying systems. However, exploring an unknown environment under limited prediction capabilities is a challenge for a learning agent. If the environment is dangerous, free exploration can result in physical damage or in an otherwise unacceptable behavior. With respect to existing methods, the main contribution of this paper is the definition of a new approach that does not require global safety functions, nor specific formulations of the dynamics or of the environment, but relies on interval estimation of the dynamics of the agent during the exploration phase, assuming a limited capability of the agent to perceive the presence of incoming fatal states. Two algorithms are presented with this approach. The first is the Safety Handling Exploration with Risk Perception Algorithm (SHERPA), which provides safety by individuating temporary safety functions, called backups. SHERPA is shown in a simulated, simplified quadrotor task, for which dangerous states are avoided. The second algorithm, denominated OptiSHERPA, can safely handle more dynamically complex systems for which SHERPA is not sufficient through the use of safety metrics. An application of OptiSHERPA is simulated on an aircraft altitude control task.

  11. An Algorithm for Protein Helix Assignment Using Helix Geometry

    PubMed Central

    Cao, Chen; Xu, Shutan; Wang, Lincong

    2015-01-01

    Helices are one of the most common and were among the earliest recognized secondary structure elements in proteins. The assignment of helices in a protein underlies the analysis of its structure and function. Though the mathematical expression for a helical curve is simple, no previous assignment programs have used a genuine helical curve as a model for helix assignment. In this paper we present a two-step assignment algorithm. The first step searches for a series of bona fide helical curves each one best fits the coordinates of four successive backbone Cα atoms. The second step uses the best fit helical curves as input to make helix assignment. The application to the protein structures in the PDB (protein data bank) proves that the algorithm is able to assign accurately not only regular α-helix but also 310 and π helices as well as their left-handed versions. One salient feature of the algorithm is that the assigned helices are structurally more uniform than those by the previous programs. The structural uniformity should be useful for protein structure classification and prediction while the accurate assignment of a helix to a particular type underlies structure-function relationship in proteins. PMID:26132394

  12. Predicting Loss-of-Control Boundaries Toward a Piloting Aid

    NASA Technical Reports Server (NTRS)

    Barlow, Jonathan; Stepanyan, Vahram; Krishnakumar, Kalmanje

    2012-01-01

    This work presents an approach to predicting loss-of-control with the goal of providing the pilot a decision aid focused on maintaining the pilot's control action within predicted loss-of-control boundaries. The predictive architecture combines quantitative loss-of-control boundaries, a data-based predictive control boundary estimation algorithm and an adaptive prediction method to estimate Markov model parameters in real-time. The data-based loss-of-control boundary estimation algorithm estimates the boundary of a safe set of control inputs that will keep the aircraft within the loss-of-control boundaries for a specified time horizon. The adaptive prediction model generates estimates of the system Markov Parameters, which are used by the data-based loss-of-control boundary estimation algorithm. The combined algorithm is applied to a nonlinear generic transport aircraft to illustrate the features of the architecture.

  13. A Framework for Integrating Multiple Biological Networks to Predict MicroRNA-Disease Associations.

    PubMed

    Peng, Wei; Lan, Wei; Yu, Zeng; Wang, Jianxin; Pan, Yi

    2017-03-01

    MicroRNAs have close relationship with human diseases. Therefore, identifying disease related MicroRNAs plays an important role in disease diagnosis, prognosis and therapy. However, designing an effective computational method which can make good use of various biological resources and correctly predict the associations between MicroRNA and disease is still a big challenge. Previous researchers have pointed out that there are complex relationships among microRNAs, diseases and environment factors. There are inter-relationships between microRNAs, diseases or environment factors based on their functional similarity or phenotype similarity or chemical structure similarity and so on. There are also intra-relationships between microRNAs and diseases, microRNAs and environment factors, diseases and environment factors. Moreover, functionally similar microRNAs tend to associate with common diseases and common environment factors. The diseases with similar phenotypes are likely caused by common microRNAs and common environment factors. In this work, we propose a framework namely ThrRWMDE which can integrate these complex relationships to predict microRNA-disease associations. In this framework, microRNA similarity network (MFN), disease similarity network (DSN) and environmental factor similarity network (ESN) are constructed according to certain biological properties. Then, an unbalanced three random walking algorithm is implemented on the three networks so as to obtain information from neighbors in corresponding networks. This algorithm not only can flexibly infer information from different levels of neighbors with respect to the topological and structural differences of the three networks, but also in the course of working the functional information will be transferred from one network to another according to the associations between the nodes in different networks. The results of experiment show that our method achieves better prediction performance than other state-of-the-art methods.

  14. Diagnosis and prediction of periodontally compromised teeth using a deep learning-based convolutional neural network algorithm.

    PubMed

    Lee, Jae-Hong; Kim, Do-Hyung; Jeong, Seong-Nyum; Choi, Seong-Ho

    2018-04-01

    The aim of the current study was to develop a computer-assisted detection system based on a deep convolutional neural network (CNN) algorithm and to evaluate the potential usefulness and accuracy of this system for the diagnosis and prediction of periodontally compromised teeth (PCT). Combining pretrained deep CNN architecture and a self-trained network, periapical radiographic images were used to determine the optimal CNN algorithm and weights. The diagnostic and predictive accuracy, sensitivity, specificity, positive predictive value, negative predictive value, receiver operating characteristic (ROC) curve, area under the ROC curve, confusion matrix, and 95% confidence intervals (CIs) were calculated using our deep CNN algorithm, based on a Keras framework in Python. The periapical radiographic dataset was split into training (n=1,044), validation (n=348), and test (n=348) datasets. With the deep learning algorithm, the diagnostic accuracy for PCT was 81.0% for premolars and 76.7% for molars. Using 64 premolars and 64 molars that were clinically diagnosed as severe PCT, the accuracy of predicting extraction was 82.8% (95% CI, 70.1%-91.2%) for premolars and 73.4% (95% CI, 59.9%-84.0%) for molars. We demonstrated that the deep CNN algorithm was useful for assessing the diagnosis and predictability of PCT. Therefore, with further optimization of the PCT dataset and improvements in the algorithm, a computer-aided detection system can be expected to become an effective and efficient method of diagnosing and predicting PCT.

  15. A systematic investigation of computation models for predicting Adverse Drug Reactions (ADRs).

    PubMed

    Kuang, Qifan; Wang, MinQi; Li, Rong; Dong, YongCheng; Li, Yizhou; Li, Menglong

    2014-01-01

    Early and accurate identification of adverse drug reactions (ADRs) is critically important for drug development and clinical safety. Computer-aided prediction of ADRs has attracted increasing attention in recent years, and many computational models have been proposed. However, because of the lack of systematic analysis and comparison of the different computational models, there remain limitations in designing more effective algorithms and selecting more useful features. There is therefore an urgent need to review and analyze previous computation models to obtain general conclusions that can provide useful guidance to construct more effective computational models to predict ADRs. In the current study, the main work is to compare and analyze the performance of existing computational methods to predict ADRs, by implementing and evaluating additional algorithms that have been earlier used for predicting drug targets. Our results indicated that topological and intrinsic features were complementary to an extent and the Jaccard coefficient had an important and general effect on the prediction of drug-ADR associations. By comparing the structure of each algorithm, final formulas of these algorithms were all converted to linear model in form, based on this finding we propose a new algorithm called the general weighted profile method and it yielded the best overall performance among the algorithms investigated in this paper. Several meaningful conclusions and useful findings regarding the prediction of ADRs are provided for selecting optimal features and algorithms.

  16. Data-directed RNA secondary structure prediction using probabilistic modeling

    PubMed Central

    Deng, Fei; Ledda, Mirko; Vaziri, Sana; Aviran, Sharon

    2016-01-01

    Structure dictates the function of many RNAs, but secondary RNA structure analysis is either labor intensive and costly or relies on computational predictions that are often inaccurate. These limitations are alleviated by integration of structure probing data into prediction algorithms. However, existing algorithms are optimized for a specific type of probing data. Recently, new chemistries combined with advances in sequencing have facilitated structure probing at unprecedented scale and sensitivity. These novel technologies and anticipated wealth of data highlight a need for algorithms that readily accommodate more complex and diverse input sources. We implemented and investigated a recently outlined probabilistic framework for RNA secondary structure prediction and extended it to accommodate further refinement of structural information. This framework utilizes direct likelihood-based calculations of pseudo-energy terms per considered structural context and can readily accommodate diverse data types and complex data dependencies. We use real data in conjunction with simulations to evaluate performances of several implementations and to show that proper integration of structural contexts can lead to improvements. Our tests also reveal discrepancies between real data and simulations, which we show can be alleviated by refined modeling. We then propose statistical preprocessing approaches to standardize data interpretation and integration into such a generic framework. We further systematically quantify the information content of data subsets, demonstrating that high reactivities are major drivers of SHAPE-directed predictions and that better understanding of less informative reactivities is key to further improvements. Finally, we provide evidence for the adaptive capability of our framework using mock probe simulations. PMID:27251549

  17. HUMAN DECISIONS AND MACHINE PREDICTIONS.

    PubMed

    Kleinberg, Jon; Lakkaraju, Himabindu; Leskovec, Jure; Ludwig, Jens; Mullainathan, Sendhil

    2018-02-01

    Can machine learning improve human decision making? Bail decisions provide a good test case. Millions of times each year, judges make jail-or-release decisions that hinge on a prediction of what a defendant would do if released. The concreteness of the prediction task combined with the volume of data available makes this a promising machine-learning application. Yet comparing the algorithm to judges proves complicated. First, the available data are generated by prior judge decisions. We only observe crime outcomes for released defendants, not for those judges detained. This makes it hard to evaluate counterfactual decision rules based on algorithmic predictions. Second, judges may have a broader set of preferences than the variable the algorithm predicts; for instance, judges may care specifically about violent crimes or about racial inequities. We deal with these problems using different econometric strategies, such as quasi-random assignment of cases to judges. Even accounting for these concerns, our results suggest potentially large welfare gains: one policy simulation shows crime reductions up to 24.7% with no change in jailing rates, or jailing rate reductions up to 41.9% with no increase in crime rates. Moreover, all categories of crime, including violent crimes, show reductions; and these gains can be achieved while simultaneously reducing racial disparities. These results suggest that while machine learning can be valuable, realizing this value requires integrating these tools into an economic framework: being clear about the link between predictions and decisions; specifying the scope of payoff functions; and constructing unbiased decision counterfactuals. JEL Codes: C10 (Econometric and statistical methods and methodology), C55 (Large datasets: Modeling and analysis), K40 (Legal procedure, the legal system, and illegal behavior).

  18. HUMAN DECISIONS AND MACHINE PREDICTIONS*

    PubMed Central

    Kleinberg, Jon; Lakkaraju, Himabindu; Leskovec, Jure; Ludwig, Jens; Mullainathan, Sendhil

    2018-01-01

    Can machine learning improve human decision making? Bail decisions provide a good test case. Millions of times each year, judges make jail-or-release decisions that hinge on a prediction of what a defendant would do if released. The concreteness of the prediction task combined with the volume of data available makes this a promising machine-learning application. Yet comparing the algorithm to judges proves complicated. First, the available data are generated by prior judge decisions. We only observe crime outcomes for released defendants, not for those judges detained. This makes it hard to evaluate counterfactual decision rules based on algorithmic predictions. Second, judges may have a broader set of preferences than the variable the algorithm predicts; for instance, judges may care specifically about violent crimes or about racial inequities. We deal with these problems using different econometric strategies, such as quasi-random assignment of cases to judges. Even accounting for these concerns, our results suggest potentially large welfare gains: one policy simulation shows crime reductions up to 24.7% with no change in jailing rates, or jailing rate reductions up to 41.9% with no increase in crime rates. Moreover, all categories of crime, including violent crimes, show reductions; and these gains can be achieved while simultaneously reducing racial disparities. These results suggest that while machine learning can be valuable, realizing this value requires integrating these tools into an economic framework: being clear about the link between predictions and decisions; specifying the scope of payoff functions; and constructing unbiased decision counterfactuals. JEL Codes: C10 (Econometric and statistical methods and methodology), C55 (Large datasets: Modeling and analysis), K40 (Legal procedure, the legal system, and illegal behavior) PMID:29755141

  19. Using Claims Data to Predict Dependency in Activities of Daily Living as a Proxy for Frailty

    PubMed Central

    Faurot, Keturah R.; Funk, Michele Jonsson; Pate, Virginia; Brookhart, M. Alan; Patrick, Amanda; Hanson, Laura C.; Castillo, Wendy Camelo; Stürmer, Til

    2014-01-01

    Purpose Estimating drug effectiveness and safety among older adults in population-based studies using administrative healthcare claims can be hampered by unmeasured confounding due to frailty. A claims-based algorithm that identifies patients likely to be dependent, a proxy for frailty, may improve confounding control. Our objective was to develop an algorithm to predict dependency in activities of daily living (ADL) in a sample of Medicare beneficiaries. Methods Community-dwelling respondents to the 2006 Medicare Current Beneficiary Survey, >65 years old, with Medicare Part A, B, home health, and hospice claims were included. ADL dependency was defined as needing help with bathing, eating, walking, dressing, toileting, or transferring. Potential predictors were demographics, ICD-9 diagnosis/procedure and durable medical equipment codes for frailty-associated conditions. Multivariable logistic regression was to predict ADL dependency. Cox models estimated hazard ratios for death as a function of observed and predicted ADL dependency. Results Of 6391 respondents, 57% were female, 88% white, and 38% were ≥80. The prevalence of ADL dependency was 9.5%. Strong predictors of ADL dependency were charges for a home hospital bed (OR=5.44, 95% CI=3.28–9.03) and wheelchair (OR=3.91, 95% CI=2.78–5.51). The c-statistic of the final model was 0.845. Model-predicted ADL dependency of 20% or greater was associated with a hazard ratio for death of 3.19 (95% CI: 2.78, 3.68). Conclusions An algorithm for predicting ADL dependency using healthcare claims was developed to measure some aspects of frailty. Accounting for variation in frailty among older adults could lead to more valid conclusions about treatment use, safety, and effectiveness. PMID:25335470

  20. Prediction of gene-phenotype associations in humans, mice, and plants using phenologs.

    PubMed

    Woods, John O; Singh-Blom, Ulf Martin; Laurent, Jon M; McGary, Kriston L; Marcotte, Edward M

    2013-06-21

    Phenotypes and diseases may be related to seemingly dissimilar phenotypes in other species by means of the orthology of underlying genes. Such "orthologous phenotypes," or "phenologs," are examples of deep homology, and may be used to predict additional candidate disease genes. In this work, we develop an unsupervised algorithm for ranking phenolog-based candidate disease genes through the integration of predictions from the k nearest neighbor phenologs, comparing classifiers and weighting functions by cross-validation. We also improve upon the original method by extending the theory to paralogous phenotypes. Our algorithm makes use of additional phenotype data--from chicken, zebrafish, and E. coli, as well as new datasets for C. elegans--establishing that several types of annotations may be treated as phenotypes. We demonstrate the use of our algorithm to predict novel candidate genes for human atrial fibrillation (such as HRH2, ATP4A, ATP4B, and HOPX) and epilepsy (e.g., PAX6 and NKX2-1). We suggest gene candidates for pharmacologically-induced seizures in mouse, solely based on orthologous phenotypes from E. coli. We also explore the prediction of plant gene-phenotype associations, as for the Arabidopsis response to vernalization phenotype. We are able to rank gene predictions for a significant portion of the diseases in the Online Mendelian Inheritance in Man database. Additionally, our method suggests candidate genes for mammalian seizures based only on bacterial phenotypes and gene orthology. We demonstrate that phenotype information may come from diverse sources, including drug sensitivities, gene ontology biological processes, and in situ hybridization annotations. Finally, we offer testable candidates for a variety of human diseases, plant traits, and other classes of phenotypes across a wide array of species.

  1. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Xu Xiaoying; Ho, Shirley; Trac, Hy

    We investigate machine learning (ML) techniques for predicting the number of galaxies (N{sub gal}) that occupy a halo, given the halo's properties. These types of mappings are crucial for constructing the mock galaxy catalogs necessary for analyses of large-scale structure. The ML techniques proposed here distinguish themselves from traditional halo occupation distribution (HOD) modeling as they do not assume a prescribed relationship between halo properties and N{sub gal}. In addition, our ML approaches are only dependent on parent halo properties (like HOD methods), which are advantageous over subhalo-based approaches as identifying subhalos correctly is difficult. We test two algorithms: supportmore » vector machines (SVM) and k-nearest-neighbor (kNN) regression. We take galaxies and halos from the Millennium simulation and predict N{sub gal} by training our algorithms on the following six halo properties: number of particles, M{sub 200}, {sigma}{sub v}, v{sub max}, half-mass radius, and spin. For Millennium, our predicted N{sub gal} values have a mean-squared error (MSE) of {approx}0.16 for both SVM and kNN. Our predictions match the overall distribution of halos reasonably well and the galaxy correlation function at large scales to {approx}5%-10%. In addition, we demonstrate a feature selection algorithm to isolate the halo parameters that are most predictive, a useful technique for understanding the mapping between halo properties and N{sub gal}. Lastly, we investigate these ML-based approaches in making mock catalogs for different galaxy subpopulations (e.g., blue, red, high M{sub star}, low M{sub star}). Given its non-parametric nature as well as its powerful predictive and feature selection capabilities, ML offers an interesting alternative for creating mock catalogs.« less

  2. An improved reversible data hiding algorithm based on modification of prediction errors

    NASA Astrophysics Data System (ADS)

    Jafar, Iyad F.; Hiary, Sawsan A.; Darabkh, Khalid A.

    2014-04-01

    Reversible data hiding algorithms are concerned with the ability of hiding data and recovering the original digital image upon extraction. This issue is of interest in medical and military imaging applications. One particular class of such algorithms relies on the idea of histogram shifting of prediction errors. In this paper, we propose an improvement over one popular algorithm in this class. The improvement is achieved by employing a different predictor, the use of more bins in the prediction error histogram in addition to multilevel embedding. The proposed extension shows significant improvement over the original algorithm and its variations.

  3. An accelerated non-Gaussianity based multichannel predictive deconvolution method with the limited supporting region of filters

    NASA Astrophysics Data System (ADS)

    Li, Zhong-xiao; Li, Zhen-chun

    2016-09-01

    The multichannel predictive deconvolution can be conducted in overlapping temporal and spatial data windows to solve the 2D predictive filter for multiple removal. Generally, the 2D predictive filter can better remove multiples at the cost of more computation time compared with the 1D predictive filter. In this paper we first use the cross-correlation strategy to determine the limited supporting region of filters where the coefficients play a major role for multiple removal in the filter coefficient space. To solve the 2D predictive filter the traditional multichannel predictive deconvolution uses the least squares (LS) algorithm, which requires primaries and multiples are orthogonal. To relax the orthogonality assumption the iterative reweighted least squares (IRLS) algorithm and the fast iterative shrinkage thresholding (FIST) algorithm have been used to solve the 2D predictive filter in the multichannel predictive deconvolution with the non-Gaussian maximization (L1 norm minimization) constraint of primaries. The FIST algorithm has been demonstrated as a faster alternative to the IRLS algorithm. In this paper we introduce the FIST algorithm to solve the filter coefficients in the limited supporting region of filters. Compared with the FIST based multichannel predictive deconvolution without the limited supporting region of filters the proposed method can reduce the computation burden effectively while achieving a similar accuracy. Additionally, the proposed method can better balance multiple removal and primary preservation than the traditional LS based multichannel predictive deconvolution and FIST based single channel predictive deconvolution. Synthetic and field data sets demonstrate the effectiveness of the proposed method.

  4. Stock price change rate prediction by utilizing social network activities.

    PubMed

    Deng, Shangkun; Mitsubuchi, Takashi; Sakurai, Akito

    2014-01-01

    Predicting stock price change rates for providing valuable information to investors is a challenging task. Individual participants may express their opinions in social network service (SNS) before or after their transactions in the market; we hypothesize that stock price change rate is better predicted by a function of social network service activities and technical indicators than by a function of just stock market activities. The hypothesis is tested by accuracy of predictions as well as performance of simulated trading because success or failure of prediction is better measured by profits or losses the investors gain or suffer. In this paper, we propose a hybrid model that combines multiple kernel learning (MKL) and genetic algorithm (GA). MKL is adopted to optimize the stock price change rate prediction models that are expressed in a multiple kernel linear function of different types of features extracted from different sources. GA is used to optimize the trading rules used in the simulated trading by fusing the return predictions and values of three well-known overbought and oversold technical indicators. Accumulated return and Sharpe ratio were used to test the goodness of performance of the simulated trading. Experimental results show that our proposed model performed better than other models including ones using state of the art techniques.

  5. Stock Price Change Rate Prediction by Utilizing Social Network Activities

    PubMed Central

    Mitsubuchi, Takashi; Sakurai, Akito

    2014-01-01

    Predicting stock price change rates for providing valuable information to investors is a challenging task. Individual participants may express their opinions in social network service (SNS) before or after their transactions in the market; we hypothesize that stock price change rate is better predicted by a function of social network service activities and technical indicators than by a function of just stock market activities. The hypothesis is tested by accuracy of predictions as well as performance of simulated trading because success or failure of prediction is better measured by profits or losses the investors gain or suffer. In this paper, we propose a hybrid model that combines multiple kernel learning (MKL) and genetic algorithm (GA). MKL is adopted to optimize the stock price change rate prediction models that are expressed in a multiple kernel linear function of different types of features extracted from different sources. GA is used to optimize the trading rules used in the simulated trading by fusing the return predictions and values of three well-known overbought and oversold technical indicators. Accumulated return and Sharpe ratio were used to test the goodness of performance of the simulated trading. Experimental results show that our proposed model performed better than other models including ones using state of the art techniques. PMID:24790586

  6. Transient and asymptotic behaviour of the binary breakage problem

    NASA Astrophysics Data System (ADS)

    Mantzaris, Nikos V.

    2005-06-01

    The general binary breakage problem with power-law breakage functions and two families of symmetric and asymmetric breakage kernels is studied in this work. A useful transformation leads to an equation that predicts self-similar solutions in its asymptotic limit and offers explicit knowledge of the mean size and particle density at each point in dimensionless time. A novel moving boundary algorithm in the transformed coordinate system is developed, allowing the accurate prediction of the full transient behaviour of the system from the initial condition up to the point where self-similarity is achieved, and beyond if necessary. The numerical algorithm is very rapid and its results are in excellent agreement with known analytical solutions. In the case of the symmetric breakage kernels only unimodal, self-similar number density functions are obtained asymptotically for all parameter values and independent of the initial conditions, while in the case of asymmetric breakage kernels, bimodality appears for high degrees of asymmetry and sharp breakage functions. For symmetric and discrete breakage kernels, self-similarity is not achieved. The solution exhibits sustained oscillations with amplitude that depends on the initial condition and the sharpness of the breakage mechanism, while the period is always fixed and equal to ln 2 with respect to dimensionless time.

  7. An Integrated Framework Advancing Membrane Protein Modeling and Design

    PubMed Central

    Weitzner, Brian D.; Duran, Amanda M.; Tilley, Drew C.; Elazar, Assaf; Gray, Jeffrey J.

    2015-01-01

    Membrane proteins are critical functional molecules in the human body, constituting more than 30% of open reading frames in the human genome. Unfortunately, a myriad of difficulties in overexpression and reconstitution into membrane mimetics severely limit our ability to determine their structures. Computational tools are therefore instrumental to membrane protein structure prediction, consequently increasing our understanding of membrane protein function and their role in disease. Here, we describe a general framework facilitating membrane protein modeling and design that combines the scientific principles for membrane protein modeling with the flexible software architecture of Rosetta3. This new framework, called RosettaMP, provides a general membrane representation that interfaces with scoring, conformational sampling, and mutation routines that can be easily combined to create new protocols. To demonstrate the capabilities of this implementation, we developed four proof-of-concept applications for (1) prediction of free energy changes upon mutation; (2) high-resolution structural refinement; (3) protein-protein docking; and (4) assembly of symmetric protein complexes, all in the membrane environment. Preliminary data show that these algorithms can produce meaningful scores and structures. The data also suggest needed improvements to both sampling routines and score functions. Importantly, the applications collectively demonstrate the potential of combining the flexible nature of RosettaMP with the power of Rosetta algorithms to facilitate membrane protein modeling and design. PMID:26325167

  8. Improved hybrid optimization algorithm for 3D protein structure prediction.

    PubMed

    Zhou, Changjun; Hou, Caixia; Wei, Xiaopeng; Zhang, Qiang

    2014-07-01

    A new improved hybrid optimization algorithm - PGATS algorithm, which is based on toy off-lattice model, is presented for dealing with three-dimensional protein structure prediction problems. The algorithm combines the particle swarm optimization (PSO), genetic algorithm (GA), and tabu search (TS) algorithms. Otherwise, we also take some different improved strategies. The factor of stochastic disturbance is joined in the particle swarm optimization to improve the search ability; the operations of crossover and mutation that are in the genetic algorithm are changed to a kind of random liner method; at last tabu search algorithm is improved by appending a mutation operator. Through the combination of a variety of strategies and algorithms, the protein structure prediction (PSP) in a 3D off-lattice model is achieved. The PSP problem is an NP-hard problem, but the problem can be attributed to a global optimization problem of multi-extremum and multi-parameters. This is the theoretical principle of the hybrid optimization algorithm that is proposed in this paper. The algorithm combines local search and global search, which overcomes the shortcoming of a single algorithm, giving full play to the advantage of each algorithm. In the current universal standard sequences, Fibonacci sequences and real protein sequences are certified. Experiments show that the proposed new method outperforms single algorithms on the accuracy of calculating the protein sequence energy value, which is proved to be an effective way to predict the structure of proteins.

  9. With or without you: predictive coding and Bayesian inference in the brain

    PubMed Central

    Aitchison, Laurence; Lengyel, Máté

    2018-01-01

    Two theoretical ideas have emerged recently with the ambition to provide a unifying functional explanation of neural population coding and dynamics: predictive coding and Bayesian inference. Here, we describe the two theories and their combination into a single framework: Bayesian predictive coding. We clarify how the two theories can be distinguished, despite sharing core computational concepts and addressing an overlapping set of empirical phenomena. We argue that predictive coding is an algorithmic / representational motif that can serve several different computational goals of which Bayesian inference is but one. Conversely, while Bayesian inference can utilize predictive coding, it can also be realized by a variety of other representations. We critically evaluate the experimental evidence supporting Bayesian predictive coding and discuss how to test it more directly. PMID:28942084

  10. Three-dimensional spatial analysis of missense variants in RTEL1 identifies pathogenic variants in patients with Familial Interstitial Pneumonia.

    PubMed

    Sivley, R Michael; Sheehan, Jonathan H; Kropski, Jonathan A; Cogan, Joy; Blackwell, Timothy S; Phillips, John A; Bush, William S; Meiler, Jens; Capra, John A

    2018-01-23

    Next-generation sequencing of individuals with genetic diseases often detects candidate rare variants in numerous genes, but determining which are causal remains challenging. We hypothesized that the spatial distribution of missense variants in protein structures contains information about function and pathogenicity that can help prioritize variants of unknown significance (VUS) and elucidate the structural mechanisms leading to disease. To illustrate this approach in a clinical application, we analyzed 13 candidate missense variants in regulator of telomere elongation helicase 1 (RTEL1) identified in patients with Familial Interstitial Pneumonia (FIP). We curated pathogenic and neutral RTEL1 variants from the literature and public databases. We then used homology modeling to construct a 3D structural model of RTEL1 and mapped known variants into this structure. We next developed a pathogenicity prediction algorithm based on proximity to known disease causing and neutral variants and evaluated its performance with leave-one-out cross-validation. We further validated our predictions with segregation analyses, telomere lengths, and mutagenesis data from the homologous XPD protein. Our algorithm for classifying RTEL1 VUS based on spatial proximity to pathogenic and neutral variation accurately distinguished 7 known pathogenic from 29 neutral variants (ROC AUC = 0.85) in the N-terminal domains of RTEL1. Pathogenic proximity scores were also significantly correlated with effects on ATPase activity (Pearson r = -0.65, p = 0.0004) in XPD, a related helicase. Applying the algorithm to 13 VUS identified from sequencing of RTEL1 from patients predicted five out of six disease-segregating VUS to be pathogenic. We provide structural hypotheses regarding how these mutations may disrupt RTEL1 ATPase and helicase function. Spatial analysis of missense variation accurately classified candidate VUS in RTEL1 and suggests how such variants cause disease. Incorporating spatial proximity analyses into other pathogenicity prediction tools may improve accuracy for other genes and genetic diseases.

  11. Premorbid functional development and conversion to psychosis in clinical high-risk youths

    PubMed Central

    Tarbox, Sarah I.; Addington, Jean; Cadenhead, Kristin S.; Cannon, Tyrone D.; Cornblatt, Barbara A.; Perkins, Diana O.; Seidman, Larry J.; Tsuang, Ming T.; Walker, Elaine F.; Heinssen, Robert; Mcglashan, Thomas H.; Woods, Scott W.

    2014-01-01

    Deterioration in premorbid functioning is a common feature of schizophrenia, but sensitivity to psychosis conversion among clinical high-risk samples has not been examined. This study evaluates premorbid functioning as a predictor of psychosis conversion among a clinical high-risk sample, controlling for effects of prior developmental periods. Participants were 270 clinical high-risk individuals in the North American Prodrome Longitudinal Study—I, 78 of whom converted to psychosis over the next 2.5 years. Social, academic, and total maladjustment in childhood, early adolescence, and late adolescence were rated using the Cannon–Spoor Premorbid Adjustment Scale. Early adolescent social dysfunction significantly predicted conversion to psychosis (hazard ratio = 1.30, p = .014), independently of childhood social maladjustment and independently of severity of most baseline positive and negative prodromal symptoms. Baseline prodromal symptoms of disorganized communication, social anhedonia, suspiciousness, and diminished ideational richness mediated this association. Early adolescent social maladjustment and baseline suspiciousness together demonstrated moderate positive predictive power (59%) and high specificity (92.1%) in predicting conversion. Deterioration of academic and total functioning, although observed, did not predict conversion to psychosis. Results indicate early adolescent social dysfunction to be an important early predictor of conversion. As such, it may be a good candidate for inclusion in prediction algorithms and could represent an advantageous target for early intervention. PMID:24229556

  12. Comparative study of biodegradability prediction of chemicals using decision trees, functional trees, and logistic regression.

    PubMed

    Chen, Guangchao; Li, Xuehua; Chen, Jingwen; Zhang, Ya-Nan; Peijnenburg, Willie J G M

    2014-12-01

    Biodegradation is the principal environmental dissipation process of chemicals. As such, it is a dominant factor determining the persistence and fate of organic chemicals in the environment, and is therefore of critical importance to chemical management and regulation. In the present study, the authors developed in silico methods assessing biodegradability based on a large heterogeneous set of 825 organic compounds, using the techniques of the C4.5 decision tree, the functional inner regression tree, and logistic regression. External validation was subsequently carried out by 2 independent test sets of 777 and 27 chemicals. As a result, the functional inner regression tree exhibited the best predictability with predictive accuracies of 81.5% and 81.0%, respectively, on the training set (825 chemicals) and test set I (777 chemicals). Performance of the developed models on the 2 test sets was subsequently compared with that of the Estimation Program Interface (EPI) Suite Biowin 5 and Biowin 6 models, which also showed a better predictability of the functional inner regression tree model. The model built in the present study exhibits a reasonable predictability compared with existing models while possessing a transparent algorithm. Interpretation of the mechanisms of biodegradation was also carried out based on the models developed. © 2014 SETAC.

  13. iDBPs: a web server for the identification of DNA binding proteins

    PubMed Central

    Nimrod, Guy; Schushan, Maya; Szilágyi, András; Leslie, Christina; Ben-Tal, Nir

    2010-01-01

    Summary: The iDBPs server uses the three-dimensional (3D) structure of a query protein to predict whether it binds DNA. First, the algorithm predicts the functional region of the protein based on its evolutionary profile; the assumption is that large clusters of conserved residues are good markers of functional regions. Next, various characteristics of the predicted functional region as well as global features of the protein are calculated, such as the average surface electrostatic potential, the dipole moment and cluster-based amino acid conservation patterns. Finally, a random forests classifier is used to predict whether the query protein is likely to bind DNA and to estimate the prediction confidence. We have trained and tested the classifier on various datasets and shown that it outperformed related methods. On a dataset that reflects the fraction of DNA binding proteins (DBPs) in a proteome, the area under the ROC curve was 0.90. The application of the server to an updated version of the N-Func database, which contains proteins of unknown function with solved 3D-structure, suggested new putative DBPs for experimental studies. Availability: http://idbps.tau.ac.il/ Contact: NirB@tauex.tau.ac.il Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20089514

  14. Modeling of mass transfer of Phospholipids in separation process with supercritical CO2 fluid by RBF artificial neural networks

    USDA-ARS?s Scientific Manuscript database

    An artificial Radial Basis Function (RBF) neural network model was developed for the prediction of mass transfer of the phospholipids from canola meal in supercritical CO2 fluid. The RBF kind of artificial neural networks (ANN) with orthogonal least squares (OLS) learning algorithm were used for mod...

  15. Modelling soil water retention using support vector machines with genetic algorithm optimisation.

    PubMed

    Lamorski, Krzysztof; Sławiński, Cezary; Moreno, Felix; Barna, Gyöngyi; Skierucha, Wojciech; Arrue, José L

    2014-01-01

    This work presents point pedotransfer function (PTF) models of the soil water retention curve. The developed models allowed for estimation of the soil water content for the specified soil water potentials: -0.98, -3.10, -9.81, -31.02, -491.66, and -1554.78 kPa, based on the following soil characteristics: soil granulometric composition, total porosity, and bulk density. Support Vector Machines (SVM) methodology was used for model development. A new methodology for elaboration of retention function models is proposed. Alternative to previous attempts known from literature, the ν-SVM method was used for model development and the results were compared with the formerly used the C-SVM method. For the purpose of models' parameters search, genetic algorithms were used as an optimisation framework. A new form of the aim function used for models parameters search is proposed which allowed for development of models with better prediction capabilities. This new aim function avoids overestimation of models which is typically encountered when root mean squared error is used as an aim function. Elaborated models showed good agreement with measured soil water retention data. Achieved coefficients of determination values were in the range 0.67-0.92. Studies demonstrated usability of ν-SVM methodology together with genetic algorithm optimisation for retention modelling which gave better performing models than other tested approaches.

  16. Penalized likelihood and multi-objective spatial scans for the detection and inference of irregular clusters

    PubMed Central

    2010-01-01

    Background Irregularly shaped spatial clusters are difficult to delineate. A cluster found by an algorithm often spreads through large portions of the map, impacting its geographical meaning. Penalized likelihood methods for Kulldorff's spatial scan statistics have been used to control the excessive freedom of the shape of clusters. Penalty functions based on cluster geometry and non-connectivity have been proposed recently. Another approach involves the use of a multi-objective algorithm to maximize two objectives: the spatial scan statistics and the geometric penalty function. Results & Discussion We present a novel scan statistic algorithm employing a function based on the graph topology to penalize the presence of under-populated disconnection nodes in candidate clusters, the disconnection nodes cohesion function. A disconnection node is defined as a region within a cluster, such that its removal disconnects the cluster. By applying this function, the most geographically meaningful clusters are sifted through the immense set of possible irregularly shaped candidate cluster solutions. To evaluate the statistical significance of solutions for multi-objective scans, a statistical approach based on the concept of attainment function is used. In this paper we compared different penalized likelihoods employing the geometric and non-connectivity regularity functions and the novel disconnection nodes cohesion function. We also build multi-objective scans using those three functions and compare them with the previous penalized likelihood scans. An application is presented using comprehensive state-wide data for Chagas' disease in puerperal women in Minas Gerais state, Brazil. Conclusions We show that, compared to the other single-objective algorithms, multi-objective scans present better performance, regarding power, sensitivity and positive predicted value. The multi-objective non-connectivity scan is faster and better suited for the detection of moderately irregularly shaped clusters. The multi-objective cohesion scan is most effective for the detection of highly irregularly shaped clusters. PMID:21034451

  17. WORMHOLE: Novel Least Diverged Ortholog Prediction through Machine Learning

    PubMed Central

    Sutphin, George L.; Mahoney, J. Matthew; Sheppard, Keith; Walton, David O.; Korstanje, Ron

    2016-01-01

    The rapid advancement of technology in genomics and targeted genetic manipulation has made comparative biology an increasingly prominent strategy to model human disease processes. Predicting orthology relationships between species is a vital component of comparative biology. Dozens of strategies for predicting orthologs have been developed using combinations of gene and protein sequence, phylogenetic history, and functional interaction with progressively increasing accuracy. A relatively new class of orthology prediction strategies combines aspects of multiple methods into meta-tools, resulting in improved prediction performance. Here we present WORMHOLE, a novel ortholog prediction meta-tool that applies machine learning to integrate 17 distinct ortholog prediction algorithms to identify novel least diverged orthologs (LDOs) between 6 eukaryotic species—humans, mice, zebrafish, fruit flies, nematodes, and budding yeast. Machine learning allows WORMHOLE to intelligently incorporate predictions from a wide-spectrum of strategies in order to form aggregate predictions of LDOs with high confidence. In this study we demonstrate the performance of WORMHOLE across each combination of query and target species. We show that WORMHOLE is particularly adept at improving LDO prediction performance between distantly related species, expanding the pool of LDOs while maintaining low evolutionary distance and a high level of functional relatedness between genes in LDO pairs. We present extensive validation, including cross-validated prediction of PANTHER LDOs and evaluation of evolutionary divergence and functional similarity, and discuss future applications of machine learning in ortholog prediction. A WORMHOLE web tool has been developed and is available at http://wormhole.jax.org/. PMID:27812085

  18. WORMHOLE: Novel Least Diverged Ortholog Prediction through Machine Learning.

    PubMed

    Sutphin, George L; Mahoney, J Matthew; Sheppard, Keith; Walton, David O; Korstanje, Ron

    2016-11-01

    The rapid advancement of technology in genomics and targeted genetic manipulation has made comparative biology an increasingly prominent strategy to model human disease processes. Predicting orthology relationships between species is a vital component of comparative biology. Dozens of strategies for predicting orthologs have been developed using combinations of gene and protein sequence, phylogenetic history, and functional interaction with progressively increasing accuracy. A relatively new class of orthology prediction strategies combines aspects of multiple methods into meta-tools, resulting in improved prediction performance. Here we present WORMHOLE, a novel ortholog prediction meta-tool that applies machine learning to integrate 17 distinct ortholog prediction algorithms to identify novel least diverged orthologs (LDOs) between 6 eukaryotic species-humans, mice, zebrafish, fruit flies, nematodes, and budding yeast. Machine learning allows WORMHOLE to intelligently incorporate predictions from a wide-spectrum of strategies in order to form aggregate predictions of LDOs with high confidence. In this study we demonstrate the performance of WORMHOLE across each combination of query and target species. We show that WORMHOLE is particularly adept at improving LDO prediction performance between distantly related species, expanding the pool of LDOs while maintaining low evolutionary distance and a high level of functional relatedness between genes in LDO pairs. We present extensive validation, including cross-validated prediction of PANTHER LDOs and evaluation of evolutionary divergence and functional similarity, and discuss future applications of machine learning in ortholog prediction. A WORMHOLE web tool has been developed and is available at http://wormhole.jax.org/.

  19. Predictive analysis of beer quality by correlating sensory evaluation with higher alcohol and ester production using multivariate statistics methods.

    PubMed

    Dong, Jian-Jun; Li, Qing-Liang; Yin, Hua; Zhong, Cheng; Hao, Jun-Guang; Yang, Pan-Fei; Tian, Yu-Hong; Jia, Shi-Ru

    2014-10-15

    Sensory evaluation is regarded as a necessary procedure to ensure a reproducible quality of beer. Meanwhile, high-throughput analytical methods provide a powerful tool to analyse various flavour compounds, such as higher alcohol and ester. In this study, the relationship between flavour compounds and sensory evaluation was established by non-linear models such as partial least squares (PLS), genetic algorithm back-propagation neural network (GA-BP), support vector machine (SVM). It was shown that SVM with a Radial Basis Function (RBF) had a better performance of prediction accuracy for both calibration set (94.3%) and validation set (96.2%) than other models. Relatively lower prediction abilities were observed for GA-BP (52.1%) and PLS (31.7%). In addition, the kernel function of SVM played an essential role of model training when the prediction accuracy of SVM with polynomial kernel function was 32.9%. As a powerful multivariate statistics method, SVM holds great potential to assess beer quality. Copyright © 2014 Elsevier Ltd. All rights reserved.

  20. Can machine-learning improve cardiovascular risk prediction using routine clinical data?

    PubMed Central

    Kai, Joe; Garibaldi, Jonathan M.; Qureshi, Nadeem

    2017-01-01

    Background Current approaches to predict cardiovascular risk fail to identify many people who would benefit from preventive treatment, while others receive unnecessary intervention. Machine-learning offers opportunity to improve accuracy by exploiting complex interactions between risk factors. We assessed whether machine-learning can improve cardiovascular risk prediction. Methods Prospective cohort study using routine clinical data of 378,256 patients from UK family practices, free from cardiovascular disease at outset. Four machine-learning algorithms (random forest, logistic regression, gradient boosting machines, neural networks) were compared to an established algorithm (American College of Cardiology guidelines) to predict first cardiovascular event over 10-years. Predictive accuracy was assessed by area under the ‘receiver operating curve’ (AUC); and sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) to predict 7.5% cardiovascular risk (threshold for initiating statins). Findings 24,970 incident cardiovascular events (6.6%) occurred. Compared to the established risk prediction algorithm (AUC 0.728, 95% CI 0.723–0.735), machine-learning algorithms improved prediction: random forest +1.7% (AUC 0.745, 95% CI 0.739–0.750), logistic regression +3.2% (AUC 0.760, 95% CI 0.755–0.766), gradient boosting +3.3% (AUC 0.761, 95% CI 0.755–0.766), neural networks +3.6% (AUC 0.764, 95% CI 0.759–0.769). The highest achieving (neural networks) algorithm predicted 4,998/7,404 cases (sensitivity 67.5%, PPV 18.4%) and 53,458/75,585 non-cases (specificity 70.7%, NPV 95.7%), correctly predicting 355 (+7.6%) more patients who developed cardiovascular disease compared to the established algorithm. Conclusions Machine-learning significantly improves accuracy of cardiovascular risk prediction, increasing the number of patients identified who could benefit from preventive treatment, while avoiding unnecessary treatment of others. PMID:28376093

  1. Can machine-learning improve cardiovascular risk prediction using routine clinical data?

    PubMed

    Weng, Stephen F; Reps, Jenna; Kai, Joe; Garibaldi, Jonathan M; Qureshi, Nadeem

    2017-01-01

    Current approaches to predict cardiovascular risk fail to identify many people who would benefit from preventive treatment, while others receive unnecessary intervention. Machine-learning offers opportunity to improve accuracy by exploiting complex interactions between risk factors. We assessed whether machine-learning can improve cardiovascular risk prediction. Prospective cohort study using routine clinical data of 378,256 patients from UK family practices, free from cardiovascular disease at outset. Four machine-learning algorithms (random forest, logistic regression, gradient boosting machines, neural networks) were compared to an established algorithm (American College of Cardiology guidelines) to predict first cardiovascular event over 10-years. Predictive accuracy was assessed by area under the 'receiver operating curve' (AUC); and sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) to predict 7.5% cardiovascular risk (threshold for initiating statins). 24,970 incident cardiovascular events (6.6%) occurred. Compared to the established risk prediction algorithm (AUC 0.728, 95% CI 0.723-0.735), machine-learning algorithms improved prediction: random forest +1.7% (AUC 0.745, 95% CI 0.739-0.750), logistic regression +3.2% (AUC 0.760, 95% CI 0.755-0.766), gradient boosting +3.3% (AUC 0.761, 95% CI 0.755-0.766), neural networks +3.6% (AUC 0.764, 95% CI 0.759-0.769). The highest achieving (neural networks) algorithm predicted 4,998/7,404 cases (sensitivity 67.5%, PPV 18.4%) and 53,458/75,585 non-cases (specificity 70.7%, NPV 95.7%), correctly predicting 355 (+7.6%) more patients who developed cardiovascular disease compared to the established algorithm. Machine-learning significantly improves accuracy of cardiovascular risk prediction, increasing the number of patients identified who could benefit from preventive treatment, while avoiding unnecessary treatment of others.

  2. A unified algorithm for predicting partition coefficients for PBPK modeling of drugs and environmental chemicals

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Peyret, Thomas; Poulin, Patrick; Krishnan, Kannan, E-mail: kannan.krishnan@umontreal.ca

    The algorithms in the literature focusing to predict tissue:blood PC (P{sub tb}) for environmental chemicals and tissue:plasma PC based on total (K{sub p}) or unbound concentration (K{sub pu}) for drugs differ in their consideration of binding to hemoglobin, plasma proteins and charged phospholipids. The objective of the present study was to develop a unified algorithm such that P{sub tb}, K{sub p} and K{sub pu} for both drugs and environmental chemicals could be predicted. The development of the unified algorithm was accomplished by integrating all mechanistic algorithms previously published to compute the PCs. Furthermore, the algorithm was structured in such amore » way as to facilitate predictions of the distribution of organic compounds at the macro (i.e. whole tissue) and micro (i.e. cells and fluids) levels. The resulting unified algorithm was applied to compute the rat P{sub tb}, K{sub p} or K{sub pu} of muscle (n = 174), liver (n = 139) and adipose tissue (n = 141) for acidic, neutral, zwitterionic and basic drugs as well as ketones, acetate esters, alcohols, aliphatic hydrocarbons, aromatic hydrocarbons and ethers. The unified algorithm reproduced adequately the values predicted previously by the published algorithms for a total of 142 drugs and chemicals. The sensitivity analysis demonstrated the relative importance of the various compound properties reflective of specific mechanistic determinants relevant to prediction of PC values of drugs and environmental chemicals. Overall, the present unified algorithm uniquely facilitates the computation of macro and micro level PCs for developing organ and cellular-level PBPK models for both chemicals and drugs.« less

  3. Development and Validation of an Algorithm to Identify Planned Readmissions From Claims Data.

    PubMed

    Horwitz, Leora I; Grady, Jacqueline N; Cohen, Dorothy B; Lin, Zhenqiu; Volpe, Mark; Ngo, Chi K; Masica, Andrew L; Long, Theodore; Wang, Jessica; Keenan, Megan; Montague, Julia; Suter, Lisa G; Ross, Joseph S; Drye, Elizabeth E; Krumholz, Harlan M; Bernheim, Susannah M

    2015-10-01

    It is desirable not to include planned readmissions in readmission measures because they represent deliberate, scheduled care. To develop an algorithm to identify planned readmissions, describe its performance characteristics, and identify improvements. Consensus-driven algorithm development and chart review validation study at 7 acute-care hospitals in 2 health systems. For development, all discharges qualifying for the publicly reported hospital-wide readmission measure. For validation, all qualifying same-hospital readmissions that were characterized by the algorithm as planned, and a random sampling of same-hospital readmissions that were characterized as unplanned. We calculated weighted sensitivity and specificity, and positive and negative predictive values of the algorithm (version 2.1), compared to gold standard chart review. In consultation with 27 experts, we developed an algorithm that characterizes 7.8% of readmissions as planned. For validation we reviewed 634 readmissions. The weighted sensitivity of the algorithm was 45.1% overall, 50.9% in large teaching centers and 40.2% in smaller community hospitals. The weighted specificity was 95.9%, positive predictive value was 51.6%, and negative predictive value was 94.7%. We identified 4 minor changes to improve algorithm performance. The revised algorithm had a weighted sensitivity 49.8% (57.1% at large hospitals), weighted specificity 96.5%, positive predictive value 58.7%, and negative predictive value 94.5%. Positive predictive value was poor for the 2 most common potentially planned procedures: diagnostic cardiac catheterization (25%) and procedures involving cardiac devices (33%). An administrative claims-based algorithm to identify planned readmissions is feasible and can facilitate public reporting of primarily unplanned readmissions. © 2015 Society of Hospital Medicine.

  4. The prediction of human exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Solovyev, V.V.; Salamov, A.A.; Lawrence, C.B.

    1994-12-31

    Discriminant analysis is applied to the problem of recognition 5`-, internal and 3`-exons in human DNA sequences. Specific recognition functions were developed for revealing exons of particular types. The method based on a splice site prediction algorithm that uses the linear Fisher discriminant to combine the information about significant triplet frequencies of various functional parts of splice site regions and preferences of oligonucleotide in protein coding and nation regions. The accuracy of our splice site recognition function is about 97%. A discriminant function for 5`-exon prediction includes hexanucleotide composition of upstream region, triplet composition around the ATG codon, ORF codingmore » potential, donor splice site potential and composition of downstream introit region. For internal exon prediction, we combine in a discriminant function the characteristics describing the 5`- intron region, donor splice site, coding region, acceptor splice site and Y-intron region for each open reading frame flanked by GT and AG base pairs. The accuracy of precise internal exon recognition on a test set of 451 exon and 246693 pseudoexon sequences is 77% with a specificity of 79% and a level of pseudoexon ORF prediction of 99.96%. The recognition quality computed at the level of individual nucleotides is 89%, for exon sequences and 98% for intron sequences. A discriminant function for 3`-exon prediction includes octanucleolide composition of upstream nation region, triplet composition around the stop codon, ORF coding potential, acceptor splice site potential and hexanucleotide composition of downstream region. We unite these three discriminant functions in exon predicting program FEX (find exons). FEX exactly predicts 70% of 1016 exons from the test of 181 complete genes with specificity 73%, and 89% exons are exactly or partially predicted. On the average, 85% of nucleotides were predicted accurately with specificity 91%.« less

  5. Predicting protein subcellular locations using hierarchical ensemble of Bayesian classifiers based on Markov chains.

    PubMed

    Bulashevska, Alla; Eils, Roland

    2006-06-14

    The subcellular location of a protein is closely related to its function. It would be worthwhile to develop a method to predict the subcellular location for a given protein when only the amino acid sequence of the protein is known. Although many efforts have been made to predict subcellular location from sequence information only, there is the need for further research to improve the accuracy of prediction. A novel method called HensBC is introduced to predict protein subcellular location. HensBC is a recursive algorithm which constructs a hierarchical ensemble of classifiers. The classifiers used are Bayesian classifiers based on Markov chain models. We tested our method on six various datasets; among them are Gram-negative bacteria dataset, data for discriminating outer membrane proteins and apoptosis proteins dataset. We observed that our method can predict the subcellular location with high accuracy. Another advantage of the proposed method is that it can improve the accuracy of the prediction of some classes with few sequences in training and is therefore useful for datasets with imbalanced distribution of classes. This study introduces an algorithm which uses only the primary sequence of a protein to predict its subcellular location. The proposed recursive scheme represents an interesting methodology for learning and combining classifiers. The method is computationally efficient and competitive with the previously reported approaches in terms of prediction accuracies as empirical results indicate. The code for the software is available upon request.

  6. Binary Classification using Decision Tree based Genetic Programming and Its Application to Analysis of Bio-mass Data

    NASA Astrophysics Data System (ADS)

    To, Cuong; Pham, Tuan D.

    2010-01-01

    In machine learning, pattern recognition may be the most popular task. "Similar" patterns identification is also very important in biology because first, it is useful for prediction of patterns associated with disease, for example cancer tissue (normal or tumor); second, similarity or dissimilarity of the kinetic patterns is used to identify coordinately controlled genes or proteins involved in the same regulatory process. Third, similar genes (proteins) share similar functions. In this paper, we present an algorithm which uses genetic programming to create decision tree for binary classification problem. The application of the algorithm was implemented on five real biological databases. Base on the results of comparisons with well-known methods, we see that the algorithm is outstanding in most of cases.

  7. Compressed sensing based missing nodes prediction in temporal communication network

    NASA Astrophysics Data System (ADS)

    Cheng, Guangquan; Ma, Yang; Liu, Zhong; Xie, Fuli

    2018-02-01

    The reconstruction of complex network topology is of great theoretical and practical significance. Most research so far focuses on the prediction of missing links. There are many mature algorithms for link prediction which have achieved good results, but research on the prediction of missing nodes has just begun. In this paper, we propose an algorithm for missing node prediction in complex networks. We detect the position of missing nodes based on their neighbor nodes under the theory of compressed sensing, and extend the algorithm to the case of multiple missing nodes using spectral clustering. Experiments on real public network datasets and simulated datasets show that our algorithm can detect the locations of hidden nodes effectively with high precision.

  8. Dynamic Stiffness Transfer Function of an Electromechanical Actuator Using System Identification

    NASA Astrophysics Data System (ADS)

    Kim, Sang Hwa; Tahk, Min-Jea

    2018-04-01

    In the aeroelastic analysis of flight vehicles with electromechanical actuators (EMAs), an accurate prediction of flutter requires dynamic stiffness characteristics of the EMA. The dynamic stiffness transfer function of the EMA with brushless direct current (BLDC) motor can be obtained by conducting complicated mathematical calculations of control algorithms and mechanical/electrical nonlinearities using linearization techniques. Thus, system identification approaches using experimental data, as an alternative, have considerable advantages. However, the test setup for system identification is expensive and complex, and experimental procedures for data collection are time-consuming tasks. To obtain the dynamic stiffness transfer function, this paper proposes a linear system identification method that uses information obtained from a reliable dynamic stiffness model with a control algorithm and nonlinearities. The results of this study show that the system identification procedure is compact, and the transfer function is able to describe the dynamic stiffness characteristics of the EMA. In addition, to verify the validity of the system identification method, the simulation results of the dynamic stiffness transfer function and the dynamic stiffness model were compared with the experimental data for various external loads.

  9. Increasing Prediction the Original Final Year Project of Student Using Genetic Algorithm

    NASA Astrophysics Data System (ADS)

    Saragih, Rijois Iboy Erwin; Turnip, Mardi; Sitanggang, Delima; Aritonang, Mendarissan; Harianja, Eva

    2018-04-01

    Final year project is very important forgraduation study of a student. Unfortunately, many students are not seriouslydidtheir final projects. Many of studentsask for someone to do it for them. In this paper, an application of genetic algorithms to predict the original final year project of a studentis proposed. In the simulation, the data of the final project for the last 5 years is collected. The genetic algorithm has several operators namely population, selection, crossover, and mutation. The result suggest that genetic algorithm can do better prediction than other comparable model. Experimental results of predicting showed that 70% was more accurate than the previous researched.

  10. Appendix F. Developmental enforcement algorithm definition document : predictive braking enforcement algorithm definition document.

    DOT National Transportation Integrated Search

    2012-05-01

    The purpose of this document is to fully define and describe the logic flow and mathematical equations for a predictive braking enforcement algorithm intended for implementation in a Positive Train Control (PTC) system.

  11. CAT-PUMA: CME Arrival Time Prediction Using Machine learning Algorithms

    NASA Astrophysics Data System (ADS)

    Liu, Jiajia; Ye, Yudong; Shen, Chenglong; Wang, Yuming; Erdélyi, Robert

    2018-04-01

    CAT-PUMA (CME Arrival Time Prediction Using Machine learning Algorithms) quickly and accurately predicts the arrival of Coronal Mass Ejections (CMEs) of CME arrival time. The software was trained via detailed analysis of CME features and solar wind parameters using 182 previously observed geo-effective partial-/full-halo CMEs and uses algorithms of the Support Vector Machine (SVM) to make its predictions, which can be made within minutes of providing the necessary input parameters of a CME.

  12. A low computation cost method for seizure prediction.

    PubMed

    Zhang, Yanli; Zhou, Weidong; Yuan, Qi; Wu, Qi

    2014-10-01

    The dynamic changes of electroencephalograph (EEG) signals in the period prior to epileptic seizures play a major role in the seizure prediction. This paper proposes a low computation seizure prediction algorithm that combines a fractal dimension with a machine learning algorithm. The presented seizure prediction algorithm extracts the Higuchi fractal dimension (HFD) of EEG signals as features to classify the patient's preictal or interictal state with Bayesian linear discriminant analysis (BLDA) as a classifier. The outputs of BLDA are smoothed by a Kalman filter for reducing possible sporadic and isolated false alarms and then the final prediction results are produced using a thresholding procedure. The algorithm was evaluated on the intracranial EEG recordings of 21 patients in the Freiburg EEG database. For seizure occurrence period of 30 min and 50 min, our algorithm obtained an average sensitivity of 86.95% and 89.33%, an average false prediction rate of 0.20/h, and an average prediction time of 24.47 min and 39.39 min, respectively. The results confirm that the changes of HFD can serve as a precursor of ictal activities and be used for distinguishing between interictal and preictal epochs. Both HFD and BLDA classifier have a low computational complexity. All of these make the proposed algorithm suitable for real-time seizure prediction. Copyright © 2014 Elsevier B.V. All rights reserved.

  13. Prediction of Allogeneic Hematopoietic Stem-Cell Transplantation Mortality 100 Days After Transplantation Using a Machine Learning Algorithm: A European Group for Blood and Marrow Transplantation Acute Leukemia Working Party Retrospective Data Mining Study.

    PubMed

    Shouval, Roni; Labopin, Myriam; Bondi, Ori; Mishan-Shamay, Hila; Shimoni, Avichai; Ciceri, Fabio; Esteve, Jordi; Giebel, Sebastian; Gorin, Norbert C; Schmid, Christoph; Polge, Emmanuelle; Aljurf, Mahmoud; Kroger, Nicolaus; Craddock, Charles; Bacigalupo, Andrea; Cornelissen, Jan J; Baron, Frederic; Unger, Ron; Nagler, Arnon; Mohty, Mohamad

    2015-10-01

    Allogeneic hematopoietic stem-cell transplantation (HSCT) is potentially curative for acute leukemia (AL), but carries considerable risk. Machine learning algorithms, which are part of the data mining (DM) approach, may serve for transplantation-related mortality risk prediction. This work is a retrospective DM study on a cohort of 28,236 adult HSCT recipients from the AL registry of the European Group for Blood and Marrow Transplantation. The primary objective was prediction of overall mortality (OM) at 100 days after HSCT. Secondary objectives were estimation of nonrelapse mortality, leukemia-free survival, and overall survival at 2 years. Donor, recipient, and procedural characteristics were analyzed. The alternating decision tree machine learning algorithm was applied for model development on 70% of the data set and validated on the remaining data. OM prevalence at day 100 was 13.9% (n=3,936). Of the 20 variables considered, 10 were selected by the model for OM prediction, and several interactions were discovered. By using a logistic transformation function, the crude score was transformed into individual probabilities for 100-day OM (range, 3% to 68%). The model's discrimination for the primary objective performed better than the European Group for Blood and Marrow Transplantation score (area under the receiver operating characteristics curve, 0.701 v 0.646; P<.001). Calibration was excellent. Scores assigned were also predictive of secondary objectives. The alternating decision tree model provides a robust tool for risk evaluation of patients with AL before HSCT, and is available online (http://bioinfo.lnx.biu.ac.il/∼bondi/web1.html). It is presented as a continuous probabilistic score for the prediction of day 100 OM, extending prediction to 2 years. The DM method has proved useful for clinical prediction in HSCT. © 2015 by American Society of Clinical Oncology.

  14. Prediction of subcellular localization of eukaryotic proteins using position-specific profiles and neural network with weighted inputs.

    PubMed

    Zou, Lingyun; Wang, Zhengzhi; Huang, Jiaomin

    2007-12-01

    Subcellular location is one of the key biological characteristics of proteins. Position-specific profiles (PSP) have been introduced as important characteristics of proteins in this article. In this study, to obtain position-specific profiles, the Position Specific Iterative-Basic Local Alignment Search Tool (PSI-BLAST) has been used to search for protein sequences in a database. Position-specific scoring matrices are extracted from the profiles as one class of characteristics. Four-part amino acid compositions and 1st-7th order dipeptide compositions have also been calculated as the other two classes of characteristics. Therefore, twelve characteristic vectors are extracted from each of the protein sequences. Next, the characteristic vectors are weighed by a simple weighing function and inputted into a BP neural network predictor named PSP-Weighted Neural Network (PSP-WNN). The Levenberg-Marquardt algorithm is employed to adjust the weight matrices and thresholds during the network training instead of the error back propagation algorithm. With a jackknife test on the RH2427 dataset, PSP-WNN has achieved a higher overall prediction accuracy of 88.4% rather than the prediction results by the general BP neural network, Markov model, and fuzzy k-nearest neighbors algorithm on this dataset. In addition, the prediction performance of PSP-WNN has been evaluated with a five-fold cross validation test on the PK7579 dataset and the prediction results have been consistently better than those of the previous method on the basis of several support vector machines, using compositions of both amino acids and amino acid pairs. These results indicate that PSP-WNN is a powerful tool for subcellular localization prediction. At the end of the article, influences on prediction accuracy using different weighting proportions among three characteristic vector categories have been discussed. An appropriate proportion is considered by increasing the prediction accuracy.

  15. Optimisation of a machine learning algorithm in human locomotion using principal component and discriminant function analyses.

    PubMed

    Bisele, Maria; Bencsik, Martin; Lewis, Martin G C; Barnett, Cleveland T

    2017-01-01

    Assessment methods in human locomotion often involve the description of normalised graphical profiles and/or the extraction of discrete variables. Whilst useful, these approaches may not represent the full complexity of gait data. Multivariate statistical methods, such as Principal Component Analysis (PCA) and Discriminant Function Analysis (DFA), have been adopted since they have the potential to overcome these data handling issues. The aim of the current study was to develop and optimise a specific machine learning algorithm for processing human locomotion data. Twenty participants ran at a self-selected speed across a 15m runway in barefoot and shod conditions. Ground reaction forces (BW) and kinematics were measured at 1000 Hz and 100 Hz, respectively from which joint angles (°), joint moments (N.m.kg-1) and joint powers (W.kg-1) for the hip, knee and ankle joints were calculated in all three anatomical planes. Using PCA and DFA, power spectra of the kinematic and kinetic variables were used as a training database for the development of a machine learning algorithm. All possible combinations of 10 out of 20 participants were explored to find the iteration of individuals that would optimise the machine learning algorithm. The results showed that the algorithm was able to successfully predict whether a participant ran shod or barefoot in 93.5% of cases. To the authors' knowledge, this is the first study to optimise the development of a machine learning algorithm.

  16. Optimisation of a machine learning algorithm in human locomotion using principal component and discriminant function analyses

    PubMed Central

    Bisele, Maria; Bencsik, Martin; Lewis, Martin G. C.

    2017-01-01

    Assessment methods in human locomotion often involve the description of normalised graphical profiles and/or the extraction of discrete variables. Whilst useful, these approaches may not represent the full complexity of gait data. Multivariate statistical methods, such as Principal Component Analysis (PCA) and Discriminant Function Analysis (DFA), have been adopted since they have the potential to overcome these data handling issues. The aim of the current study was to develop and optimise a specific machine learning algorithm for processing human locomotion data. Twenty participants ran at a self-selected speed across a 15m runway in barefoot and shod conditions. Ground reaction forces (BW) and kinematics were measured at 1000 Hz and 100 Hz, respectively from which joint angles (°), joint moments (N.m.kg-1) and joint powers (W.kg-1) for the hip, knee and ankle joints were calculated in all three anatomical planes. Using PCA and DFA, power spectra of the kinematic and kinetic variables were used as a training database for the development of a machine learning algorithm. All possible combinations of 10 out of 20 participants were explored to find the iteration of individuals that would optimise the machine learning algorithm. The results showed that the algorithm was able to successfully predict whether a participant ran shod or barefoot in 93.5% of cases. To the authors’ knowledge, this is the first study to optimise the development of a machine learning algorithm. PMID:28886059

  17. Systematic discovery of novel eukaryotic transcriptional regulators using sequence homology independent prediction

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bossi, Flavia; Fan, Jue; Xiao, Jun

    Here, the molecular function of a gene is most commonly inferred by sequence similarity. Therefore, genes that lack sufficient sequence similarity to characterized genes (such as certain classes of transcriptional regulators) are difficult to classify using most function prediction algorithms and have remained uncharacterized. As a result, to identify novel transcriptional regulators systematically, we used a feature-based pipeline to screen protein families of unknown function. This method predicted 43 transcriptional regulator families in Arabidopsis thaliana, 7 families in Drosophila melanogaster, and 9 families in Homo sapiens. Literature curation validated 12 of the predicted families to be involved in transcriptional regulation.more » We tested 33 out of the 195 Arabidopsis putative transcriptional regulators for their ability to activate transcription of a reporter gene in planta and found twelve coactivators, five of which had no prior literature support. To investigate mechanisms of action in which the predicted regulators might work, we looked for interactors of an Arabidopsis candidate that did not show transactivation activity in planta and found that it might work with other members of its own family and a subunit of the Polycomb Repressive Complex 2 to regulate transcription. Our results demonstrate the feasibility of assigning molecular function to proteins of unknown function without depending on sequence similarity. In particular, we identified novel transcriptional regulators using biological features enriched in transcription factors. The predictions reported here should accelerate the characterization of novel regulators.« less

  18. Systematic discovery of novel eukaryotic transcriptional regulators using sequence homology independent prediction

    DOE PAGES

    Bossi, Flavia; Fan, Jue; Xiao, Jun; ...

    2017-06-26

    Here, the molecular function of a gene is most commonly inferred by sequence similarity. Therefore, genes that lack sufficient sequence similarity to characterized genes (such as certain classes of transcriptional regulators) are difficult to classify using most function prediction algorithms and have remained uncharacterized. As a result, to identify novel transcriptional regulators systematically, we used a feature-based pipeline to screen protein families of unknown function. This method predicted 43 transcriptional regulator families in Arabidopsis thaliana, 7 families in Drosophila melanogaster, and 9 families in Homo sapiens. Literature curation validated 12 of the predicted families to be involved in transcriptional regulation.more » We tested 33 out of the 195 Arabidopsis putative transcriptional regulators for their ability to activate transcription of a reporter gene in planta and found twelve coactivators, five of which had no prior literature support. To investigate mechanisms of action in which the predicted regulators might work, we looked for interactors of an Arabidopsis candidate that did not show transactivation activity in planta and found that it might work with other members of its own family and a subunit of the Polycomb Repressive Complex 2 to regulate transcription. Our results demonstrate the feasibility of assigning molecular function to proteins of unknown function without depending on sequence similarity. In particular, we identified novel transcriptional regulators using biological features enriched in transcription factors. The predictions reported here should accelerate the characterization of novel regulators.« less

  19. Using Machine Learning To Predict Which Light Curves Will Yield Stellar Rotation Periods

    NASA Astrophysics Data System (ADS)

    Agüeros, Marcel; Teachey, Alexander

    2018-01-01

    Using time-domain photometry to reliably measure a solar-type star's rotation period requires that its light curve have a number of favorable characteristics. The probability of recovering a period will be a non-linear function of these light curve features, which are either astrophysical in nature or set by the observations. We employ standard machine learning algorithms (artificial neural networks and random forests) to predict whether a given light curve will produce a robust rotation period measurement from its Lomb-Scargle periodogram. The algorithms are trained and validated using salient statistics extracted from both simulated light curves and their corresponding periodograms, and we apply these classifiers to the most recent Intermediate Palomar Transient Factory (iPTF) data release. With this pipeline, we anticipate measuring rotation periods for a significant fraction of the ∼4x108 stars in the iPTF footprint.

  20. Circadian Phase Resetting via Single and Multiple Control Targets

    PubMed Central

    Bagheri, Neda; Stelling, Jörg; Doyle, Francis J.

    2008-01-01

    Circadian entrainment is necessary for rhythmic physiological functions to be appropriately timed over the 24-hour day. Disruption of circadian rhythms has been associated with sleep and neuro-behavioral impairments as well as cancer. To date, light is widely accepted to be the most powerful circadian synchronizer, motivating its use as a key control input for phase resetting. Through sensitivity analysis, we identify additional control targets whose individual and simultaneous manipulation (via a model predictive control algorithm) out-perform the open-loop light-based phase recovery dynamics by nearly 3-fold. We further demonstrate the robustness of phase resetting by synchronizing short- and long-period mutant phenotypes to the 24-hour environment; the control algorithm is robust in the presence of model mismatch. These studies prove the efficacy and immediate application of model predictive control in experimental studies and medicine. In particular, maintaining proper circadian regulation may significantly decrease the chance of acquiring chronic illness. PMID:18795146

  1. Predicting consumer behavior: using novel mind-reading approaches.

    PubMed

    Calvert, Gemma A; Brammer, Michael J

    2012-01-01

    Advances in machine learning as applied to functional magnetic resonance imaging (fMRI) data offer the possibility of pretesting and classifying marketing communications using unbiased pattern recognition algorithms. By using these algorithms to analyze brain responses to brands, products, or existing marketing communications that either failed or succeeded in the marketplace and identifying the patterns of brain activity that characterize success or failure, future planned campaigns or new products can now be pretested to determine how well the resulting brain responses match the desired (successful) pattern of brain activity without the need for verbal feedback. This major advance in signal processing is poised to revolutionize the application of these brain-imaging techniques in the marketing sector by offering greater accuracy of prediction in terms of consumer acceptance of new brands, products, and campaigns at a speed that makes them accessible as routine pretesting tools that will clearly demonstrate return on investment.

  2. Sensorless Modeling of Varying Pulse Width Modulator Resolutions in Three-Phase Induction Motors

    PubMed Central

    Marko, Matthew David; Shevach, Glenn

    2017-01-01

    A sensorless algorithm was developed to predict rotor speeds in an electric three-phase induction motor. This sensorless model requires a measurement of the stator currents and voltages, and the rotor speed is predicted accurately without any mechanical measurement of the rotor speed. A model of an electric vehicle undergoing acceleration was built, and the sensorless prediction of the simulation rotor speed was determined to be robust even in the presence of fluctuating motor parameters and significant sensor errors. Studies were conducted for varying pulse width modulator resolutions, and the sensorless model was accurate for all resolutions of sinusoidal voltage functions. PMID:28076418

  3. Sensorless Modeling of Varying Pulse Width Modulator Resolutions in Three-Phase Induction Motors.

    PubMed

    Marko, Matthew David; Shevach, Glenn

    2017-01-01

    A sensorless algorithm was developed to predict rotor speeds in an electric three-phase induction motor. This sensorless model requires a measurement of the stator currents and voltages, and the rotor speed is predicted accurately without any mechanical measurement of the rotor speed. A model of an electric vehicle undergoing acceleration was built, and the sensorless prediction of the simulation rotor speed was determined to be robust even in the presence of fluctuating motor parameters and significant sensor errors. Studies were conducted for varying pulse width modulator resolutions, and the sensorless model was accurate for all resolutions of sinusoidal voltage functions.

  4. Model predictive control design for polytopic uncertain systems by synthesising multi-step prediction scenarios

    NASA Astrophysics Data System (ADS)

    Lu, Jianbo; Xi, Yugeng; Li, Dewei; Xu, Yuli; Gan, Zhongxue

    2018-01-01

    A common objective of model predictive control (MPC) design is the large initial feasible region, low online computational burden as well as satisfactory control performance of the resulting algorithm. It is well known that interpolation-based MPC can achieve a favourable trade-off among these different aspects. However, the existing results are usually based on fixed prediction scenarios, which inevitably limits the performance of the obtained algorithms. So by replacing the fixed prediction scenarios with the time-varying multi-step prediction scenarios, this paper provides a new insight into improvement of the existing MPC designs. The adopted control law is a combination of predetermined multi-step feedback control laws, based on which two MPC algorithms with guaranteed recursive feasibility and asymptotic stability are presented. The efficacy of the proposed algorithms is illustrated by a numerical example.

  5. Mapping the EORTC QLQ-C30 onto the EQ-5D-3L: assessing the external validity of existing mapping algorithms.

    PubMed

    Doble, Brett; Lorgelly, Paula

    2016-04-01

    To determine the external validity of existing mapping algorithms for predicting EQ-5D-3L utility values from EORTC QLQ-C30 responses and to establish their generalizability in different types of cancer. A main analysis (pooled) sample of 3560 observations (1727 patients) and two disease severity patient samples (496 and 93 patients) with repeated observations over time from Cancer 2015 were used to validate the existing algorithms. Errors were calculated between observed and predicted EQ-5D-3L utility values using a single pooled sample and ten pooled tumour type-specific samples. Predictive accuracy was assessed using mean absolute error (MAE) and standardized root-mean-squared error (RMSE). The association between observed and predicted EQ-5D utility values and other covariates across the distribution was tested using quantile regression. Quality-adjusted life years (QALYs) were calculated using observed and predicted values to test responsiveness. Ten 'preferred' mapping algorithms were identified. Two algorithms estimated via response mapping and ordinary least-squares regression using dummy variables performed well on number of validation criteria, including accurate prediction of the best and worst QLQ-C30 health states, predicted values within the EQ-5D tariff range, relatively small MAEs and RMSEs, and minimal differences between estimated QALYs. Comparison of predictive accuracy across ten tumour type-specific samples highlighted that algorithms are relatively insensitive to grouping by tumour type and affected more by differences in disease severity. Two of the 'preferred' mapping algorithms suggest more accurate predictions, but limitations exist. We recommend extensive scenario analyses if mapped utilities are used in cost-utility analyses.

  6. A Systematic Investigation of Computation Models for Predicting Adverse Drug Reactions (ADRs)

    PubMed Central

    Kuang, Qifan; Wang, MinQi; Li, Rong; Dong, YongCheng; Li, Yizhou; Li, Menglong

    2014-01-01

    Background Early and accurate identification of adverse drug reactions (ADRs) is critically important for drug development and clinical safety. Computer-aided prediction of ADRs has attracted increasing attention in recent years, and many computational models have been proposed. However, because of the lack of systematic analysis and comparison of the different computational models, there remain limitations in designing more effective algorithms and selecting more useful features. There is therefore an urgent need to review and analyze previous computation models to obtain general conclusions that can provide useful guidance to construct more effective computational models to predict ADRs. Principal Findings In the current study, the main work is to compare and analyze the performance of existing computational methods to predict ADRs, by implementing and evaluating additional algorithms that have been earlier used for predicting drug targets. Our results indicated that topological and intrinsic features were complementary to an extent and the Jaccard coefficient had an important and general effect on the prediction of drug-ADR associations. By comparing the structure of each algorithm, final formulas of these algorithms were all converted to linear model in form, based on this finding we propose a new algorithm called the general weighted profile method and it yielded the best overall performance among the algorithms investigated in this paper. Conclusion Several meaningful conclusions and useful findings regarding the prediction of ADRs are provided for selecting optimal features and algorithms. PMID:25180585

  7. Optimizing multiple sequence alignments using a genetic algorithm based on three objectives: structural information, non-gaps percentage and totally conserved columns.

    PubMed

    Ortuño, Francisco M; Valenzuela, Olga; Rojas, Fernando; Pomares, Hector; Florido, Javier P; Urquiza, Jose M; Rojas, Ignacio

    2013-09-01

    Multiple sequence alignments (MSAs) are widely used approaches in bioinformatics to carry out other tasks such as structure predictions, biological function analyses or phylogenetic modeling. However, current tools usually provide partially optimal alignments, as each one is focused on specific biological features. Thus, the same set of sequences can produce different alignments, above all when sequences are less similar. Consequently, researchers and biologists do not agree about which is the most suitable way to evaluate MSAs. Recent evaluations tend to use more complex scores including further biological features. Among them, 3D structures are increasingly being used to evaluate alignments. Because structures are more conserved in proteins than sequences, scores with structural information are better suited to evaluate more distant relationships between sequences. The proposed multiobjective algorithm, based on the non-dominated sorting genetic algorithm, aims to jointly optimize three objectives: STRIKE score, non-gaps percentage and totally conserved columns. It was significantly assessed on the BAliBASE benchmark according to the Kruskal-Wallis test (P < 0.01). This algorithm also outperforms other aligners, such as ClustalW, Multiple Sequence Alignment Genetic Algorithm (MSA-GA), PRRP, DIALIGN, Hidden Markov Model Training (HMMT), Pattern-Induced Multi-sequence Alignment (PIMA), MULTIALIGN, Sequence Alignment Genetic Algorithm (SAGA), PILEUP, Rubber Band Technique Genetic Algorithm (RBT-GA) and Vertical Decomposition Genetic Algorithm (VDGA), according to the Wilcoxon signed-rank test (P < 0.05), whereas it shows results not significantly different to 3D-COFFEE (P > 0.05) with the advantage of being able to use less structures. Structural information is included within the objective function to evaluate more accurately the obtained alignments. The source code is available at http://www.ugr.es/~fortuno/MOSAStrE/MO-SAStrE.zip.

  8. Hearing through the noise: Biologically inspired noise reduction

    NASA Astrophysics Data System (ADS)

    Lee, Tyler Paul

    Vocal communication in the natural world demands that a listener perform a remarkably complicated task in real-time. Vocalizations mix with all other sounds in the environment as they travel to the listener, arriving as a jumbled low-dimensional signal. A listener must then use this signal to extract the structure corresponding to individual sound sources. How this computation is implemented in the brain remains poorly understood, yet an accurate description of such mechanisms would impact a variety of medical and technological applications of sound processing. In this thesis, I describe initial work on how neurons in the secondary auditory cortex of the Zebra Finch extract song from naturalistic background noise. I then build on our understanding of the function of these neurons by creating an algorithm that extracts speech from natural background noise using spectrotemporal modulations. The algorithm, implemented as an artificial neural network, can be flexibly applied to any class of signal or noise and performs better than an optimal frequency-based noise reduction algorithm for a variety of background noises and signal-to-noise ratios. One potential drawback to using spectrotemporal modulations for noise reduction, though, is that analyzing the modulations present in an ongoing sound requires a latency set by the slowest temporal modulation computed. The algorithm avoids this problem by reducing noise predictively, taking advantage of the large amount of temporal structure present in natural sounds. This predictive denoising has ties to recent work suggesting that the auditory system uses attention to focus on predicted regions of spectrotemporal space when performing auditory scene analysis.

  9. A digital prediction algorithm for a single-phase boost PFC

    NASA Astrophysics Data System (ADS)

    Qing, Wang; Ning, Chen; Weifeng, Sun; Shengli, Lu; Longxing, Shi

    2012-12-01

    A novel digital control algorithm for digital control power factor correction is presented, which is called the prediction algorithm and has a feature of a higher PF (power factor) with lower total harmonic distortion, and a faster dynamic response with the change of the input voltage or load current. For a certain system, based on the current system state parameters, the prediction algorithm can estimate the track of the output voltage and the inductor current at the next switching cycle and get a set of optimized control sequences to perfectly track the trajectory of input voltage. The proposed prediction algorithm is verified at different conditions, and computer simulation and experimental results under multi-situations confirm the effectiveness of the prediction algorithm. Under the circumstances that the input voltage is in the range of 90-265 V and the load current in the range of 20%-100%, the PF value is larger than 0.998. The startup and the recovery times respectively are about 0.1 s and 0.02 s without overshoot. The experimental results also verify the validity of the proposed method.

  10. Protein docking prediction using predicted protein-protein interface.

    PubMed

    Li, Bin; Kihara, Daisuke

    2012-01-10

    Many important cellular processes are carried out by protein complexes. To provide physical pictures of interacting proteins, many computational protein-protein prediction methods have been developed in the past. However, it is still difficult to identify the correct docking complex structure within top ranks among alternative conformations. We present a novel protein docking algorithm that utilizes imperfect protein-protein binding interface prediction for guiding protein docking. Since the accuracy of protein binding site prediction varies depending on cases, the challenge is to develop a method which does not deteriorate but improves docking results by using a binding site prediction which may not be 100% accurate. The algorithm, named PI-LZerD (using Predicted Interface with Local 3D Zernike descriptor-based Docking algorithm), is based on a pair wise protein docking prediction algorithm, LZerD, which we have developed earlier. PI-LZerD starts from performing docking prediction using the provided protein-protein binding interface prediction as constraints, which is followed by the second round of docking with updated docking interface information to further improve docking conformation. Benchmark results on bound and unbound cases show that PI-LZerD consistently improves the docking prediction accuracy as compared with docking without using binding site prediction or using the binding site prediction as post-filtering. We have developed PI-LZerD, a pairwise docking algorithm, which uses imperfect protein-protein binding interface prediction to improve docking accuracy. PI-LZerD consistently showed better prediction accuracy over alternative methods in the series of benchmark experiments including docking using actual docking interface site predictions as well as unbound docking cases.

  11. Pitch-Learning Algorithm For Speech Encoders

    NASA Technical Reports Server (NTRS)

    Bhaskar, B. R. Udaya

    1988-01-01

    Adaptive algorithm detects and corrects errors in sequence of estimates of pitch period of speech. Algorithm operates in conjunction with techniques used to estimate pitch period. Used in such parametric and hybrid speech coders as linear predictive coders and adaptive predictive coders.

  12. Risk-Seeking Versus Risk-Avoiding Investments in Noisy Periodic Environments

    NASA Astrophysics Data System (ADS)

    Navarro-Barrientos, J. Emeterio; Walter, Frank E.; Schweitzer, Frank

    We study the performance of various agent strategies in an artificial investment scenario. Agents are equipped with a budget, x(t), and at each time step invest a particular fraction, q(t), of their budget. The return on investment (RoI), r(t), is characterized by a periodic function with different types and levels of noise. Risk-avoiding agents choose their fraction q(t) proportional to the expected positive RoI, while risk-seeking agents always choose a maximum value qmax if they predict the RoI to be positive ("everything on red"). In addition to these different strategies, agents have different capabilities to predict the future r(t), dependent on their internal complexity. Here, we compare "zero-intelligent" agents using technical analysis (such as moving least squares) with agents using reinforcement learning or genetic algorithms to predict r(t). The performance of agents is measured by their average budget growth after a certain number of time steps. We present results of extensive computer simulations, which show that, for our given artificial environment, (i) the risk-seeking strategy outperforms the risk-avoiding one, and (ii) the genetic algorithm was able to find this optimal strategy itself, and thus outperforms other prediction approaches considered.

  13. PDB-UF: database of predicted enzymatic functions for unannotated protein structures from structural genomics.

    PubMed

    von Grotthuss, Marcin; Plewczynski, Dariusz; Ginalski, Krzysztof; Rychlewski, Leszek; Shakhnovich, Eugene I

    2006-02-06

    The number of protein structures from structural genomics centers dramatically increases in the Protein Data Bank (PDB). Many of these structures are functionally unannotated because they have no sequence similarity to proteins of known function. However, it is possible to successfully infer function using only structural similarity. Here we present the PDB-UF database, a web-accessible collection of predictions of enzymatic properties using structure-function relationship. The assignments were conducted for three-dimensional protein structures of unknown function that come from structural genomics initiatives. We show that 4 hypothetical proteins (with PDB accession codes: 1VH0, 1NS5, 1O6D, and 1TO0), for which standard BLAST tools such as PSI-BLAST or RPS-BLAST failed to assign any function, are probably methyltransferase enzymes. We suggest that the structure-based prediction of an EC number should be conducted having the different similarity score cutoff for different protein folds. Moreover, performing the annotation using two different algorithms can reduce the rate of false positive assignments. We believe, that the presented web-based repository will help to decrease the number of protein structures that have functions marked as "unknown" in the PDB file. http://paradox.harvard.edu/PDB-UF and http://bioinfo.pl/PDB-UF.

  14. Predicting multicellular function through multi-layer tissue networks

    PubMed Central

    Zitnik, Marinka; Leskovec, Jure

    2017-01-01

    Abstract Motivation: Understanding functions of proteins in specific human tissues is essential for insights into disease diagnostics and therapeutics, yet prediction of tissue-specific cellular function remains a critical challenge for biomedicine. Results: Here, we present OhmNet, a hierarchy-aware unsupervised node feature learning approach for multi-layer networks. We build a multi-layer network, where each layer represents molecular interactions in a different human tissue. OhmNet then automatically learns a mapping of proteins, represented as nodes, to a neural embedding-based low-dimensional space of features. OhmNet encourages sharing of similar features among proteins with similar network neighborhoods and among proteins activated in similar tissues. The algorithm generalizes prior work, which generally ignores relationships between tissues, by modeling tissue organization with a rich multiscale tissue hierarchy. We use OhmNet to study multicellular function in a multi-layer protein interaction network of 107 human tissues. In 48 tissues with known tissue-specific cellular functions, OhmNet provides more accurate predictions of cellular function than alternative approaches, and also generates more accurate hypotheses about tissue-specific protein actions. We show that taking into account the tissue hierarchy leads to improved predictive power. Remarkably, we also demonstrate that it is possible to leverage the tissue hierarchy in order to effectively transfer cellular functions to a functionally uncharacterized tissue. Overall, OhmNet moves from flat networks to multiscale models able to predict a range of phenotypes spanning cellular subsystems. Availability and implementation: Source code and datasets are available at http://snap.stanford.edu/ohmnet. Contact: jure@cs.stanford.edu PMID:28881986

  15. Real Time Monitoring and Prediction of the Monsoon Intraseasonal Oscillations: An index based on Nonlinear Laplacian Spectral Analysis Technique

    NASA Astrophysics Data System (ADS)

    Cherumadanakadan Thelliyil, S.; Ravindran, A. M.; Giannakis, D.; Majda, A.

    2016-12-01

    An improved index for real time monitoring and forecast verification of monsoon intraseasonal oscillations (MISO) is introduced using the recently developed Nonlinear Laplacian Spectral Analysis (NLSA) algorithm. Previous studies has demonstrated the proficiency of NLSA in capturing low frequency variability and intermittency of a time series. Using NLSA a hierarchy of Laplace-Beltrami (LB) eigen functions are extracted from the unfiltered daily GPCP rainfall data over the south Asian monsoon region. Two modes representing the full life cycle of complex northeastward propagating boreal summer MISO are identified from the hierarchy of Laplace-Beltrami eigen functions. These two MISO modes have a number of advantages over the conventionally used Extended Empirical Orthogonal Function (EEOF) MISO modes including higher memory and better predictability, higher fractional variance over the western Pacific, Western Ghats and adjoining Arabian Sea regions and more realistic representation of regional heat sources associated with the MISO. The skill of NLSA based MISO indices in real time prediction of MISO is demonstrated using hindcasts of CFSv2 extended range prediction runs. It is shown that these indices yield a higher prediction skill than the other conventional indices supporting the use of NLSA in real time prediction of MISO. Real time monitoring and prediction of MISO finds its application in agriculture, construction and hydro-electric power sectors and hence an important component of monsoon prediction.

  16. Adaptive Trajectory Prediction Algorithm for Climbing Flights

    NASA Technical Reports Server (NTRS)

    Schultz, Charles Alexander; Thipphavong, David P.; Erzberger, Heinz

    2012-01-01

    Aircraft climb trajectories are difficult to predict, and large errors in these predictions reduce the potential operational benefits of some advanced features for NextGen. The algorithm described in this paper improves climb trajectory prediction accuracy by adjusting trajectory predictions based on observed track data. It utilizes rate-of-climb and airspeed measurements derived from position data to dynamically adjust the aircraft weight modeled for trajectory predictions. In simulations with weight uncertainty, the algorithm is able to adapt to within 3 percent of the actual gross weight within two minutes of the initial adaptation. The root-mean-square of altitude errors for five-minute predictions was reduced by 73 percent. Conflict detection performance also improved, with a 15 percent reduction in missed alerts and a 10 percent reduction in false alerts. In a simulation with climb speed capture intent and weight uncertainty, the algorithm improved climb trajectory prediction accuracy by up to 30 percent and conflict detection performance, reducing missed and false alerts by up to 10 percent.

  17. Molecular descriptor subset selection in theoretical peptide quantitative structure-retention relationship model development using nature-inspired optimization algorithms.

    PubMed

    Žuvela, Petar; Liu, J Jay; Macur, Katarzyna; Bączek, Tomasz

    2015-10-06

    In this work, performance of five nature-inspired optimization algorithms, genetic algorithm (GA), particle swarm optimization (PSO), artificial bee colony (ABC), firefly algorithm (FA), and flower pollination algorithm (FPA), was compared in molecular descriptor selection for development of quantitative structure-retention relationship (QSRR) models for 83 peptides that originate from eight model proteins. The matrix with 423 descriptors was used as input, and QSRR models based on selected descriptors were built using partial least squares (PLS), whereas root mean square error of prediction (RMSEP) was used as a fitness function for their selection. Three performance criteria, prediction accuracy, computational cost, and the number of selected descriptors, were used to evaluate the developed QSRR models. The results show that all five variable selection methods outperform interval PLS (iPLS), sparse PLS (sPLS), and the full PLS model, whereas GA is superior because of its lowest computational cost and higher accuracy (RMSEP of 5.534%) with a smaller number of variables (nine descriptors). The GA-QSRR model was validated initially through Y-randomization. In addition, it was successfully validated with an external testing set out of 102 peptides originating from Bacillus subtilis proteomes (RMSEP of 22.030%). Its applicability domain was defined, from which it was evident that the developed GA-QSRR exhibited strong robustness. All the sources of the model's error were identified, thus allowing for further application of the developed methodology in proteomics.

  18. Simultaneous feature selection and parameter optimisation using an artificial ant colony: case study of melting point prediction.

    PubMed

    O'Boyle, Noel M; Palmer, David S; Nigsch, Florian; Mitchell, John Bo

    2008-10-29

    We present a novel feature selection algorithm, Winnowing Artificial Ant Colony (WAAC), that performs simultaneous feature selection and model parameter optimisation for the development of predictive quantitative structure-property relationship (QSPR) models. The WAAC algorithm is an extension of the modified ant colony algorithm of Shen et al. (J Chem Inf Model 2005, 45: 1024-1029). We test the ability of the algorithm to develop a predictive partial least squares model for the Karthikeyan dataset (J Chem Inf Model 2005, 45: 581-590) of melting point values. We also test its ability to perform feature selection on a support vector machine model for the same dataset. Starting from an initial set of 203 descriptors, the WAAC algorithm selected a PLS model with 68 descriptors which has an RMSE on an external test set of 46.6 degrees C and R2 of 0.51. The number of components chosen for the model was 49, which was close to optimal for this feature selection. The selected SVM model has 28 descriptors (cost of 5, epsilon of 0.21) and an RMSE of 45.1 degrees C and R2 of 0.54. This model outperforms a kNN model (RMSE of 48.3 degrees C, R2 of 0.47) for the same data and has similar performance to a Random Forest model (RMSE of 44.5 degrees C, R2 of 0.55). However it is much less prone to bias at the extremes of the range of melting points as shown by the slope of the line through the residuals: -0.43 for WAAC/SVM, -0.53 for Random Forest. With a careful choice of objective function, the WAAC algorithm can be used to optimise machine learning and regression models that suffer from overfitting. Where model parameters also need to be tuned, as is the case with support vector machine and partial least squares models, it can optimise these simultaneously. The moving probabilities used by the algorithm are easily interpreted in terms of the best and current models of the ants, and the winnowing procedure promotes the removal of irrelevant descriptors.

  19. Predicting nucleic acid binding interfaces from structural models of proteins.

    PubMed

    Dror, Iris; Shazman, Shula; Mukherjee, Srayanta; Zhang, Yang; Glaser, Fabian; Mandel-Gutfreund, Yael

    2012-02-01

    The function of DNA- and RNA-binding proteins can be inferred from the characterization and accurate prediction of their binding interfaces. However, the main pitfall of various structure-based methods for predicting nucleic acid binding function is that they are all limited to a relatively small number of proteins for which high-resolution three-dimensional structures are available. In this study, we developed a pipeline for extracting functional electrostatic patches from surfaces of protein structural models, obtained using the I-TASSER protein structure predictor. The largest positive patches are extracted from the protein surface using the patchfinder algorithm. We show that functional electrostatic patches extracted from an ensemble of structural models highly overlap the patches extracted from high-resolution structures. Furthermore, by testing our pipeline on a set of 55 known nucleic acid binding proteins for which I-TASSER produces high-quality models, we show that the method accurately identifies the nucleic acids binding interface on structural models of proteins. Employing a combined patch approach we show that patches extracted from an ensemble of models better predicts the real nucleic acid binding interfaces compared with patches extracted from independent models. Overall, these results suggest that combining information from a collection of low-resolution structural models could be a valuable approach for functional annotation. We suggest that our method will be further applicable for predicting other functional surfaces of proteins with unknown structure. Copyright © 2011 Wiley Periodicals, Inc.

  20. A comparison of algorithms for inference and learning in probabilistic graphical models.

    PubMed

    Frey, Brendan J; Jojic, Nebojsa

    2005-09-01

    Research into methods for reasoning under uncertainty is currently one of the most exciting areas of artificial intelligence, largely because it has recently become possible to record, store, and process large amounts of data. While impressive achievements have been made in pattern classification problems such as handwritten character recognition, face detection, speaker identification, and prediction of gene function, it is even more exciting that researchers are on the verge of introducing systems that can perform large-scale combinatorial analyses of data, decomposing the data into interacting components. For example, computational methods for automatic scene analysis are now emerging in the computer vision community. These methods decompose an input image into its constituent objects, lighting conditions, motion patterns, etc. Two of the main challenges are finding effective representations and models in specific applications and finding efficient algorithms for inference and learning in these models. In this paper, we advocate the use of graph-based probability models and their associated inference and learning algorithms. We review exact techniques and various approximate, computationally efficient techniques, including iterated conditional modes, the expectation maximization (EM) algorithm, Gibbs sampling, the mean field method, variational techniques, structured variational techniques and the sum-product algorithm ("loopy" belief propagation). We describe how each technique can be applied in a vision model of multiple, occluding objects and contrast the behaviors and performances of the techniques using a unifying cost function, free energy.

Top