prediction algorithm based: Topics by Science.gov

Sample records for prediction algorithm based

Minimalist ensemble algorithms for genome-wide protein localization prediction.

PubMed

Lin, Jhih-Rong; Mondal, Ananda Mohan; Liu, Rong; Hu, Jianjun

2012-07-03

Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi.
Minimalist ensemble algorithms for genome-wide protein localization prediction

PubMed Central

2012-01-01

Background Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. Results This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. Conclusions We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi. PMID:22759391
A Particle Swarm Optimization-Based Approach with Local Search for Predicting Protein Folding.

PubMed

Yang, Cheng-Hong; Lin, Yu-Shiun; Chuang, Li-Yeh; Chang, Hsueh-Wei

2017-10-01

The hydrophobic-polar (HP) model is commonly used for predicting protein folding structures and hydrophobic interactions. This study developed a particle swarm optimization (PSO)-based algorithm combined with local search algorithms; specifically, the high exploration PSO (HEPSO) algorithm (which can execute global search processes) was combined with three local search algorithms (hill-climbing algorithm, greedy algorithm, and Tabu table), yielding the proposed HE-L-PSO algorithm. By using 20 known protein structures, we evaluated the performance of the HE-L-PSO algorithm in predicting protein folding in the HP model. The proposed HE-L-PSO algorithm exhibited favorable performance in predicting both short and long amino acid sequences with high reproducibility and stability, compared with seven reported algorithms. The HE-L-PSO algorithm yielded optimal solutions for all predicted protein folding structures. All HE-L-PSO-predicted protein folding structures possessed a hydrophobic core that is similar to normal protein folding.
A High Performance Cloud-Based Protein-Ligand Docking Prediction Algorithm

PubMed Central

Chen, Jui-Le; Yang, Chu-Sing

2013-01-01

The potential of predicting druggability for a particular disease by integrating biological and computer science technologies has witnessed success in recent years. Although the computer science technologies can be used to reduce the costs of the pharmaceutical research, the computation time of the structure-based protein-ligand docking prediction is still unsatisfied until now. Hence, in this paper, a novel docking prediction algorithm, named fast cloud-based protein-ligand docking prediction algorithm (FCPLDPA), is presented to accelerate the docking prediction algorithm. The proposed algorithm works by leveraging two high-performance operators: (1) the novel migration (information exchange) operator is designed specially for cloud-based environments to reduce the computation time; (2) the efficient operator is aimed at filtering out the worst search directions. Our simulation results illustrate that the proposed method outperforms the other docking algorithms compared in this paper in terms of both the computation time and the quality of the end result. PMID:23762864
Testing an earthquake prediction algorithm

USGS Publications Warehouse

Kossobokov, V.G.; Healy, J.H.; Dewey, J.W.

1997-01-01

A test to evaluate earthquake prediction algorithms is being applied to a Russian algorithm known as M8. The M8 algorithm makes intermediate term predictions for earthquakes to occur in a large circle, based on integral counts of transient seismicity in the circle. In a retroactive prediction for the period January 1, 1985 to July 1, 1991 the algorithm as configured for the forward test would have predicted eight of ten strong earthquakes in the test area. A null hypothesis, based on random assignment of predictions, predicts eight earthquakes in 2.87% of the trials. The forward test began July 1, 1991 and will run through December 31, 1997. As of July 1, 1995, the algorithm had forward predicted five out of nine earthquakes in the test area, which success ratio would have been achieved in 53% of random trials with the null hypothesis.
Predicting Loss-of-Control Boundaries Toward a Piloting Aid

NASA Technical Reports Server (NTRS)

Barlow, Jonathan; Stepanyan, Vahram; Krishnakumar, Kalmanje

2012-01-01

This work presents an approach to predicting loss-of-control with the goal of providing the pilot a decision aid focused on maintaining the pilot's control action within predicted loss-of-control boundaries. The predictive architecture combines quantitative loss-of-control boundaries, a data-based predictive control boundary estimation algorithm and an adaptive prediction method to estimate Markov model parameters in real-time. The data-based loss-of-control boundary estimation algorithm estimates the boundary of a safe set of control inputs that will keep the aircraft within the loss-of-control boundaries for a specified time horizon. The adaptive prediction model generates estimates of the system Markov Parameters, which are used by the data-based loss-of-control boundary estimation algorithm. The combined algorithm is applied to a nonlinear generic transport aircraft to illustrate the features of the architecture.
SNBRFinder: A Sequence-Based Hybrid Algorithm for Enhanced Prediction of Nucleic Acid-Binding Residues.

PubMed

Yang, Xiaoxia; Wang, Jia; Sun, Jun; Liu, Rong

2015-01-01

Protein-nucleic acid interactions are central to various fundamental biological processes. Automated methods capable of reliably identifying DNA- and RNA-binding residues in protein sequence are assuming ever-increasing importance. The majority of current algorithms rely on feature-based prediction, but their accuracy remains to be further improved. Here we propose a sequence-based hybrid algorithm SNBRFinder (Sequence-based Nucleic acid-Binding Residue Finder) by merging a feature predictor SNBRFinderF and a template predictor SNBRFinderT. SNBRFinderF was established using the support vector machine whose inputs include sequence profile and other complementary sequence descriptors, while SNBRFinderT was implemented with the sequence alignment algorithm based on profile hidden Markov models to capture the weakly homologous template of query sequence. Experimental results show that SNBRFinderF was clearly superior to the commonly used sequence profile-based predictor and SNBRFinderT can achieve comparable performance to the structure-based template methods. Leveraging the complementary relationship between these two predictors, SNBRFinder reasonably improved the performance of both DNA- and RNA-binding residue predictions. More importantly, the sequence-based hybrid prediction reached competitive performance relative to our previous structure-based counterpart. Our extensive and stringent comparisons show that SNBRFinder has obvious advantages over the existing sequence-based prediction algorithms. The value of our algorithm is highlighted by establishing an easy-to-use web server that is freely accessible at http://ibi.hzau.edu.cn/SNBRFinder.
Predicting missing links in complex networks based on common neighbors and distance

PubMed Central

Yang, Jinxuan; Zhang, Xiao-Dong

2016-01-01

The algorithms based on common neighbors metric to predict missing links in complex networks are very popular, but most of these algorithms do not account for missing links between nodes with no common neighbors. It is not accurate enough to reconstruct networks by using these methods in some cases especially when between nodes have less common neighbors. We proposed in this paper a new algorithm based on common neighbors and distance to improve accuracy of link prediction. Our proposed algorithm makes remarkable effect in predicting the missing links between nodes with no common neighbors and performs better than most existing currently used methods for a variety of real-world networks without increasing complexity. PMID:27905526
Model predictive control design for polytopic uncertain systems by synthesising multi-step prediction scenarios

NASA Astrophysics Data System (ADS)

Lu, Jianbo; Xi, Yugeng; Li, Dewei; Xu, Yuli; Gan, Zhongxue

2018-01-01

A common objective of model predictive control (MPC) design is the large initial feasible region, low online computational burden as well as satisfactory control performance of the resulting algorithm. It is well known that interpolation-based MPC can achieve a favourable trade-off among these different aspects. However, the existing results are usually based on fixed prediction scenarios, which inevitably limits the performance of the obtained algorithms. So by replacing the fixed prediction scenarios with the time-varying multi-step prediction scenarios, this paper provides a new insight into improvement of the existing MPC designs. The adopted control law is a combination of predetermined multi-step feedback control laws, based on which two MPC algorithms with guaranteed recursive feasibility and asymptotic stability are presented. The efficacy of the proposed algorithms is illustrated by a numerical example.
A range-based predictive localization algorithm for WSID networks

NASA Astrophysics Data System (ADS)

Liu, Yuan; Chen, Junjie; Li, Gang

2017-11-01

Most studies on localization algorithms are conducted on the sensor networks with densely distributed nodes. However, the non-localizable problems are prone to occur in the network with sparsely distributed sensor nodes. To solve this problem, a range-based predictive localization algorithm (RPLA) is proposed in this paper for the wireless sensor networks syncretizing the RFID (WSID) networks. The Gaussian mixture model is established to predict the trajectory of a mobile target. Then, the received signal strength indication is used to reduce the residence area of the target location based on the approximate point-in-triangulation test algorithm. In addition, collaborative localization schemes are introduced to locate the target in the non-localizable situations. Simulation results verify that the RPLA achieves accurate localization for the network with sparsely distributed sensor nodes. The localization accuracy of the RPLA is 48.7% higher than that of the APIT algorithm, 16.8% higher than that of the single Gaussian model-based algorithm and 10.5% higher than that of the Kalman filtering-based algorithm.
An accelerated non-Gaussianity based multichannel predictive deconvolution method with the limited supporting region of filters

NASA Astrophysics Data System (ADS)

Li, Zhong-xiao; Li, Zhen-chun

2016-09-01

The multichannel predictive deconvolution can be conducted in overlapping temporal and spatial data windows to solve the 2D predictive filter for multiple removal. Generally, the 2D predictive filter can better remove multiples at the cost of more computation time compared with the 1D predictive filter. In this paper we first use the cross-correlation strategy to determine the limited supporting region of filters where the coefficients play a major role for multiple removal in the filter coefficient space. To solve the 2D predictive filter the traditional multichannel predictive deconvolution uses the least squares (LS) algorithm, which requires primaries and multiples are orthogonal. To relax the orthogonality assumption the iterative reweighted least squares (IRLS) algorithm and the fast iterative shrinkage thresholding (FIST) algorithm have been used to solve the 2D predictive filter in the multichannel predictive deconvolution with the non-Gaussian maximization (L1 norm minimization) constraint of primaries. The FIST algorithm has been demonstrated as a faster alternative to the IRLS algorithm. In this paper we introduce the FIST algorithm to solve the filter coefficients in the limited supporting region of filters. Compared with the FIST based multichannel predictive deconvolution without the limited supporting region of filters the proposed method can reduce the computation burden effectively while achieving a similar accuracy. Additionally, the proposed method can better balance multiple removal and primary preservation than the traditional LS based multichannel predictive deconvolution and FIST based single channel predictive deconvolution. Synthetic and field data sets demonstrate the effectiveness of the proposed method.
Local-search based prediction of medical image registration error

NASA Astrophysics Data System (ADS)

Saygili, Görkem

2018-03-01

Medical image registration is a crucial task in many different medical imaging applications. Hence, considerable amount of work has been published recently that aim to predict the error in a registration without any human effort. If provided, these error predictions can be used as a feedback to the registration algorithm to further improve its performance. Recent methods generally start with extracting image-based and deformation-based features, then apply feature pooling and finally train a Random Forest (RF) regressor to predict the real registration error. Image-based features can be calculated after applying a single registration but provide limited accuracy whereas deformation-based features such as variation of deformation vector field may require up to 20 registrations which is a considerably high time-consuming task. This paper proposes to use extracted features from a local search algorithm as image-based features to estimate the error of a registration. The proposed method comprises a local search algorithm to find corresponding voxels between registered image pairs and based on the amount of shifts and stereo confidence measures, it predicts the amount of registration error in millimetres densely using a RF regressor. Compared to other algorithms in the literature, the proposed algorithm does not require multiple registrations, can be efficiently implemented on a Graphical Processing Unit (GPU) and can still provide highly accurate error predictions in existence of large registration error. Experimental results with real registrations on a public dataset indicate a substantially high accuracy achieved by using features from the local search algorithm.
Predicting the random drift of MEMS gyroscope based on K-means clustering and OLS RBF Neural Network

NASA Astrophysics Data System (ADS)

Wang, Zhen-yu; Zhang, Li-jie

2017-10-01

Measure error of the sensor can be effectively compensated with prediction. Aiming at large random drift error of MEMS(Micro Electro Mechanical System))gyroscope, an improved learning algorithm of Radial Basis Function(RBF) Neural Network(NN) based on K-means clustering and Orthogonal Least-Squares (OLS) is proposed in this paper. The algorithm selects the typical samples as the initial cluster centers of RBF NN firstly, candidates centers with K-means algorithm secondly, and optimizes the candidate centers with OLS algorithm thirdly, which makes the network structure simpler and makes the prediction performance better. Experimental results show that the proposed K-means clustering OLS learning algorithm can predict the random drift of MEMS gyroscope effectively, the prediction error of which is 9.8019e-007°/s and the prediction time of which is 2.4169e-006s
Code-based Diagnostic Algorithms for Idiopathic Pulmonary Fibrosis. Case Validation and Improvement.

PubMed

Ley, Brett; Urbania, Thomas; Husson, Gail; Vittinghoff, Eric; Brush, David R; Eisner, Mark D; Iribarren, Carlos; Collard, Harold R

2017-06-01

Population-based studies of idiopathic pulmonary fibrosis (IPF) in the United States have been limited by reliance on diagnostic code-based algorithms that lack clinical validation. To validate a well-accepted International Classification of Diseases, Ninth Revision, code-based algorithm for IPF using patient-level information and to develop a modified algorithm for IPF with enhanced predictive value. The traditional IPF algorithm was used to identify potential cases of IPF in the Kaiser Permanente Northern California adult population from 2000 to 2014. Incidence and prevalence were determined overall and by age, sex, and race/ethnicity. A validation subset of cases (n = 150) underwent expert medical record and chest computed tomography review. A modified IPF algorithm was then derived and validated to optimize positive predictive value. From 2000 to 2014, the traditional IPF algorithm identified 2,608 cases among 5,389,627 at-risk adults in the Kaiser Permanente Northern California population. Annual incidence was 6.8/100,000 person-years (95% confidence interval [CI], 6.1-7.7) and was higher in patients with older age, male sex, and white race. The positive predictive value of the IPF algorithm was only 42.2% (95% CI, 30.6 to 54.6%); sensitivity was 55.6% (95% CI, 21.2 to 86.3%). The corrected incidence was estimated at 5.6/100,000 person-years (95% CI, 2.6-10.3). A modified IPF algorithm had improved positive predictive value but reduced sensitivity compared with the traditional algorithm. A well-accepted International Classification of Diseases, Ninth Revision, code-based IPF algorithm performs poorly, falsely classifying many non-IPF cases as IPF and missing a substantial proportion of IPF cases. A modification of the IPF algorithm may be useful for future population-based studies of IPF.
Diagnosis and prediction of periodontally compromised teeth using a deep learning-based convolutional neural network algorithm.

PubMed

Lee, Jae-Hong; Kim, Do-Hyung; Jeong, Seong-Nyum; Choi, Seong-Ho

2018-04-01

The aim of the current study was to develop a computer-assisted detection system based on a deep convolutional neural network (CNN) algorithm and to evaluate the potential usefulness and accuracy of this system for the diagnosis and prediction of periodontally compromised teeth (PCT). Combining pretrained deep CNN architecture and a self-trained network, periapical radiographic images were used to determine the optimal CNN algorithm and weights. The diagnostic and predictive accuracy, sensitivity, specificity, positive predictive value, negative predictive value, receiver operating characteristic (ROC) curve, area under the ROC curve, confusion matrix, and 95% confidence intervals (CIs) were calculated using our deep CNN algorithm, based on a Keras framework in Python. The periapical radiographic dataset was split into training (n=1,044), validation (n=348), and test (n=348) datasets. With the deep learning algorithm, the diagnostic accuracy for PCT was 81.0% for premolars and 76.7% for molars. Using 64 premolars and 64 molars that were clinically diagnosed as severe PCT, the accuracy of predicting extraction was 82.8% (95% CI, 70.1%-91.2%) for premolars and 73.4% (95% CI, 59.9%-84.0%) for molars. We demonstrated that the deep CNN algorithm was useful for assessing the diagnosis and predictability of PCT. Therefore, with further optimization of the PCT dataset and improvements in the algorithm, a computer-aided detection system can be expected to become an effective and efficient method of diagnosing and predicting PCT.
The effect of machine learning regression algorithms and sample size on individualized behavioral prediction with functional connectivity features.

PubMed

Cui, Zaixu; Gong, Gaolang

2018-06-02

Individualized behavioral/cognitive prediction using machine learning (ML) regression approaches is becoming increasingly applied. The specific ML regression algorithm and sample size are two key factors that non-trivially influence prediction accuracies. However, the effects of the ML regression algorithm and sample size on individualized behavioral/cognitive prediction performance have not been comprehensively assessed. To address this issue, the present study included six commonly used ML regression algorithms: ordinary least squares (OLS) regression, least absolute shrinkage and selection operator (LASSO) regression, ridge regression, elastic-net regression, linear support vector regression (LSVR), and relevance vector regression (RVR), to perform specific behavioral/cognitive predictions based on different sample sizes. Specifically, the publicly available resting-state functional MRI (rs-fMRI) dataset from the Human Connectome Project (HCP) was used, and whole-brain resting-state functional connectivity (rsFC) or rsFC strength (rsFCS) were extracted as prediction features. Twenty-five sample sizes (ranged from 20 to 700) were studied by sub-sampling from the entire HCP cohort. The analyses showed that rsFC-based LASSO regression performed remarkably worse than the other algorithms, and rsFCS-based OLS regression performed markedly worse than the other algorithms. Regardless of the algorithm and feature type, both the prediction accuracy and its stability exponentially increased with increasing sample size. The specific patterns of the observed algorithm and sample size effects were well replicated in the prediction using re-testing fMRI data, data processed by different imaging preprocessing schemes, and different behavioral/cognitive scores, thus indicating excellent robustness/generalization of the effects. The current findings provide critical insight into how the selected ML regression algorithm and sample size influence individualized predictions of behavior/cognition and offer important guidance for choosing the ML regression algorithm or sample size in relevant investigations. Copyright © 2018 Elsevier Inc. All rights reserved.
Compressed sensing based missing nodes prediction in temporal communication network

NASA Astrophysics Data System (ADS)

Cheng, Guangquan; Ma, Yang; Liu, Zhong; Xie, Fuli

2018-02-01

The reconstruction of complex network topology is of great theoretical and practical significance. Most research so far focuses on the prediction of missing links. There are many mature algorithms for link prediction which have achieved good results, but research on the prediction of missing nodes has just begun. In this paper, we propose an algorithm for missing node prediction in complex networks. We detect the position of missing nodes based on their neighbor nodes under the theory of compressed sensing, and extend the algorithm to the case of multiple missing nodes using spectral clustering. Experiments on real public network datasets and simulated datasets show that our algorithm can detect the locations of hidden nodes effectively with high precision.
Controlling for Frailty in Pharmacoepidemiologic Studies of Older Adults: Validation of an Existing Medicare Claims-based Algorithm.

PubMed

Cuthbertson, Carmen C; Kucharska-Newton, Anna; Faurot, Keturah R; Stürmer, Til; Jonsson Funk, Michele; Palta, Priya; Windham, B Gwen; Thai, Sydney; Lund, Jennifer L

2018-07-01

Frailty is a geriatric syndrome characterized by weakness and weight loss and is associated with adverse health outcomes. It is often an unmeasured confounder in pharmacoepidemiologic and comparative effectiveness studies using administrative claims data. Among the Atherosclerosis Risk in Communities (ARIC) Study Visit 5 participants (2011-2013; n = 3,146), we conducted a validation study to compare a Medicare claims-based algorithm of dependency in activities of daily living (or dependency) developed as a proxy for frailty with a reference standard measure of phenotypic frailty. We applied the algorithm to the ARIC participants' claims data to generate a predicted probability of dependency. Using the claims-based algorithm, we estimated the C-statistic for predicting phenotypic frailty. We further categorized participants by their predicted probability of dependency (<5%, 5% to <20%, and ≥20%) and estimated associations with difficulties in physical abilities, falls, and mortality. The claims-based algorithm showed good discrimination of phenotypic frailty (C-statistic = 0.71; 95% confidence interval [CI] = 0.67, 0.74). Participants classified with a high predicted probability of dependency (≥20%) had higher prevalence of falls and difficulty in physical ability, and a greater risk of 1-year all-cause mortality (hazard ratio = 5.7 [95% CI = 2.5, 13]) than participants classified with a low predicted probability (<5%). Sensitivity and specificity varied across predicted probability of dependency thresholds. The Medicare claims-based algorithm showed good discrimination of phenotypic frailty and high predictive ability with adverse health outcomes. This algorithm can be used in future Medicare claims analyses to reduce confounding by frailty and improve study validity.
A link prediction approach to cancer drug sensitivity prediction.

PubMed

Turki, Turki; Wei, Zhi

2017-10-03

Predicting the response to a drug for cancer disease patients based on genomic information is an important problem in modern clinical oncology. This problem occurs in part because many available drug sensitivity prediction algorithms do not consider better quality cancer cell lines and the adoption of new feature representations; both lead to the accurate prediction of drug responses. By predicting accurate drug responses to cancer, oncologists gain a more complete understanding of the effective treatments for each patient, which is a core goal in precision medicine. In this paper, we model cancer drug sensitivity as a link prediction, which is shown to be an effective technique. We evaluate our proposed link prediction algorithms and compare them with an existing drug sensitivity prediction approach based on clinical trial data. The experimental results based on the clinical trial data show the stability of our link prediction algorithms, which yield the highest area under the ROC curve (AUC) and are statistically significant. We propose a link prediction approach to obtain new feature representation. Compared with an existing approach, the results show that incorporating the new feature representation to the link prediction algorithms has significantly improved the performance.
Medical chart validation of an algorithm for identifying multiple sclerosis relapse in healthcare claims.

PubMed

Chastek, Benjamin J; Oleen-Burkey, Merrikay; Lopez-Bresnahan, Maria V

2010-01-01

Relapse is a common measure of disease activity in relapsing-remitting multiple sclerosis (MS). The objective of this study was to test the content validity of an operational algorithm for detecting relapse in claims data. A claims-based relapse detection algorithm was tested by comparing its detection rate over a 1-year period with relapses identified based on medical chart review. According to the algorithm, MS patients in a US healthcare claims database who had either (1) a primary claim for MS during hospitalization or (2) a corticosteroid claim following a MS-related outpatient visit were designated as having a relapse. Patient charts were examined for explicit indication of relapse or care suggestive of relapse. Positive and negative predictive values were calculated. Medical charts were reviewed for 300 MS patients, half of whom had a relapse according to the algorithm. The claims-based criteria correctly classified 67.3% of patients with relapses (positive predictive value) and 70.0% of patients without relapses (negative predictive value; kappa 0.373: p < 0.001). Alternative algorithms did not improve on the predictive value of the operational algorithm. Limitations of the algorithm include lack of differentiation between relapsing-remitting MS and other types, and that it does not incorporate measures of function and disability. The claims-based algorithm appeared to successfully detect moderate-to-severe MS relapse. This validated definition can be applied to future claims-based MS studies.

Network-based ranking methods for prediction of novel disease associated microRNAs.

PubMed

Le, Duc-Hau

2015-10-01

Many studies have shown roles of microRNAs on human disease and a number of computational methods have been proposed to predict such associations by ranking candidate microRNAs according to their relevance to a disease. Among them, machine learning-based methods usually have a limitation in specifying non-disease microRNAs as negative training samples. Meanwhile, network-based methods are becoming dominant since they well exploit a "disease module" principle in microRNA functional similarity networks. Of which, random walk with restart (RWR) algorithm-based method is currently state-of-the-art. The use of this algorithm was inspired from its success in predicting disease gene because the "disease module" principle also exists in protein interaction networks. Besides, many algorithms designed for webpage ranking have been successfully applied in ranking disease candidate genes because web networks share topological properties with protein interaction networks. However, these algorithms have not yet been utilized for disease microRNA prediction. We constructed microRNA functional similarity networks based on shared targets of microRNAs, and then we integrated them with a microRNA functional synergistic network, which was recently identified. After analyzing topological properties of these networks, in addition to RWR, we assessed the performance of (i) PRINCE (PRIoritizatioN and Complex Elucidation), which was proposed for disease gene prediction; (ii) PageRank with Priors (PRP) and K-Step Markov (KSM), which were used for studying web networks; and (iii) a neighborhood-based algorithm. Analyses on topological properties showed that all microRNA functional similarity networks are small-worldness and scale-free. The performance of each algorithm was assessed based on average AUC values on 35 disease phenotypes and average rankings of newly discovered disease microRNAs. As a result, the performance on the integrated network was better than that on individual ones. In addition, the performance of PRINCE, PRP and KSM was comparable with that of RWR, whereas it was worst for the neighborhood-based algorithm. Moreover, all the algorithms were stable with the change of parameters. Final, using the integrated network, we predicted six novel miRNAs (i.e., hsa-miR-101, hsa-miR-181d, hsa-miR-192, hsa-miR-423-3p, hsa-miR-484 and hsa-miR-98) associated with breast cancer. Network-based ranking algorithms, which were successfully applied for either disease gene prediction or for studying social/web networks, can be also used effectively for disease microRNA prediction. Copyright © 2015 Elsevier Ltd. All rights reserved.
External validation of the international risk prediction algorithm for major depressive episode in the US general population: the PredictD-US study.

PubMed

Nigatu, Yeshambel T; Liu, Yan; Wang, JianLi

2016-07-22

Multivariable risk prediction algorithms are useful for making clinical decisions and for health planning. While prediction algorithms for new onset of major depression in the primary care attendees in Europe and elsewhere have been developed, the performance of these algorithms in different populations is not known. The objective of this study was to validate the PredictD algorithm for new onset of major depressive episode (MDE) in the US general population. Longitudinal study design was conducted with approximate 3-year follow-up data from a nationally representative sample of the US general population. A total of 29,621 individuals who participated in Wave 1 and 2 of the US National Epidemiologic Survey on Alcohol and Related Conditions (NESARC) and who did not have an MDE in the past year at Wave 1 were included. The PredictD algorithm was directly applied to the selected participants. MDE was assessed by the Alcohol Use Disorder and Associated Disabilities Interview Schedule, based on the DSM-IV criteria. Among the participants, 8 % developed an MDE over three years. The PredictD algorithm had acceptable discriminative power (C-statistics = 0.708, 95 % CI: 0.696, 0.720), but poor calibration (p < 0.001) with the NESARC data. In the European primary care attendees, the algorithm had a C-statistics of 0.790 (95 % CI: 0.767, 0.813) with a perfect calibration. The PredictD algorithm has acceptable discrimination, but the calibration capacity was poor in the US general population despite of re-calibration. Therefore, based on the results, at current stage, the use of PredictD in the US general population for predicting individual risk of MDE is not encouraged. More independent validation research is needed.
Prediction of dynamical systems by symbolic regression

NASA Astrophysics Data System (ADS)

Quade, Markus; Abel, Markus; Shafi, Kamran; Niven, Robert K.; Noack, Bernd R.

2016-07-01

We study the modeling and prediction of dynamical systems based on conventional models derived from measurements. Such algorithms are highly desirable in situations where the underlying dynamics are hard to model from physical principles or simplified models need to be found. We focus on symbolic regression methods as a part of machine learning. These algorithms are capable of learning an analytically tractable model from data, a highly valuable property. Symbolic regression methods can be considered as generalized regression methods. We investigate two particular algorithms, the so-called fast function extraction which is a generalized linear regression algorithm, and genetic programming which is a very general method. Both are able to combine functions in a certain way such that a good model for the prediction of the temporal evolution of a dynamical system can be identified. We illustrate the algorithms by finding a prediction for the evolution of a harmonic oscillator based on measurements, by detecting an arriving front in an excitable system, and as a real-world application, the prediction of solar power production based on energy production observations at a given site together with the weather forecast.
Research on wind field algorithm of wind lidar based on BP neural network and grey prediction

NASA Astrophysics Data System (ADS)

Chen, Yong; Chen, Chun-Li; Luo, Xiong; Zhang, Yan; Yang, Ze-hou; Zhou, Jie; Shi, Xiao-ding; Wang, Lei

2018-01-01

This paper uses the BP neural network and grey algorithm to forecast and study radar wind field. In order to reduce the residual error in the wind field prediction which uses BP neural network and grey algorithm, calculating the minimum value of residual error function, adopting the residuals of the gray algorithm trained by BP neural network, using the trained network model to forecast the residual sequence, using the predicted residual error sequence to modify the forecast sequence of the grey algorithm. The test data show that using the grey algorithm modified by BP neural network can effectively reduce the residual value and improve the prediction precision.
DVD-COOP: Innovative Conjunction Prediction Using Voronoi-filter based on the Dynamic Voronoi Diagram of 3D Spheres

NASA Astrophysics Data System (ADS)

Cha, J.; Ryu, J.; Lee, M.; Song, C.; Cho, Y.; Schumacher, P.; Mah, M.; Kim, D.

Conjunction prediction is one of the critical operations in space situational awareness (SSA). For geospace objects, common algorithms for conjunction prediction are usually based on all-pairwise check, spatial hash, or kd-tree. Computational load is usually reduced through some filters. However, there exists a good chance of missing potential collisions between space objects. We present a novel algorithm which both guarantees no missing conjunction and is efficient to answer to a variety of spatial queries including pairwise conjunction prediction. The algorithm takes only O(k log N) time for N objects in the worst case to answer conjunctions where k is a constant which is linear to prediction time length. The proposed algorithm, named DVD-COOP (Dynamic Voronoi Diagram-based Conjunctive Orbital Object Predictor), is based on the dynamic Voronoi diagram of moving spherical balls in 3D space. The algorithm has a preprocessing which consists of two steps: The construction of an initial Voronoi diagram (taking O(N) time on average) and the construction of a priority queue for the events of topology changes in the Voronoi diagram (taking O(N log N) time in the worst case). The scalability of the proposed algorithm is also discussed. We hope that the proposed Voronoi-approach will change the computational paradigm in spatial reasoning among space objects.
Prediction based active ramp metering control strategy with mobility and safety assessment

NASA Astrophysics Data System (ADS)

Fang, Jie; Tu, Lili

2018-04-01

Ramp metering is one of the most direct and efficient motorway traffic flow management measures so as to improve traffic conditions. However, owing to short of traffic conditions prediction, in earlier studies, the impact on traffic flow dynamics of the applied RM control was not quantitatively evaluated. In this study, a RM control algorithm adopting Model Predictive Control (MPC) framework to predict and assess future traffic conditions, which taking both the current traffic conditions and the RM-controlled future traffic states into consideration, was presented. The designed RM control algorithm targets at optimizing the network mobility and safety performance. The designed algorithm is evaluated in a field-data-based simulation. Through comparing the presented algorithm controlled scenario with the uncontrolled scenario, it was proved that the proposed RM control algorithm can effectively relieve the congestion of traffic network with no significant compromises in safety aspect.
Electroencephalogram-based decoding cognitive states using convolutional neural network and likelihood ratio based score fusion.

PubMed

Zafar, Raheel; Dass, Sarat C; Malik, Aamir Saeed

2017-01-01

Electroencephalogram (EEG)-based decoding human brain activity is challenging, owing to the low spatial resolution of EEG. However, EEG is an important technique, especially for brain-computer interface applications. In this study, a novel algorithm is proposed to decode brain activity associated with different types of images. In this hybrid algorithm, convolutional neural network is modified for the extraction of features, a t-test is used for the selection of significant features and likelihood ratio-based score fusion is used for the prediction of brain activity. The proposed algorithm takes input data from multichannel EEG time-series, which is also known as multivariate pattern analysis. Comprehensive analysis was conducted using data from 30 participants. The results from the proposed method are compared with current recognized feature extraction and classification/prediction techniques. The wavelet transform-support vector machine method is the most popular currently used feature extraction and prediction method. This method showed an accuracy of 65.7%. However, the proposed method predicts the novel data with improved accuracy of 79.9%. In conclusion, the proposed algorithm outperformed the current feature extraction and prediction method.
Novel Near-Lossless Compression Algorithm for Medical Sequence Images with Adaptive Block-Based Spatial Prediction.

PubMed

Song, Xiaoying; Huang, Qijun; Chang, Sheng; He, Jin; Wang, Hao

2016-12-01

To address the low compression efficiency of lossless compression and the low image quality of general near-lossless compression, a novel near-lossless compression algorithm based on adaptive spatial prediction is proposed for medical sequence images for possible diagnostic use in this paper. The proposed method employs adaptive block size-based spatial prediction to predict blocks directly in the spatial domain and Lossless Hadamard Transform before quantization to improve the quality of reconstructed images. The block-based prediction breaks the pixel neighborhood constraint and takes full advantage of the local spatial correlations found in medical images. The adaptive block size guarantees a more rational division of images and the improved use of the local structure. The results indicate that the proposed algorithm can efficiently compress medical images and produces a better peak signal-to-noise ratio (PSNR) under the same pre-defined distortion than other near-lossless methods.
Prediction Of The Expected Safety Performance Of Rural Two-Lane Highways

DOT National Transportation Integrated Search

2000-12-01

This report presents an algorithm for predicting the safety performance of a rural two-lane highway. The accident prediction algorithm consists of base models and accident modification factors for both roadway segments and at-grade intersections on r...
Model Predictive Control Based Motion Drive Algorithm for a Driving Simulator

NASA Astrophysics Data System (ADS)

Rehmatullah, Faizan

In this research, we develop a model predictive control based motion drive algorithm for the driving simulator at Toronto Rehabilitation Institute. Motion drive algorithms exploit the limitations of the human vestibular system to formulate a perception of motion within the constrained workspace of a simulator. In the absence of visual cues, the human perception system is unable to distinguish between acceleration and the force of gravity. The motion drive algorithm determines control inputs to displace the simulator platform, and by using the resulting inertial forces and angular rates, creates the perception of motion. By using model predictive control, we can optimize the use of simulator workspace for every maneuver while simulating the vehicle perception. With the ability to handle nonlinear constraints, the model predictive control allows us to incorporate workspace limitations.
An Efficient Deterministic Approach to Model-based Prediction Uncertainty Estimation

NASA Technical Reports Server (NTRS)

Daigle, Matthew J.; Saxena, Abhinav; Goebel, Kai

2012-01-01

Prognostics deals with the prediction of the end of life (EOL) of a system. EOL is a random variable, due to the presence of process noise and uncertainty in the future inputs to the system. Prognostics algorithm must account for this inherent uncertainty. In addition, these algorithms never know exactly the state of the system at the desired time of prediction, or the exact model describing the future evolution of the system, accumulating additional uncertainty into the predicted EOL. Prediction algorithms that do not account for these sources of uncertainty are misrepresenting the EOL and can lead to poor decisions based on their results. In this paper, we explore the impact of uncertainty in the prediction problem. We develop a general model-based prediction algorithm that incorporates these sources of uncertainty, and propose a novel approach to efficiently handle uncertainty in the future input trajectories of a system by using the unscented transformation. Using this approach, we are not only able to reduce the computational load but also estimate the bounds of uncertainty in a deterministic manner, which can be useful to consider during decision-making. Using a lithium-ion battery as a case study, we perform several simulation-based experiments to explore these issues, and validate the overall approach using experimental data from a battery testbed.
A time series based sequence prediction algorithm to detect activities of daily living in smart home.

PubMed

Marufuzzaman, M; Reaz, M B I; Ali, M A M; Rahman, L F

2015-01-01

The goal of smart homes is to create an intelligent environment adapting the inhabitants need and assisting the person who needs special care and safety in their daily life. This can be reached by collecting the ADL (activities of daily living) data and further analysis within existing computing elements. In this research, a very recent algorithm named sequence prediction via enhanced episode discovery (SPEED) is modified and in order to improve accuracy time component is included. The modified SPEED or M-SPEED is a sequence prediction algorithm, which modified the previous SPEED algorithm by using time duration of appliance's ON-OFF states to decide the next state. M-SPEED discovered periodic episodes of inhabitant behavior, trained it with learned episodes, and made decisions based on the obtained knowledge. The results showed that M-SPEED achieves 96.8% prediction accuracy, which is better than other time prediction algorithms like PUBS, ALZ with temporal rules and the previous SPEED. Since human behavior shows natural temporal patterns, duration times can be used to predict future events more accurately. This inhabitant activity prediction system will certainly improve the smart homes by ensuring safety and better care for elderly and handicapped people.
Feed-Forward Neural Network Soft-Sensor Modeling of Flotation Process Based on Particle Swarm Optimization and Gravitational Search Algorithm

PubMed Central

Wang, Jie-Sheng; Han, Shuang

2015-01-01

For predicting the key technology indicators (concentrate grade and tailings recovery rate) of flotation process, a feed-forward neural network (FNN) based soft-sensor model optimized by the hybrid algorithm combining particle swarm optimization (PSO) algorithm and gravitational search algorithm (GSA) is proposed. Although GSA has better optimization capability, it has slow convergence velocity and is easy to fall into local optimum. So in this paper, the velocity vector and position vector of GSA are adjusted by PSO algorithm in order to improve its convergence speed and prediction accuracy. Finally, the proposed hybrid algorithm is adopted to optimize the parameters of FNN soft-sensor model. Simulation results show that the model has better generalization and prediction accuracy for the concentrate grade and tailings recovery rate to meet the online soft-sensor requirements of the real-time control in the flotation process. PMID:26583034
Can Mapping Algorithms Based on Raw Scores Overestimate QALYs Gained by Treatment? A Comparison of Mappings Between the Roland-Morris Disability Questionnaire and the EQ-5D-3L Based on Raw and Differenced Score Data.

PubMed

Madan, Jason; Khan, Kamran A; Petrou, Stavros; Lamb, Sarah E

2017-05-01

Mapping algorithms are increasingly being used to predict health-utility values based on responses or scores from non-preference-based measures, thereby informing economic evaluations. We explored whether predictions in the EuroQol 5-dimension 3-level instrument (EQ-5D-3L) health-utility gains from mapping algorithms might differ if estimated using differenced versus raw scores, using the Roland-Morris Disability Questionnaire (RMQ), a widely used health status measure for low back pain, as an example. We estimated algorithms mapping within-person changes in RMQ scores to changes in EQ-5D-3L health utilities using data from two clinical trials with repeated observations. We also used logistic regression models to estimate response mapping algorithms from these data to predict within-person changes in responses to each EQ-5D-3L dimension from changes in RMQ scores. Predicted health-utility gains from these mappings were compared with predictions based on raw RMQ data. Using differenced scores reduced the predicted health-utility gain from a unit decrease in RMQ score from 0.037 (standard error [SE] 0.001) to 0.020 (SE 0.002). Analysis of response mapping data suggests that the use of differenced data reduces the predicted impact of reducing RMQ scores across EQ-5D-3L dimensions and that patients can experience health-utility gains on the EQ-5D-3L 'usual activity' dimension independent from improvements captured by the RMQ. Mappings based on raw RMQ data overestimate the EQ-5D-3L health utility gains from interventions that reduce RMQ scores. Where possible, mapping algorithms should reflect within-person changes in health outcome and be estimated from datasets containing repeated observations if they are to be used to estimate incremental health-utility gains.
A traveling salesman approach for predicting protein functions.

PubMed

Johnson, Olin; Liu, Jing

2006-10-12

Protein-protein interaction information can be used to predict unknown protein functions and to help study biological pathways. Here we present a new approach utilizing the classic Traveling Salesman Problem to study the protein-protein interactions and to predict protein functions in budding yeast Saccharomyces cerevisiae. We apply the global optimization tool from combinatorial optimization algorithms to cluster the yeast proteins based on the global protein interaction information. We then use this clustering information to help us predict protein functions. We use our algorithm together with the direct neighbor algorithm 1 on characterized proteins and compare the prediction accuracy of the two methods. We show our algorithm can produce better predictions than the direct neighbor algorithm, which only considers the immediate neighbors of the query protein. Our method is a promising one to be used as a general tool to predict functions of uncharacterized proteins and a successful sample of using computer science knowledge and algorithms to study biological problems.
A traveling salesman approach for predicting protein functions

PubMed Central

Johnson, Olin; Liu, Jing

2006-01-01

Background Protein-protein interaction information can be used to predict unknown protein functions and to help study biological pathways. Results Here we present a new approach utilizing the classic Traveling Salesman Problem to study the protein-protein interactions and to predict protein functions in budding yeast Saccharomyces cerevisiae. We apply the global optimization tool from combinatorial optimization algorithms to cluster the yeast proteins based on the global protein interaction information. We then use this clustering information to help us predict protein functions. We use our algorithm together with the direct neighbor algorithm [1] on characterized proteins and compare the prediction accuracy of the two methods. We show our algorithm can produce better predictions than the direct neighbor algorithm, which only considers the immediate neighbors of the query protein. Conclusion Our method is a promising one to be used as a general tool to predict functions of uncharacterized proteins and a successful sample of using computer science knowledge and algorithms to study biological problems. PMID:17147783
The Icarus challenge - Predicting vulnerability to climate change using an algorithm-based species' trait approach

EPA Science Inventory

The Icarus challenge - Predicting vulnerability to climate change using an algorithm-based species’ trait approachHenry Lee II, Christina Folger, Deborah A. Reusser, Patrick Clinton, and Rene Graham1 U.S. EPA, Western Ecology Division, Newport, OR USA E-mail: lee.henry@ep...
Analysis of energy-based algorithms for RNA secondary structure prediction

PubMed Central

2012-01-01

Background RNA molecules play critical roles in the cells of organisms, including roles in gene regulation, catalysis, and synthesis of proteins. Since RNA function depends in large part on its folded structures, much effort has been invested in developing accurate methods for prediction of RNA secondary structure from the base sequence. Minimum free energy (MFE) predictions are widely used, based on nearest neighbor thermodynamic parameters of Mathews, Turner et al. or those of Andronescu et al. Some recently proposed alternatives that leverage partition function calculations find the structure with maximum expected accuracy (MEA) or pseudo-expected accuracy (pseudo-MEA) methods. Advances in prediction methods are typically benchmarked using sensitivity, positive predictive value and their harmonic mean, namely F-measure, on datasets of known reference structures. Since such benchmarks document progress in improving accuracy of computational prediction methods, it is important to understand how measures of accuracy vary as a function of the reference datasets and whether advances in algorithms or thermodynamic parameters yield statistically significant improvements. Our work advances such understanding for the MFE and (pseudo-)MEA-based methods, with respect to the latest datasets and energy parameters. Results We present three main findings. First, using the bootstrap percentile method, we show that the average F-measure accuracy of the MFE and (pseudo-)MEA-based algorithms, as measured on our largest datasets with over 2000 RNAs from diverse families, is a reliable estimate (within a 2% range with high confidence) of the accuracy of a population of RNA molecules represented by this set. However, average accuracy on smaller classes of RNAs such as a class of 89 Group I introns used previously in benchmarking algorithm accuracy is not reliable enough to draw meaningful conclusions about the relative merits of the MFE and MEA-based algorithms. Second, on our large datasets, the algorithm with best overall accuracy is a pseudo MEA-based algorithm of Hamada et al. that uses a generalized centroid estimator of base pairs. However, between MFE and other MEA-based methods, there is no clear winner in the sense that the relative accuracy of the MFE versus MEA-based algorithms changes depending on the underlying energy parameters. Third, of the four parameter sets we considered, the best accuracy for the MFE-, MEA-based, and pseudo-MEA-based methods is 0.686, 0.680, and 0.711, respectively (on a scale from 0 to 1 with 1 meaning perfect structure predictions) and is obtained with a thermodynamic parameter set obtained by Andronescu et al. called BL* (named after the Boltzmann likelihood method by which the parameters were derived). Conclusions Large datasets should be used to obtain reliable measures of the accuracy of RNA structure prediction algorithms, and average accuracies on specific classes (such as Group I introns and Transfer RNAs) should be interpreted with caution, considering the relatively small size of currently available datasets for such classes. The accuracy of the MEA-based methods is significantly higher when using the BL* parameter set of Andronescu et al. than when using the parameters of Mathews and Turner, and there is no significant difference between the accuracy of MEA-based methods and MFE when using the BL* parameters. The pseudo-MEA-based method of Hamada et al. with the BL* parameter set significantly outperforms all other MFE and MEA-based algorithms on our large data sets. PMID:22296803
Analysis of energy-based algorithms for RNA secondary structure prediction.

PubMed

Hajiaghayi, Monir; Condon, Anne; Hoos, Holger H

2012-02-01

RNA molecules play critical roles in the cells of organisms, including roles in gene regulation, catalysis, and synthesis of proteins. Since RNA function depends in large part on its folded structures, much effort has been invested in developing accurate methods for prediction of RNA secondary structure from the base sequence. Minimum free energy (MFE) predictions are widely used, based on nearest neighbor thermodynamic parameters of Mathews, Turner et al. or those of Andronescu et al. Some recently proposed alternatives that leverage partition function calculations find the structure with maximum expected accuracy (MEA) or pseudo-expected accuracy (pseudo-MEA) methods. Advances in prediction methods are typically benchmarked using sensitivity, positive predictive value and their harmonic mean, namely F-measure, on datasets of known reference structures. Since such benchmarks document progress in improving accuracy of computational prediction methods, it is important to understand how measures of accuracy vary as a function of the reference datasets and whether advances in algorithms or thermodynamic parameters yield statistically significant improvements. Our work advances such understanding for the MFE and (pseudo-)MEA-based methods, with respect to the latest datasets and energy parameters. We present three main findings. First, using the bootstrap percentile method, we show that the average F-measure accuracy of the MFE and (pseudo-)MEA-based algorithms, as measured on our largest datasets with over 2000 RNAs from diverse families, is a reliable estimate (within a 2% range with high confidence) of the accuracy of a population of RNA molecules represented by this set. However, average accuracy on smaller classes of RNAs such as a class of 89 Group I introns used previously in benchmarking algorithm accuracy is not reliable enough to draw meaningful conclusions about the relative merits of the MFE and MEA-based algorithms. Second, on our large datasets, the algorithm with best overall accuracy is a pseudo MEA-based algorithm of Hamada et al. that uses a generalized centroid estimator of base pairs. However, between MFE and other MEA-based methods, there is no clear winner in the sense that the relative accuracy of the MFE versus MEA-based algorithms changes depending on the underlying energy parameters. Third, of the four parameter sets we considered, the best accuracy for the MFE-, MEA-based, and pseudo-MEA-based methods is 0.686, 0.680, and 0.711, respectively (on a scale from 0 to 1 with 1 meaning perfect structure predictions) and is obtained with a thermodynamic parameter set obtained by Andronescu et al. called BL* (named after the Boltzmann likelihood method by which the parameters were derived). Large datasets should be used to obtain reliable measures of the accuracy of RNA structure prediction algorithms, and average accuracies on specific classes (such as Group I introns and Transfer RNAs) should be interpreted with caution, considering the relatively small size of currently available datasets for such classes. The accuracy of the MEA-based methods is significantly higher when using the BL* parameter set of Andronescu et al. than when using the parameters of Mathews and Turner, and there is no significant difference between the accuracy of MEA-based methods and MFE when using the BL* parameters. The pseudo-MEA-based method of Hamada et al. with the BL* parameter set significantly outperforms all other MFE and MEA-based algorithms on our large data sets.
An improved reversible data hiding algorithm based on modification of prediction errors

NASA Astrophysics Data System (ADS)

Jafar, Iyad F.; Hiary, Sawsan A.; Darabkh, Khalid A.

2014-04-01

Reversible data hiding algorithms are concerned with the ability of hiding data and recovering the original digital image upon extraction. This issue is of interest in medical and military imaging applications. One particular class of such algorithms relies on the idea of histogram shifting of prediction errors. In this paper, we propose an improvement over one popular algorithm in this class. The improvement is achieved by employing a different predictor, the use of more bins in the prediction error histogram in addition to multilevel embedding. The proposed extension shows significant improvement over the original algorithm and its variations.

Predictive Model of Linear Antimicrobial Peptides Active against Gram-Negative Bacteria.

PubMed

Vishnepolsky, Boris; Gabrielian, Andrei; Rosenthal, Alex; Hurt, Darrell E; Tartakovsky, Michael; Managadze, Grigol; Grigolava, Maya; Makhatadze, George I; Pirtskhalava, Malak

2018-05-29

Antimicrobial peptides (AMPs) have been identified as a potential new class of anti-infectives for drug development. There are a lot of computational methods that try to predict AMPs. Most of them can only predict if a peptide will show any antimicrobial potency, but to the best of our knowledge, there are no tools which can predict antimicrobial potency against particular strains. Here we present a predictive model of linear AMPs being active against particular Gram-negative strains relying on a semi-supervised machine-learning approach with a density-based clustering algorithm. The algorithm can well distinguish peptides active against particular strains from others which may also be active but not against the considered strain. The available AMP prediction tools cannot carry out this task. The prediction tool based on the algorithm suggested herein is available on https://dbaasp.org.
A system for learning statistical motion patterns.

PubMed

Hu, Weiming; Xiao, Xuejuan; Fu, Zhouyu; Xie, Dan; Tan, Tieniu; Maybank, Steve

2006-09-01

Analysis of motion patterns is an effective approach for anomaly detection and behavior prediction. Current approaches for the analysis of motion patterns depend on known scenes, where objects move in predefined ways. It is highly desirable to automatically construct object motion patterns which reflect the knowledge of the scene. In this paper, we present a system for automatically learning motion patterns for anomaly detection and behavior prediction based on a proposed algorithm for robustly tracking multiple objects. In the tracking algorithm, foreground pixels are clustered using a fast accurate fuzzy K-means algorithm. Growing and prediction of the cluster centroids of foreground pixels ensure that each cluster centroid is associated with a moving object in the scene. In the algorithm for learning motion patterns, trajectories are clustered hierarchically using spatial and temporal information and then each motion pattern is represented with a chain of Gaussian distributions. Based on the learned statistical motion patterns, statistical methods are used to detect anomalies and predict behaviors. Our system is tested using image sequences acquired, respectively, from a crowded real traffic scene and a model traffic scene. Experimental results show the robustness of the tracking algorithm, the efficiency of the algorithm for learning motion patterns, and the encouraging performance of algorithms for anomaly detection and behavior prediction.
Serious injury prediction algorithm based on large-scale data and under-triage control.

PubMed

Nishimoto, Tetsuya; Mukaigawa, Kosuke; Tominaga, Shigeru; Lubbe, Nils; Kiuchi, Toru; Motomura, Tomokazu; Matsumoto, Hisashi

2017-01-01

The present study was undertaken to construct an algorithm for an advanced automatic collision notification system based on national traffic accident data compiled by Japanese police. While US research into the development of a serious-injury prediction algorithm is based on a logistic regression algorithm using the National Automotive Sampling System/Crashworthiness Data System, the present injury prediction algorithm was based on comprehensive police data covering all accidents that occurred across Japan. The particular focus of this research is to improve the rescue of injured vehicle occupants in traffic accidents, and the present algorithm assumes the use of an onboard event data recorder data from which risk factors such as pseudo delta-V, vehicle impact location, seatbelt wearing or non-wearing, involvement in a single impact or multiple impact crash and the occupant's age can be derived. As a result, a simple and handy algorithm suited for onboard vehicle installation was constructed from a sample of half of the available police data. The other half of the police data was applied to the validation testing of this new algorithm using receiver operating characteristic analysis. An additional validation was conducted using in-depth investigation of accident injuries in collaboration with prospective host emergency care institutes. The validated algorithm, named the TOYOTA-Nihon University algorithm, proved to be as useful as the US URGENCY and other existing algorithms. Furthermore, an under-triage control analysis found that the present algorithm could achieve an under-triage rate of less than 10% by setting a threshold of 8.3%. Copyright © 2016 Elsevier Ltd. All rights reserved.
Development of a generally applicable morphokinetic algorithm capable of predicting the implantation potential of embryos transferred on Day 3.

PubMed

Petersen, Bjørn Molt; Boel, Mikkel; Montag, Markus; Gardner, David K

2016-10-01

Can a generally applicable morphokinetic algorithm suitable for Day 3 transfers of time-lapse monitored embryos originating from different culture conditions and fertilization methods be developed for the purpose of supporting the embryologist's decision on which embryo to transfer back to the patient in assisted reproduction? The algorithm presented here can be used independently of culture conditions and fertilization method and provides predictive power not surpassed by other published algorithms for ranking embryos according to their blastocyst formation potential. Generally applicable algorithms have so far been developed only for predicting blastocyst formation. A number of clinics have reported validated implantation prediction algorithms, which have been developed based on clinic-specific culture conditions and clinical environment. However, a generally applicable embryo evaluation algorithm based on actual implantation outcome has not yet been reported. Retrospective evaluation of data extracted from a database of known implantation data (KID) originating from 3275 embryos transferred on Day 3 conducted in 24 clinics between 2009 and 2014. The data represented different culture conditions (reduced and ambient oxygen with various culture medium strategies) and fertilization methods (IVF, ICSI). The capability to predict blastocyst formation was evaluated on an independent set of morphokinetic data from 11 218 embryos which had been cultured to Day 5. PARTICIPANTS/MATERIALS, SETTING, The algorithm was developed by applying automated recursive partitioning to a large number of annotation types and derived equations, progressing to a five-fold cross-validation test of the complete data set and a validation test of different incubation conditions and fertilization methods. The results were expressed as receiver operating characteristics curves using the area under the curve (AUC) to establish the predictive strength of the algorithm. By applying the here developed algorithm (KIDScore), which was based on six annotations (the number of pronuclei equals 2 at the 1-cell stage, time from insemination to pronuclei fading at the 1-cell stage, time from insemination to the 2-cell stage, time from insemination to the 3-cell stage, time from insemination to the 5-cell stage and time from insemination to the 8-cell stage) and ranking the embryos in five groups, the implantation potential of the embryos was predicted with an AUC of 0.650. On Day 3 the KIDScore algorithm was capable of predicting blastocyst development with an AUC of 0.745 and blastocyst quality with an AUC of 0.679. In a comparison of blastocyst prediction including six other published algorithms and KIDScore, only KIDScore and one more algorithm surpassed an algorithm constructed on conventional Alpha/ESHRE consensus timings in terms of predictive power. Some morphological assessments were not available and consequently three of the algorithms in the comparison were not used in full and may therefore have been put at a disadvantage. Algorithms based on implantation data from Day 3 embryo transfers require adjustments to be capable of predicting the implantation potential of Day 5 embryo transfers. The current study is restricted by its retrospective nature and absence of live birth information. Prospective Randomized Controlled Trials should be used in future studies to establish the value of time-lapse technology and morphokinetic evaluation. Algorithms applicable to different culture conditions can be developed if based on large data sets of heterogeneous origin. This study was funded by Vitrolife A/S, Denmark and Vitrolife AB, Sweden. B.M.P.'s company BMP Analytics is performing consultancy for Vitrolife A/S. M.B. is employed at Vitrolife A/S. M.M.'s company ilabcomm GmbH received honorarium for consultancy from Vitrolife AB. D.K.G. received research support from Vitrolife AB. © The Author 2016. Published by Oxford University Press on behalf of the European Society of Human Reproduction and Embryology.
Development of a generally applicable morphokinetic algorithm capable of predicting the implantation potential of embryos transferred on Day 3

PubMed Central

Petersen, Bjørn Molt; Boel, Mikkel; Montag, Markus; Gardner, David K.

2016-01-01

STUDY QUESTION Can a generally applicable morphokinetic algorithm suitable for Day 3 transfers of time-lapse monitored embryos originating from different culture conditions and fertilization methods be developed for the purpose of supporting the embryologist's decision on which embryo to transfer back to the patient in assisted reproduction? SUMMARY ANSWER The algorithm presented here can be used independently of culture conditions and fertilization method and provides predictive power not surpassed by other published algorithms for ranking embryos according to their blastocyst formation potential. WHAT IS KNOWN ALREADY Generally applicable algorithms have so far been developed only for predicting blastocyst formation. A number of clinics have reported validated implantation prediction algorithms, which have been developed based on clinic-specific culture conditions and clinical environment. However, a generally applicable embryo evaluation algorithm based on actual implantation outcome has not yet been reported. STUDY DESIGN, SIZE, DURATION Retrospective evaluation of data extracted from a database of known implantation data (KID) originating from 3275 embryos transferred on Day 3 conducted in 24 clinics between 2009 and 2014. The data represented different culture conditions (reduced and ambient oxygen with various culture medium strategies) and fertilization methods (IVF, ICSI). The capability to predict blastocyst formation was evaluated on an independent set of morphokinetic data from 11 218 embryos which had been cultured to Day 5. PARTICIPANTS/MATERIALS, SETTING, METHODS The algorithm was developed by applying automated recursive partitioning to a large number of annotation types and derived equations, progressing to a five-fold cross-validation test of the complete data set and a validation test of different incubation conditions and fertilization methods. The results were expressed as receiver operating characteristics curves using the area under the curve (AUC) to establish the predictive strength of the algorithm. MAIN RESULTS AND THE ROLE OF CHANCE By applying the here developed algorithm (KIDScore), which was based on six annotations (the number of pronuclei equals 2 at the 1-cell stage, time from insemination to pronuclei fading at the 1-cell stage, time from insemination to the 2-cell stage, time from insemination to the 3-cell stage, time from insemination to the 5-cell stage and time from insemination to the 8-cell stage) and ranking the embryos in five groups, the implantation potential of the embryos was predicted with an AUC of 0.650. On Day 3 the KIDScore algorithm was capable of predicting blastocyst development with an AUC of 0.745 and blastocyst quality with an AUC of 0.679. In a comparison of blastocyst prediction including six other published algorithms and KIDScore, only KIDScore and one more algorithm surpassed an algorithm constructed on conventional Alpha/ESHRE consensus timings in terms of predictive power. LIMITATIONS, REASONS FOR CAUTION Some morphological assessments were not available and consequently three of the algorithms in the comparison were not used in full and may therefore have been put at a disadvantage. Algorithms based on implantation data from Day 3 embryo transfers require adjustments to be capable of predicting the implantation potential of Day 5 embryo transfers. The current study is restricted by its retrospective nature and absence of live birth information. Prospective Randomized Controlled Trials should be used in future studies to establish the value of time-lapse technology and morphokinetic evaluation. WIDER IMPLICATIONS OF THE FINDINGS Algorithms applicable to different culture conditions can be developed if based on large data sets of heterogeneous origin. STUDY FUNDING/COMPETING INTEREST(S) This study was funded by Vitrolife A/S, Denmark and Vitrolife AB, Sweden. B.M.P.’s company BMP Analytics is performing consultancy for Vitrolife A/S. M.B. is employed at Vitrolife A/S. M.M.’s company ilabcomm GmbH received honorarium for consultancy from Vitrolife AB. D.K.G. received research support from Vitrolife AB. PMID:27609980
Fast Demand Forecast of Electric Vehicle Charging Stations for Cell Phone Application

DOE Office of Scientific and Technical Information (OSTI.GOV)

Majidpour, Mostafa; Qiu, Charlie; Chung, Ching-Yen

This paper describes the core cellphone application algorithm which has been implemented for the prediction of energy consumption at Electric Vehicle (EV) Charging Stations at UCLA. For this interactive user application, the total time of accessing database, processing the data and making the prediction, needs to be within a few seconds. We analyze four relatively fast Machine Learning based time series prediction algorithms for our prediction engine: Historical Average, kNearest Neighbor, Weighted k-Nearest Neighbor, and Lazy Learning. The Nearest Neighbor algorithm (k Nearest Neighbor with k=1) shows better performance and is selected to be the prediction algorithm implemented for themore » cellphone application. Two applications have been designed on top of the prediction algorithm: one predicts the expected available energy at the station and the other one predicts the expected charging finishing time. The total time, including accessing the database, data processing, and prediction is about one second for both applications.« less
The SIST-M: Predictive validity of a brief structured Clinical Dementia Rating interview

PubMed Central

Okereke, Olivia I.; Pantoja-Galicia, Norberto; Copeland, Maura; Hyman, Bradley T.; Wanggaard, Taylor; Albert, Marilyn S.; Betensky, Rebecca A.; Blacker, Deborah

2011-01-01

Background We previously established reliability and cross-sectional validity of the SIST-M (Structured Interview and Scoring Tool–Massachusetts Alzheimer's Disease Research Center), a shortened version of an instrument shown to predict progression to Alzheimer disease (AD), even among persons with very mild cognitive impairment (vMCI). Objective To test predictive validity of the SIST-M. Methods Participants were 342 community-dwelling, non-demented older adults in a longitudinal study. Baseline Clinical Dementia Rating (CDR) ratings were determined by either: 1) clinician interviews or 2) a previously developed computer algorithm based on 60 questions (of a possible 131) extracted from clinician interviews. We developed age+gender+education-adjusted Cox proportional hazards models using CDR-sum-of-boxes (CDR-SB) as the predictor, where CDR-SB was determined by either clinician interview or algorithm; models were run for the full sample (n=342) and among those jointly classified as vMCI using clinician- and algorithm-based CDR ratings (n=156). We directly compared predictive accuracy using time-dependent Receiver Operating Characteristic (ROC) curves. Results AD hazard ratios (HRs) were similar for clinician-based and algorithm-based CDR-SB: for a 1-point increment in CDR-SB, respective HRs (95% CI)=3.1 (2.5,3.9) and 2.8 (2.2,3.5); among those with vMCI, respective HRs (95% CI) were 2.2 (1.6,3.2) and 2.1 (1.5,3.0). Similarly high predictive accuracy was achieved: the concordance probability (weighted average of the area-under-the-ROC curves) over follow-up was 0.78 vs. 0.76 using clinician-based vs. algorithm-based CDR-SB. Conclusion CDR scores based on items from this shortened interview had high predictive ability for AD – comparable to that using a lengthy clinical interview. PMID:21986342
Comparison and optimization of in silico algorithms for predicting the pathogenicity of sodium channel variants in epilepsy.

PubMed

Holland, Katherine D; Bouley, Thomas M; Horn, Paul S

2017-07-01

Variants in neuronal voltage-gated sodium channel α-subunits genes SCN1A, SCN2A, and SCN8A are common in early onset epileptic encephalopathies and other autosomal dominant childhood epilepsy syndromes. However, in clinical practice, missense variants are often classified as variants of uncertain significance when missense variants are identified but heritability cannot be determined. Genetic testing reports often include results of computational tests to estimate pathogenicity and the frequency of that variant in population-based databases. The objective of this work was to enhance clinicians' understanding of results by (1) determining how effectively computational algorithms predict epileptogenicity of sodium channel (SCN) missense variants; (2) optimizing their predictive capabilities; and (3) determining if epilepsy-associated SCN variants are present in population-based databases. This will help clinicians better understand the results of indeterminate SCN test results in people with epilepsy. Pathogenic, likely pathogenic, and benign variants in SCNs were identified using databases of sodium channel variants. Benign variants were also identified from population-based databases. Eight algorithms commonly used to predict pathogenicity were compared. In addition, logistic regression was used to determine if a combination of algorithms could better predict pathogenicity. Based on American College of Medical Genetic Criteria, 440 variants were classified as pathogenic or likely pathogenic and 84 were classified as benign or likely benign. Twenty-eight variants previously associated with epilepsy were present in population-based gene databases. The output provided by most computational algorithms had a high sensitivity but low specificity with an accuracy of 0.52-0.77. Accuracy could be improved by adjusting the threshold for pathogenicity. Using this adjustment, the Mendelian Clinically Applicable Pathogenicity (M-CAP) algorithm had an accuracy of 0.90 and a combination of algorithms increased the accuracy to 0.92. Potentially pathogenic variants are present in population-based sources. Most computational algorithms overestimate pathogenicity; however, a weighted combination of several algorithms increased classification accuracy to >0.90. Wiley Periodicals, Inc. © 2017 International League Against Epilepsy.
Real-time prediction and gating of respiratory motion using an extended Kalman filter and Gaussian process regression

NASA Astrophysics Data System (ADS)

Bukhari, W.; Hong, S.-M.

2015-01-01

Motion-adaptive radiotherapy aims to deliver a conformal dose to the target tumour with minimal normal tissue exposure by compensating for tumour motion in real time. The prediction as well as the gating of respiratory motion have received much attention over the last two decades for reducing the targeting error of the treatment beam due to respiratory motion. In this article, we present a real-time algorithm for predicting and gating respiratory motion that utilizes a model-based and a model-free Bayesian framework by combining them in a cascade structure. The algorithm, named EKF-GPR+, implements a gating function without pre-specifying a particular region of the patient’s breathing cycle. The algorithm first employs an extended Kalman filter (LCM-EKF) to predict the respiratory motion and then uses a model-free Gaussian process regression (GPR) to correct the error of the LCM-EKF prediction. The GPR is a non-parametric Bayesian algorithm that yields predictive variance under Gaussian assumptions. The EKF-GPR+ algorithm utilizes the predictive variance from the GPR component to capture the uncertainty in the LCM-EKF prediction error and systematically identify breathing points with a higher probability of large prediction error in advance. This identification allows us to pause the treatment beam over such instances. EKF-GPR+ implements the gating function by using simple calculations based on the predictive variance with no additional detection mechanism. A sparse approximation of the GPR algorithm is employed to realize EKF-GPR+ in real time. Extensive numerical experiments are performed based on a large database of 304 respiratory motion traces to evaluate EKF-GPR+. The experimental results show that the EKF-GPR+ algorithm effectively reduces the prediction error in a root-mean-square (RMS) sense by employing the gating function, albeit at the cost of a reduced duty cycle. As an example, EKF-GPR+ reduces the patient-wise RMS error to 37%, 39% and 42% in percent ratios relative to no prediction for a duty cycle of 80% at lookahead lengths of 192 ms, 384 ms and 576 ms, respectively. The experiments also confirm that EKF-GPR+ controls the duty cycle with reasonable accuracy.
Verification of Pharmacogenetics-Based Warfarin Dosing Algorithms in Han-Chinese Patients Undertaking Mechanic Heart Valve Replacement

PubMed Central

Zhao, Li; Chen, Chunxia; Li, Bei; Dong, Li; Guo, Yingqiang; Xiao, Xijun; Zhang, Eryong; Qin, Li

2014-01-01

Objective To study the performance of pharmacogenetics-based warfarin dosing algorithms in the initial and the stable warfarin treatment phases in a cohort of Han-Chinese patients undertaking mechanic heart valve replacement. Methods We searched PubMed, Chinese National Knowledge Infrastructure and Wanfang databases for selecting pharmacogenetics-based warfarin dosing models. Patients with mechanic heart valve replacement were consecutively recruited between March 2012 and July 2012. The predicted warfarin dose of each patient was calculated and compared with the observed initial and stable warfarin doses. The percentage of patients whose predicted dose fell within 20% of their actual therapeutic dose (percentage within 20%), and the mean absolute error (MAE) were utilized to evaluate the predictive accuracy of all the selected algorithms. Results A total of 8 algorithms including Du, Huang, Miao, Wei, Zhang, Lou, Gage, and International Warfarin Pharmacogenetics Consortium (IWPC) model, were tested in 181 patients. The MAE of the Gage, IWPC and 6 Han-Chinese pharmacogenetics-based warfarin dosing algorithms was less than 0.6 mg/day in accuracy and the percentage within 20% exceeded 45% in all of the selected models in both the initial and the stable treatment stages. When patients were stratified according to the warfarin dose range, all of the equations demonstrated better performance in the ideal-dose range (1.88–4.38 mg/day) than the low-dose range (<1.88 mg/day). Among the 8 algorithms compared, the algorithms of Wei, Huang, and Miao showed a lower MAE and higher percentage within 20% in both the initial and the stable warfarin dose prediction and in the low-dose and the ideal-dose ranges. Conclusions All of the selected pharmacogenetics-based warfarin dosing regimens performed similarly in our cohort. However, the algorithms of Wei, Huang, and Miao showed a better potential for warfarin prediction in the initial and the stable treatment phases in Han-Chinese patients undertaking mechanic heart valve replacement. PMID:24728385
Verification of pharmacogenetics-based warfarin dosing algorithms in Han-Chinese patients undertaking mechanic heart valve replacement.

PubMed

Zhao, Li; Chen, Chunxia; Li, Bei; Dong, Li; Guo, Yingqiang; Xiao, Xijun; Zhang, Eryong; Qin, Li

2014-01-01

To study the performance of pharmacogenetics-based warfarin dosing algorithms in the initial and the stable warfarin treatment phases in a cohort of Han-Chinese patients undertaking mechanic heart valve replacement. We searched PubMed, Chinese National Knowledge Infrastructure and Wanfang databases for selecting pharmacogenetics-based warfarin dosing models. Patients with mechanic heart valve replacement were consecutively recruited between March 2012 and July 2012. The predicted warfarin dose of each patient was calculated and compared with the observed initial and stable warfarin doses. The percentage of patients whose predicted dose fell within 20% of their actual therapeutic dose (percentage within 20%), and the mean absolute error (MAE) were utilized to evaluate the predictive accuracy of all the selected algorithms. A total of 8 algorithms including Du, Huang, Miao, Wei, Zhang, Lou, Gage, and International Warfarin Pharmacogenetics Consortium (IWPC) model, were tested in 181 patients. The MAE of the Gage, IWPC and 6 Han-Chinese pharmacogenetics-based warfarin dosing algorithms was less than 0.6 mg/day in accuracy and the percentage within 20% exceeded 45% in all of the selected models in both the initial and the stable treatment stages. When patients were stratified according to the warfarin dose range, all of the equations demonstrated better performance in the ideal-dose range (1.88-4.38 mg/day) than the low-dose range (<1.88 mg/day). Among the 8 algorithms compared, the algorithms of Wei, Huang, and Miao showed a lower MAE and higher percentage within 20% in both the initial and the stable warfarin dose prediction and in the low-dose and the ideal-dose ranges. All of the selected pharmacogenetics-based warfarin dosing regimens performed similarly in our cohort. However, the algorithms of Wei, Huang, and Miao showed a better potential for warfarin prediction in the initial and the stable treatment phases in Han-Chinese patients undertaking mechanic heart valve replacement.
Multi-agent cooperation rescue algorithm based on influence degree and state prediction

NASA Astrophysics Data System (ADS)

Zheng, Yanbin; Ma, Guangfu; Wang, Linlin; Xi, Pengxue

2018-04-01

Aiming at the multi-agent cooperative rescue in disaster, a multi-agent cooperative rescue algorithm based on impact degree and state prediction is proposed. Firstly, based on the influence of the information in the scene on the collaborative task, the influence degree function is used to filter the information. Secondly, using the selected information to predict the state of the system and Agent behavior. Finally, according to the result of the forecast, the cooperative behavior of Agent is guided and improved the efficiency of individual collaboration. The simulation results show that this algorithm can effectively solve the cooperative rescue problem of multi-agent and ensure the efficient completion of the task.
Predicting intensity ranks of peptide fragment ions.

PubMed

Frank, Ari M

2009-05-01

Accurate modeling of peptide fragmentation is necessary for the development of robust scoring functions for peptide-spectrum matches, which are the cornerstone of MS/MS-based identification algorithms. Unfortunately, peptide fragmentation is a complex process that can involve several competing chemical pathways, which makes it difficult to develop generative probabilistic models that describe it accurately. However, the vast amounts of MS/MS data being generated now make it possible to use data-driven machine learning methods to develop discriminative ranking-based models that predict the intensity ranks of a peptide's fragment ions. We use simple sequence-based features that get combined by a boosting algorithm into models that make peak rank predictions with high accuracy. In an accompanying manuscript, we demonstrate how these prediction models are used to significantly improve the performance of peptide identification algorithms. The models can also be useful in the design of optimal multiple reaction monitoring (MRM) transitions, in cases where there is insufficient experimental data to guide the peak selection process. The prediction algorithm can also be run independently through PepNovo+, which is available for download from http://bix.ucsd.edu/Software/PepNovo.html.
Predicting Intensity Ranks of Peptide Fragment Ions

PubMed Central

Frank, Ari M.

2009-01-01

Accurate modeling of peptide fragmentation is necessary for the development of robust scoring functions for peptide-spectrum matches, which are the cornerstone of MS/MS-based identification algorithms. Unfortunately, peptide fragmentation is a complex process that can involve several competing chemical pathways, which makes it difficult to develop generative probabilistic models that describe it accurately. However, the vast amounts of MS/MS data being generated now make it possible to use data-driven machine learning methods to develop discriminative ranking-based models that predict the intensity ranks of a peptide's fragment ions. We use simple sequence-based features that get combined by a boosting algorithm in to models that make peak rank predictions with high accuracy. In an accompanying manuscript, we demonstrate how these prediction models are used to significantly improve the performance of peptide identification algorithms. The models can also be useful in the design of optimal MRM transitions, in cases where there is insufficient experimental data to guide the peak selection process. The prediction algorithm can also be run independently through PepNovo+, which is available for download from http://bix.ucsd.edu/Software/PepNovo.html. PMID:19256476
Research on prediction of agricultural machinery total power based on grey model optimized by genetic algorithm

NASA Astrophysics Data System (ADS)

Xie, Yan; Li, Mu; Zhou, Jin; Zheng, Chang-zheng

2009-07-01

Agricultural machinery total power is an important index to reflex and evaluate the level of agricultural mechanization. It is the power source of agricultural production, and is the main factors to enhance the comprehensive agricultural production capacity expand production scale and increase the income of the farmers. Its demand is affected by natural, economic, technological and social and other "grey" factors. Therefore, grey system theory can be used to analyze the development of agricultural machinery total power. A method based on genetic algorithm optimizing grey modeling process is introduced in this paper. This method makes full use of the advantages of the grey prediction model and characteristics of genetic algorithm to find global optimization. So the prediction model is more accurate. According to data from a province, the GM (1, 1) model for predicting agricultural machinery total power was given based on the grey system theories and genetic algorithm. The result indicates that the model can be used as agricultural machinery total power an effective tool for prediction.
Web of Objects Based Ambient Assisted Living Framework for Emergency Psychiatric State Prediction

PubMed Central

Alam, Md Golam Rabiul; Abedin, Sarder Fakhrul; Al Ameen, Moshaddique; Hong, Choong Seon

2016-01-01

Ambient assisted living can facilitate optimum health and wellness by aiding physical, mental and social well-being. In this paper, patients’ psychiatric symptoms are collected through lightweight biosensors and web-based psychiatric screening scales in a smart home environment and then analyzed through machine learning algorithms to provide ambient intelligence in a psychiatric emergency. The psychiatric states are modeled through a Hidden Markov Model (HMM), and the model parameters are estimated using a Viterbi path counting and scalable Stochastic Variational Inference (SVI)-based training algorithm. The most likely psychiatric state sequence of the corresponding observation sequence is determined, and an emergency psychiatric state is predicted through the proposed algorithm. Moreover, to enable personalized psychiatric emergency care, a service a web of objects-based framework is proposed for a smart-home environment. In this framework, the biosensor observations and the psychiatric rating scales are objectified and virtualized in the web space. Then, the web of objects of sensor observations and psychiatric rating scores are used to assess the dweller’s mental health status and to predict an emergency psychiatric state. The proposed psychiatric state prediction algorithm reported 83.03 percent prediction accuracy in an empirical performance study. PMID:27608023
Electroencephalogram-based decoding cognitive states using convolutional neural network and likelihood ratio based score fusion

PubMed Central

2017-01-01

Electroencephalogram (EEG)-based decoding human brain activity is challenging, owing to the low spatial resolution of EEG. However, EEG is an important technique, especially for brain–computer interface applications. In this study, a novel algorithm is proposed to decode brain activity associated with different types of images. In this hybrid algorithm, convolutional neural network is modified for the extraction of features, a t-test is used for the selection of significant features and likelihood ratio-based score fusion is used for the prediction of brain activity. The proposed algorithm takes input data from multichannel EEG time-series, which is also known as multivariate pattern analysis. Comprehensive analysis was conducted using data from 30 participants. The results from the proposed method are compared with current recognized feature extraction and classification/prediction techniques. The wavelet transform-support vector machine method is the most popular currently used feature extraction and prediction method. This method showed an accuracy of 65.7%. However, the proposed method predicts the novel data with improved accuracy of 79.9%. In conclusion, the proposed algorithm outperformed the current feature extraction and prediction method. PMID:28558002
An enhanced deterministic K-Means clustering algorithm for cancer subtype prediction from gene expression data.

PubMed

Nidheesh, N; Abdul Nazeer, K A; Ameer, P M

2017-12-01

Clustering algorithms with steps involving randomness usually give different results on different executions for the same dataset. This non-deterministic nature of algorithms such as the K-Means clustering algorithm limits their applicability in areas such as cancer subtype prediction using gene expression data. It is hard to sensibly compare the results of such algorithms with those of other algorithms. The non-deterministic nature of K-Means is due to its random selection of data points as initial centroids. We propose an improved, density based version of K-Means, which involves a novel and systematic method for selecting initial centroids. The key idea of the algorithm is to select data points which belong to dense regions and which are adequately separated in feature space as the initial centroids. We compared the proposed algorithm to a set of eleven widely used single clustering algorithms and a prominent ensemble clustering algorithm which is being used for cancer data classification, based on the performances on a set of datasets comprising ten cancer gene expression datasets. The proposed algorithm has shown better overall performance than the others. There is a pressing need in the Biomedical domain for simple, easy-to-use and more accurate Machine Learning tools for cancer subtype prediction. The proposed algorithm is simple, easy-to-use and gives stable results. Moreover, it provides comparatively better predictions of cancer subtypes from gene expression data. Copyright © 2017 Elsevier Ltd. All rights reserved.
Application of Recursive Partitioning to Derive and Validate a Claims-Based Algorithm for Identifying Keratinocyte Carcinoma (Nonmelanoma Skin Cancer).

PubMed

Chan, An-Wen; Fung, Kinwah; Tran, Jennifer M; Kitchen, Jessica; Austin, Peter C; Weinstock, Martin A; Rochon, Paula A

2016-10-01

Keratinocyte carcinoma (nonmelanoma skin cancer) accounts for substantial burden in terms of high incidence and health care costs but is excluded by most cancer registries in North America. Administrative health insurance claims databases offer an opportunity to identify these cancers using diagnosis and procedural codes submitted for reimbursement purposes. To apply recursive partitioning to derive and validate a claims-based algorithm for identifying keratinocyte carcinoma with high sensitivity and specificity. Retrospective study using population-based administrative databases linked to 602 371 pathology episodes from a community laboratory for adults residing in Ontario, Canada, from January 1, 1992, to December 31, 2009. The final analysis was completed in January 2016. We used recursive partitioning (classification trees) to derive an algorithm based on health insurance claims. The performance of the derived algorithm was compared with 5 prespecified algorithms and validated using an independent academic hospital clinic data set of 2082 patients seen in May and June 2011. Sensitivity, specificity, positive predictive value, and negative predictive value using the histopathological diagnosis as the criterion standard. We aimed to achieve maximal specificity, while maintaining greater than 80% sensitivity. Among 602 371 pathology episodes, 131 562 (21.8%) had a diagnosis of keratinocyte carcinoma. Our final derived algorithm outperformed the 5 simple prespecified algorithms and performed well in both community and hospital data sets in terms of sensitivity (82.6% and 84.9%, respectively), specificity (93.0% and 99.0%, respectively), positive predictive value (76.7% and 69.2%, respectively), and negative predictive value (95.0% and 99.6%, respectively). Algorithm performance did not vary substantially during the 18-year period. This algorithm offers a reliable mechanism for ascertaining keratinocyte carcinoma for epidemiological research in the absence of cancer registry data. Our findings also demonstrate the value of recursive partitioning in deriving valid claims-based algorithms.
LMI-Based Generation of Feedback Laws for a Robust Model Predictive Control Algorithm

NASA Technical Reports Server (NTRS)

Acikmese, Behcet; Carson, John M., III

2007-01-01

This technical note provides a mathematical proof of Corollary 1 from the paper 'A Nonlinear Model Predictive Control Algorithm with Proven Robustness and Resolvability' that appeared in the 2006 Proceedings of the American Control Conference. The proof was omitted for brevity in the publication. The paper was based on algorithms developed for the FY2005 R&TD (Research and Technology Development) project for Small-body Guidance, Navigation, and Control [2].The framework established by the Corollary is for a robustly stabilizing MPC (model predictive control) algorithm for uncertain nonlinear systems that guarantees the resolvability of the associated nite-horizon optimal control problem in a receding-horizon implementation. Additional details of the framework are available in the publication.

Protein docking prediction using predicted protein-protein interface.

PubMed

Li, Bin; Kihara, Daisuke

2012-01-10

Many important cellular processes are carried out by protein complexes. To provide physical pictures of interacting proteins, many computational protein-protein prediction methods have been developed in the past. However, it is still difficult to identify the correct docking complex structure within top ranks among alternative conformations. We present a novel protein docking algorithm that utilizes imperfect protein-protein binding interface prediction for guiding protein docking. Since the accuracy of protein binding site prediction varies depending on cases, the challenge is to develop a method which does not deteriorate but improves docking results by using a binding site prediction which may not be 100% accurate. The algorithm, named PI-LZerD (using Predicted Interface with Local 3D Zernike descriptor-based Docking algorithm), is based on a pair wise protein docking prediction algorithm, LZerD, which we have developed earlier. PI-LZerD starts from performing docking prediction using the provided protein-protein binding interface prediction as constraints, which is followed by the second round of docking with updated docking interface information to further improve docking conformation. Benchmark results on bound and unbound cases show that PI-LZerD consistently improves the docking prediction accuracy as compared with docking without using binding site prediction or using the binding site prediction as post-filtering. We have developed PI-LZerD, a pairwise docking algorithm, which uses imperfect protein-protein binding interface prediction to improve docking accuracy. PI-LZerD consistently showed better prediction accuracy over alternative methods in the series of benchmark experiments including docking using actual docking interface site predictions as well as unbound docking cases.
Predicting patchy particle crystals: variable box shape simulations and evolutionary algorithms.

PubMed

Bianchi, Emanuela; Doppelbauer, Günther; Filion, Laura; Dijkstra, Marjolein; Kahl, Gerhard

2012-06-07

We consider several patchy particle models that have been proposed in literature and we investigate their candidate crystal structures in a systematic way. We compare two different algorithms for predicting crystal structures: (i) an approach based on Monte Carlo simulations in the isobaric-isothermal ensemble and (ii) an optimization technique based on ideas of evolutionary algorithms. We show that the two methods are equally successful and provide consistent results on crystalline phases of patchy particle systems.
A Third Approach to Gene Prediction Suggests Thousands of Additional Human Transcribed Regions

PubMed Central

Glusman, Gustavo; Qin, Shizhen; El-Gewely, M. Raafat; Siegel, Andrew F; Roach, Jared C; Hood, Leroy; Smit, Arian F. A

2006-01-01

The identification and characterization of the complete ensemble of genes is a main goal of deciphering the digital information stored in the human genome. Many algorithms for computational gene prediction have been described, ultimately derived from two basic concepts: (1) modeling gene structure and (2) recognizing sequence similarity. Successful hybrid methods combining these two concepts have also been developed. We present a third orthogonal approach to gene prediction, based on detecting the genomic signatures of transcription, accumulated over evolutionary time. We discuss four algorithms based on this third concept: Greens and CHOWDER, which quantify mutational strand biases caused by transcription-coupled DNA repair, and ROAST and PASTA, which are based on strand-specific selection against polyadenylation signals. We combined these algorithms into an integrated method called FEAST, which we used to predict the location and orientation of thousands of putative transcription units not overlapping known genes. Many of the newly predicted transcriptional units do not appear to code for proteins. The new algorithms are particularly apt at detecting genes with long introns and lacking sequence conservation. They therefore complement existing gene prediction methods and will help identify functional transcripts within many apparent “genomic deserts.” PMID:16543943
A vertical handoff decision algorithm based on ARMA prediction model

NASA Astrophysics Data System (ADS)

Li, Ru; Shen, Jiao; Chen, Jun; Liu, Qiuhuan

2012-01-01

With the development of computer technology and the increasing demand for mobile communications, the next generation wireless networks will be composed of various wireless networks (e.g., WiMAX and WiFi). Vertical handoff is a key technology of next generation wireless networks. During the vertical handoff procedure, handoff decision is a crucial issue for an efficient mobility. Based on auto regression moving average (ARMA) prediction model, we propose a vertical handoff decision algorithm, which aims to improve the performance of vertical handoff and avoid unnecessary handoff. Based on the current received signal strength (RSS) and the previous RSS, the proposed approach adopt ARMA model to predict the next RSS. And then according to the predicted RSS to determine whether trigger the link layer triggering event and complete vertical handoff. The simulation results indicate that the proposed algorithm outperforms the RSS-based scheme with a threshold in the performance of handoff and the number of handoff.
[MicroRNA Target Prediction Based on Support Vector Machine Ensemble Classification Algorithm of Under-sampling Technique].

PubMed

Chen, Zhiru; Hong, Wenxue

2016-02-01

Considering the low accuracy of prediction in the positive samples and poor overall classification effects caused by unbalanced sample data of MicroRNA (miRNA) target, we proposes a support vector machine (SVM)-integration of under-sampling and weight (IUSM) algorithm in this paper, an under-sampling based on the ensemble learning algorithm. The algorithm adopts SVM as learning algorithm and AdaBoost as integration framework, and embeds clustering-based under-sampling into the iterative process, aiming at reducing the degree of unbalanced distribution of positive and negative samples. Meanwhile, in the process of adaptive weight adjustment of the samples, the SVM-IUSM algorithm eliminates the abnormal ones in negative samples with robust sample weights smoothing mechanism so as to avoid over-learning. Finally, the prediction of miRNA target integrated classifier is achieved with the combination of multiple weak classifiers through the voting mechanism. The experiment revealed that the SVM-IUSW, compared with other algorithms on unbalanced dataset collection, could not only improve the accuracy of positive targets and the overall effect of classification, but also enhance the generalization ability of miRNA target classifier.
The wind power prediction research based on mind evolutionary algorithm

NASA Astrophysics Data System (ADS)

Zhuang, Ling; Zhao, Xinjian; Ji, Tianming; Miao, Jingwen; Cui, Haina

2018-04-01

When the wind power is connected to the power grid, its characteristics of fluctuation, intermittent and randomness will affect the stability of the power system. The wind power prediction can guarantee the power quality and reduce the operating cost of power system. There were some limitations in several traditional wind power prediction methods. On the basis, the wind power prediction method based on Mind Evolutionary Algorithm (MEA) is put forward and a prediction model is provided. The experimental results demonstrate that MEA performs efficiently in term of the wind power prediction. The MEA method has broad prospect of engineering application.
Electric Power Engineering Cost Predicting Model Based on the PCA-GA-BP

NASA Astrophysics Data System (ADS)

Wen, Lei; Yu, Jiake; Zhao, Xin

2017-10-01

In this paper a hybrid prediction algorithm: PCA-GA-BP model is proposed. PCA algorithm is established to reduce the correlation between indicators of original data and decrease difficulty of BP neural network in complex dimensional calculation. The BP neural network is established to estimate the cost of power transmission project. The results show that PCA-GA-BP algorithm can improve result of prediction of electric power engineering cost.
Automated Phase Segmentation for Large-Scale X-ray Diffraction Data Using a Graph-Based Phase Segmentation (GPhase) Algorithm.

PubMed

Xiong, Zheng; He, Yinyan; Hattrick-Simpers, Jason R; Hu, Jianjun

2017-03-13

The creation of composition-processing-structure relationships currently represents a key bottleneck for data analysis for high-throughput experimental (HTE) material studies. Here we propose an automated phase diagram attribution algorithm for HTE data analysis that uses a graph-based segmentation algorithm and Delaunay tessellation to create a crystal phase diagram from high throughput libraries of X-ray diffraction (XRD) patterns. We also propose the sample-pair based objective evaluation measures for the phase diagram prediction problem. Our approach was validated using 278 diffraction patterns from a Fe-Ga-Pd composition spread sample with a prediction precision of 0.934 and a Matthews Correlation Coefficient score of 0.823. The algorithm was then applied to the open Ni-Mn-Al thin-film composition spread sample to obtain the first predicted phase diagram mapping for that sample.
Evaluation of Machine Learning and Rules-Based Approaches for Predicting Antimicrobial Resistance Profiles in Gram-negative Bacilli from Whole Genome Sequence Data.

PubMed

Pesesky, Mitchell W; Hussain, Tahir; Wallace, Meghan; Patel, Sanket; Andleeb, Saadia; Burnham, Carey-Ann D; Dantas, Gautam

2016-01-01

The time-to-result for culture-based microorganism recovery and phenotypic antimicrobial susceptibility testing necessitates initial use of empiric (frequently broad-spectrum) antimicrobial therapy. If the empiric therapy is not optimal, this can lead to adverse patient outcomes and contribute to increasing antibiotic resistance in pathogens. New, more rapid technologies are emerging to meet this need. Many of these are based on identifying resistance genes, rather than directly assaying resistance phenotypes, and thus require interpretation to translate the genotype into treatment recommendations. These interpretations, like other parts of clinical diagnostic workflows, are likely to be increasingly automated in the future. We set out to evaluate the two major approaches that could be amenable to automation pipelines: rules-based methods and machine learning methods. The rules-based algorithm makes predictions based upon current, curated knowledge of Enterobacteriaceae resistance genes. The machine-learning algorithm predicts resistance and susceptibility based on a model built from a training set of variably resistant isolates. As our test set, we used whole genome sequence data from 78 clinical Enterobacteriaceae isolates, previously identified to represent a variety of phenotypes, from fully-susceptible to pan-resistant strains for the antibiotics tested. We tested three antibiotic resistance determinant databases for their utility in identifying the complete resistome for each isolate. The predictions of the rules-based and machine learning algorithms for these isolates were compared to results of phenotype-based diagnostics. The rules based and machine-learning predictions achieved agreement with standard-of-care phenotypic diagnostics of 89.0 and 90.3%, respectively, across twelve antibiotic agents from six major antibiotic classes. Several sources of disagreement between the algorithms were identified. Novel variants of known resistance factors and incomplete genome assembly confounded the rules-based algorithm, resulting in predictions based on gene family, rather than on knowledge of the specific variant found. Low-frequency resistance caused errors in the machine-learning algorithm because those genes were not seen or seen infrequently in the test set. We also identified an example of variability in the phenotype-based results that led to disagreement with both genotype-based methods. Genotype-based antimicrobial susceptibility testing shows great promise as a diagnostic tool, and we outline specific research goals to further refine this methodology.
A Sensor Dynamic Measurement Error Prediction Model Based on NAPSO-SVM.

PubMed

Jiang, Minlan; Jiang, Lan; Jiang, Dingde; Li, Fei; Song, Houbing

2018-01-15

Dynamic measurement error correction is an effective way to improve sensor precision. Dynamic measurement error prediction is an important part of error correction, and support vector machine (SVM) is often used for predicting the dynamic measurement errors of sensors. Traditionally, the SVM parameters were always set manually, which cannot ensure the model's performance. In this paper, a SVM method based on an improved particle swarm optimization (NAPSO) is proposed to predict the dynamic measurement errors of sensors. Natural selection and simulated annealing are added in the PSO to raise the ability to avoid local optima. To verify the performance of NAPSO-SVM, three types of algorithms are selected to optimize the SVM's parameters: the particle swarm optimization algorithm (PSO), the improved PSO optimization algorithm (NAPSO), and the glowworm swarm optimization (GSO). The dynamic measurement error data of two sensors are applied as the test data. The root mean squared error and mean absolute percentage error are employed to evaluate the prediction models' performances. The experimental results show that among the three tested algorithms the NAPSO-SVM method has a better prediction precision and a less prediction errors, and it is an effective method for predicting the dynamic measurement errors of sensors.
Reconstructing genome-wide regulatory network of E. coli using transcriptome data and predicted transcription factor activities

PubMed Central

2011-01-01

Background Gene regulatory networks play essential roles in living organisms to control growth, keep internal metabolism running and respond to external environmental changes. Understanding the connections and the activity levels of regulators is important for the research of gene regulatory networks. While relevance score based algorithms that reconstruct gene regulatory networks from transcriptome data can infer genome-wide gene regulatory networks, they are unfortunately prone to false positive results. Transcription factor activities (TFAs) quantitatively reflect the ability of the transcription factor to regulate target genes. However, classic relevance score based gene regulatory network reconstruction algorithms use models do not include the TFA layer, thus missing a key regulatory element. Results This work integrates TFA prediction algorithms with relevance score based network reconstruction algorithms to reconstruct gene regulatory networks with improved accuracy over classic relevance score based algorithms. This method is called Gene expression and Transcription factor activity based Relevance Network (GTRNetwork). Different combinations of TFA prediction algorithms and relevance score functions have been applied to find the most efficient combination. When the integrated GTRNetwork method was applied to E. coli data, the reconstructed genome-wide gene regulatory network predicted 381 new regulatory links. This reconstructed gene regulatory network including the predicted new regulatory links show promising biological significances. Many of the new links are verified by known TF binding site information, and many other links can be verified from the literature and databases such as EcoCyc. The reconstructed gene regulatory network is applied to a recent transcriptome analysis of E. coli during isobutanol stress. In addition to the 16 significantly changed TFAs detected in the original paper, another 7 significantly changed TFAs have been detected by using our reconstructed network. Conclusions The GTRNetwork algorithm introduces the hidden layer TFA into classic relevance score-based gene regulatory network reconstruction processes. Integrating the TFA biological information with regulatory network reconstruction algorithms significantly improves both detection of new links and reduces that rate of false positives. The application of GTRNetwork on E. coli gene transcriptome data gives a set of potential regulatory links with promising biological significance for isobutanol stress and other conditions. PMID:21668997
Research prioritization through prediction of future impact on biomedical science: a position paper on inference-analytics.

PubMed

Ganapathiraju, Madhavi K; Orii, Naoki

2013-08-30

Advances in biotechnology have created "big-data" situations in molecular and cellular biology. Several sophisticated algorithms have been developed that process big data to generate hundreds of biomedical hypotheses (or predictions). The bottleneck to translating this large number of biological hypotheses is that each of them needs to be studied by experimentation for interpreting its functional significance. Even when the predictions are estimated to be very accurate, from a biologist's perspective, the choice of which of these predictions is to be studied further is made based on factors like availability of reagents and resources and the possibility of formulating some reasonable hypothesis about its biological relevance. When viewed from a global perspective, say from that of a federal funding agency, ideally the choice of which prediction should be studied would be made based on which of them can make the most translational impact. We propose that algorithms be developed to identify which of the computationally generated hypotheses have potential for high translational impact; this way, funding agencies and scientific community can invest resources and drive the research based on a global view of biomedical impact without being deterred by local view of feasibility. In short, data-analytic algorithms analyze big-data and generate hypotheses; in contrast, the proposed inference-analytic algorithms analyze these hypotheses and rank them by predicted biological impact. We demonstrate this through the development of an algorithm to predict biomedical impact of protein-protein interactions (PPIs) which is estimated by the number of future publications that cite the paper which originally reported the PPI. This position paper describes a new computational problem that is relevant in the era of big-data and discusses the challenges that exist in studying this problem, highlighting the need for the scientific community to engage in this line of research. The proposed class of algorithms, namely inference-analytic algorithms, is necessary to ensure that resources are invested in translating those computational outcomes that promise maximum biological impact. Application of this concept to predict biomedical impact of PPIs illustrates not only the concept, but also the challenges in designing these algorithms.
A Feature and Algorithm Selection Method for Improving the Prediction of Protein Structural Class.

PubMed

Ni, Qianwu; Chen, Lei

2017-01-01

Correct prediction of protein structural class is beneficial to investigation on protein functions, regulations and interactions. In recent years, several computational methods have been proposed in this regard. However, based on various features, it is still a great challenge to select proper classification algorithm and extract essential features to participate in classification. In this study, a feature and algorithm selection method was presented for improving the accuracy of protein structural class prediction. The amino acid compositions and physiochemical features were adopted to represent features and thirty-eight machine learning algorithms collected in Weka were employed. All features were first analyzed by a feature selection method, minimum redundancy maximum relevance (mRMR), producing a feature list. Then, several feature sets were constructed by adding features in the list one by one. For each feature set, thirtyeight algorithms were executed on a dataset, in which proteins were represented by features in the set. The predicted classes yielded by these algorithms and true class of each protein were collected to construct a dataset, which were analyzed by mRMR method, yielding an algorithm list. From the algorithm list, the algorithm was taken one by one to build an ensemble prediction model. Finally, we selected the ensemble prediction model with the best performance as the optimal ensemble prediction model. Experimental results indicate that the constructed model is much superior to models using single algorithm and other models that only adopt feature selection procedure or algorithm selection procedure. The feature selection procedure or algorithm selection procedure are really helpful for building an ensemble prediction model that can yield a better performance. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Increased prognostic accuracy of TBI when a brain electrical activity biomarker is added to loss of consciousness (LOC).

PubMed

Hack, Dallas; Huff, J Stephen; Curley, Kenneth; Naunheim, Roseanne; Ghosh Dastidar, Samanwoy; Prichep, Leslie S

2017-07-01

Extremely high accuracy for predicting CT+ traumatic brain injury (TBI) using a quantitative EEG (QEEG) based multivariate classification algorithm was demonstrated in an independent validation trial, in Emergency Department (ED) patients, using an easy to use handheld device. This study compares the predictive power using that algorithm (which includes LOC and amnesia), to the predictive power of LOC alone or LOC plus traumatic amnesia. ED patients 18-85years presenting within 72h of closed head injury, with GSC 12-15, were study candidates. 680 patients with known absence or presence of LOC were enrolled (145 CT+ and 535 CT- patients). 5-10min of eyes closed EEG was acquired using the Ahead 300 handheld device, from frontal and frontotemporal regions. The same classification algorithm methodology was used for both the EEG based and the LOC based algorithms. Predictive power was evaluated using area under the ROC curve (AUC) and odds ratios. The QEEG based classification algorithm demonstrated significant improvement in predictive power compared with LOC alone, both in improved AUC (83% improvement) and odds ratio (increase from 4.65 to 16.22). Adding RGA and/or PTA to LOC was not improved over LOC alone. Rapid triage of TBI relies on strong initial predictors. Addition of an electrophysiological based marker was shown to outperform report of LOC alone or LOC plus amnesia, in determining risk of an intracranial bleed. In addition, ease of use at point-of-care, non-invasive, and rapid result using such technology suggests significant value added to standard clinical prediction. Copyright © 2017 Elsevier Inc. All rights reserved.
Predictability of the Lagrangian Motion in the Upper Ocean

NASA Astrophysics Data System (ADS)

Piterbarg, L. I.; Griffa, A.; Griffa, A.; Mariano, A. J.; Ozgokmen, T. M.; Ryan, E. H.

2001-12-01

The complex non-linear dynamics of the upper ocean leads to chaotic behavior of drifter trajectories in the ocean. Our study is focused on estimating the predictability limit for the position of an individual Lagrangian particle or a particle cluster based on the knowledge of mean currents and observations of nearby particles (predictors). The Lagrangian prediction problem, besides being a fundamental scientific problem, is also of great importance for practical applications such as search and rescue operations and for modeling the spread of fish larvae. A stochastic multi-particle model for the Lagrangian motion has been rigorously formulated and is a generalization of the well known "random flight" model for a single particle. Our model is mathematically consistent and includes a few easily interpreted parameters, such as the Lagrangian velocity decorrelation time scale, the turbulent velocity variance, and the velocity decorrelation radius, that can be estimated from data. The top Lyapunov exponent for an isotropic version of the model is explicitly expressed as a function of these parameters enabling us to approximate the predictability limit to first order. Lagrangian prediction errors for two new prediction algorithms are evaluated against simple algorithms and each other and are used to test the predictability limits of the stochastic model for isotropic turbulence. The first algorithm is based on a Kalman filter and uses the developed stochastic model. Its implementation for drifter clusters in both the Tropical Pacific and Adriatic Sea, showed good prediction skill over a period of 1-2 weeks. The prediction error is primarily a function of the data density, defined as the number of predictors within a velocity decorrelation spatial scale from the particle to be predicted. The second algorithm is model independent and is based on spatial regression considerations. Preliminary results, based on simulated, as well as, real data, indicate that it performs better than the Kalman-based algorithm in strong shear flows. An important component of our research is the optimal predictor location problem; Where should floats be launched in order to minimize the Lagrangian prediction error? Preliminary Lagrangian sampling results for different flow scenarios will be presented.
Sparse RNA folding revisited: space-efficient minimum free energy structure prediction.

PubMed

Will, Sebastian; Jabbari, Hosna

2016-01-01

RNA secondary structure prediction by energy minimization is the central computational tool for the analysis of structural non-coding RNAs and their interactions. Sparsification has been successfully applied to improve the time efficiency of various structure prediction algorithms while guaranteeing the same result; however, for many such folding problems, space efficiency is of even greater concern, particularly for long RNA sequences. So far, space-efficient sparsified RNA folding with fold reconstruction was solved only for simple base-pair-based pseudo-energy models. Here, we revisit the problem of space-efficient free energy minimization. Whereas the space-efficient minimization of the free energy has been sketched before, the reconstruction of the optimum structure has not even been discussed. We show that this reconstruction is not possible in trivial extension of the method for simple energy models. Then, we present the time- and space-efficient sparsified free energy minimization algorithm SparseMFEFold that guarantees MFE structure prediction. In particular, this novel algorithm provides efficient fold reconstruction based on dynamically garbage-collected trace arrows. The complexity of our algorithm depends on two parameters, the number of candidates Z and the number of trace arrows T; both are bounded by [Formula: see text], but are typically much smaller. The time complexity of RNA folding is reduced from [Formula: see text] to [Formula: see text]; the space complexity, from [Formula: see text] to [Formula: see text]. Our empirical results show more than 80 % space savings over RNAfold [Vienna RNA package] on the long RNAs from the RNA STRAND database (≥2500 bases). The presented technique is intentionally generalizable to complex prediction algorithms; due to their high space demands, algorithms like pseudoknot prediction and RNA-RNA-interaction prediction are expected to profit even stronger than "standard" MFE folding. SparseMFEFold is free software, available at http://www.bioinf.uni-leipzig.de/~will/Software/SparseMFEFold.
Systems Biological Approach of Molecular Descriptors Connectivity: Optimal Descriptors for Oral Bioavailability Prediction

PubMed Central

Ahmed, Shiek S. S. J.; Ramakrishnan, V.

2012-01-01

Background Poor oral bioavailability is an important parameter accounting for the failure of the drug candidates. Approximately, 50% of developing drugs fail because of unfavorable oral bioavailability. In silico prediction of oral bioavailability (%F) based on physiochemical properties are highly needed. Although many computational models have been developed to predict oral bioavailability, their accuracy remains low with a significant number of false positives. In this study, we present an oral bioavailability model based on systems biological approach, using a machine learning algorithm coupled with an optimal discriminative set of physiochemical properties. Results The models were developed based on computationally derived 247 physicochemical descriptors from 2279 molecules, among which 969, 605 and 705 molecules were corresponds to oral bioavailability, intestinal absorption (HIA) and caco-2 permeability data set, respectively. The partial least squares discriminate analysis showed 49 descriptors of HIA and 50 descriptors of caco-2 are the major contributing descriptors in classifying into groups. Of these descriptors, 47 descriptors were commonly associated to HIA and caco-2, which suggests to play a vital role in classifying oral bioavailability. To determine the best machine learning algorithm, 21 classifiers were compared using a bioavailability data set of 969 molecules with 47 descriptors. Each molecule in the data set was represented by a set of 47 physiochemical properties with the functional relevance labeled as (+bioavailability/−bioavailability) to indicate good-bioavailability/poor-bioavailability molecules. The best-performing algorithm was the logistic algorithm. The correlation based feature selection (CFS) algorithm was implemented, which confirms that these 47 descriptors are the fundamental descriptors for oral bioavailability prediction. Conclusion The logistic algorithm with 47 selected descriptors correctly predicted the oral bioavailability, with a predictive accuracy of more than 71%. Overall, the method captures the fundamental molecular descriptors, that can be used as an entity to facilitate prediction of oral bioavailability. PMID:22815781
Systems biological approach of molecular descriptors connectivity: optimal descriptors for oral bioavailability prediction.

PubMed

Ahmed, Shiek S S J; Ramakrishnan, V

2012-01-01

Poor oral bioavailability is an important parameter accounting for the failure of the drug candidates. Approximately, 50% of developing drugs fail because of unfavorable oral bioavailability. In silico prediction of oral bioavailability (%F) based on physiochemical properties are highly needed. Although many computational models have been developed to predict oral bioavailability, their accuracy remains low with a significant number of false positives. In this study, we present an oral bioavailability model based on systems biological approach, using a machine learning algorithm coupled with an optimal discriminative set of physiochemical properties. The models were developed based on computationally derived 247 physicochemical descriptors from 2279 molecules, among which 969, 605 and 705 molecules were corresponds to oral bioavailability, intestinal absorption (HIA) and caco-2 permeability data set, respectively. The partial least squares discriminate analysis showed 49 descriptors of HIA and 50 descriptors of caco-2 are the major contributing descriptors in classifying into groups. Of these descriptors, 47 descriptors were commonly associated to HIA and caco-2, which suggests to play a vital role in classifying oral bioavailability. To determine the best machine learning algorithm, 21 classifiers were compared using a bioavailability data set of 969 molecules with 47 descriptors. Each molecule in the data set was represented by a set of 47 physiochemical properties with the functional relevance labeled as (+bioavailability/-bioavailability) to indicate good-bioavailability/poor-bioavailability molecules. The best-performing algorithm was the logistic algorithm. The correlation based feature selection (CFS) algorithm was implemented, which confirms that these 47 descriptors are the fundamental descriptors for oral bioavailability prediction. The logistic algorithm with 47 selected descriptors correctly predicted the oral bioavailability, with a predictive accuracy of more than 71%. Overall, the method captures the fundamental molecular descriptors, that can be used as an entity to facilitate prediction of oral bioavailability.
Lossless medical image compression using geometry-adaptive partitioning and least square-based prediction.

PubMed

Song, Xiaoying; Huang, Qijun; Chang, Sheng; He, Jin; Wang, Hao

2018-06-01

To improve the compression rates for lossless compression of medical images, an efficient algorithm, based on irregular segmentation and region-based prediction, is proposed in this paper. Considering that the first step of a region-based compression algorithm is segmentation, this paper proposes a hybrid method by combining geometry-adaptive partitioning and quadtree partitioning to achieve adaptive irregular segmentation for medical images. Then, least square (LS)-based predictors are adaptively designed for each region (regular subblock or irregular subregion). The proposed adaptive algorithm not only exploits spatial correlation between pixels but it utilizes local structure similarity, resulting in efficient compression performance. Experimental results show that the average compression performance of the proposed algorithm is 10.48, 4.86, 3.58, and 0.10% better than that of JPEG 2000, CALIC, EDP, and JPEG-LS, respectively. Graphical abstract ᅟ.
Long-range prediction of Indian summer monsoon rainfall using data mining and statistical approaches

NASA Astrophysics Data System (ADS)

H, Vathsala; Koolagudi, Shashidhar G.

2017-10-01

This paper presents a hybrid model to better predict Indian summer monsoon rainfall. The algorithm considers suitable techniques for processing dense datasets. The proposed three-step algorithm comprises closed itemset generation-based association rule mining for feature selection, cluster membership for dimensionality reduction, and simple logistic function for prediction. The application of predicting rainfall into flood, excess, normal, deficit, and drought based on 36 predictors consisting of land and ocean variables is presented. Results show good accuracy in the considered study period of 37years (1969-2005).

A Novel Approach to Prediction of Mild Obstructive Sleep Disordered Breathing in a Population-Based Sample: The Sleep Heart Health Study

PubMed Central

Caffo, Brian; Diener-West, Marie; Punjabi, Naresh M.; Samet, Jonathan

2010-01-01

This manuscript considers a data-mining approach for the prediction of mild obstructive sleep disordered breathing, defined as an elevated respiratory disturbance index (RDI), in 5,530 participants in a community-based study, the Sleep Heart Health Study. The prediction algorithm was built using modern ensemble learning algorithms, boosting in specific, which allowed for assessing potential high-dimensional interactions between predictor variables or classifiers. To evaluate the performance of the algorithm, the data were split into training and validation sets for varying thresholds for predicting the probability of a high RDI (≥ 7 events per hour in the given results). Based on a moderate classification threshold from the boosting algorithm, the estimated post-test odds of a high RDI were 2.20 times higher than the pre-test odds given a positive test, while the corresponding post-test odds were decreased by 52% given a negative test (sensitivity and specificity of 0.66 and 0.70, respectively). In rank order, the following variables had the largest impact on prediction performance: neck circumference, body mass index, age, snoring frequency, waist circumference, and snoring loudness. Citation: Caffo B; Diener-West M; Punjabi NM; Samet J. A novel approach to prediction of mild obstructive sleep disordered breathing in a population-based sample: the Sleep Heart Health Study. SLEEP 2010;33(12):1641-1648. PMID:21120126
Prediction of Industrial Electric Energy Consumption in Anhui Province Based on GA-BP Neural Network

NASA Astrophysics Data System (ADS)

Zhang, Jiajing; Yin, Guodong; Ni, Youcong; Chen, Jinlan

2018-01-01

In order to improve the prediction accuracy of industrial electrical energy consumption, a prediction model of industrial electrical energy consumption was proposed based on genetic algorithm and neural network. The model use genetic algorithm to optimize the weights and thresholds of BP neural network, and the model is used to predict the energy consumption of industrial power in Anhui Province, to improve the prediction accuracy of industrial electric energy consumption in Anhui province. By comparing experiment of GA-BP prediction model and BP neural network model, the GA-BP model is more accurate with smaller number of neurons in the hidden layer.
Predicting online ratings based on the opinion spreading process

NASA Astrophysics Data System (ADS)

He, Xing-Sheng; Zhou, Ming-Yang; Zhuo, Zhao; Fu, Zhong-Qian; Liu, Jian-Guo

2015-10-01

Predicting users' online ratings is always a challenge issue and has drawn lots of attention. In this paper, we present a rating prediction method by combining the user opinion spreading process with the collaborative filtering algorithm, where user similarity is defined by measuring the amount of opinion a user transfers to another based on the primitive user-item rating matrix. The proposed method could produce a more precise rating prediction for each unrated user-item pair. In addition, we introduce a tunable parameter λ to regulate the preferential diffusion relevant to the degree of both opinion sender and receiver. The numerical results for Movielens and Netflix data sets show that this algorithm has a better accuracy than the standard user-based collaborative filtering algorithm using Cosine and Pearson correlation without increasing computational complexity. By tuning λ, our method could further boost the prediction accuracy when using Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) as measurements. In the optimal cases, on Movielens and Netflix data sets, the corresponding algorithmic accuracy (MAE and RMSE) are improved 11.26% and 8.84%, 13.49% and 10.52% compared to the item average method, respectively.
Parameterization of typhoon-induced ocean cooling using temperature equation and machine learning algorithms: an example of typhoon Soulik (2013)

NASA Astrophysics Data System (ADS)

Wei, Jun; Jiang, Guo-Qing; Liu, Xin

2017-09-01

This study proposed three algorithms that can potentially be used to provide sea surface temperature (SST) conditions for typhoon prediction models. Different from traditional data assimilation approaches, which provide prescribed initial/boundary conditions, our proposed algorithms aim to resolve a flow-dependent SST feedback between growing typhoons and oceans in the future time. Two of these algorithms are based on linear temperature equations (TE-based), and the other is based on an innovative technique involving machine learning (ML-based). The algorithms are then implemented into a Weather Research and Forecasting model for the simulation of typhoon to assess their effectiveness, and the results show significant improvement in simulated storm intensities by including ocean cooling feedback. The TE-based algorithm I considers wind-induced ocean vertical mixing and upwelling processes only, and thus obtained a synoptic and relatively smooth sea surface temperature cooling. The TE-based algorithm II incorporates not only typhoon winds but also ocean information, and thus resolves more cooling features. The ML-based algorithm is based on a neural network, consisting of multiple layers of input variables and neurons, and produces the best estimate of the cooling structure, in terms of its amplitude and position. Sensitivity analysis indicated that the typhoon-induced ocean cooling is a nonlinear process involving interactions of multiple atmospheric and oceanic variables. Therefore, with an appropriate selection of input variables and neuron sizes, the ML-based algorithm appears to be more efficient in prognosing the typhoon-induced ocean cooling and in predicting typhoon intensity than those algorithms based on linear regression methods.
Optimal Parameter Selection for Support Vector Machine Based on Artificial Bee Colony Algorithm: A Case Study of Grid-Connected PV System Power Prediction.

PubMed

Gao, Xiang-Ming; Yang, Shi-Feng; Pan, San-Bo

2017-01-01

Predicting the output power of photovoltaic system with nonstationarity and randomness, an output power prediction model for grid-connected PV systems is proposed based on empirical mode decomposition (EMD) and support vector machine (SVM) optimized with an artificial bee colony (ABC) algorithm. First, according to the weather forecast data sets on the prediction date, the time series data of output power on a similar day with 15-minute intervals are built. Second, the time series data of the output power are decomposed into a series of components, including some intrinsic mode components IMFn and a trend component Res, at different scales using EMD. The corresponding SVM prediction model is established for each IMF component and trend component, and the SVM model parameters are optimized with the artificial bee colony algorithm. Finally, the prediction results of each model are reconstructed, and the predicted values of the output power of the grid-connected PV system can be obtained. The prediction model is tested with actual data, and the results show that the power prediction model based on the EMD and ABC-SVM has a faster calculation speed and higher prediction accuracy than do the single SVM prediction model and the EMD-SVM prediction model without optimization.
Optimal Parameter Selection for Support Vector Machine Based on Artificial Bee Colony Algorithm: A Case Study of Grid-Connected PV System Power Prediction

PubMed Central

2017-01-01

Predicting the output power of photovoltaic system with nonstationarity and randomness, an output power prediction model for grid-connected PV systems is proposed based on empirical mode decomposition (EMD) and support vector machine (SVM) optimized with an artificial bee colony (ABC) algorithm. First, according to the weather forecast data sets on the prediction date, the time series data of output power on a similar day with 15-minute intervals are built. Second, the time series data of the output power are decomposed into a series of components, including some intrinsic mode components IMFn and a trend component Res, at different scales using EMD. The corresponding SVM prediction model is established for each IMF component and trend component, and the SVM model parameters are optimized with the artificial bee colony algorithm. Finally, the prediction results of each model are reconstructed, and the predicted values of the output power of the grid-connected PV system can be obtained. The prediction model is tested with actual data, and the results show that the power prediction model based on the EMD and ABC-SVM has a faster calculation speed and higher prediction accuracy than do the single SVM prediction model and the EMD-SVM prediction model without optimization. PMID:28912803
Research on Mechanical Fault Prediction Algorithm for Circuit Breaker Based on Sliding Time Window and ANN

NASA Astrophysics Data System (ADS)

Wang, Xiaohua; Rong, Mingzhe; Qiu, Juan; Liu, Dingxin; Su, Biao; Wu, Yi

A new type of algorithm for predicting the mechanical faults of a vacuum circuit breaker (VCB) based on an artificial neural network (ANN) is proposed in this paper. There are two types of mechanical faults in a VCB: operation mechanism faults and tripping circuit faults. An angle displacement sensor is used to measure the main axle angle displacement which reflects the displacement of the moving contact, to obtain the state of the operation mechanism in the VCB, while a Hall current sensor is used to measure the trip coil current, which reflects the operation state of the tripping circuit. Then an ANN prediction algorithm based on a sliding time window is proposed in this paper and successfully used to predict mechanical faults in a VCB. The research results in this paper provide a theoretical basis for the realization of online monitoring and fault diagnosis of a VCB.
Design and experiment of vehicular charger AC/DC system based on predictive control algorithm

NASA Astrophysics Data System (ADS)

He, Guangbi; Quan, Shuhai; Lu, Yuzhang

2018-06-01

For the car charging stage rectifier uncontrollable system, this paper proposes a predictive control algorithm of DC/DC converter based on the prediction model, established by the state space average method and its prediction model, obtained by the optimal mathematical description of mathematical calculation, to analysis prediction algorithm by Simulink simulation. The design of the structure of the car charging, at the request of the rated output power and output voltage adjustable control circuit, the first stage is the three-phase uncontrolled rectifier DC voltage Ud through the filter capacitor, after by using double-phase interleaved buck-boost circuit with wide range output voltage required value, analyzing its working principle and the the parameters for the design and selection of components. The analysis of current ripple shows that the double staggered parallel connection has the advantages of reducing the output current ripple and reducing the loss. The simulation experiment of the whole charging circuit is carried out by software, and the result is in line with the design requirements of the system. Finally combining the soft with hardware circuit to achieve charging of the system according to the requirements, experimental platform proved the feasibility and effectiveness of the proposed predictive control algorithm based on the car charging of the system, which is consistent with the simulation results.
Improving personalized link prediction by hybrid diffusion

NASA Astrophysics Data System (ADS)

Liu, Jin-Hu; Zhu, Yu-Xiao; Zhou, Tao

2016-04-01

Inspired by traditional link prediction and to solve the problem of recommending friends in social networks, we introduce the personalized link prediction in this paper, in which each individual will get equal number of diversiform predictions. While the performances of many classical algorithms are not satisfactory under this framework, thus new algorithms are in urgent need. Motivated by previous researches in other fields, we generalize heat conduction process to the framework of personalized link prediction and find that this method outperforms many classical similarity-based algorithms, especially in the performance of diversity. In addition, we demonstrate that adding one ground node that is supposed to connect all the nodes in the system will greatly benefit the performance of heat conduction. Finally, better hybrid algorithms composed of local random walk and heat conduction have been proposed. Numerical results show that the hybrid algorithms can outperform other algorithms simultaneously in all four adopted metrics: AUC, precision, recall and hamming distance. In a word, this work may shed some light on the in-depth understanding of the effect of physical processes in personalized link prediction.
Autoregressive-moving-average hidden Markov model for vision-based fall prediction-An application for walker robot.

PubMed

Taghvaei, Sajjad; Jahanandish, Mohammad Hasan; Kosuge, Kazuhiro

2017-01-01

Population aging of the societies requires providing the elderly with safe and dependable assistive technologies in daily life activities. Improving the fall detection algorithms can play a major role in achieving this goal. This article proposes a real-time fall prediction algorithm based on the acquired visual data of a user with walking assistive system from a depth sensor. In the lack of a coupled dynamic model of the human and the assistive walker a hybrid "system identification-machine learning" approach is used. An autoregressive-moving-average (ARMA) model is fitted on the time-series walking data to forecast the upcoming states, and a hidden Markov model (HMM) based classifier is built on the top of the ARMA model to predict falling in the upcoming time frames. The performance of the algorithm is evaluated through experiments with four subjects including an experienced physiotherapist while using a walker robot in five different falling scenarios; namely, fall forward, fall down, fall back, fall left, and fall right. The algorithm successfully predicts the fall with a rate of 84.72%.
Fundamental Algorithms of the Goddard Battery Model

NASA Technical Reports Server (NTRS)

Jagielski, J. M.

1985-01-01

The Goddard Space Flight Center (GSFC) is currently producing a computer model to predict Nickel Cadmium (NiCd) performance in a Low Earth Orbit (LEO) cycling regime. The model proper is currently still in development, but the inherent, fundamental algorithms (or methodologies) of the model are defined. At present, the model is closely dependent on empirical data and the data base currently used is of questionable accuracy. Even so, very good correlations have been determined between model predictions and actual cycling data. A more accurate and encompassing data base has been generated to serve dual functions: show the limitations of the current data base, and be inbred in the model properly for more accurate predictions. The fundamental algorithms of the model, and the present data base and its limitations, are described and a brief preliminary analysis of the new data base and its verification of the model's methodology are presented.
A comprehensive performance evaluation on the prediction results of existing cooperative transcription factors identification algorithms.

PubMed

Lai, Fu-Jou; Chang, Hong-Tsun; Huang, Yueh-Min; Wu, Wei-Sheng

2014-01-01

Eukaryotic transcriptional regulation is known to be highly connected through the networks of cooperative transcription factors (TFs). Measuring the cooperativity of TFs is helpful for understanding the biological relevance of these TFs in regulating genes. The recent advances in computational techniques led to various predictions of cooperative TF pairs in yeast. As each algorithm integrated different data resources and was developed based on different rationales, it possessed its own merit and claimed outperforming others. However, the claim was prone to subjectivity because each algorithm compared with only a few other algorithms and only used a small set of performance indices for comparison. This motivated us to propose a series of indices to objectively evaluate the prediction performance of existing algorithms. And based on the proposed performance indices, we conducted a comprehensive performance evaluation. We collected 14 sets of predicted cooperative TF pairs (PCTFPs) in yeast from 14 existing algorithms in the literature. Using the eight performance indices we adopted/proposed, the cooperativity of each PCTFP was measured and a ranking score according to the mean cooperativity of the set was given to each set of PCTFPs under evaluation for each performance index. It was seen that the ranking scores of a set of PCTFPs vary with different performance indices, implying that an algorithm used in predicting cooperative TF pairs is of strength somewhere but may be of weakness elsewhere. We finally made a comprehensive ranking for these 14 sets. The results showed that Wang J's study obtained the best performance evaluation on the prediction of cooperative TF pairs in yeast. In this study, we adopted/proposed eight performance indices to make a comprehensive performance evaluation on the prediction results of 14 existing cooperative TFs identification algorithms. Most importantly, these proposed indices can be easily applied to measure the performance of new algorithms developed in the future, thus expedite progress in this research field.
Limitations and potentials of current motif discovery algorithms

PubMed Central

Hu, Jianjun; Li, Bin; Kihara, Daisuke

2005-01-01

Computational methods for de novo identification of gene regulation elements, such as transcription factor binding sites, have proved to be useful for deciphering genetic regulatory networks. However, despite the availability of a large number of algorithms, their strengths and weaknesses are not sufficiently understood. Here, we designed a comprehensive set of performance measures and benchmarked five modern sequence-based motif discovery algorithms using large datasets generated from Escherichia coli RegulonDB. Factors that affect the prediction accuracy, scalability and reliability are characterized. It is revealed that the nucleotide and the binding site level accuracy are very low, while the motif level accuracy is relatively high, which indicates that the algorithms can usually capture at least one correct motif in an input sequence. To exploit diverse predictions from multiple runs of one or more algorithms, a consensus ensemble algorithm has been developed, which achieved 6–45% improvement over the base algorithms by increasing both the sensitivity and specificity. Our study illustrates limitations and potentials of existing sequence-based motif discovery algorithms. Taking advantage of the revealed potentials, several promising directions for further improvements are discussed. Since the sequence-based algorithms are the baseline of most of the modern motif discovery algorithms, this paper suggests substantial improvements would be possible for them. PMID:16284194
Genetic algorithm based adaptive neural network ensemble and its application in predicting carbon flux

USGS Publications Warehouse

Xue, Y.; Liu, S.; Hu, Y.; Yang, J.; Chen, Q.

2007-01-01

To improve the accuracy in prediction, Genetic Algorithm based Adaptive Neural Network Ensemble (GA-ANNE) is presented. Intersections are allowed between different training sets based on the fuzzy clustering analysis, which ensures the diversity as well as the accuracy of individual Neural Networks (NNs). Moreover, to improve the accuracy of the adaptive weights of individual NNs, GA is used to optimize the cluster centers. Empirical results in predicting carbon flux of Duke Forest reveal that GA-ANNE can predict the carbon flux more accurately than Radial Basis Function Neural Network (RBFNN), Bagging NN ensemble, and ANNE. ?? 2007 IEEE.
Assessing the external validity of algorithms to estimate EQ-5D-3L from the WOMAC.

PubMed

Kiadaliri, Aliasghar A; Englund, Martin

2016-10-04

The use of mapping algorithms have been suggested as a solution to predict health utilities when no preference-based measure is included in the study. However, validity and predictive performance of these algorithms are highly variable and hence assessing the accuracy and validity of algorithms before use them in a new setting is of importance. The aim of the current study was to assess the predictive accuracy of three mapping algorithms to estimate the EQ-5D-3L from the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) among Swedish people with knee disorders. Two of these algorithms developed using ordinary least squares (OLS) models and one developed using mixture model. The data from 1078 subjects mean (SD) age 69.4 (7.2) years with frequent knee pain and/or knee osteoarthritis from the Malmö Osteoarthritis study in Sweden were used. The algorithms' performance was assessed using mean error, mean absolute error, and root mean squared error. Two types of prediction were estimated for mixture model: weighted average (WA), and conditional on estimated component (CEC). The overall mean was overpredicted by an OLS model and underpredicted by two other algorithms (P < 0.001). All predictions but the CEC predictions of mixture model had a narrower range than the observed scores (22 to 90 %). All algorithms suffered from overprediction for severe health states and underprediction for mild health states with lesser extent for mixture model. While the mixture model outperformed OLS models at the extremes of the EQ-5D-3D distribution, it underperformed around the center of the distribution. While algorithm based on mixture model reflected the distribution of EQ-5D-3L data more accurately compared with OLS models, all algorithms suffered from systematic bias. This calls for caution in applying these mapping algorithms in a new setting particularly in samples with milder knee problems than original sample. Assessing the impact of the choice of these algorithms on cost-effectiveness studies through sensitivity analysis is recommended.
Mental Health Risk Adjustment with Clinical Categories and Machine Learning.

PubMed

Shrestha, Akritee; Bergquist, Savannah; Montz, Ellen; Rose, Sherri

2017-12-15

To propose nonparametric ensemble machine learning for mental health and substance use disorders (MHSUD) spending risk adjustment formulas, including considering Clinical Classification Software (CCS) categories as diagnostic covariates over the commonly used Hierarchical Condition Category (HCC) system. 2012-2013 Truven MarketScan database. We implement 21 algorithms to predict MHSUD spending, as well as a weighted combination of these algorithms called super learning. The algorithm collection included seven unique algorithms that were supplied with three differing sets of MHSUD-related predictors alongside demographic covariates: HCC, CCS, and HCC + CCS diagnostic variables. Performance was evaluated based on cross-validated R 2 and predictive ratios. Results show that super learning had the best performance based on both metrics. The top single algorithm was random forests, which improved on ordinary least squares regression by 10 percent with respect to relative efficiency. CCS categories-based formulas were generally more predictive of MHSUD spending compared to HCC-based formulas. Literature supports the potential benefit of implementing a separate MHSUD spending risk adjustment formula. Our results suggest there is an incentive to explore machine learning for MHSUD-specific risk adjustment, as well as considering CCS categories over HCCs. © Health Research and Educational Trust.
ATLAAS: an automatic decision tree-based learning algorithm for advanced image segmentation in positron emission tomography.

PubMed

Berthon, Beatrice; Marshall, Christopher; Evans, Mererid; Spezi, Emiliano

2016-07-07

Accurate and reliable tumour delineation on positron emission tomography (PET) is crucial for radiotherapy treatment planning. PET automatic segmentation (PET-AS) eliminates intra- and interobserver variability, but there is currently no consensus on the optimal method to use, as different algorithms appear to perform better for different types of tumours. This work aimed to develop a predictive segmentation model, trained to automatically select and apply the best PET-AS method, according to the tumour characteristics. ATLAAS, the automatic decision tree-based learning algorithm for advanced segmentation is based on supervised machine learning using decision trees. The model includes nine PET-AS methods and was trained on a 100 PET scans with known true contour. A decision tree was built for each PET-AS algorithm to predict its accuracy, quantified using the Dice similarity coefficient (DSC), according to the tumour volume, tumour peak to background SUV ratio and a regional texture metric. The performance of ATLAAS was evaluated for 85 PET scans obtained from fillable and printed subresolution sandwich phantoms. ATLAAS showed excellent accuracy across a wide range of phantom data and predicted the best or near-best segmentation algorithm in 93% of cases. ATLAAS outperformed all single PET-AS methods on fillable phantom data with a DSC of 0.881, while the DSC for H&N phantom data was 0.819. DSCs higher than 0.650 were achieved in all cases. ATLAAS is an advanced automatic image segmentation algorithm based on decision tree predictive modelling, which can be trained on images with known true contour, to predict the best PET-AS method when the true contour is unknown. ATLAAS provides robust and accurate image segmentation with potential applications to radiation oncology.
ATLAAS: an automatic decision tree-based learning algorithm for advanced image segmentation in positron emission tomography

NASA Astrophysics Data System (ADS)

Berthon, Beatrice; Marshall, Christopher; Evans, Mererid; Spezi, Emiliano

2016-07-01

Accurate and reliable tumour delineation on positron emission tomography (PET) is crucial for radiotherapy treatment planning. PET automatic segmentation (PET-AS) eliminates intra- and interobserver variability, but there is currently no consensus on the optimal method to use, as different algorithms appear to perform better for different types of tumours. This work aimed to develop a predictive segmentation model, trained to automatically select and apply the best PET-AS method, according to the tumour characteristics. ATLAAS, the automatic decision tree-based learning algorithm for advanced segmentation is based on supervised machine learning using decision trees. The model includes nine PET-AS methods and was trained on a 100 PET scans with known true contour. A decision tree was built for each PET-AS algorithm to predict its accuracy, quantified using the Dice similarity coefficient (DSC), according to the tumour volume, tumour peak to background SUV ratio and a regional texture metric. The performance of ATLAAS was evaluated for 85 PET scans obtained from fillable and printed subresolution sandwich phantoms. ATLAAS showed excellent accuracy across a wide range of phantom data and predicted the best or near-best segmentation algorithm in 93% of cases. ATLAAS outperformed all single PET-AS methods on fillable phantom data with a DSC of 0.881, while the DSC for H&N phantom data was 0.819. DSCs higher than 0.650 were achieved in all cases. ATLAAS is an advanced automatic image segmentation algorithm based on decision tree predictive modelling, which can be trained on images with known true contour, to predict the best PET-AS method when the true contour is unknown. ATLAAS provides robust and accurate image segmentation with potential applications to radiation oncology.
Choosing the appropriate forecasting model for predictive parameter control.

PubMed

Aleti, Aldeida; Moser, Irene; Meedeniya, Indika; Grunske, Lars

2014-01-01

All commonly used stochastic optimisation algorithms have to be parameterised to perform effectively. Adaptive parameter control (APC) is an effective method used for this purpose. APC repeatedly adjusts parameter values during the optimisation process for optimal algorithm performance. The assignment of parameter values for a given iteration is based on previously measured performance. In recent research, time series prediction has been proposed as a method of projecting the probabilities to use for parameter value selection. In this work, we examine the suitability of a variety of prediction methods for the projection of future parameter performance based on previous data. All considered prediction methods have assumptions the time series data has to conform to for the prediction method to provide accurate projections. Looking specifically at parameters of evolutionary algorithms (EAs), we find that all standard EA parameters with the exception of population size conform largely to the assumptions made by the considered prediction methods. Evaluating the performance of these prediction methods, we find that linear regression provides the best results by a very small and statistically insignificant margin. Regardless of the prediction method, predictive parameter control outperforms state of the art parameter control methods when the performance data adheres to the assumptions made by the prediction method. When a parameter's performance data does not adhere to the assumptions made by the forecasting method, the use of prediction does not have a notable adverse impact on the algorithm's performance.
The Role of miRNAs in the Progression of Prostate Cancer from Androgen-Dependent to Androgen-Independent Stages

DTIC Science & Technology

2012-09-01

regulated by miR-99a/let7c/125b-2 cluster. Using bioinformatic prediction algorithm TargetScan, we identified 7 genes that are commonly targeted by miR-99a...HPeak, a Hidden Markov Model (HMM)-based peak identifying algorithm (http://www.sph.umich.edu/csg/qin/HPeak/). Seven AR binding sites were reported by...and ARBS2 by ALGGEN- PROMO, a matrix algorithm for predicting transcription factor binding sites based on TRANSFAC (http://alggen.lsi.upc.es/cgi- bin

Cloud Based Metalearning System for Predictive Modeling of Biomedical Data

PubMed Central

Vukićević, Milan

2014-01-01

Rapid growth and storage of biomedical data enabled many opportunities for predictive modeling and improvement of healthcare processes. On the other side analysis of such large amounts of data is a difficult and computationally intensive task for most existing data mining algorithms. This problem is addressed by proposing a cloud based system that integrates metalearning framework for ranking and selection of best predictive algorithms for data at hand and open source big data technologies for analysis of biomedical data. PMID:24892101
Qualitative Event-Based Diagnosis: Case Study on the Second International Diagnostic Competition

NASA Technical Reports Server (NTRS)

Daigle, Matthew; Roychoudhury, Indranil

2010-01-01

We describe a diagnosis algorithm entered into the Second International Diagnostic Competition. We focus on the first diagnostic problem of the industrial track of the competition in which a diagnosis algorithm must detect, isolate, and identify faults in an electrical power distribution testbed and provide corresponding recovery recommendations. The diagnosis algorithm embodies a model-based approach, centered around qualitative event-based fault isolation. Faults produce deviations in measured values from model-predicted values. The sequence of these deviations is matched to those predicted by the model in order to isolate faults. We augment this approach with model-based fault identification, which determines fault parameters and helps to further isolate faults. We describe the diagnosis approach, provide diagnosis results from running the algorithm on provided example scenarios, and discuss the issues faced, and lessons learned, from implementing the approach
A Turn-Projected State-Based Conflict Resolution Algorithm

NASA Technical Reports Server (NTRS)

Butler, Ricky W.; Lewis, Timothy A.

2013-01-01

State-based conflict detection and resolution (CD&R) algorithms detect conflicts and resolve them on the basis on current state information without the use of additional intent information from aircraft flight plans. Therefore, the prediction of the trajectory of aircraft is based solely upon the position and velocity vectors of the traffic aircraft. Most CD&R algorithms project the traffic state using only the current state vectors. However, the past state vectors can be used to make a better prediction of the future trajectory of the traffic aircraft. This paper explores the idea of using past state vectors to detect traffic turns and resolve conflicts caused by these turns using a non-linear projection of the traffic state. A new algorithm based on this idea is presented and validated using a fast-time simulator developed for this study.
Deep learning improves prediction of CRISPR-Cpf1 guide RNA activity.

PubMed

Kim, Hui Kwon; Min, Seonwoo; Song, Myungjae; Jung, Soobin; Choi, Jae Woo; Kim, Younggwang; Lee, Sangeun; Yoon, Sungroh; Kim, Hyongbum Henry

2018-03-01

We present two algorithms to predict the activity of AsCpf1 guide RNAs. Indel frequencies for 15,000 target sequences were used in a deep-learning framework based on a convolutional neural network to train Seq-deepCpf1. We then incorporated chromatin accessibility information to create the better-performing DeepCpf1 algorithm for cell lines for which such information is available and show that both algorithms outperform previous machine learning algorithms on our own and published data sets.
A Sensor Dynamic Measurement Error Prediction Model Based on NAPSO-SVM

PubMed Central

Jiang, Minlan; Jiang, Lan; Jiang, Dingde; Li, Fei

2018-01-01

Dynamic measurement error correction is an effective way to improve sensor precision. Dynamic measurement error prediction is an important part of error correction, and support vector machine (SVM) is often used for predicting the dynamic measurement errors of sensors. Traditionally, the SVM parameters were always set manually, which cannot ensure the model’s performance. In this paper, a SVM method based on an improved particle swarm optimization (NAPSO) is proposed to predict the dynamic measurement errors of sensors. Natural selection and simulated annealing are added in the PSO to raise the ability to avoid local optima. To verify the performance of NAPSO-SVM, three types of algorithms are selected to optimize the SVM’s parameters: the particle swarm optimization algorithm (PSO), the improved PSO optimization algorithm (NAPSO), and the glowworm swarm optimization (GSO). The dynamic measurement error data of two sensors are applied as the test data. The root mean squared error and mean absolute percentage error are employed to evaluate the prediction models’ performances. The experimental results show that among the three tested algorithms the NAPSO-SVM method has a better prediction precision and a less prediction errors, and it is an effective method for predicting the dynamic measurement errors of sensors. PMID:29342942
Aircraft Engine Thrust Estimator Design Based on GSA-LSSVM

NASA Astrophysics Data System (ADS)

Sheng, Hanlin; Zhang, Tianhong

2017-08-01

In view of the necessity of highly precise and reliable thrust estimator to achieve direct thrust control of aircraft engine, based on support vector regression (SVR), as well as least square support vector machine (LSSVM) and a new optimization algorithm - gravitational search algorithm (GSA), by performing integrated modelling and parameter optimization, a GSA-LSSVM-based thrust estimator design solution is proposed. The results show that compared to particle swarm optimization (PSO) algorithm, GSA can find unknown optimization parameter better and enables the model developed with better prediction and generalization ability. The model can better predict aircraft engine thrust and thus fulfills the need of direct thrust control of aircraft engine.
The Behavioral and Neural Mechanisms Underlying the Tracking of Expertise

PubMed Central

Boorman, Erie D.; O’Doherty, John P.; Adolphs, Ralph; Rangel, Antonio

2013-01-01

Summary Evaluating the abilities of others is fundamental for successful economic and social behavior. We investigated the computational and neurobiological basis of ability tracking by designing an fMRI task that required participants to use and update estimates of both people and algorithms’ expertise through observation of their predictions. Behaviorally, we find a model-based algorithm characterized subject predictions better than several alternative models. Notably, when the agent’s prediction was concordant rather than discordant with the subject’s own likely prediction, participants credited people more than algorithms for correct predictions and penalized them less for incorrect predictions. Neurally, many components of the mentalizing network—medial prefrontal cortex, anterior cingulate gyrus, temporoparietal junction, and precuneus—represented or updated expertise beliefs about both people and algorithms. Moreover, activity in lateral orbitofrontal and medial prefrontal cortex reflected behavioral differences in learning about people and algorithms. These findings provide basic insights into the neural basis of social learning. PMID:24360551
Historical feature pattern extraction based network attack situation sensing algorithm.

PubMed

Zeng, Yong; Liu, Dacheng; Lei, Zhou

2014-01-01

The situation sequence contains a series of complicated and multivariate random trends, which are very sudden, uncertain, and difficult to recognize and describe its principle by traditional algorithms. To solve the above questions, estimating parameters of super long situation sequence is essential, but very difficult, so this paper proposes a situation prediction method based on historical feature pattern extraction (HFPE). First, HFPE algorithm seeks similar indications from the history situation sequence recorded and weighs the link intensity between occurred indication and subsequent effect. Then it calculates the probability that a certain effect reappears according to the current indication and makes a prediction after weighting. Meanwhile, HFPE method gives an evolution algorithm to derive the prediction deviation from the views of pattern and accuracy. This algorithm can continuously promote the adaptability of HFPE through gradual fine-tuning. The method preserves the rules in sequence at its best, does not need data preprocessing, and can track and adapt to the variation of situation sequence continuously.
Historical Feature Pattern Extraction Based Network Attack Situation Sensing Algorithm

PubMed Central

Zeng, Yong; Liu, Dacheng; Lei, Zhou

2014-01-01

The situation sequence contains a series of complicated and multivariate random trends, which are very sudden, uncertain, and difficult to recognize and describe its principle by traditional algorithms. To solve the above questions, estimating parameters of super long situation sequence is essential, but very difficult, so this paper proposes a situation prediction method based on historical feature pattern extraction (HFPE). First, HFPE algorithm seeks similar indications from the history situation sequence recorded and weighs the link intensity between occurred indication and subsequent effect. Then it calculates the probability that a certain effect reappears according to the current indication and makes a prediction after weighting. Meanwhile, HFPE method gives an evolution algorithm to derive the prediction deviation from the views of pattern and accuracy. This algorithm can continuously promote the adaptability of HFPE through gradual fine-tuning. The method preserves the rules in sequence at its best, does not need data preprocessing, and can track and adapt to the variation of situation sequence continuously. PMID:24892054
Adaptive Trajectory Prediction Algorithm for Climbing Flights

NASA Technical Reports Server (NTRS)

Schultz, Charles Alexander; Thipphavong, David P.; Erzberger, Heinz

2012-01-01

Aircraft climb trajectories are difficult to predict, and large errors in these predictions reduce the potential operational benefits of some advanced features for NextGen. The algorithm described in this paper improves climb trajectory prediction accuracy by adjusting trajectory predictions based on observed track data. It utilizes rate-of-climb and airspeed measurements derived from position data to dynamically adjust the aircraft weight modeled for trajectory predictions. In simulations with weight uncertainty, the algorithm is able to adapt to within 3 percent of the actual gross weight within two minutes of the initial adaptation. The root-mean-square of altitude errors for five-minute predictions was reduced by 73 percent. Conflict detection performance also improved, with a 15 percent reduction in missed alerts and a 10 percent reduction in false alerts. In a simulation with climb speed capture intent and weight uncertainty, the algorithm improved climb trajectory prediction accuracy by up to 30 percent and conflict detection performance, reducing missed and false alerts by up to 10 percent.
Analysis of Bioactive Amino Acids from Fish Hydrolysates with a New Bioinformatic Intelligent System Approach.

PubMed

Elaziz, Mohamed Abd; Hemdan, Ahmed Monem; Hassanien, AboulElla; Oliva, Diego; Xiong, Shengwu

2017-09-07

The current economics of the fish protein industry demand rapid, accurate and expressive prediction algorithms at every step of protein production especially with the challenge of global climate change. This help to predict and analyze functional and nutritional quality then consequently control food allergies in hyper allergic patients. As, it is quite expensive and time-consuming to know these concentrations by the lab experimental tests, especially to conduct large-scale projects. Therefore, this paper introduced a new intelligent algorithm using adaptive neuro-fuzzy inference system based on whale optimization algorithm. This algorithm is used to predict the concentration levels of bioactive amino acids in fish protein hydrolysates at different times during the year. The whale optimization algorithm is used to determine the optimal parameters in adaptive neuro-fuzzy inference system. The results of proposed algorithm are compared with others and it is indicated the higher performance of the proposed algorithm.
A learning-based autonomous driver: emulate human driver's intelligence in low-speed car following

NASA Astrophysics Data System (ADS)

Wei, Junqing; Dolan, John M.; Litkouhi, Bakhtiar

2010-04-01

In this paper, an offline learning mechanism based on the genetic algorithm is proposed for autonomous vehicles to emulate human driver behaviors. The autonomous driving ability is implemented based on a Prediction- and Cost function-Based algorithm (PCB). PCB is designed to emulate a human driver's decision process, which is modeled as traffic scenario prediction and evaluation. This paper focuses on using a learning algorithm to optimize PCB with very limited training data, so that PCB can have the ability to predict and evaluate traffic scenarios similarly to human drivers. 80 seconds of human driving data was collected in low-speed (< 30miles/h) car-following scenarios. In the low-speed car-following tests, PCB was able to perform more human-like carfollowing after learning. A more general 120 kilometer-long simulation showed that PCB performs robustly even in scenarios that are not part of the training set.
Link prediction based on local community properties

NASA Astrophysics Data System (ADS)

Yang, Xu-Hua; Zhang, Hai-Feng; Ling, Fei; Cheng, Zhi; Weng, Guo-Qing; Huang, Yu-Jiao

2016-09-01

The link prediction algorithm is one of the key technologies to reveal the inherent rule of network evolution. This paper proposes a novel link prediction algorithm based on the properties of the local community, which is composed of the common neighbor nodes of any two nodes in the network and the links between these nodes. By referring to the node degree and the condition of assortativity or disassortativity in a network, we comprehensively consider the effect of the shortest path and edge clustering coefficient within the local community on node similarity. We numerically show the proposed method provide good link prediction results.
Kalman/Map filtering-aided fast normalized cross correlation-based Wi-Fi fingerprinting location sensing.

PubMed

Sun, Yongliang; Xu, Yubin; Li, Cheng; Ma, Lin

2013-11-13

A Kalman/map filtering (KMF)-aided fast normalized cross correlation (FNCC)-based Wi-Fi fingerprinting location sensing system is proposed in this paper. Compared with conventional neighbor selection algorithms that calculate localization results with received signal strength (RSS) mean samples, the proposed FNCC algorithm makes use of all the on-line RSS samples and reference point RSS variations to achieve higher fingerprinting accuracy. The FNCC computes efficiently while maintaining the same accuracy as the basic normalized cross correlation. Additionally, a KMF is also proposed to process fingerprinting localization results. It employs a new map matching algorithm to nonlinearize the linear location prediction process of Kalman filtering (KF) that takes advantage of spatial proximities of consecutive localization results. With a calibration model integrated into an indoor map, the map matching algorithm corrects unreasonable prediction locations of the KF according to the building interior structure. Thus, more accurate prediction locations are obtained. Using these locations, the KMF considerably improves fingerprinting algorithm performance. Experimental results demonstrate that the FNCC algorithm with reduced computational complexity outperforms other neighbor selection algorithms and the KMF effectively improves location sensing accuracy by using indoor map information and spatial proximities of consecutive localization results.
Kalman/Map Filtering-Aided Fast Normalized Cross Correlation-Based Wi-Fi Fingerprinting Location Sensing

PubMed Central

Sun, Yongliang; Xu, Yubin; Li, Cheng; Ma, Lin

2013-01-01

A Kalman/map filtering (KMF)-aided fast normalized cross correlation (FNCC)-based Wi-Fi fingerprinting location sensing system is proposed in this paper. Compared with conventional neighbor selection algorithms that calculate localization results with received signal strength (RSS) mean samples, the proposed FNCC algorithm makes use of all the on-line RSS samples and reference point RSS variations to achieve higher fingerprinting accuracy. The FNCC computes efficiently while maintaining the same accuracy as the basic normalized cross correlation. Additionally, a KMF is also proposed to process fingerprinting localization results. It employs a new map matching algorithm to nonlinearize the linear location prediction process of Kalman filtering (KF) that takes advantage of spatial proximities of consecutive localization results. With a calibration model integrated into an indoor map, the map matching algorithm corrects unreasonable prediction locations of the KF according to the building interior structure. Thus, more accurate prediction locations are obtained. Using these locations, the KMF considerably improves fingerprinting algorithm performance. Experimental results demonstrate that the FNCC algorithm with reduced computational complexity outperforms other neighbor selection algorithms and the KMF effectively improves location sensing accuracy by using indoor map information and spatial proximities of consecutive localization results. PMID:24233027
NEP: web server for epitope prediction based on antibody neutralization of viral strains with diverse sequences

PubMed Central

Chuang, Gwo-Yu; Liou, David; Kwong, Peter D.; Georgiev, Ivelin S.

2014-01-01

Delineation of the antigenic site, or epitope, recognized by an antibody can provide clues about functional vulnerabilities and resistance mechanisms, and can therefore guide antibody optimization and epitope-based vaccine design. Previously, we developed an algorithm for antibody-epitope prediction based on antibody neutralization of viral strains with diverse sequences and validated the algorithm on a set of broadly neutralizing HIV-1 antibodies. Here we describe the implementation of this algorithm, NEP (Neutralization-based Epitope Prediction), as a web-based server. The users must supply as input: (i) an alignment of antigen sequences of diverse viral strains; (ii) neutralization data for the antibody of interest against the same set of antigen sequences; and (iii) (optional) a structure of the unbound antigen, for enhanced prediction accuracy. The prediction results can be downloaded or viewed interactively on the antigen structure (if supplied) from the web browser using a JSmol applet. Since neutralization experiments are typically performed as one of the first steps in the characterization of an antibody to determine its breadth and potency, the NEP server can be used to predict antibody-epitope information at no additional experimental costs. NEP can be accessed on the internet at http://exon.niaid.nih.gov/nep. PMID:24782517
Next Day Building Load Predictions based on Limited Input Features Using an On-Line Laterally Primed Adaptive Resonance Theory Artificial Neural Network.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jones, Christian Birk; Robinson, Matt; Yasaei, Yasser

Optimal integration of thermal energy storage within commercial building applications requires accurate load predictions. Several methods exist that provide an estimate of a buildings future needs. Methods include component-based models and data-driven algorithms. This work implemented a previously untested algorithm for this application that is called a Laterally Primed Adaptive Resonance Theory (LAPART) artificial neural network (ANN). The LAPART algorithm provided accurate results over a two month period where minimal historical data and a small amount of input types were available. These results are significant, because common practice has often overlooked the implementation of an ANN. ANN have often beenmore » perceived to be too complex and require large amounts of data to provide accurate results. The LAPART neural network was implemented in an on-line learning manner. On-line learning refers to the continuous updating of training data as time occurs. For this experiment, training began with a singe day and grew to two months of data. This approach provides a platform for immediate implementation that requires minimal time and effort. The results from the LAPART algorithm were compared with statistical regression and a component-based model. The comparison was based on the predictions linear relationship with the measured data, mean squared error, mean bias error, and cost savings achieved by the respective prediction techniques. The results show that the LAPART algorithm provided a reliable and cost effective means to predict the building load for the next day.« less
Can administrative health utilisation data provide an accurate diabetes prevalence estimate for a geographical region?

PubMed

Chan, Wing Cheuk; Papaconstantinou, Dean; Lee, Mildred; Telfer, Kendra; Jo, Emmanuel; Drury, Paul L; Tobias, Martin

2018-05-01

To validate the New Zealand Ministry of Health (MoH) Virtual Diabetes Register (VDR) using longitudinal laboratory results and to develop an improved algorithm for estimating diabetes prevalence at a population level. The assigned diabetes status of individuals based on the 2014 version of the MoH VDR is compared to the diabetes status based on the laboratory results stored in the Auckland regional laboratory result repository (TestSafe) using the New Zealand diabetes diagnostic criteria. The existing VDR algorithm is refined by reviewing the sensitivity and positive predictive value of the each of the VDR algorithm rules individually and as a combination. The diabetes prevalence estimate based on the original 2014 MoH VDR was 17% higher (n = 108,505) than the corresponding TestSafe prevalence estimate (n = 92,707). Compared to the diabetes prevalence based on TestSafe, the original VDR has a sensitivity of 89%, specificity of 96%, positive predictive value of 76% and negative predictive value of 98%. The modified VDR algorithm has improved the positive predictive value by 6.1% and the specificity by 1.4% with modest reductions in sensitivity of 2.2% and negative predictive value of 0.3%. At an aggregated level the overall diabetes prevalence estimated by the modified VDR is 5.7% higher than the corresponding estimate based on TestSafe. The Ministry of Health Virtual Diabetes Register algorithm has been refined to provide a more accurate diabetes prevalence estimate at a population level. The comparison highlights the potential value of a national population long term condition register constructed from both laboratory results and administrative data. Copyright © 2018 Elsevier B.V. All rights reserved.
A Grammatical Approach to RNA-RNA Interaction Prediction

NASA Astrophysics Data System (ADS)

Kato, Yuki; Akutsu, Tatsuya; Seki, Hiroyuki

2007-11-01

Much attention has been paid to two interacting RNA molecules involved in post-transcriptional control of gene expression. Although there have been a few studies on RNA-RNA interaction prediction based on dynamic programming algorithm, no grammar-based approach has been proposed. The purpose of this paper is to provide a new modeling for RNA-RNA interaction based on multiple context-free grammar (MCFG). We present a polynomial time parsing algorithm for finding the most likely derivation tree for the stochastic version of MCFG, which is applicable to RNA joint secondary structure prediction including kissing hairpin loops. Also, elementary tests on RNA-RNA interaction prediction have shown that the proposed method is comparable to Alkan et al.'s method.
A digital prediction algorithm for a single-phase boost PFC

NASA Astrophysics Data System (ADS)

Qing, Wang; Ning, Chen; Weifeng, Sun; Shengli, Lu; Longxing, Shi

2012-12-01

A novel digital control algorithm for digital control power factor correction is presented, which is called the prediction algorithm and has a feature of a higher PF (power factor) with lower total harmonic distortion, and a faster dynamic response with the change of the input voltage or load current. For a certain system, based on the current system state parameters, the prediction algorithm can estimate the track of the output voltage and the inductor current at the next switching cycle and get a set of optimized control sequences to perfectly track the trajectory of input voltage. The proposed prediction algorithm is verified at different conditions, and computer simulation and experimental results under multi-situations confirm the effectiveness of the prediction algorithm. Under the circumstances that the input voltage is in the range of 90-265 V and the load current in the range of 20%-100%, the PF value is larger than 0.998. The startup and the recovery times respectively are about 0.1 s and 0.02 s without overshoot. The experimental results also verify the validity of the proposed method.

Neural network-based run-to-run controller using exposure and resist thickness adjustment

NASA Astrophysics Data System (ADS)

Geary, Shane; Barry, Ronan

2003-06-01

This paper describes the development of a run-to-run control algorithm using a feedforward neural network, trained using the backpropagation training method. The algorithm is used to predict the critical dimension of the next lot using previous lot information. It is compared to a common prediction algorithm - the exponentially weighted moving average (EWMA) and is shown to give superior prediction performance in simulations. The manufacturing implementation of the final neural network showed significantly improved process capability when compared to the case where no run-to-run control was utilised.
SIFTER search: a web server for accurate phylogeny-based protein function prediction

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sahraeian, Sayed M.; Luo, Kevin R.; Brenner, Steven E.

We are awash in proteins discovered through high-throughput sequencing projects. As only a minuscule fraction of these have been experimentally characterized, computational methods are widely used for automated annotation. Here, we introduce a user-friendly web interface for accurate protein function prediction using the SIFTER algorithm. SIFTER is a state-of-the-art sequence-based gene molecular function prediction algorithm that uses a statistical model of function evolution to incorporate annotations throughout the phylogenetic tree. Due to the resources needed by the SIFTER algorithm, running SIFTER locally is not trivial for most users, especially for large-scale problems. The SIFTER web server thus provides access tomore » precomputed predictions on 16 863 537 proteins from 232 403 species. Users can explore SIFTER predictions with queries for proteins, species, functions, and homologs of sequences not in the precomputed prediction set. Lastly, the SIFTER web server is accessible at http://sifter.berkeley.edu/ and the source code can be downloaded.« less
SIFTER search: a web server for accurate phylogeny-based protein function prediction

DOE PAGES

Sahraeian, Sayed M.; Luo, Kevin R.; Brenner, Steven E.

2015-05-15

We are awash in proteins discovered through high-throughput sequencing projects. As only a minuscule fraction of these have been experimentally characterized, computational methods are widely used for automated annotation. Here, we introduce a user-friendly web interface for accurate protein function prediction using the SIFTER algorithm. SIFTER is a state-of-the-art sequence-based gene molecular function prediction algorithm that uses a statistical model of function evolution to incorporate annotations throughout the phylogenetic tree. Due to the resources needed by the SIFTER algorithm, running SIFTER locally is not trivial for most users, especially for large-scale problems. The SIFTER web server thus provides access tomore » precomputed predictions on 16 863 537 proteins from 232 403 species. Users can explore SIFTER predictions with queries for proteins, species, functions, and homologs of sequences not in the precomputed prediction set. Lastly, the SIFTER web server is accessible at http://sifter.berkeley.edu/ and the source code can be downloaded.« less
An improved stochastic fractal search algorithm for 3D protein structure prediction.

PubMed

Zhou, Changjun; Sun, Chuan; Wang, Bin; Wang, Xiaojun

2018-05-03

Protein structure prediction (PSP) is a significant area for biological information research, disease treatment, and drug development and so on. In this paper, three-dimensional structures of proteins are predicted based on the known amino acid sequences, and the structure prediction problem is transformed into a typical NP problem by an AB off-lattice model. This work applies a novel improved Stochastic Fractal Search algorithm (ISFS) to solve the problem. The Stochastic Fractal Search algorithm (SFS) is an effective evolutionary algorithm that performs well in exploring the search space but falls into local minimums sometimes. In order to avoid the weakness, Lvy flight and internal feedback information are introduced in ISFS. In the experimental process, simulations are conducted by ISFS algorithm on Fibonacci sequences and real peptide sequences. Experimental results prove that the ISFS performs more efficiently and robust in terms of finding the global minimum and avoiding getting stuck in local minimums.
Automated Assessment of Existing Patient's Revised Cardiac Risk Index Using Algorithmic Software.

PubMed

Hofer, Ira S; Cheng, Drew; Grogan, Tristan; Fujimoto, Yohei; Yamada, Takashige; Beck, Lauren; Cannesson, Maxime; Mahajan, Aman

2018-05-25

Previous work in the field of medical informatics has shown that rules-based algorithms can be created to identify patients with various medical conditions; however, these techniques have not been compared to actual clinician notes nor has the ability to predict complications been tested. We hypothesize that a rules-based algorithm can successfully identify patients with the diseases in the Revised Cardiac Risk Index (RCRI). Patients undergoing surgery at the University of California, Los Angeles Health System between April 1, 2013 and July 1, 2016 and who had at least 2 previous office visits were included. For each disease in the RCRI except renal failure-congestive heart failure, ischemic heart disease, cerebrovascular disease, and diabetes mellitus-diagnosis algorithms were created based on diagnostic and standard clinical treatment criteria. For each disease state, the prevalence of the disease as determined by the algorithm, International Classification of Disease (ICD) code, and anesthesiologist's preoperative note were determined. Additionally, 400 American Society of Anesthesiologists classes III and IV cases were randomly chosen for manual review by an anesthesiologist. The sensitivity, specificity, accuracy, positive predictive value, negative predictive value, and area under the receiver operating characteristic curve were determined using the manual review as a gold standard. Last, the ability of the RCRI as calculated by each of the methods to predict in-hospital mortality was determined, and the time necessary to run the algorithms was calculated. A total of 64,151 patients met inclusion criteria for the study. In general, the incidence of definite or likely disease determined by the algorithms was higher than that detected by the anesthesiologist. Additionally, in all disease states, the prevalence of disease was always lowest for the ICD codes, followed by the preoperative note, followed by the algorithms. In the subset of patients for whom the records were manually reviewed, the algorithms were generally the most sensitive and the ICD codes the most specific. When computing the modified RCRI using each of the methods, the modified RCRI from the algorithms predicted in-hospital mortality with an area under the receiver operating characteristic curve of 0.70 (0.67-0.73), which compared to 0.70 (0.67-0.72) for ICD codes and 0.64 (0.61-0.67) for the preoperative note. On average, the algorithms took 12.64 ± 1.20 minutes to run on 1.4 million patients. Rules-based algorithms for disease in the RCRI can be created that perform with a similar discriminative ability as compared to physician notes and ICD codes but with significantly increased economies of scale.
Developing a local least-squares support vector machines-based neuro-fuzzy model for nonlinear and chaotic time series prediction.

PubMed

Miranian, A; Abdollahzade, M

2013-02-01

Local modeling approaches, owing to their ability to model different operating regimes of nonlinear systems and processes by independent local models, seem appealing for modeling, identification, and prediction applications. In this paper, we propose a local neuro-fuzzy (LNF) approach based on the least-squares support vector machines (LSSVMs). The proposed LNF approach employs LSSVMs, which are powerful in modeling and predicting time series, as local models and uses hierarchical binary tree (HBT) learning algorithm for fast and efficient estimation of its parameters. The HBT algorithm heuristically partitions the input space into smaller subdomains by axis-orthogonal splits. In each partitioning, the validity functions automatically form a unity partition and therefore normalization side effects, e.g., reactivation, are prevented. Integration of LSSVMs into the LNF network as local models, along with the HBT learning algorithm, yield a high-performance approach for modeling and prediction of complex nonlinear time series. The proposed approach is applied to modeling and predictions of different nonlinear and chaotic real-world and hand-designed systems and time series. Analysis of the prediction results and comparisons with recent and old studies demonstrate the promising performance of the proposed LNF approach with the HBT learning algorithm for modeling and prediction of nonlinear and chaotic systems and time series.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Zawisza, I; Yan, H; Yin, F

Purpose: To assure that tumor motion is within the radiation field during high-dose and high-precision radiosurgery, real-time imaging and surrogate monitoring are employed. These methods are useful in providing real-time tumor/surrogate motion but no future information is available. In order to anticipate future tumor/surrogate motion and track target location precisely, an algorithm is developed and investigated for estimating surrogate motion multiple-steps ahead. Methods: The study utilized a one-dimensional surrogate motion signal divided into three components: (a) training component containing the primary data including the first frame to the beginning of the input subsequence; (b) input subsequence component of the surrogatemore » signal used as input to the prediction algorithm: (c) output subsequence component is the remaining signal used as the known output of the prediction algorithm for validation. The prediction algorithm consists of three major steps: (1) extracting subsequences from training component which best-match the input subsequence according to given criterion; (2) calculating weighting factors from these best-matched subsequence; (3) collecting the proceeding parts of the subsequences and combining them together with assigned weighting factors to form output. The prediction algorithm was examined for several patients, and its performance is assessed based on the correlation between prediction and known output. Results: Respiratory motion data was collected for 20 patients using the RPM system. The output subsequence is the last 50 samples (∼2 seconds) of a surrogate signal, and the input subsequence was 100 (∼3 seconds) frames prior to the output subsequence. Based on the analysis of correlation coefficient between predicted and known output subsequence, the average correlation is 0.9644±0.0394 and 0.9789±0.0239 for equal-weighting and relative-weighting strategies, respectively. Conclusion: Preliminary results indicate that the prediction algorithm is effective in estimating surrogate motion multiple-steps in advance. Relative-weighting method shows better prediction accuracy than equal-weighting method. More parameters of this algorithm are under investigation.« less
Derivation and Validation of a Biomarker-Based Clinical Algorithm to Rule Out Sepsis From Noninfectious Systemic Inflammatory Response Syndrome at Emergency Department Admission: A Multicenter Prospective Study.

PubMed

Mearelli, Filippo; Fiotti, Nicola; Giansante, Carlo; Casarsa, Chiara; Orso, Daniele; De Helmersen, Marco; Altamura, Nicola; Ruscio, Maurizio; Castello, Luigi Mario; Colonetti, Efrem; Marino, Rossella; Barbati, Giulia; Bregnocchi, Andrea; Ronco, Claudio; Lupia, Enrico; Montrucchio, Giuseppe; Muiesan, Maria Lorenza; Di Somma, Salvatore; Avanzi, Gian Carlo; Biolo, Gianni

2018-05-07

To derive and validate a predictive algorithm integrating a nomogram-based prediction of the pretest probability of infection with a panel of serum biomarkers, which could robustly differentiate sepsis/septic shock from noninfectious systemic inflammatory response syndrome. Multicenter prospective study. At emergency department admission in five University hospitals. Nine-hundred forty-seven adults in inception cohort and 185 adults in validation cohort. None. A nomogram, including age, Sequential Organ Failure Assessment score, recent antimicrobial therapy, hyperthermia, leukocytosis, and high C-reactive protein values, was built in order to take data from 716 infected patients and 120 patients with noninfectious systemic inflammatory response syndrome to predict pretest probability of infection. Then, the best combination of procalcitonin, soluble phospholypase A2 group IIA, presepsin, soluble interleukin-2 receptor α, and soluble triggering receptor expressed on myeloid cell-1 was applied in order to categorize patients as "likely" or "unlikely" to be infected. The predictive algorithm required only procalcitonin backed up with soluble phospholypase A2 group IIA determined in 29% of the patients to rule out sepsis/septic shock with a negative predictive value of 93%. In a validation cohort of 158 patients, predictive algorithm reached 100% of negative predictive value requiring biomarker measurements in 18% of the population. We have developed and validated a high-performing, reproducible, and parsimonious algorithm to assist emergency department physicians in distinguishing sepsis/septic shock from noninfectious systemic inflammatory response syndrome.
A systematic investigation of computation models for predicting Adverse Drug Reactions (ADRs).

PubMed

Kuang, Qifan; Wang, MinQi; Li, Rong; Dong, YongCheng; Li, Yizhou; Li, Menglong

2014-01-01

Early and accurate identification of adverse drug reactions (ADRs) is critically important for drug development and clinical safety. Computer-aided prediction of ADRs has attracted increasing attention in recent years, and many computational models have been proposed. However, because of the lack of systematic analysis and comparison of the different computational models, there remain limitations in designing more effective algorithms and selecting more useful features. There is therefore an urgent need to review and analyze previous computation models to obtain general conclusions that can provide useful guidance to construct more effective computational models to predict ADRs. In the current study, the main work is to compare and analyze the performance of existing computational methods to predict ADRs, by implementing and evaluating additional algorithms that have been earlier used for predicting drug targets. Our results indicated that topological and intrinsic features were complementary to an extent and the Jaccard coefficient had an important and general effect on the prediction of drug-ADR associations. By comparing the structure of each algorithm, final formulas of these algorithms were all converted to linear model in form, based on this finding we propose a new algorithm called the general weighted profile method and it yielded the best overall performance among the algorithms investigated in this paper. Several meaningful conclusions and useful findings regarding the prediction of ADRs are provided for selecting optimal features and algorithms.
Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity.

PubMed

Webb, Samuel J; Hanser, Thierry; Howlin, Brendan; Krause, Paul; Vessey, Jonathan D

2014-03-25

A new algorithm has been developed to enable the interpretation of black box models. The developed algorithm is agnostic to learning algorithm and open to all structural based descriptors such as fragments, keys and hashed fingerprints. The algorithm has provided meaningful interpretation of Ames mutagenicity predictions from both random forest and support vector machine models built on a variety of structural fingerprints.A fragmentation algorithm is utilised to investigate the model's behaviour on specific substructures present in the query. An output is formulated summarising causes of activation and deactivation. The algorithm is able to identify multiple causes of activation or deactivation in addition to identifying localised deactivations where the prediction for the query is active overall. No loss in performance is seen as there is no change in the prediction; the interpretation is produced directly on the model's behaviour for the specific query. Models have been built using multiple learning algorithms including support vector machine and random forest. The models were built on public Ames mutagenicity data and a variety of fingerprint descriptors were used. These models produced a good performance in both internal and external validation with accuracies around 82%. The models were used to evaluate the interpretation algorithm. Interpretation was revealed that links closely with understood mechanisms for Ames mutagenicity. This methodology allows for a greater utilisation of the predictions made by black box models and can expedite further study based on the output for a (quantitative) structure activity model. Additionally the algorithm could be utilised for chemical dataset investigation and knowledge extraction/human SAR development.
Prediction in complex systems: The case of the international trade network

NASA Astrophysics Data System (ADS)

Vidmer, Alexandre; Zeng, An; Medo, Matúš; Zhang, Yi-Cheng

2015-10-01

Predicting the future evolution of complex systems is one of the main challenges in complexity science. Based on a current snapshot of a network, link prediction algorithms aim to predict its future evolution. We apply here link prediction algorithms to data on the international trade between countries. This data can be represented as a complex network where links connect countries with the products that they export. Link prediction techniques based on heat and mass diffusion processes are employed to obtain predictions for products exported in the future. These baseline predictions are improved using a recent metric of country fitness and product similarity. The overall best results are achieved with a newly developed metric of product similarity which takes advantage of causality in the network evolution.
a Probability Model for Drought Prediction Using Fusion of Markov Chain and SAX Methods

NASA Astrophysics Data System (ADS)

Jouybari-Moghaddam, Y.; Saradjian, M. R.; Forati, A. M.

2017-09-01

Drought is one of the most powerful natural disasters which are affected on different aspects of the environment. Most of the time this phenomenon is immense in the arid and semi-arid area. Monitoring and prediction the severity of the drought can be useful in the management of the natural disaster caused by drought. Many indices were used in predicting droughts such as SPI, VCI, and TVX. In this paper, based on three data sets (rainfall, NDVI, and land surface temperature) which are acquired from MODIS satellite imagery, time series of SPI, VCI, and TVX in time limited between winters 2000 to summer 2015 for the east region of Isfahan province were created. Using these indices and fusion of symbolic aggregation approximation and hidden Markov chain drought was predicted for fall 2015. For this purpose, at first, each time series was transformed into the set of quality data based on the state of drought (5 group) by using SAX algorithm then the probability matrix for the future state was created by using Markov hidden chain. The fall drought severity was predicted by fusion the probability matrix and state of drought severity in summer 2015. The prediction based on the likelihood for each state of drought includes severe drought, middle drought, normal drought, severe wet and middle wet. The analysis and experimental result from proposed algorithm show that the product of this algorithm is acceptable and the proposed algorithm is appropriate and efficient for predicting drought using remote sensor data.
An improved shuffled frog leaping algorithm based evolutionary framework for currency exchange rate prediction

NASA Astrophysics Data System (ADS)

Dash, Rajashree

2017-11-01

Forecasting purchasing power of one currency with respect to another currency is always an interesting topic in the field of financial time series prediction. Despite the existence of several traditional and computational models for currency exchange rate forecasting, there is always a need for developing simpler and more efficient model, which will produce better prediction capability. In this paper, an evolutionary framework is proposed by using an improved shuffled frog leaping (ISFL) algorithm with a computationally efficient functional link artificial neural network (CEFLANN) for prediction of currency exchange rate. The model is validated by observing the monthly prediction measures obtained for three currency exchange data sets such as USD/CAD, USD/CHF, and USD/JPY accumulated within same period of time. The model performance is also compared with two other evolutionary learning techniques such as Shuffled frog leaping algorithm and Particle Swarm optimization algorithm. Practical analysis of results suggest that, the proposed model developed using the ISFL algorithm with CEFLANN network is a promising predictor model for currency exchange rate prediction compared to other models included in the study.
A New Inversion-Based Algorithm for Retrieval of Over-Water Rain Rate from SSM/I Multichannel Imagery

NASA Technical Reports Server (NTRS)

Petty, Grant W.; Stettner, David R.

1994-01-01

This paper discusses certain aspects of a new inversion based algorithm for the retrieval of rain rate over the open ocean from the special sensor microwave/imager (SSM/I) multichannel imagery. This algorithm takes a more detailed physical approach to the retrieval problem than previously discussed algorithms that perform explicit forward radiative transfer calculations based on detailed model hydrometer profiles and attempt to match the observations to the predicted brightness temperature.
Identifying Psoriasis and Psoriatic Arthritis Patients in Retrospective Databases When Diagnosis Codes Are Not Available: A Validation Study Comparing Medication/Prescriber Visit-Based Algorithms with Diagnosis Codes.

PubMed

Dobson-Belaire, Wendy; Goodfield, Jason; Borrelli, Richard; Liu, Fei Fei; Khan, Zeba M

2018-01-01

Using diagnosis code-based algorithms is the primary method of identifying patient cohorts for retrospective studies; nevertheless, many databases lack reliable diagnosis code information. To develop precise algorithms based on medication claims/prescriber visits (MCs/PVs) to identify psoriasis (PsO) patients and psoriatic patients with arthritic conditions (PsO-AC), a proxy for psoriatic arthritis, in Canadian databases lacking diagnosis codes. Algorithms were developed using medications with narrow indication profiles in combination with prescriber specialty to define PsO and PsO-AC. For a 3-year study period from July 1, 2009, algorithms were validated using the PharMetrics Plus database, which contains both adjudicated medication claims and diagnosis codes. Positive predictive value (PPV), negative predictive value (NPV), sensitivity, and specificity of the developed algorithms were assessed using diagnosis code as the reference standard. Chosen algorithms were then applied to Canadian drug databases to profile the algorithm-identified PsO and PsO-AC cohorts. In the selected database, 183,328 patients were identified for validation. The highest PPVs for PsO (85%) and PsO-AC (65%) occurred when a predictive algorithm of two or more MCs/PVs was compared with the reference standard of one or more diagnosis codes. NPV and specificity were high (99%-100%), whereas sensitivity was low (≤30%). Reducing the number of MCs/PVs or increasing diagnosis claims decreased the algorithms' PPVs. We have developed an MC/PV-based algorithm to identify PsO patients with a high degree of accuracy, but accuracy for PsO-AC requires further investigation. Such methods allow researchers to conduct retrospective studies in databases in which diagnosis codes are absent. Copyright © 2018 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.
A utility/cost analysis of breast cancer risk prediction algorithms

NASA Astrophysics Data System (ADS)

Abbey, Craig K.; Wu, Yirong; Burnside, Elizabeth S.; Wunderlich, Adam; Samuelson, Frank W.; Boone, John M.

2016-03-01

Breast cancer risk prediction algorithms are used to identify subpopulations that are at increased risk for developing breast cancer. They can be based on many different sources of data such as demographics, relatives with cancer, gene expression, and various phenotypic features such as breast density. Women who are identified as high risk may undergo a more extensive (and expensive) screening process that includes MRI or ultrasound imaging in addition to the standard full-field digital mammography (FFDM) exam. Given that there are many ways that risk prediction may be accomplished, it is of interest to evaluate them in terms of expected cost, which includes the costs of diagnostic outcomes. In this work we perform an expected-cost analysis of risk prediction algorithms that is based on a published model that includes the costs associated with diagnostic outcomes (true-positive, false-positive, etc.). We assume the existence of a standard screening method and an enhanced screening method with higher scan cost, higher sensitivity, and lower specificity. We then assess expected cost of using a risk prediction algorithm to determine who gets the enhanced screening method under the strong assumption that risk and diagnostic performance are independent. We find that if risk prediction leads to a high enough positive predictive value, it will be cost-effective regardless of the size of the subpopulation. Furthermore, in terms of the hit-rate and false-alarm rate of the of the risk prediction algorithm, iso-cost contours are lines with slope determined by properties of the available diagnostic systems for screening.
An unsupervised classification scheme for improving predictions of prokaryotic TIS.

PubMed

Tech, Maike; Meinicke, Peter

2006-03-09

Although it is not difficult for state-of-the-art gene finders to identify coding regions in prokaryotic genomes, exact prediction of the corresponding translation initiation sites (TIS) is still a challenging problem. Recently a number of post-processing tools have been proposed for improving the annotation of prokaryotic TIS. However, inherent difficulties of these approaches arise from the considerable variation of TIS characteristics across different species. Therefore prior assumptions about the properties of prokaryotic gene starts may cause suboptimal predictions for newly sequenced genomes with TIS signals differing from those of well-investigated genomes. We introduce a clustering algorithm for completely unsupervised scoring of potential TIS, based on positionally smoothed probability matrices. The algorithm requires an initial gene prediction and the genomic sequence of the organism to perform the reannotation. As compared with other methods for improving predictions of gene starts in bacterial genomes, our approach is not based on any specific assumptions about prokaryotic TIS. Despite the generality of the underlying algorithm, the prediction rate of our method is competitive on experimentally verified test data from E. coli and B. subtilis. Regarding genomes with high G+C content, in contrast to some previously proposed methods, our algorithm also provides good performance on P. aeruginosa, B. pseudomallei and R. solanacearum. On reliable test data we showed that our method provides good results in post-processing the predictions of the widely-used program GLIMMER. The underlying clustering algorithm is robust with respect to variations in the initial TIS annotation and does not require specific assumptions about prokaryotic gene starts. These features are particularly useful on genomes with high G+C content. The algorithm has been implemented in the tool "TICO" (TIs COrrector) which is publicly available from our web site.
Application of Avco data analysis and prediction techniques (ADAPT) to prediction of sunspot activity

NASA Technical Reports Server (NTRS)

Hunter, H. E.; Amato, R. A.

1972-01-01

The results are presented of the application of Avco Data Analysis and Prediction Techniques (ADAPT) to derivation of new algorithms for the prediction of future sunspot activity. The ADAPT derived algorithms show a factor of 2 to 3 reduction in the expected 2-sigma errors in the estimates of the 81-day running average of the Zurich sunspot numbers. The report presents: (1) the best estimates for sunspot cycles 20 and 21, (2) a comparison of the ADAPT performance with conventional techniques, and (3) specific approaches to further reduction in the errors of estimated sunspot activity and to recovery of earlier sunspot historical data. The ADAPT programs are used both to derive regression algorithm for prediction of the entire 11-year sunspot cycle from the preceding two cycles and to derive extrapolation algorithms for extrapolating a given sunspot cycle based on any available portion of the cycle.
Social Media: Menagerie of Metrics

DTIC Science & Technology

2010-01-27

intelligence, an evolutionary algorithm (EA) is a subset of evolutionary computation, a generic population-based metaheuristic optimization algorithm . An EA...Cloning - 22 Animals were cloned to date; genetic algorithms can help prediction (e.g. “elitism” - attempts to ensure selection by including performers...28, 2010 Evolutionary Algorithm • Evolutionary algorithm From Wikipedia, the free encyclopedia Artificial intelligence portal In artificial
A prediction algorithm for first onset of major depression in the general population: development and validation.

PubMed

Wang, JianLi; Sareen, Jitender; Patten, Scott; Bolton, James; Schmitz, Norbert; Birney, Arden

2014-05-01

Prediction algorithms are useful for making clinical decisions and for population health planning. However, such prediction algorithms for first onset of major depression do not exist. The objective of this study was to develop and validate a prediction algorithm for first onset of major depression in the general population. Longitudinal study design with approximate 3-year follow-up. The study was based on data from a nationally representative sample of the US general population. A total of 28 059 individuals who participated in Waves 1 and 2 of the US National Epidemiologic Survey on Alcohol and Related Conditions and who had not had major depression at Wave 1 were included. The prediction algorithm was developed using logistic regression modelling in 21 813 participants from three census regions. The algorithm was validated in participants from the 4th census region (n=6246). Major depression occurred since Wave 1 of the National Epidemiologic Survey on Alcohol and Related Conditions, assessed by the Alcohol Use Disorder and Associated Disabilities Interview Schedule-diagnostic and statistical manual for mental disorders IV. A prediction algorithm containing 17 unique risk factors was developed. The algorithm had good discriminative power (C statistics=0.7538, 95% CI 0.7378 to 0.7699) and excellent calibration (F-adjusted test=1.00, p=0.448) with the weighted data. In the validation sample, the algorithm had a C statistic of 0.7259 and excellent calibration (Hosmer-Lemeshow χ(2)=3.41, p=0.906). The developed prediction algorithm has good discrimination and calibration capacity. It can be used by clinicians, mental health policy-makers and service planners and the general public to predict future risk of having major depression. The application of the algorithm may lead to increased personalisation of treatment, better clinical decisions and more optimal mental health service planning.

Prediction of protein-protein interaction network using a multi-objective optimization approach.

PubMed

Chowdhury, Archana; Rakshit, Pratyusha; Konar, Amit

2016-06-01

Protein-Protein Interactions (PPIs) are very important as they coordinate almost all cellular processes. This paper attempts to formulate PPI prediction problem in a multi-objective optimization framework. The scoring functions for the trial solution deal with simultaneous maximization of functional similarity, strength of the domain interaction profiles, and the number of common neighbors of the proteins predicted to be interacting. The above optimization problem is solved using the proposed Firefly Algorithm with Nondominated Sorting. Experiments undertaken reveal that the proposed PPI prediction technique outperforms existing methods, including gene ontology-based Relative Specific Similarity, multi-domain-based Domain Cohesion Coupling method, domain-based Random Decision Forest method, Bagging with REP Tree, and evolutionary/swarm algorithm-based approaches, with respect to sensitivity, specificity, and F1 score.
Algorithm for predicting the evolution of series of dynamics of complex systems in solving information problems

NASA Astrophysics Data System (ADS)

Kasatkina, T. I.; Dushkin, A. V.; Pavlov, V. A.; Shatovkin, R. R.

2018-03-01

In the development of information, systems and programming to predict the series of dynamics, neural network methods have recently been applied. They are more flexible, in comparison with existing analogues and are capable of taking into account the nonlinearities of the series. In this paper, we propose a modified algorithm for predicting the series of dynamics, which includes a method for training neural networks, an approach to describing and presenting input data, based on the prediction by the multilayer perceptron method. To construct a neural network, the values of a series of dynamics at the extremum points and time values corresponding to them, formed based on the sliding window method, are used as input data. The proposed algorithm can act as an independent approach to predicting the series of dynamics, and be one of the parts of the forecasting system. The efficiency of predicting the evolution of the dynamics series for a short-term one-step and long-term multi-step forecast by the classical multilayer perceptron method and a modified algorithm using synthetic and real data is compared. The result of this modification was the minimization of the magnitude of the iterative error that arises from the previously predicted inputs to the inputs to the neural network, as well as the increase in the accuracy of the iterative prediction of the neural network.
Improved hybrid optimization algorithm for 3D protein structure prediction.

PubMed

Zhou, Changjun; Hou, Caixia; Wei, Xiaopeng; Zhang, Qiang

2014-07-01

A new improved hybrid optimization algorithm - PGATS algorithm, which is based on toy off-lattice model, is presented for dealing with three-dimensional protein structure prediction problems. The algorithm combines the particle swarm optimization (PSO), genetic algorithm (GA), and tabu search (TS) algorithms. Otherwise, we also take some different improved strategies. The factor of stochastic disturbance is joined in the particle swarm optimization to improve the search ability; the operations of crossover and mutation that are in the genetic algorithm are changed to a kind of random liner method; at last tabu search algorithm is improved by appending a mutation operator. Through the combination of a variety of strategies and algorithms, the protein structure prediction (PSP) in a 3D off-lattice model is achieved. The PSP problem is an NP-hard problem, but the problem can be attributed to a global optimization problem of multi-extremum and multi-parameters. This is the theoretical principle of the hybrid optimization algorithm that is proposed in this paper. The algorithm combines local search and global search, which overcomes the shortcoming of a single algorithm, giving full play to the advantage of each algorithm. In the current universal standard sequences, Fibonacci sequences and real protein sequences are certified. Experiments show that the proposed new method outperforms single algorithms on the accuracy of calculating the protein sequence energy value, which is proved to be an effective way to predict the structure of proteins.
Design, implementation and evaluation of a practical pseudoknot folding algorithm based on thermodynamics

PubMed Central

Reeder, Jens; Giegerich, Robert

2004-01-01

Background The general problem of RNA secondary structure prediction under the widely used thermodynamic model is known to be NP-complete when the structures considered include arbitrary pseudoknots. For restricted classes of pseudoknots, several polynomial time algorithms have been designed, where the O(n6)time and O(n4) space algorithm by Rivas and Eddy is currently the best available program. Results We introduce the class of canonical simple recursive pseudoknots and present an algorithm that requires O(n4) time and O(n2) space to predict the energetically optimal structure of an RNA sequence, possible containing such pseudoknots. Evaluation against a large collection of known pseudoknotted structures shows the adequacy of the canonization approach and our algorithm. Conclusions RNA pseudoknots of medium size can now be predicted reliably as well as efficiently by the new algorithm. PMID:15294028
Predicting drug-target interactions by dual-network integrated logistic matrix factorization

NASA Astrophysics Data System (ADS)

Hao, Ming; Bryant, Stephen H.; Wang, Yanli

2017-01-01

In this work, we propose a dual-network integrated logistic matrix factorization (DNILMF) algorithm to predict potential drug-target interactions (DTI). The prediction procedure consists of four steps: (1) inferring new drug/target profiles and constructing profile kernel matrix; (2) diffusing drug profile kernel matrix with drug structure kernel matrix; (3) diffusing target profile kernel matrix with target sequence kernel matrix; and (4) building DNILMF model and smoothing new drug/target predictions based on their neighbors. We compare our algorithm with the state-of-the-art method based on the benchmark dataset. Results indicate that the DNILMF algorithm outperforms the previously reported approaches in terms of AUPR (area under precision-recall curve) and AUC (area under curve of receiver operating characteristic) based on the 5 trials of 10-fold cross-validation. We conclude that the performance improvement depends on not only the proposed objective function, but also the used nonlinear diffusion technique which is important but under studied in the DTI prediction field. In addition, we also compile a new DTI dataset for increasing the diversity of currently available benchmark datasets. The top prediction results for the new dataset are confirmed by experimental studies or supported by other computational research.
City traffic flow breakdown prediction based on fuzzy rough set

NASA Astrophysics Data System (ADS)

Yang, Xu; Da-wei, Hu; Bing, Su; Duo-jia, Zhang

2017-05-01

In city traffic management, traffic breakdown is a very important issue, which is defined as a speed drop of a certain amount within a dense traffic situation. In order to predict city traffic flow breakdown accurately, in this paper, we propose a novel city traffic flow breakdown prediction algorithm based on fuzzy rough set. Firstly, we illustrate the city traffic flow breakdown problem, in which three definitions are given, that is, 1) Pre-breakdown flow rate, 2) Rate, density, and speed of the traffic flow breakdown, and 3) Duration of the traffic flow breakdown. Moreover, we define a hazard function to represent the probability of the breakdown ending at a given time point. Secondly, as there are many redundant and irrelevant attributes in city flow breakdown prediction, we propose an attribute reduction algorithm using the fuzzy rough set. Thirdly, we discuss how to predict the city traffic flow breakdown based on attribute reduction and SVM classifier. Finally, experiments are conducted by collecting data from I-405 Freeway, which is located at Irvine, California. Experimental results demonstrate that the proposed algorithm is able to achieve lower average error rate of city traffic flow breakdown prediction.
A statistical analysis of RNA folding algorithms through thermodynamic parameter perturbation.

PubMed

Layton, D M; Bundschuh, R

2005-01-01

Computational RNA secondary structure prediction is rather well established. However, such prediction algorithms always depend on a large number of experimentally measured parameters. Here, we study how sensitive structure prediction algorithms are to changes in these parameters. We found already that for changes corresponding to the actual experimental error to which these parameters have been determined, 30% of the structure are falsely predicted whereas the ground state structure is preserved under parameter perturbation in only 5% of all the cases. We establish that base-pairing probabilities calculated in a thermal ensemble are viable although not a perfect measure for the reliability of the prediction of individual structure elements. Here, a new measure of stability using parameter perturbation is proposed, and its limitations are discussed.
I-TASSER: fully automated protein structure prediction in CASP8.

PubMed

Zhang, Yang

2009-01-01

The I-TASSER algorithm for 3D protein structure prediction was tested in CASP8, with the procedure fully automated in both the Server and Human sections. The quality of the server models is close to that of human ones but the human predictions incorporate more diverse templates from other servers which improve the human predictions in some of the distant homology targets. For the first time, the sequence-based contact predictions from machine learning techniques are found helpful for both template-based modeling (TBM) and template-free modeling (FM). In TBM, although the accuracy of the sequence based contact predictions is on average lower than that from template-based ones, the novel contacts in the sequence-based predictions, which are complementary to the threading templates in the weakly or unaligned regions, are important to improve the global and local packing in these regions. Moreover, the newly developed atomic structural refinement algorithm was tested in CASP8 and found to improve the hydrogen-bonding networks and the overall TM-score, which is mainly due to its ability of removing steric clashes so that the models can be generated from cluster centroids. Nevertheless, one of the major issues of the I-TASSER pipeline is the model selection where the best models could not be appropriately recognized when the correct templates are detected only by the minority of the threading algorithms. There are also problems related with domain-splitting and mirror image recognition which mainly influences the performance of I-TASSER modeling in the FM-based structure predictions. Copyright 2009 Wiley-Liss, Inc.
An Injury Severity-, Time Sensitivity-, and Predictability-Based Advanced Automatic Crash Notification Algorithm Improves Motor Vehicle Crash Occupant Triage.

PubMed

Stitzel, Joel D; Weaver, Ashley A; Talton, Jennifer W; Barnard, Ryan T; Schoell, Samantha L; Doud, Andrea N; Martin, R Shayn; Meredith, J Wayne

2016-06-01

Advanced Automatic Crash Notification algorithms use vehicle telemetry measurements to predict risk of serious motor vehicle crash injury. The objective of the study was to develop an Advanced Automatic Crash Notification algorithm to reduce response time, increase triage efficiency, and improve patient outcomes by minimizing undertriage (<5%) and overtriage (<50%), as recommended by the American College of Surgeons. A list of injuries associated with a patient's need for Level I/II trauma center treatment known as the Target Injury List was determined using an approach based on 3 facets of injury: severity, time sensitivity, and predictability. Multivariable logistic regression was used to predict an occupant's risk of sustaining an injury on the Target Injury List based on crash severity and restraint factors for occupants in the National Automotive Sampling System - Crashworthiness Data System 2000-2011. The Advanced Automatic Crash Notification algorithm was optimized and evaluated to minimize triage rates, per American College of Surgeons recommendations. The following rates were achieved: <50% overtriage and <5% undertriage in side impacts and 6% to 16% undertriage in other crash modes. Nationwide implementation of our algorithm is estimated to improve triage decisions for 44% of undertriaged and 38% of overtriaged occupants. Annually, this translates to more appropriate care for >2,700 seriously injured occupants and reduces unnecessary use of trauma center resources for >162,000 minimally injured occupants. The algorithm could be incorporated into vehicles to inform emergency personnel of recommended motor vehicle crash triage decisions. Lower under- and overtriage was achieved, and nationwide implementation of the algorithm would yield improved triage decision making for an estimated 165,000 occupants annually. Copyright © 2016. Published by Elsevier Inc.
Applying network analysis and Nebula (neighbor-edges based and unbiased leverage algorithm) to ToxCast data.

PubMed

Ye, Hao; Luo, Heng; Ng, Hui Wen; Meehan, Joe; Ge, Weigong; Tong, Weida; Hong, Huixiao

2016-01-01

ToxCast data have been used to develop models for predicting in vivo toxicity. To predict the in vivo toxicity of a new chemical using a ToxCast data based model, its ToxCast bioactivity data are needed but not normally available. The capability of predicting ToxCast bioactivity data is necessary to fully utilize ToxCast data in the risk assessment of chemicals. We aimed to understand and elucidate the relationships between the chemicals and bioactivity data of the assays in ToxCast and to develop a network analysis based method for predicting ToxCast bioactivity data. We conducted modularity analysis on a quantitative network constructed from ToxCast data to explore the relationships between the assays and chemicals. We further developed Nebula (neighbor-edges based and unbiased leverage algorithm) for predicting ToxCast bioactivity data. Modularity analysis on the network constructed from ToxCast data yielded seven modules. Assays and chemicals in the seven modules were distinct. Leave-one-out cross-validation yielded a Q(2) of 0.5416, indicating ToxCast bioactivity data can be predicted by Nebula. Prediction domain analysis showed some types of ToxCast assay data could be more reliably predicted by Nebula than others. Network analysis is a promising approach to understand ToxCast data. Nebula is an effective algorithm for predicting ToxCast bioactivity data, helping fully utilize ToxCast data in the risk assessment of chemicals. Published by Elsevier Ltd.
A content-boosted collaborative filtering algorithm for personalized training in interpretation of radiological imaging.

PubMed

Lin, Hongli; Yang, Xuedong; Wang, Weisheng

2014-08-01

Devising a method that can select cases based on the performance levels of trainees and the characteristics of cases is essential for developing a personalized training program in radiology education. In this paper, we propose a novel hybrid prediction algorithm called content-boosted collaborative filtering (CBCF) to predict the difficulty level of each case for each trainee. The CBCF utilizes a content-based filtering (CBF) method to enhance existing trainee-case ratings data and then provides final predictions through a collaborative filtering (CF) algorithm. The CBCF algorithm incorporates the advantages of both CBF and CF, while not inheriting the disadvantages of either. The CBCF method is compared with the pure CBF and pure CF approaches using three datasets. The experimental data are then evaluated in terms of the MAE metric. Our experimental results show that the CBCF outperforms the pure CBF and CF methods by 13.33 and 12.17 %, respectively, in terms of prediction precision. This also suggests that the CBCF can be used in the development of personalized training systems in radiology education.
sNebula, a network-based algorithm to predict binding between human leukocyte antigens and peptides

PubMed Central

Luo, Heng; Ye, Hao; Ng, Hui Wen; Sakkiah, Sugunadevi; Mendrick, Donna L.; Hong, Huixiao

2016-01-01

Understanding the binding between human leukocyte antigens (HLAs) and peptides is important to understand the functioning of the immune system. Since it is time-consuming and costly to measure the binding between large numbers of HLAs and peptides, computational methods including machine learning models and network approaches have been developed to predict HLA-peptide binding. However, there are several limitations for the existing methods. We developed a network-based algorithm called sNebula to address these limitations. We curated qualitative Class I HLA-peptide binding data and demonstrated the prediction performance of sNebula on this dataset using leave-one-out cross-validation and five-fold cross-validations. This algorithm can predict not only peptides of different lengths and different types of HLAs, but also the peptides or HLAs that have no existing binding data. We believe sNebula is an effective method to predict HLA-peptide binding and thus improve our understanding of the immune system. PMID:27558848
The lucky image-motion prediction for simple scene observation based soft-sensor technology

NASA Astrophysics Data System (ADS)

Li, Yan; Su, Yun; Hu, Bin

2015-08-01

High resolution is important to earth remote sensors, while the vibration of the platforms of the remote sensors is a major factor restricting high resolution imaging. The image-motion prediction and real-time compensation are key technologies to solve this problem. For the reason that the traditional autocorrelation image algorithm cannot meet the demand for the simple scene image stabilization, this paper proposes to utilize soft-sensor technology in image-motion prediction, and focus on the research of algorithm optimization in imaging image-motion prediction. Simulations results indicate that the improving lucky image-motion stabilization algorithm combining the Back Propagation Network (BP NN) and support vector machine (SVM) is the most suitable for the simple scene image stabilization. The relative error of the image-motion prediction based the soft-sensor technology is below 5%, the training computing speed of the mathematical predication model is as fast as the real-time image stabilization in aerial photography.
Voidage correction algorithm for unresolved Euler-Lagrange simulations

NASA Astrophysics Data System (ADS)

Askarishahi, Maryam; Salehi, Mohammad-Sadegh; Radl, Stefan

2018-04-01

The effect of grid coarsening on the predicted total drag force and heat exchange rate in dense gas-particle flows is investigated using Euler-Lagrange (EL) approach. We demonstrate that grid coarsening may reduce the predicted total drag force and exchange rate. Surprisingly, exchange coefficients predicted by the EL approach deviate more significantly from the exact value compared to results of Euler-Euler (EE)-based calculations. The voidage gradient is identified as the root cause of this peculiar behavior. Consequently, we propose a correction algorithm based on a sigmoidal function to predict the voidage experienced by individual particles. Our correction algorithm can significantly improve the prediction of exchange coefficients in EL models, which is tested for simulations involving Euler grid cell sizes between 2d_p and 12d_p . It is most relevant in simulations of dense polydisperse particle suspensions featuring steep voidage profiles. For these suspensions, classical approaches may result in an error of the total exchange rate of up to 30%.
Accelerated probabilistic inference of RNA structure evolution

PubMed Central

Holmes, Ian

2005-01-01

Background Pairwise stochastic context-free grammars (Pair SCFGs) are powerful tools for evolutionary analysis of RNA, including simultaneous RNA sequence alignment and secondary structure prediction, but the associated algorithms are intensive in both CPU and memory usage. The same problem is faced by other RNA alignment-and-folding algorithms based on Sankoff's 1985 algorithm. It is therefore desirable to constrain such algorithms, by pre-processing the sequences and using this first pass to limit the range of structures and/or alignments that can be considered. Results We demonstrate how flexible classes of constraint can be imposed, greatly reducing the computational costs while maintaining a high quality of structural homology prediction. Any score-attributed context-free grammar (e.g. energy-based scoring schemes, or conditionally normalized Pair SCFGs) is amenable to this treatment. It is now possible to combine independent structural and alignment constraints of unprecedented general flexibility in Pair SCFG alignment algorithms. We outline several applications to the bioinformatics of RNA sequence and structure, including Waterman-Eggert N-best alignments and progressive multiple alignment. We evaluate the performance of the algorithm on test examples from the RFAM database. Conclusion A program, Stemloc, that implements these algorithms for efficient RNA sequence alignment and structure prediction is available under the GNU General Public License. PMID:15790387
Load balancing prediction method of cloud storage based on analytic hierarchy process and hybrid hierarchical genetic algorithm.

PubMed

Zhou, Xiuze; Lin, Fan; Yang, Lvqing; Nie, Jing; Tan, Qian; Zeng, Wenhua; Zhang, Nian

2016-01-01

With the continuous expansion of the cloud computing platform scale and rapid growth of users and applications, how to efficiently use system resources to improve the overall performance of cloud computing has become a crucial issue. To address this issue, this paper proposes a method that uses an analytic hierarchy process group decision (AHPGD) to evaluate the load state of server nodes. Training was carried out by using a hybrid hierarchical genetic algorithm (HHGA) for optimizing a radial basis function neural network (RBFNN). The AHPGD makes the aggregative indicator of virtual machines in cloud, and become input parameters of predicted RBFNN. Also, this paper proposes a new dynamic load balancing scheduling algorithm combined with a weighted round-robin algorithm, which uses the predictive periodical load value of nodes based on AHPPGD and RBFNN optimized by HHGA, then calculates the corresponding weight values of nodes and makes constant updates. Meanwhile, it keeps the advantages and avoids the shortcomings of static weighted round-robin algorithm.
Ensemble-based prediction of RNA secondary structures.

PubMed

Aghaeepour, Nima; Hoos, Holger H

2013-04-24

Accurate structure prediction methods play an important role for the understanding of RNA function. Energy-based, pseudoknot-free secondary structure prediction is one of the most widely used and versatile approaches, and improved methods for this task have received much attention over the past five years. Despite the impressive progress that as been achieved in this area, existing evaluations of the prediction accuracy achieved by various algorithms do not provide a comprehensive, statistically sound assessment. Furthermore, while there is increasing evidence that no prediction algorithm consistently outperforms all others, no work has been done to exploit the complementary strengths of multiple approaches. In this work, we present two contributions to the area of RNA secondary structure prediction. Firstly, we use state-of-the-art, resampling-based statistical methods together with a previously published and increasingly widely used dataset of high-quality RNA structures to conduct a comprehensive evaluation of existing RNA secondary structure prediction procedures. The results from this evaluation clarify the performance relationship between ten well-known existing energy-based pseudoknot-free RNA secondary structure prediction methods and clearly demonstrate the progress that has been achieved in recent years. Secondly, we introduce AveRNA, a generic and powerful method for combining a set of existing secondary structure prediction procedures into an ensemble-based method that achieves significantly higher prediction accuracies than obtained from any of its component procedures. Our new, ensemble-based method, AveRNA, improves the state of the art for energy-based, pseudoknot-free RNA secondary structure prediction by exploiting the complementary strengths of multiple existing prediction procedures, as demonstrated using a state-of-the-art statistical resampling approach. In addition, AveRNA allows an intuitive and effective control of the trade-off between false negative and false positive base pair predictions. Finally, AveRNA can make use of arbitrary sets of secondary structure prediction procedures and can therefore be used to leverage improvements in prediction accuracy offered by algorithms and energy models developed in the future. Our data, MATLAB software and a web-based version of AveRNA are publicly available at http://www.cs.ubc.ca/labs/beta/Software/AveRNA.
Conformal Prediction Based on K-Nearest Neighbors for Discrimination of Ginsengs by a Home-Made Electronic Nose

PubMed Central

Sun, Xiyang; Miao, Jiacheng; Wang, You; Luo, Zhiyuan; Li, Guang

2017-01-01

An estimate on the reliability of prediction in the applications of electronic nose is essential, which has not been paid enough attention. An algorithm framework called conformal prediction is introduced in this work for discriminating different kinds of ginsengs with a home-made electronic nose instrument. Nonconformity measure based on k-nearest neighbors (KNN) is implemented separately as underlying algorithm of conformal prediction. In offline mode, the conformal predictor achieves a classification rate of 84.44% based on 1NN and 80.63% based on 3NN, which is better than that of simple KNN. In addition, it provides an estimate of reliability for each prediction. In online mode, the validity of predictions is guaranteed, which means that the error rate of region predictions never exceeds the significance level set by a user. The potential of this framework for detecting borderline examples and outliers in the application of E-nose is also investigated. The result shows that conformal prediction is a promising framework for the application of electronic nose to make predictions with reliability and validity. PMID:28805721
Conditional nonlinear optimal perturbations based on the particle swarm optimization and their applications to the predictability problems

NASA Astrophysics Data System (ADS)

Zheng, Qin; Yang, Zubin; Sha, Jianxin; Yan, Jun

2017-02-01

In predictability problem research, the conditional nonlinear optimal perturbation (CNOP) describes the initial perturbation that satisfies a certain constraint condition and causes the largest prediction error at the prediction time. The CNOP has been successfully applied in estimation of the lower bound of maximum predictable time (LBMPT). Generally, CNOPs are calculated by a gradient descent algorithm based on the adjoint model, which is called ADJ-CNOP. This study, through the two-dimensional Ikeda model, investigates the impacts of the nonlinearity on ADJ-CNOP and the corresponding precision problems when using ADJ-CNOP to estimate the LBMPT. Our conclusions are that (1) when the initial perturbation is large or the prediction time is long, the strong nonlinearity of the dynamical model in the prediction variable will lead to failure of the ADJ-CNOP method, and (2) when the objective function has multiple extreme values, ADJ-CNOP has a large probability of producing local CNOPs, hence making a false estimation of the LBMPT. Furthermore, the particle swarm optimization (PSO) algorithm, one kind of intelligent algorithm, is introduced to solve this problem. The method using PSO to compute CNOP is called PSO-CNOP. The results of numerical experiments show that even with a large initial perturbation and long prediction time, or when the objective function has multiple extreme values, PSO-CNOP can always obtain the global CNOP. Since the PSO algorithm is a heuristic search algorithm based on the population, it can overcome the impact of nonlinearity and the disturbance from multiple extremes of the objective function. In addition, to check the estimation accuracy of the LBMPT presented by PSO-CNOP and ADJ-CNOP, we partition the constraint domain of initial perturbations into sufficiently fine grid meshes and take the LBMPT obtained by the filtering method as a benchmark. The result shows that the estimation presented by PSO-CNOP is closer to the true value than the one by ADJ-CNOP with the forecast time increasing.
Predicting DNA hybridization kinetics from sequence

NASA Astrophysics Data System (ADS)

Zhang, Jinny X.; Fang, John Z.; Duan, Wei; Wu, Lucia R.; Zhang, Angela W.; Dalchau, Neil; Yordanov, Boyan; Petersen, Rasmus; Phillips, Andrew; Zhang, David Yu

2018-01-01

Hybridization is a key molecular process in biology and biotechnology, but so far there is no predictive model for accurately determining hybridization rate constants based on sequence information. Here, we report a weighted neighbour voting (WNV) prediction algorithm, in which the hybridization rate constant of an unknown sequence is predicted based on similarity reactions with known rate constants. To construct this algorithm we first performed 210 fluorescence kinetics experiments to observe the hybridization kinetics of 100 different DNA target and probe pairs (36 nt sub-sequences of the CYCS and VEGF genes) at temperatures ranging from 28 to 55 °C. Automated feature selection and weighting optimization resulted in a final six-feature WNV model, which can predict hybridization rate constants of new sequences to within a factor of 3 with ∼91% accuracy, based on leave-one-out cross-validation. Accurate prediction of hybridization kinetics allows the design of efficient probe sequences for genomics research.

A Systematic Investigation of Computation Models for Predicting Adverse Drug Reactions (ADRs)

PubMed Central

Kuang, Qifan; Wang, MinQi; Li, Rong; Dong, YongCheng; Li, Yizhou; Li, Menglong

2014-01-01

Background Early and accurate identification of adverse drug reactions (ADRs) is critically important for drug development and clinical safety. Computer-aided prediction of ADRs has attracted increasing attention in recent years, and many computational models have been proposed. However, because of the lack of systematic analysis and comparison of the different computational models, there remain limitations in designing more effective algorithms and selecting more useful features. There is therefore an urgent need to review and analyze previous computation models to obtain general conclusions that can provide useful guidance to construct more effective computational models to predict ADRs. Principal Findings In the current study, the main work is to compare and analyze the performance of existing computational methods to predict ADRs, by implementing and evaluating additional algorithms that have been earlier used for predicting drug targets. Our results indicated that topological and intrinsic features were complementary to an extent and the Jaccard coefficient had an important and general effect on the prediction of drug-ADR associations. By comparing the structure of each algorithm, final formulas of these algorithms were all converted to linear model in form, based on this finding we propose a new algorithm called the general weighted profile method and it yielded the best overall performance among the algorithms investigated in this paper. Conclusion Several meaningful conclusions and useful findings regarding the prediction of ADRs are provided for selecting optimal features and algorithms. PMID:25180585
SANDPUMA: ensemble predictions of nonribosomal peptide chemistry reveal biosynthetic diversity across Actinobacteria.

PubMed

Chevrette, Marc G; Aicheler, Fabian; Kohlbacher, Oliver; Currie, Cameron R; Medema, Marnix H

2017-10-15

Nonribosomally synthesized peptides (NRPs) are natural products with widespread applications in medicine and biotechnology. Many algorithms have been developed to predict the substrate specificities of nonribosomal peptide synthetase adenylation (A) domains from DNA sequences, which enables prioritization and dereplication, and integration with other data types in discovery efforts. However, insufficient training data and a lack of clarity regarding prediction quality have impeded optimal use. Here, we introduce prediCAT, a new phylogenetics-inspired algorithm, which quantitatively estimates the degree of predictability of each A-domain. We then systematically benchmarked all algorithms on a newly gathered, independent test set of 434 A-domain sequences, showing that active-site-motif-based algorithms outperform whole-domain-based methods. Subsequently, we developed SANDPUMA, a powerful ensemble algorithm, based on newly trained versions of all high-performing algorithms, which significantly outperforms individual methods. Finally, we deployed SANDPUMA in a systematic investigation of 7635 Actinobacteria genomes, suggesting that NRP chemical diversity is much higher than previously estimated. SANDPUMA has been integrated into the widely used antiSMASH biosynthetic gene cluster analysis pipeline and is also available as an open-source, standalone tool. SANDPUMA is freely available at https://bitbucket.org/chevrm/sandpuma and as a docker image at https://hub.docker.com/r/chevrm/sandpuma/ under the GNU Public License 3 (GPL3). chevrette@wisc.edu or marnix.medema@wur.nl. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Evaluation of genotype-guided acenocoumarol dosing algorithms in Russian patients.

PubMed

Sychev, Dmitriy Alexeyevich; Rozhkov, Aleksandr Vladimirovich; Ananichuk, Anna Viktorovna; Kazakov, Ruslan Evgenyevich

2017-05-24

Acenocoumarol dose is normally determined via step-by-step adjustment process based on International Normalized Ratio (INR) measurements. During this time, the risk of adverse reactions is especially high. Several genotype-based acenocoumarol dosing algorithms have been created to predict ideal doses at the start of anticoagulant therapy. Nine dosing algorithms were selected through a literature search. These were evaluated using a cohort of 63 patients with atrial fibrillation receiving acenocoumarol therapy. None of the existing algorithms could predict the ideal acenocoumarol dose in 50% of Russian patients. The Wolkanin-Bartnik algorithtm based on European population was the best-performing one with the highest correlation values (r=0.397), mean absolute error (MAE) 0.82 (±0.61). EU-PACT also managed to give an estimate within the ideal range in 43% of the cases. The two least accurate results were yielded by the Indian population-based algorithms. Among patients receiving amiodarone, algorithms by Schie and Tong proved to be the most effective with the MAE of 0.48±0.42 mg/day and 0.56±0.31 mg/day, respectively. Patient ethnicity and amiodarone intake are factors that must be considered when building future algorithms. Further research is required to find the perfect dosing formula of acenocoumarol maintenance doses in Russian patients.
MultiMiTar: a novel multi objective optimization based miRNA-target prediction method.

PubMed

Mitra, Ramkrishna; Bandyopadhyay, Sanghamitra

2011-01-01

Machine learning based miRNA-target prediction algorithms often fail to obtain a balanced prediction accuracy in terms of both sensitivity and specificity due to lack of the gold standard of negative examples, miRNA-targeting site context specific relevant features and efficient feature selection process. Moreover, all the sequence, structure and machine learning based algorithms are unable to distribute the true positive predictions preferentially at the top of the ranked list; hence the algorithms become unreliable to the biologists. In addition, these algorithms fail to obtain considerable combination of precision and recall for the target transcripts that are translationally repressed at protein level. In the proposed article, we introduce an efficient miRNA-target prediction system MultiMiTar, a Support Vector Machine (SVM) based classifier integrated with a multiobjective metaheuristic based feature selection technique. The robust performance of the proposed method is mainly the result of using high quality negative examples and selection of biologically relevant miRNA-targeting site context specific features. The features are selected by using a novel feature selection technique AMOSA-SVM, that integrates the multi objective optimization technique Archived Multi-Objective Simulated Annealing (AMOSA) and SVM. MultiMiTar is found to achieve much higher Matthew's correlation coefficient (MCC) of 0.583 and average class-wise accuracy (ACA) of 0.8 compared to the others target prediction methods for a completely independent test data set. The obtained MCC and ACA values of these algorithms range from -0.269 to 0.155 and 0.321 to 0.582, respectively. Moreover, it shows a more balanced result in terms of precision and sensitivity (recall) for the translationally repressed data set as compared to all the other existing methods. An important aspect is that the true positive predictions are distributed preferentially at the top of the ranked list that makes MultiMiTar reliable for the biologists. MultiMiTar is now available as an online tool at www.isical.ac.in/~bioinfo_miu/multimitar.htm. MultiMiTar software can be downloaded from www.isical.ac.in/~bioinfo_miu/multimitar-download.htm.
Family-Based Benchmarking of Copy Number Variation Detection Software.

PubMed

Nutsua, Marcel Elie; Fischer, Annegret; Nebel, Almut; Hofmann, Sylvia; Schreiber, Stefan; Krawczak, Michael; Nothnagel, Michael

2015-01-01

The analysis of structural variants, in particular of copy-number variations (CNVs), has proven valuable in unraveling the genetic basis of human diseases. Hence, a large number of algorithms have been developed for the detection of CNVs in SNP array signal intensity data. Using the European and African HapMap trio data, we undertook a comparative evaluation of six commonly used CNV detection software tools, namely Affymetrix Power Tools (APT), QuantiSNP, PennCNV, GLAD, R-gada and VEGA, and assessed their level of pair-wise prediction concordance. The tool-specific CNV prediction accuracy was assessed in silico by way of intra-familial validation. Software tools differed greatly in terms of the number and length of the CNVs predicted as well as the number of markers included in a CNV. All software tools predicted substantially more deletions than duplications. Intra-familial validation revealed consistently low levels of prediction accuracy as measured by the proportion of validated CNVs (34-60%). Moreover, up to 20% of apparent family-based validations were found to be due to chance alone. Software using Hidden Markov models (HMM) showed a trend to predict fewer CNVs than segmentation-based algorithms albeit with greater validity. PennCNV yielded the highest prediction accuracy (60.9%). Finally, the pairwise concordance of CNV prediction was found to vary widely with the software tools involved. We recommend HMM-based software, in particular PennCNV, rather than segmentation-based algorithms when validity is the primary concern of CNV detection. QuantiSNP may be used as an additional tool to detect sets of CNVs not detectable by the other tools. Our study also reemphasizes the need for laboratory-based validation, such as qPCR, of CNVs predicted in silico.
Research on cross - Project software defect prediction based on transfer learning

NASA Astrophysics Data System (ADS)

Chen, Ya; Ding, Xiaoming

2018-04-01

According to the two challenges in the prediction of cross-project software defects, the distribution differences between the source project and the target project dataset and the class imbalance in the dataset, proposing a cross-project software defect prediction method based on transfer learning, named NTrA. Firstly, solving the source project data's class imbalance based on the Augmented Neighborhood Cleaning Algorithm. Secondly, the data gravity method is used to give different weights on the basis of the attribute similarity of source project and target project data. Finally, a defect prediction model is constructed by using Trad boost algorithm. Experiments were conducted using data, come from NASA and SOFTLAB respectively, from a published PROMISE dataset. The results show that the method has achieved good values of recall and F-measure, and achieved good prediction results.
Comparing Binaural Pre-processing Strategies I: Instrumental Evaluation.

PubMed

Baumgärtel, Regina M; Krawczyk-Becker, Martin; Marquardt, Daniel; Völker, Christoph; Hu, Hongmei; Herzke, Tobias; Coleman, Graham; Adiloğlu, Kamil; Ernst, Stephan M A; Gerkmann, Timo; Doclo, Simon; Kollmeier, Birger; Hohmann, Volker; Dietz, Mathias

2015-12-30

In a collaborative research project, several monaural and binaural noise reduction algorithms have been comprehensively evaluated. In this article, eight selected noise reduction algorithms were assessed using instrumental measures, with a focus on the instrumental evaluation of speech intelligibility. Four distinct, reverberant scenarios were created to reflect everyday listening situations: a stationary speech-shaped noise, a multitalker babble noise, a single interfering talker, and a realistic cafeteria noise. Three instrumental measures were employed to assess predicted speech intelligibility and predicted sound quality: the intelligibility-weighted signal-to-noise ratio, the short-time objective intelligibility measure, and the perceptual evaluation of speech quality. The results show substantial improvements in predicted speech intelligibility as well as sound quality for the proposed algorithms. The evaluated coherence-based noise reduction algorithm was able to provide improvements in predicted audio signal quality. For the tested single-channel noise reduction algorithm, improvements in intelligibility-weighted signal-to-noise ratio were observed in all but the nonstationary cafeteria ambient noise scenario. Binaural minimum variance distortionless response beamforming algorithms performed particularly well in all noise scenarios. © The Author(s) 2015.
Comparing Binaural Pre-processing Strategies I

PubMed Central

Krawczyk-Becker, Martin; Marquardt, Daniel; Völker, Christoph; Hu, Hongmei; Herzke, Tobias; Coleman, Graham; Adiloğlu, Kamil; Ernst, Stephan M. A.; Gerkmann, Timo; Doclo, Simon; Kollmeier, Birger; Hohmann, Volker; Dietz, Mathias

2015-01-01

In a collaborative research project, several monaural and binaural noise reduction algorithms have been comprehensively evaluated. In this article, eight selected noise reduction algorithms were assessed using instrumental measures, with a focus on the instrumental evaluation of speech intelligibility. Four distinct, reverberant scenarios were created to reflect everyday listening situations: a stationary speech-shaped noise, a multitalker babble noise, a single interfering talker, and a realistic cafeteria noise. Three instrumental measures were employed to assess predicted speech intelligibility and predicted sound quality: the intelligibility-weighted signal-to-noise ratio, the short-time objective intelligibility measure, and the perceptual evaluation of speech quality. The results show substantial improvements in predicted speech intelligibility as well as sound quality for the proposed algorithms. The evaluated coherence-based noise reduction algorithm was able to provide improvements in predicted audio signal quality. For the tested single-channel noise reduction algorithm, improvements in intelligibility-weighted signal-to-noise ratio were observed in all but the nonstationary cafeteria ambient noise scenario. Binaural minimum variance distortionless response beamforming algorithms performed particularly well in all noise scenarios. PMID:26721920
Information theory-based algorithm for in silico prediction of PCR products with whole genomic sequences as templates.

PubMed

Cao, Youfang; Wang, Lianjie; Xu, Kexue; Kou, Chunhai; Zhang, Yulei; Wei, Guifang; He, Junjian; Wang, Yunfang; Zhao, Liping

2005-07-26

A new algorithm for assessing similarity between primer and template has been developed based on the hypothesis that annealing of primer to template is an information transfer process. Primer sequence is converted to a vector of the full potential hydrogen numbers (3 for G or C, 2 for A or T), while template sequence is converted to a vector of the actual hydrogen bond numbers formed after primer annealing. The former is considered as source information and the latter destination information. An information coefficient is calculated as a measure for fidelity of this information transfer process and thus a measure of similarity between primer and potential annealing site on template. Successful prediction of PCR products from whole genomic sequences with a computer program based on the algorithm demonstrated the potential of this new algorithm in areas like in silico PCR and gene finding.
A unified algorithm for predicting partition coefficients for PBPK modeling of drugs and environmental chemicals

DOE Office of Scientific and Technical Information (OSTI.GOV)

Peyret, Thomas; Poulin, Patrick; Krishnan, Kannan, E-mail: kannan.krishnan@umontreal.ca

The algorithms in the literature focusing to predict tissue:blood PC (P{sub tb}) for environmental chemicals and tissue:plasma PC based on total (K{sub p}) or unbound concentration (K{sub pu}) for drugs differ in their consideration of binding to hemoglobin, plasma proteins and charged phospholipids. The objective of the present study was to develop a unified algorithm such that P{sub tb}, K{sub p} and K{sub pu} for both drugs and environmental chemicals could be predicted. The development of the unified algorithm was accomplished by integrating all mechanistic algorithms previously published to compute the PCs. Furthermore, the algorithm was structured in such amore » way as to facilitate predictions of the distribution of organic compounds at the macro (i.e. whole tissue) and micro (i.e. cells and fluids) levels. The resulting unified algorithm was applied to compute the rat P{sub tb}, K{sub p} or K{sub pu} of muscle (n = 174), liver (n = 139) and adipose tissue (n = 141) for acidic, neutral, zwitterionic and basic drugs as well as ketones, acetate esters, alcohols, aliphatic hydrocarbons, aromatic hydrocarbons and ethers. The unified algorithm reproduced adequately the values predicted previously by the published algorithms for a total of 142 drugs and chemicals. The sensitivity analysis demonstrated the relative importance of the various compound properties reflective of specific mechanistic determinants relevant to prediction of PC values of drugs and environmental chemicals. Overall, the present unified algorithm uniquely facilitates the computation of macro and micro level PCs for developing organ and cellular-level PBPK models for both chemicals and drugs.« less
Development and Validation of an Algorithm to Identify Planned Readmissions From Claims Data.

PubMed

Horwitz, Leora I; Grady, Jacqueline N; Cohen, Dorothy B; Lin, Zhenqiu; Volpe, Mark; Ngo, Chi K; Masica, Andrew L; Long, Theodore; Wang, Jessica; Keenan, Megan; Montague, Julia; Suter, Lisa G; Ross, Joseph S; Drye, Elizabeth E; Krumholz, Harlan M; Bernheim, Susannah M

2015-10-01

It is desirable not to include planned readmissions in readmission measures because they represent deliberate, scheduled care. To develop an algorithm to identify planned readmissions, describe its performance characteristics, and identify improvements. Consensus-driven algorithm development and chart review validation study at 7 acute-care hospitals in 2 health systems. For development, all discharges qualifying for the publicly reported hospital-wide readmission measure. For validation, all qualifying same-hospital readmissions that were characterized by the algorithm as planned, and a random sampling of same-hospital readmissions that were characterized as unplanned. We calculated weighted sensitivity and specificity, and positive and negative predictive values of the algorithm (version 2.1), compared to gold standard chart review. In consultation with 27 experts, we developed an algorithm that characterizes 7.8% of readmissions as planned. For validation we reviewed 634 readmissions. The weighted sensitivity of the algorithm was 45.1% overall, 50.9% in large teaching centers and 40.2% in smaller community hospitals. The weighted specificity was 95.9%, positive predictive value was 51.6%, and negative predictive value was 94.7%. We identified 4 minor changes to improve algorithm performance. The revised algorithm had a weighted sensitivity 49.8% (57.1% at large hospitals), weighted specificity 96.5%, positive predictive value 58.7%, and negative predictive value 94.5%. Positive predictive value was poor for the 2 most common potentially planned procedures: diagnostic cardiac catheterization (25%) and procedures involving cardiac devices (33%). An administrative claims-based algorithm to identify planned readmissions is feasible and can facilitate public reporting of primarily unplanned readmissions. © 2015 Society of Hospital Medicine.
An adaptive transmission protocol for managing dynamic shared states in collaborative surgical simulation.

PubMed

Qin, J; Choi, K S; Ho, Simon S M; Heng, P A

2008-01-01

A force prediction algorithm is proposed to facilitate virtual-reality (VR) based collaborative surgical simulation by reducing the effect of network latencies. State regeneration is used to correct the estimated prediction. This algorithm is incorporated into an adaptive transmission protocol in which auxiliary features such as view synchronization and coupling control are equipped to ensure the system consistency. We implemented this protocol using multi-threaded technique on a cluster-based network architecture.
Detecting REM sleep from the finger: an automatic REM sleep algorithm based on peripheral arterial tone (PAT) and actigraphy.

PubMed

Herscovici, Sarah; Pe'er, Avivit; Papyan, Surik; Lavie, Peretz

2007-02-01

Scoring of REM sleep based on polysomnographic recordings is a laborious and time-consuming process. The growing number of ambulatory devices designed for cost-effective home-based diagnostic sleep recordings necessitates the development of a reliable automatic REM sleep detection algorithm that is not based on the traditional electroencephalographic, electrooccolographic and electromyographic recordings trio. This paper presents an automatic REM detection algorithm based on the peripheral arterial tone (PAT) signal and actigraphy which are recorded with an ambulatory wrist-worn device (Watch-PAT100). The PAT signal is a measure of the pulsatile volume changes at the finger tip reflecting sympathetic tone variations. The algorithm was developed using a training set of 30 patients recorded simultaneously with polysomnography and Watch-PAT100. Sleep records were divided into 5 min intervals and two time series were constructed from the PAT amplitudes and PAT-derived inter-pulse periods in each interval. A prediction function based on 16 features extracted from the above time series that determines the likelihood of detecting a REM epoch was developed. The coefficients of the prediction function were determined using a genetic algorithm (GA) optimizing process tuned to maximize a price function depending on the sensitivity, specificity and agreement of the algorithm in comparison with the gold standard of polysomnographic manual scoring. Based on a separate validation set of 30 patients overall sensitivity, specificity and agreement of the automatic algorithm to identify standard 30 s epochs of REM sleep were 78%, 92%, 89%, respectively. Deploying this REM detection algorithm in a wrist worn device could be very useful for unattended ambulatory sleep monitoring. The innovative method of optimization using a genetic algorithm has been proven to yield robust results in the validation set.
PDC-SGB: Prediction of effective drug combinations using a stochastic gradient boosting algorithm.

PubMed

Xu, Qian; Xiong, Yi; Dai, Hao; Kumari, Kotni Meena; Xu, Qin; Ou, Hong-Yu; Wei, Dong-Qing

2017-03-21

Combinatorial therapy is a promising strategy for combating complex diseases by improving the efficacy and reducing the side effects. To facilitate the identification of drug combinations in pharmacology, we proposed a new computational model, termed PDC-SGB, to predict effective drug combinations by integrating biological, chemical and pharmacological information based on a stochastic gradient boosting algorithm. To begin with, a set of 352 golden positive samples were collected from the public drug combination database. Then, a set of 732 dimensional feature vector involving biological, chemical and pharmaceutical information was constructed for each drug combination to describe its properties. To avoid overfitting, the maximum relevance & minimum redundancy (mRMR) method was performed to extract useful ones by removing redundant subsets. Based on the selected features, the three different type of classification algorithms were employed to build the drug combination prediction models. Our results demonstrated that the model based on the stochastic gradient boosting algorithm yield out the best performance. Furthermore, it is indicated that the feature patterns of therapy had powerful ability to discriminate effective drug combinations from non-effective ones. By analyzing various features, it is shown that the enriched features occurred frequently in golden positive samples can help predict novel drug combinations. Copyright © 2017 Elsevier Ltd. All rights reserved.
A Localization Method for Underwater Wireless Sensor Networks Based on Mobility Prediction and Particle Swarm Optimization Algorithms

PubMed Central

Zhang, Ying; Liang, Jixing; Jiang, Shengming; Chen, Wei

2016-01-01

Due to their special environment, Underwater Wireless Sensor Networks (UWSNs) are usually deployed over a large sea area and the nodes are usually floating. This results in a lower beacon node distribution density, a longer time for localization, and more energy consumption. Currently most of the localization algorithms in this field do not pay enough consideration on the mobility of the nodes. In this paper, by analyzing the mobility patterns of water near the seashore, a localization method for UWSNs based on a Mobility Prediction and a Particle Swarm Optimization algorithm (MP-PSO) is proposed. In this method, the range-based PSO algorithm is used to locate the beacon nodes, and their velocities can be calculated. The velocity of an unknown node is calculated by using the spatial correlation of underwater object’s mobility, and then their locations can be predicted. The range-based PSO algorithm may cause considerable energy consumption and its computation complexity is a little bit high, nevertheless the number of beacon nodes is relatively smaller, so the calculation for the large number of unknown nodes is succinct, and this method can obviously decrease the energy consumption and time cost of localizing these mobile nodes. The simulation results indicate that this method has higher localization accuracy and better localization coverage rate compared with some other widely used localization methods in this field. PMID:26861348
RNA secondary structure prediction with pseudoknots: Contribution of algorithm versus energy model.

PubMed

Jabbari, Hosna; Wark, Ian; Montemagno, Carlo

2018-01-01

RNA is a biopolymer with various applications inside the cell and in biotechnology. Structure of an RNA molecule mainly determines its function and is essential to guide nanostructure design. Since experimental structure determination is time-consuming and expensive, accurate computational prediction of RNA structure is of great importance. Prediction of RNA secondary structure is relatively simpler than its tertiary structure and provides information about its tertiary structure, therefore, RNA secondary structure prediction has received attention in the past decades. Numerous methods with different folding approaches have been developed for RNA secondary structure prediction. While methods for prediction of RNA pseudoknot-free structure (structures with no crossing base pairs) have greatly improved in terms of their accuracy, methods for prediction of RNA pseudoknotted secondary structure (structures with crossing base pairs) still have room for improvement. A long-standing question for improving the prediction accuracy of RNA pseudoknotted secondary structure is whether to focus on the prediction algorithm or the underlying energy model, as there is a trade-off on computational cost of the prediction algorithm versus the generality of the method. The aim of this work is to argue when comparing different methods for RNA pseudoknotted structure prediction, the combination of algorithm and energy model should be considered and a method should not be considered superior or inferior to others if they do not use the same scoring model. We demonstrate that while the folding approach is important in structure prediction, it is not the only important factor in prediction accuracy of a given method as the underlying energy model is also as of great value. Therefore we encourage researchers to pay particular attention in comparing methods with different energy models.
Disk storage management for LHCb based on Data Popularity estimator

NASA Astrophysics Data System (ADS)

Hushchyn, Mikhail; Charpentier, Philippe; Ustyuzhanin, Andrey

2015-12-01

This paper presents an algorithm providing recommendations for optimizing the LHCb data storage. The LHCb data storage system is a hybrid system. All datasets are kept as archives on magnetic tapes. The most popular datasets are kept on disks. The algorithm takes the dataset usage history and metadata (size, type, configuration etc.) to generate a recommendation report. This article presents how we use machine learning algorithms to predict future data popularity. Using these predictions it is possible to estimate which datasets should be removed from disk. We use regression algorithms and time series analysis to find the optimal number of replicas for datasets that are kept on disk. Based on the data popularity and the number of replicas optimization, the algorithm minimizes a loss function to find the optimal data distribution. The loss function represents all requirements for data distribution in the data storage system. We demonstrate how our algorithm helps to save disk space and to reduce waiting times for jobs using this data.
Gross domestic product estimation based on electricity utilization by artificial neural network

NASA Astrophysics Data System (ADS)

Stevanović, Mirjana; Vujičić, Slađana; Gajić, Aleksandar M.

2018-01-01

The main goal of the paper was to estimate gross domestic product (GDP) based on electricity estimation by artificial neural network (ANN). The electricity utilization was analyzed based on different sources like renewable, coal and nuclear sources. The ANN network was trained with two training algorithms namely extreme learning method and back-propagation algorithm in order to produce the best prediction results of the GDP. According to the results it can be concluded that the ANN model with extreme learning method could produce the acceptable prediction of the GDP based on the electricity utilization.
Machine Learning for Flood Prediction in Google Earth Engine

NASA Astrophysics Data System (ADS)

Kuhn, C.; Tellman, B.; Max, S. A.; Schwarz, B.

2015-12-01

With the increasing availability of high-resolution satellite imagery, dynamic flood mapping in near real time is becoming a reachable goal for decision-makers. This talk describes a newly developed framework for predicting biophysical flood vulnerability using public data, cloud computing and machine learning. Our objective is to define an approach to flood inundation modeling using statistical learning methods deployed in a cloud-based computing platform. Traditionally, static flood extent maps grounded in physically based hydrologic models can require hours of human expertise to construct at significant financial cost. In addition, desktop modeling software and limited local server storage can impose restraints on the size and resolution of input datasets. Data-driven, cloud-based processing holds promise for predictive watershed modeling at a wide range of spatio-temporal scales. However, these benefits come with constraints. In particular, parallel computing limits a modeler's ability to simulate the flow of water across a landscape, rendering traditional routing algorithms unusable in this platform. Our project pushes these limits by testing the performance of two machine learning algorithms, Support Vector Machine (SVM) and Random Forests, at predicting flood extent. Constructed in Google Earth Engine, the model mines a suite of publicly available satellite imagery layers to use as algorithm inputs. Results are cross-validated using MODIS-based flood maps created using the Dartmouth Flood Observatory detection algorithm. Model uncertainty highlights the difficulty of deploying unbalanced training data sets based on rare extreme events.
Ab-initio conformational epitope structure prediction using genetic algorithm and SVM for vaccine design.

PubMed

Moghram, Basem Ameen; Nabil, Emad; Badr, Amr

2018-01-01

T-cell epitope structure identification is a significant challenging immunoinformatic problem within epitope-based vaccine design. Epitopes or antigenic peptides are a set of amino acids that bind with the Major Histocompatibility Complex (MHC) molecules. The aim of this process is presented by Antigen Presenting Cells to be inspected by T-cells. MHC-molecule-binding epitopes are responsible for triggering the immune response to antigens. The epitope's three-dimensional (3D) molecular structure (i.e., tertiary structure) reflects its proper function. Therefore, the identification of MHC class-II epitopes structure is a significant step towards epitope-based vaccine design and understanding of the immune system. In this paper, we propose a new technique using a Genetic Algorithm for Predicting the Epitope Structure (GAPES), to predict the structure of MHC class-II epitopes based on their sequence. The proposed Elitist-based genetic algorithm for predicting the epitope's tertiary structure is based on Ab-Initio Empirical Conformational Energy Program for Peptides (ECEPP) Force Field Model. The developed secondary structure prediction technique relies on Ramachandran Plot. We used two alignment algorithms: the ROSS alignment and TM-Score alignment. We applied four different alignment approaches to calculate the similarity scores of the dataset under test. We utilized the support vector machine (SVM) classifier as an evaluation of the prediction performance. The prediction accuracy and the Area Under Receiver Operating Characteristic (ROC) Curve (AUC) were calculated as measures of performance. The calculations are performed on twelve similarity-reduced datasets of the Immune Epitope Data Base (IEDB) and a large dataset of peptide-binding affinities to HLA-DRB1*0101. The results showed that GAPES was reliable and very accurate. We achieved an average prediction accuracy of 93.50% and an average AUC of 0.974 in the IEDB dataset. Also, we achieved an accuracy of 95.125% and an AUC of 0.987 on the HLA-DRB1*0101 allele of the Wang benchmark dataset. The results indicate that the proposed prediction technique "GAPES" is a promising technique that will help researchers and scientists to predict the protein structure and it will assist them in the intelligent design of new epitope-based vaccines. Copyright © 2017 Elsevier B.V. All rights reserved.

Ads' click-through rates predicting based on gated recurrent unit neural networks

NASA Astrophysics Data System (ADS)

Chen, Qiaohong; Guo, Zixuan; Dong, Wen; Jin, Lingzi

2018-05-01

In order to improve the effect of online advertising and to increase the revenue of advertising, the gated recurrent unit neural networks(GRU) model is used as the ads' click through rates(CTR) predicting. Combined with the characteristics of gated unit structure and the unique of time sequence in data, using BPTT algorithm to train the model. Furthermore, by optimizing the step length algorithm of the gated unit recurrent neural networks, making the model reach optimal point better and faster in less iterative rounds. The experiment results show that the model based on the gated recurrent unit neural networks and its optimization of step length algorithm has the better effect on the ads' CTR predicting, which helps advertisers, media and audience achieve a win-win and mutually beneficial situation in Three-Side Game.
Investigation of energy management strategies for photovoltaic systems - A predictive control algorithm

NASA Technical Reports Server (NTRS)

Cull, R. C.; Eltimsahy, A. H.

1983-01-01

The present investigation is concerned with the formulation of energy management strategies for stand-alone photovoltaic (PV) systems, taking into account a basic control algorithm for a possible predictive, (and adaptive) controller. The control system controls the flow of energy in the system according to the amount of energy available, and predicts the appropriate control set-points based on the energy (insolation) available by using an appropriate system model. Aspects of adaptation to the conditions of the system are also considered. Attention is given to a statistical analysis technique, the analysis inputs, the analysis procedure, and details regarding the basic control algorithm.
Robust prediction of consensus secondary structures using averaged base pairing probability matrices.

PubMed

Kiryu, Hisanori; Kin, Taishin; Asai, Kiyoshi

2007-02-15

Recent transcriptomic studies have revealed the existence of a considerable number of non-protein-coding RNA transcripts in higher eukaryotic cells. To investigate the functional roles of these transcripts, it is of great interest to find conserved secondary structures from multiple alignments on a genomic scale. Since multiple alignments are often created using alignment programs that neglect the special conservation patterns of RNA secondary structures for computational efficiency, alignment failures can cause potential risks of overlooking conserved stem structures. We investigated the dependence of the accuracy of secondary structure prediction on the quality of alignments. We compared three algorithms that maximize the expected accuracy of secondary structures as well as other frequently used algorithms. We found that one of our algorithms, called McCaskill-MEA, was more robust against alignment failures than others. The McCaskill-MEA method first computes the base pairing probability matrices for all the sequences in the alignment and then obtains the base pairing probability matrix of the alignment by averaging over these matrices. The consensus secondary structure is predicted from this matrix such that the expected accuracy of the prediction is maximized. We show that the McCaskill-MEA method performs better than other methods, particularly when the alignment quality is low and when the alignment consists of many sequences. Our model has a parameter that controls the sensitivity and specificity of predictions. We discussed the uses of that parameter for multi-step screening procedures to search for conserved secondary structures and for assigning confidence values to the predicted base pairs. The C++ source code that implements the McCaskill-MEA algorithm and the test dataset used in this paper are available at http://www.ncrna.org/papers/McCaskillMEA/. Supplementary data are available at Bioinformatics online.
Development and Implementation of a Hardware In-the-Loop Test Bed for Unmanned Aerial Vehicle Control Algorithms

NASA Technical Reports Server (NTRS)

Nyangweso, Emmanuel; Bole, Brian

2014-01-01

Successful prediction and management of battery life using prognostic algorithms through ground and flight tests is important for performance evaluation of electrical systems. This paper details the design of test beds suitable for replicating loading profiles that would be encountered in deployed electrical systems. The test bed data will be used to develop and validate prognostic algorithms for predicting battery discharge time and battery failure time. Online battery prognostic algorithms will enable health management strategies. The platform used for algorithm demonstration is the EDGE 540T electric unmanned aerial vehicle (UAV). The fully designed test beds developed and detailed in this paper can be used to conduct battery life tests by controlling current and recording voltage and temperature to develop a model that makes a prediction of end-of-charge and end-of-life of the system based on rapid state of health (SOH) assessment.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Dall-Anese, Emiliano; Simonetto, Andrea

This paper focuses on the design of online algorithms based on prediction-correction steps to track the optimal solution of a time-varying constrained problem. Existing prediction-correction methods have been shown to work well for unconstrained convex problems and for settings where obtaining the inverse of the Hessian of the cost function can be computationally affordable. The prediction-correction algorithm proposed in this paper addresses the limitations of existing methods by tackling constrained problems and by designing a first-order prediction step that relies on the Hessian of the cost function (and do not require the computation of its inverse). Analytical results are establishedmore » to quantify the tracking error. Numerical simulations corroborate the analytical results and showcase performance and benefits of the algorithms.« less
Algorithm aversion: people erroneously avoid algorithms after seeing them err.

PubMed

Dietvorst, Berkeley J; Simmons, Joseph P; Massey, Cade

2015-02-01

Research shows that evidence-based algorithms more accurately predict the future than do human forecasters. Yet when forecasters are deciding whether to use a human forecaster or a statistical algorithm, they often choose the human forecaster. This phenomenon, which we call algorithm aversion, is costly, and it is important to understand its causes. We show that people are especially averse to algorithmic forecasters after seeing them perform, even when they see them outperform a human forecaster. This is because people more quickly lose confidence in algorithmic than human forecasters after seeing them make the same mistake. In 5 studies, participants either saw an algorithm make forecasts, a human make forecasts, both, or neither. They then decided whether to tie their incentives to the future predictions of the algorithm or the human. Participants who saw the algorithm perform were less confident in it, and less likely to choose it over an inferior human forecaster. This was true even among those who saw the algorithm outperform the human.
Enhanced clinical pharmacy service targeting tools: risk-predictive algorithms.

PubMed

El Hajji, Feras W D; Scullin, Claire; Scott, Michael G; McElnay, James C

2015-04-01

This study aimed to determine the value of using a mix of clinical pharmacy data and routine hospital admission spell data in the development of predictive algorithms. Exploration of risk factors in hospitalized patients, together with the targeting strategies devised, will enable the prioritization of clinical pharmacy services to optimize patient outcomes. Predictive algorithms were developed using a number of detailed steps using a 75% sample of integrated medicines management (IMM) patients, and validated using the remaining 25%. IMM patients receive targeted clinical pharmacy input throughout their hospital stay. The algorithms were applied to the validation sample, and predicted risk probability was generated for each patient from the coefficients. Risk threshold for the algorithms were determined by identifying the cut-off points of risk scores at which the algorithm would have the highest discriminative performance. Clinical pharmacy staffing levels were obtained from the pharmacy department staffing database. Numbers of previous emergency admissions and admission medicines together with age-adjusted co-morbidity and diuretic receipt formed a 12-month post-discharge and/or readmission risk algorithm. Age-adjusted co-morbidity proved to be the best index to predict mortality. Increased numbers of clinical pharmacy staff at ward level was correlated with a reduction in risk-adjusted mortality index (RAMI). Algorithms created were valid in predicting risk of in-hospital and post-discharge mortality and risk of hospital readmission 3, 6 and 12 months post-discharge. The provision of ward-based clinical pharmacy services is a key component to reducing RAMI and enabling the full benefits of pharmacy input to patient care to be realized. © 2014 John Wiley & Sons, Ltd.
Base pair probability estimates improve the prediction accuracy of RNA non-canonical base pairs

PubMed Central

2017-01-01

Prediction of RNA tertiary structure from sequence is an important problem, but generating accurate structure models for even short sequences remains difficult. Predictions of RNA tertiary structure tend to be least accurate in loop regions, where non-canonical pairs are important for determining the details of structure. Non-canonical pairs can be predicted using a knowledge-based model of structure that scores nucleotide cyclic motifs, or NCMs. In this work, a partition function algorithm is introduced that allows the estimation of base pairing probabilities for both canonical and non-canonical interactions. Pairs that are predicted to be probable are more likely to be found in the true structure than pairs of lower probability. Pair probability estimates can be further improved by predicting the structure conserved across multiple homologous sequences using the TurboFold algorithm. These pairing probabilities, used in concert with prior knowledge of the canonical secondary structure, allow accurate inference of non-canonical pairs, an important step towards accurate prediction of the full tertiary structure. Software to predict non-canonical base pairs and pairing probabilities is now provided as part of the RNAstructure software package. PMID:29107980
TargetSpy: a supervised machine learning approach for microRNA target prediction.

PubMed

Sturm, Martin; Hackenberg, Michael; Langenberger, David; Frishman, Dmitrij

2010-05-28

Virtually all currently available microRNA target site prediction algorithms require the presence of a (conserved) seed match to the 5' end of the microRNA. Recently however, it has been shown that this requirement might be too stringent, leading to a substantial number of missed target sites. We developed TargetSpy, a novel computational approach for predicting target sites regardless of the presence of a seed match. It is based on machine learning and automatic feature selection using a wide spectrum of compositional, structural, and base pairing features covering current biological knowledge. Our model does not rely on evolutionary conservation, which allows the detection of species-specific interactions and makes TargetSpy suitable for analyzing unconserved genomic sequences.In order to allow for an unbiased comparison of TargetSpy to other methods, we classified all algorithms into three groups: I) no seed match requirement, II) seed match requirement, and III) conserved seed match requirement. TargetSpy predictions for classes II and III are generated by appropriate postfiltering. On a human dataset revealing fold-change in protein production for five selected microRNAs our method shows superior performance in all classes. In Drosophila melanogaster not only our class II and III predictions are on par with other algorithms, but notably the class I (no-seed) predictions are just marginally less accurate. We estimate that TargetSpy predicts between 26 and 112 functional target sites without a seed match per microRNA that are missed by all other currently available algorithms. Only a few algorithms can predict target sites without demanding a seed match and TargetSpy demonstrates a substantial improvement in prediction accuracy in that class. Furthermore, when conservation and the presence of a seed match are required, the performance is comparable with state-of-the-art algorithms. TargetSpy was trained on mouse and performs well in human and drosophila, suggesting that it may be applicable to a broad range of species. Moreover, we have demonstrated that the application of machine learning techniques in combination with upcoming deep sequencing data results in a powerful microRNA target site prediction tool http://www.targetspy.org.
TargetSpy: a supervised machine learning approach for microRNA target prediction

PubMed Central

2010-01-01

Background Virtually all currently available microRNA target site prediction algorithms require the presence of a (conserved) seed match to the 5' end of the microRNA. Recently however, it has been shown that this requirement might be too stringent, leading to a substantial number of missed target sites. Results We developed TargetSpy, a novel computational approach for predicting target sites regardless of the presence of a seed match. It is based on machine learning and automatic feature selection using a wide spectrum of compositional, structural, and base pairing features covering current biological knowledge. Our model does not rely on evolutionary conservation, which allows the detection of species-specific interactions and makes TargetSpy suitable for analyzing unconserved genomic sequences. In order to allow for an unbiased comparison of TargetSpy to other methods, we classified all algorithms into three groups: I) no seed match requirement, II) seed match requirement, and III) conserved seed match requirement. TargetSpy predictions for classes II and III are generated by appropriate postfiltering. On a human dataset revealing fold-change in protein production for five selected microRNAs our method shows superior performance in all classes. In Drosophila melanogaster not only our class II and III predictions are on par with other algorithms, but notably the class I (no-seed) predictions are just marginally less accurate. We estimate that TargetSpy predicts between 26 and 112 functional target sites without a seed match per microRNA that are missed by all other currently available algorithms. Conclusion Only a few algorithms can predict target sites without demanding a seed match and TargetSpy demonstrates a substantial improvement in prediction accuracy in that class. Furthermore, when conservation and the presence of a seed match are required, the performance is comparable with state-of-the-art algorithms. TargetSpy was trained on mouse and performs well in human and drosophila, suggesting that it may be applicable to a broad range of species. Moreover, we have demonstrated that the application of machine learning techniques in combination with upcoming deep sequencing data results in a powerful microRNA target site prediction tool http://www.targetspy.org. PMID:20509939
NEP: web server for epitope prediction based on antibody neutralization of viral strains with diverse sequences.

PubMed

Chuang, Gwo-Yu; Liou, David; Kwong, Peter D; Georgiev, Ivelin S

2014-07-01

Delineation of the antigenic site, or epitope, recognized by an antibody can provide clues about functional vulnerabilities and resistance mechanisms, and can therefore guide antibody optimization and epitope-based vaccine design. Previously, we developed an algorithm for antibody-epitope prediction based on antibody neutralization of viral strains with diverse sequences and validated the algorithm on a set of broadly neutralizing HIV-1 antibodies. Here we describe the implementation of this algorithm, NEP (Neutralization-based Epitope Prediction), as a web-based server. The users must supply as input: (i) an alignment of antigen sequences of diverse viral strains; (ii) neutralization data for the antibody of interest against the same set of antigen sequences; and (iii) (optional) a structure of the unbound antigen, for enhanced prediction accuracy. The prediction results can be downloaded or viewed interactively on the antigen structure (if supplied) from the web browser using a JSmol applet. Since neutralization experiments are typically performed as one of the first steps in the characterization of an antibody to determine its breadth and potency, the NEP server can be used to predict antibody-epitope information at no additional experimental costs. NEP can be accessed on the internet at http://exon.niaid.nih.gov/nep. Published by Oxford University Press on behalf of Nucleic Acids Research 2014. This work is written by (a) US Government employee(s) and is in the public domain in the US.
CrossLink: a novel method for cross-condition classification of cancer subtypes.

PubMed

Ma, Chifeng; Sastry, Konduru S; Flore, Mario; Gehani, Salah; Al-Bozom, Issam; Feng, Yusheng; Serpedin, Erchin; Chouchane, Lotfi; Chen, Yidong; Huang, Yufei

2016-08-22

We considered the prediction of cancer classes (e.g. subtypes) using patient gene expression profiles that contain both systematic and condition-specific biases when compared with the training reference dataset. The conventional normalization-based approaches cannot guarantee that the gene signatures in the reference and prediction datasets always have the same distribution for all different conditions as the class-specific gene signatures change with the condition. Therefore, the trained classifier would work well under one condition but not under another. To address the problem of current normalization approaches, we propose a novel algorithm called CrossLink (CL). CL recognizes that there is no universal, condition-independent normalization mapping of signatures. In contrast, it exploits the fact that the signature is unique to its associated class under any condition and thus employs an unsupervised clustering algorithm to discover this unique signature. We assessed the performance of CL for cross-condition predictions of PAM50 subtypes of breast cancer by using a simulated dataset modeled after TCGA BRCA tumor samples with a cross-validation scheme, and datasets with known and unknown PAM50 classification. CL achieved prediction accuracy >73 %, highest among other methods we evaluated. We also applied the algorithm to a set of breast cancer tumors derived from Arabic population to assign a PAM50 classification to each tumor based on their gene expression profiles. A novel algorithm CrossLink for cross-condition prediction of cancer classes was proposed. In all test datasets, CL showed robust and consistent improvement in prediction performance over other state-of-the-art normalization and classification algorithms.
Predicting Positive and Negative Relationships in Large Social Networks.

PubMed

Wang, Guan-Nan; Gao, Hui; Chen, Lian; Mensah, Dennis N A; Fu, Yan

2015-01-01

In a social network, users hold and express positive and negative attitudes (e.g. support/opposition) towards other users. Those attitudes exhibit some kind of binary relationships among the users, which play an important role in social network analysis. However, some of those binary relationships are likely to be latent as the scale of social network increases. The essence of predicting latent binary relationships have recently began to draw researchers' attention. In this paper, we propose a machine learning algorithm for predicting positive and negative relationships in social networks inspired by structural balance theory and social status theory. More specifically, we show that when two users in the network have fewer common neighbors, the prediction accuracy of the relationship between them deteriorates. Accordingly, in the training phase, we propose a segment-based training framework to divide the training data into two subsets according to the number of common neighbors between users, and build a prediction model for each subset based on support vector machine (SVM). Moreover, to deal with large-scale social network data, we employ a sampling strategy that selects small amount of training data while maintaining high accuracy of prediction. We compare our algorithm with traditional algorithms and adaptive boosting of them. Experimental results of typical data sets show that our algorithm can deal with large social networks and consistently outperforms other methods.
Prostate cancer prediction using the random forest algorithm that takes into account transrectal ultrasound findings, age, and serum levels of prostate-specific antigen.

PubMed

Xiao, Li-Hong; Chen, Pei-Ran; Gou, Zhong-Ping; Li, Yong-Zhong; Li, Mei; Xiang, Liang-Cheng; Feng, Ping

2017-01-01

The aim of this study is to evaluate the ability of the random forest algorithm that combines data on transrectal ultrasound findings, age, and serum levels of prostate-specific antigen to predict prostate carcinoma. Clinico-demographic data were analyzed for 941 patients with prostate diseases treated at our hospital, including age, serum prostate-specific antigen levels, transrectal ultrasound findings, and pathology diagnosis based on ultrasound-guided needle biopsy of the prostate. These data were compared between patients with and without prostate cancer using the Chi-square test, and then entered into the random forest model to predict diagnosis. Patients with and without prostate cancer differed significantly in age and serum prostate-specific antigen levels (P < 0.001), as well as in all transrectal ultrasound characteristics (P < 0.05) except uneven echo (P = 0.609). The random forest model based on age, prostate-specific antigen and ultrasound predicted prostate cancer with an accuracy of 83.10%, sensitivity of 65.64%, and specificity of 93.83%. Positive predictive value was 86.72%, and negative predictive value was 81.64%. By integrating age, prostate-specific antigen levels and transrectal ultrasound findings, the random forest algorithm shows better diagnostic performance for prostate cancer than either diagnostic indicator on its own. This algorithm may help improve diagnosis of the disease by identifying patients at high risk for biopsy.
Optimum location of external markers using feature selection algorithms for real‐time tumor tracking in external‐beam radiotherapy: a virtual phantom study

PubMed Central

Nankali, Saber; Miandoab, Payam Samadi; Baghizadeh, Amin

2016-01-01

In external‐beam radiotherapy, using external markers is one of the most reliable tools to predict tumor position, in clinical applications. The main challenge in this approach is tumor motion tracking with highest accuracy that depends heavily on external markers location, and this issue is the objective of this study. Four commercially available feature selection algorithms entitled 1) Correlation‐based Feature Selection, 2) Classifier, 3) Principal Components, and 4) Relief were proposed to find optimum location of external markers in combination with two “Genetic” and “Ranker” searching procedures. The performance of these algorithms has been evaluated using four‐dimensional extended cardiac‐torso anthropomorphic phantom. Six tumors in lung, three tumors in liver, and 49 points on the thorax surface were taken into account to simulate internal and external motions, respectively. The root mean square error of an adaptive neuro‐fuzzy inference system (ANFIS) as prediction model was considered as metric for quantitatively evaluating the performance of proposed feature selection algorithms. To do this, the thorax surface region was divided into nine smaller segments and predefined tumors motion was predicted by ANFIS using external motion data of given markers at each small segment, separately. Our comparative results showed that all feature selection algorithms can reasonably select specific external markers from those segments where the root mean square error of the ANFIS model is minimum. Moreover, the performance accuracy of proposed feature selection algorithms was compared, separately. For this, each tumor motion was predicted using motion data of those external markers selected by each feature selection algorithm. Duncan statistical test, followed by F‐test, on final results reflected that all proposed feature selection algorithms have the same performance accuracy for lung tumors. But for liver tumors, a correlation‐based feature selection algorithm, in combination with a genetic search algorithm, proved to yield best performance accuracy for selecting optimum markers. PACS numbers: 87.55.km, 87.56.Fc PMID:26894358
Optimum location of external markers using feature selection algorithms for real-time tumor tracking in external-beam radiotherapy: a virtual phantom study.

PubMed

Nankali, Saber; Torshabi, Ahmad Esmaili; Miandoab, Payam Samadi; Baghizadeh, Amin

2016-01-08

In external-beam radiotherapy, using external markers is one of the most reliable tools to predict tumor position, in clinical applications. The main challenge in this approach is tumor motion tracking with highest accuracy that depends heavily on external markers location, and this issue is the objective of this study. Four commercially available feature selection algorithms entitled 1) Correlation-based Feature Selection, 2) Classifier, 3) Principal Components, and 4) Relief were proposed to find optimum location of external markers in combination with two "Genetic" and "Ranker" searching procedures. The performance of these algorithms has been evaluated using four-dimensional extended cardiac-torso anthropomorphic phantom. Six tumors in lung, three tumors in liver, and 49 points on the thorax surface were taken into account to simulate internal and external motions, respectively. The root mean square error of an adaptive neuro-fuzzy inference system (ANFIS) as prediction model was considered as metric for quantitatively evaluating the performance of proposed feature selection algorithms. To do this, the thorax surface region was divided into nine smaller segments and predefined tumors motion was predicted by ANFIS using external motion data of given markers at each small segment, separately. Our comparative results showed that all feature selection algorithms can reasonably select specific external markers from those segments where the root mean square error of the ANFIS model is minimum. Moreover, the performance accuracy of proposed feature selection algorithms was compared, separately. For this, each tumor motion was predicted using motion data of those external markers selected by each feature selection algorithm. Duncan statistical test, followed by F-test, on final results reflected that all proposed feature selection algorithms have the same performance accuracy for lung tumors. But for liver tumors, a correlation-based feature selection algorithm, in combination with a genetic search algorithm, proved to yield best performance accuracy for selecting optimum markers.
LS-DYNA Simulation of Hemispherical-punch Stamping Process Using an Efficient Algorithm for Continuum Damage Based Elastoplastic Constitutive Equation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Salajegheh, Nima; Abedrabbo, Nader; Pourboghrat, Farhang

An efficient integration algorithm for continuum damage based elastoplastic constitutive equations is implemented in LS-DYNA. The isotropic damage parameter is defined as the ratio of the damaged surface area over the total cross section area of the representative volume element. This parameter is incorporated into the integration algorithm as an internal variable. The developed damage model is then implemented in the FEM code LS-DYNA as user material subroutine (UMAT). Pure stretch experiments of a hemispherical punch are carried out for copper sheets and the results are compared against the predictions of the implemented damage model. Evaluation of damage parameters ismore » carried out and the optimized values that correctly predicted the failure in the sheet are reported. Prediction of failure in the numerical analysis is performed through element deletion using the critical damage value. The set of failure parameters which accurately predict the failure behavior in copper sheets compared to experimental data is reported as well.« less
Training radial basis function networks for wind speed prediction using PSO enhanced differential search optimizer

PubMed Central

2018-01-01

This paper presents an integrated hybrid optimization algorithm for training the radial basis function neural network (RBF NN). Training of neural networks is still a challenging exercise in machine learning domain. Traditional training algorithms in general suffer and trap in local optima and lead to premature convergence, which makes them ineffective when applied for datasets with diverse features. Training algorithms based on evolutionary computations are becoming popular due to their robust nature in overcoming the drawbacks of the traditional algorithms. Accordingly, this paper proposes a hybrid training procedure with differential search (DS) algorithm functionally integrated with the particle swarm optimization (PSO). To surmount the local trapping of the search procedure, a new population initialization scheme is proposed using Logistic chaotic sequence, which enhances the population diversity and aid the search capability. To demonstrate the effectiveness of the proposed RBF hybrid training algorithm, experimental analysis on publicly available 7 benchmark datasets are performed. Subsequently, experiments were conducted on a practical application case for wind speed prediction to expound the superiority of the proposed RBF training algorithm in terms of prediction accuracy. PMID:29768463
Training radial basis function networks for wind speed prediction using PSO enhanced differential search optimizer.

PubMed

Rani R, Hannah Jessie; Victoire T, Aruldoss Albert

2018-01-01

This paper presents an integrated hybrid optimization algorithm for training the radial basis function neural network (RBF NN). Training of neural networks is still a challenging exercise in machine learning domain. Traditional training algorithms in general suffer and trap in local optima and lead to premature convergence, which makes them ineffective when applied for datasets with diverse features. Training algorithms based on evolutionary computations are becoming popular due to their robust nature in overcoming the drawbacks of the traditional algorithms. Accordingly, this paper proposes a hybrid training procedure with differential search (DS) algorithm functionally integrated with the particle swarm optimization (PSO). To surmount the local trapping of the search procedure, a new population initialization scheme is proposed using Logistic chaotic sequence, which enhances the population diversity and aid the search capability. To demonstrate the effectiveness of the proposed RBF hybrid training algorithm, experimental analysis on publicly available 7 benchmark datasets are performed. Subsequently, experiments were conducted on a practical application case for wind speed prediction to expound the superiority of the proposed RBF training algorithm in terms of prediction accuracy.
Automated diagnoses of attention deficit hyperactive disorder using magnetic resonance imaging.

PubMed

Eloyan, Ani; Muschelli, John; Nebel, Mary Beth; Liu, Han; Han, Fang; Zhao, Tuo; Barber, Anita D; Joel, Suresh; Pekar, James J; Mostofsky, Stewart H; Caffo, Brian

2012-01-01

Successful automated diagnoses of attention deficit hyperactive disorder (ADHD) using imaging and functional biomarkers would have fundamental consequences on the public health impact of the disease. In this work, we show results on the predictability of ADHD using imaging biomarkers and discuss the scientific and diagnostic impacts of the research. We created a prediction model using the landmark ADHD 200 data set focusing on resting state functional connectivity (rs-fc) and structural brain imaging. We predicted ADHD status and subtype, obtained by behavioral examination, using imaging data, intelligence quotients and other covariates. The novel contributions of this manuscript include a thorough exploration of prediction and image feature extraction methodology on this form of data, including the use of singular value decompositions (SVDs), CUR decompositions, random forest, gradient boosting, bagging, voxel-based morphometry, and support vector machines as well as important insights into the value, and potentially lack thereof, of imaging biomarkers of disease. The key results include the CUR-based decomposition of the rs-fc-fMRI along with gradient boosting and the prediction algorithm based on a motor network parcellation and random forest algorithm. We conjecture that the CUR decomposition is largely diagnosing common population directions of head motion. Of note, a byproduct of this research is a potential automated method for detecting subtle in-scanner motion. The final prediction algorithm, a weighted combination of several algorithms, had an external test set specificity of 94% with sensitivity of 21%. The most promising imaging biomarker was a correlation graph from a motor network parcellation. In summary, we have undertaken a large-scale statistical exploratory prediction exercise on the unique ADHD 200 data set. The exercise produced several potential leads for future scientific exploration of the neurological basis of ADHD.

Automated diagnoses of attention deficit hyperactive disorder using magnetic resonance imaging

PubMed Central

Eloyan, Ani; Muschelli, John; Nebel, Mary Beth; Liu, Han; Han, Fang; Zhao, Tuo; Barber, Anita D.; Joel, Suresh; Pekar, James J.; Mostofsky, Stewart H.; Caffo, Brian

2012-01-01

Successful automated diagnoses of attention deficit hyperactive disorder (ADHD) using imaging and functional biomarkers would have fundamental consequences on the public health impact of the disease. In this work, we show results on the predictability of ADHD using imaging biomarkers and discuss the scientific and diagnostic impacts of the research. We created a prediction model using the landmark ADHD 200 data set focusing on resting state functional connectivity (rs-fc) and structural brain imaging. We predicted ADHD status and subtype, obtained by behavioral examination, using imaging data, intelligence quotients and other covariates. The novel contributions of this manuscript include a thorough exploration of prediction and image feature extraction methodology on this form of data, including the use of singular value decompositions (SVDs), CUR decompositions, random forest, gradient boosting, bagging, voxel-based morphometry, and support vector machines as well as important insights into the value, and potentially lack thereof, of imaging biomarkers of disease. The key results include the CUR-based decomposition of the rs-fc-fMRI along with gradient boosting and the prediction algorithm based on a motor network parcellation and random forest algorithm. We conjecture that the CUR decomposition is largely diagnosing common population directions of head motion. Of note, a byproduct of this research is a potential automated method for detecting subtle in-scanner motion. The final prediction algorithm, a weighted combination of several algorithms, had an external test set specificity of 94% with sensitivity of 21%. The most promising imaging biomarker was a correlation graph from a motor network parcellation. In summary, we have undertaken a large-scale statistical exploratory prediction exercise on the unique ADHD 200 data set. The exercise produced several potential leads for future scientific exploration of the neurological basis of ADHD. PMID:22969709
A hybrid clustering and classification approach for predicting crash injury severity on rural roads.

PubMed

Hasheminejad, Seyed Hessam-Allah; Zahedi, Mohsen; Hasheminejad, Seyed Mohammad Hossein

2018-03-01

As a threat for transportation system, traffic crashes have a wide range of social consequences for governments. Traffic crashes are increasing in developing countries and Iran as a developing country is not immune from this risk. There are several researches in the literature to predict traffic crash severity based on artificial neural networks (ANNs), support vector machines and decision trees. This paper attempts to investigate the crash injury severity of rural roads by using a hybrid clustering and classification approach to compare the performance of classification algorithms before and after applying the clustering. In this paper, a novel rule-based genetic algorithm (GA) is proposed to predict crash injury severity, which is evaluated by performance criteria in comparison with classification algorithms like ANN. The results obtained from analysis of 13,673 crashes (5600 property damage, 778 fatal crashes, 4690 slight injuries and 2605 severe injuries) on rural roads in Tehran Province of Iran during 2011-2013 revealed that the proposed GA method outperforms other classification algorithms based on classification metrics like precision (86%), recall (88%) and accuracy (87%). Moreover, the proposed GA method has the highest level of interpretation, is easy to understand and provides feedback to analysts.
[Prediction of regional soil quality based on mutual information theory integrated with decision tree algorithm].

PubMed

Lin, Fen-Fang; Wang, Ke; Yang, Ning; Yan, Shi-Guang; Zheng, Xin-Yu

2012-02-01

In this paper, some main factors such as soil type, land use pattern, lithology type, topography, road, and industry type that affect soil quality were used to precisely obtain the spatial distribution characteristics of regional soil quality, mutual information theory was adopted to select the main environmental factors, and decision tree algorithm See 5.0 was applied to predict the grade of regional soil quality. The main factors affecting regional soil quality were soil type, land use, lithology type, distance to town, distance to water area, altitude, distance to road, and distance to industrial land. The prediction accuracy of the decision tree model with the variables selected by mutual information was obviously higher than that of the model with all variables, and, for the former model, whether of decision tree or of decision rule, its prediction accuracy was all higher than 80%. Based on the continuous and categorical data, the method of mutual information theory integrated with decision tree could not only reduce the number of input parameters for decision tree algorithm, but also predict and assess regional soil quality effectively.
Wheel life prediction model - an alternative to the FASTSIM algorithm for RCF

NASA Astrophysics Data System (ADS)

Hossein-Nia, Saeed; Sichani, Matin Sh.; Stichel, Sebastian; Casanueva, Carlos

2018-07-01

In this article, a wheel life prediction model considering wear and rolling contact fatigue (RCF) is developed and applied to a heavy-haul locomotive. For wear calculations, a methodology based on Archard's wear calculation theory is used. The simulated wear depth is compared with profile measurements within 100,000 km. For RCF, a shakedown-based theory is applied locally, using the FaStrip algorithm to estimate the tangential stresses instead of FASTSIM. The differences between the two algorithms on damage prediction models are studied. The running distance between the two reprofiling due to RCF is estimated based on a Wöhler-like relationship developed from laboratory test results from the literature and the Palmgren-Miner rule. The simulated crack locations and their angles are compared with a five-year field study. Calculations to study the effects of electro-dynamic braking, track gauge, harder wheel material and the increase of axle load on the wheel life are also carried out.
Feature selection method based on multi-fractal dimension and harmony search algorithm and its application

NASA Astrophysics Data System (ADS)

Zhang, Chen; Ni, Zhiwei; Ni, Liping; Tang, Na

2016-10-01

Feature selection is an important method of data preprocessing in data mining. In this paper, a novel feature selection method based on multi-fractal dimension and harmony search algorithm is proposed. Multi-fractal dimension is adopted as the evaluation criterion of feature subset, which can determine the number of selected features. An improved harmony search algorithm is used as the search strategy to improve the efficiency of feature selection. The performance of the proposed method is compared with that of other feature selection algorithms on UCI data-sets. Besides, the proposed method is also used to predict the daily average concentration of PM2.5 in China. Experimental results show that the proposed method can obtain competitive results in terms of both prediction accuracy and the number of selected features.
Research of maneuvering target prediction and tracking technology based on IMM algorithm

NASA Astrophysics Data System (ADS)

Cao, Zheng; Mao, Yao; Deng, Chao; Liu, Qiong; Chen, Jing

2016-09-01

Maneuvering target prediction and tracking technology is widely used in both military and civilian applications, the study of those technologies is all along the hotspot and difficulty. In the Electro-Optical acquisition-tracking-pointing system (ATP), the primary traditional maneuvering targets are ballistic target, large aircraft and other big targets. Those targets have the features of fast velocity and a strong regular trajectory and Kalman Filtering and polynomial fitting have good effects when they are used to track those targets. In recent years, the small unmanned aerial vehicles developed rapidly for they are small, nimble and simple operation. The small unmanned aerial vehicles have strong maneuverability in the observation system of ATP although they are close-in, slow and small targets. Moreover, those vehicles are under the manual operation, therefore, the acceleration of them changes greatly and they move erratically. So the prediction and tracking precision is low when traditional algorithms are used to track the maneuvering fly of those targets, such as speeding up, turning, climbing and so on. The interacting multiple model algorithm (IMM) use multiple models to match target real movement trajectory, there are interactions between each model. The IMM algorithm can switch model based on a Markov chain to adapt to the change of target movement trajectory, so it is suitable to solve the prediction and tracking problems of the small unmanned aerial vehicles because of the better adaptability of irregular movement. This paper has set up model set of constant velocity model (CV), constant acceleration model (CA), constant turning model (CT) and current statistical model. And the results of simulating and analyzing the real movement trajectory data of the small unmanned aerial vehicles show that the prediction and tracking technology based on the interacting multiple model algorithm can get relatively lower tracking error and improve tracking precision comparing with traditional algorithms.
Prediction of Baseflow Index of Catchments using Machine Learning Algorithms

NASA Astrophysics Data System (ADS)

Yadav, B.; Hatfield, K.

2017-12-01

We present the results of eight machine learning techniques for predicting the baseflow index (BFI) of ungauged basins using a surrogate of catchment scale climate and physiographic data. The tested algorithms include ordinary least squares, ridge regression, least absolute shrinkage and selection operator (lasso), elasticnet, support vector machine, gradient boosted regression trees, random forests, and extremely randomized trees. Our work seeks to identify the dominant controls of BFI that can be readily obtained from ancillary geospatial databases and remote sensing measurements, such that the developed techniques can be extended to ungauged catchments. More than 800 gauged catchments spanning the continental United States were selected to develop the general methodology. The BFI calculation was based on the baseflow separated from daily streamflow hydrograph using HYSEP filter. The surrogate catchment attributes were compiled from multiple sources including digital elevation model, soil, landuse, climate data, other publicly available ancillary and geospatial data. 80% catchments were used to train the ML algorithms, and the remaining 20% of the catchments were used as an independent test set to measure the generalization performance of fitted models. A k-fold cross-validation using exhaustive grid search was used to fit the hyperparameters of each model. Initial model development was based on 19 independent variables, but after variable selection and feature ranking, we generated revised sparse models of BFI prediction that are based on only six catchment attributes. These key predictive variables selected after the careful evaluation of bias-variance tradeoff include average catchment elevation, slope, fraction of sand, permeability, temperature, and precipitation. The most promising algorithms exceeding an accuracy score (r-square) of 0.7 on test data include support vector machine, gradient boosted regression trees, random forests, and extremely randomized trees. Considering both the accuracy and the computational complexity of these algorithms, we identify the extremely randomized trees as the best performing algorithm for BFI prediction in ungauged basins.
Adaptive MPC based on MIMO ARX-Laguerre model.

PubMed

Ben Abdelwahed, Imen; Mbarek, Abdelkader; Bouzrara, Kais

2017-03-01

This paper proposes a method for synthesizing an adaptive predictive controller using a reduced complexity model. This latter is given by the projection of the ARX model on Laguerre bases. The resulting model is entitled MIMO ARX-Laguerre and it is characterized by an easy recursive representation. The adaptive predictive control law is computed based on multi-step-ahead finite-element predictors, identified directly from experimental input/output data. The model is tuned in each iteration by an online identification algorithms of both model parameters and Laguerre poles. The proposed approach avoids time consuming numerical optimization algorithms associated with most common linear predictive control strategies, which makes it suitable for real-time implementation. The method is used to synthesize and test in numerical simulations adaptive predictive controllers for the CSTR process benchmark. Copyright © 2016 ISA. Published by Elsevier Ltd. All rights reserved.
A universal deep learning approach for modeling the flow of patients under different severities.

PubMed

Jiang, Shancheng; Chin, Kwai-Sang; Tsui, Kwok L

2018-02-01

The Accident and Emergency Department (A&ED) is the frontline for providing emergency care in hospitals. Unfortunately, relative A&ED resources have failed to keep up with continuously increasing demand in recent years, which leads to overcrowding in A&ED. Knowing the fluctuation of patient arrival volume in advance is a significant premise to relieve this pressure. Based on this motivation, the objective of this study is to explore an integrated framework with high accuracy for predicting A&ED patient flow under different triage levels, by combining a novel feature selection process with deep neural networks. Administrative data is collected from an actual A&ED and categorized into five groups based on different triage levels. A genetic algorithm (GA)-based feature selection algorithm is improved and implemented as a pre-processing step for this time-series prediction problem, in order to explore key features affecting patient flow. In our improved GA, a fitness-based crossover is proposed to maintain the joint information of multiple features during iterative process, instead of traditional point-based crossover. Deep neural networks (DNN) is employed as the prediction model to utilize their universal adaptability and high flexibility. In the model-training process, the learning algorithm is well-configured based on a parallel stochastic gradient descent algorithm. Two effective regularization strategies are integrated in one DNN framework to avoid overfitting. All introduced hyper-parameters are optimized efficiently by grid-search in one pass. As for feature selection, our improved GA-based feature selection algorithm has outperformed a typical GA and four state-of-the-art feature selection algorithms (mRMR, SAFS, VIFR, and CFR). As for the prediction accuracy of proposed integrated framework, compared with other frequently used statistical models (GLM, seasonal-ARIMA, ARIMAX, and ANN) and modern machine models (SVM-RBF, SVM-linear, RF, and R-LASSO), the proposed integrated "DNN-I-GA" framework achieves higher prediction accuracy on both MAPE and RMSE metrics in pairwise comparisons. The contribution of our study is two-fold. Theoretically, the traditional GA-based feature selection process is improved to have less hyper-parameters and higher efficiency, and the joint information of multiple features is maintained by fitness-based crossover operator. The universal property of DNN is further enhanced by merging different regularization strategies. Practically, features selected by our improved GA can be used to acquire an underlying relationship between patient flows and input features. Predictive values are significant indicators of patients' demand and can be used by A&ED managers to make resource planning and allocation. High accuracy achieved by the present framework in different cases enhances the reliability of downstream decision makings. Copyright © 2017 Elsevier B.V. All rights reserved.
TaDb: A time-aware diffusion-based recommender algorithm

NASA Astrophysics Data System (ADS)

Li, Wen-Jun; Xu, Yuan-Yuan; Dong, Qiang; Zhou, Jun-Lin; Fu, Yan

2015-02-01

Traditional recommender algorithms usually employ the early and recent records indiscriminately, which overlooks the change of user interests over time. In this paper, we show that the interests of a user remain stable in a short-term interval and drift during a long-term period. Based on this observation, we propose a time-aware diffusion-based (TaDb) recommender algorithm, which assigns different temporal weights to the leading links existing before the target user's collection and the following links appearing after that in the diffusion process. Experiments on four real datasets, Netflix, MovieLens, FriendFeed and Delicious show that TaDb algorithm significantly improves the prediction accuracy compared with the algorithms not considering temporal effects.
Comparison of Two Phenotypic Algorithms To Detect Carbapenemase-Producing Enterobacteriaceae

PubMed Central

Dortet, Laurent; Bernabeu, Sandrine; Gonzalez, Camille

2017-01-01

ABSTRACT A novel algorithm designed for the screening of carbapenemase-producing Enterobacteriaceae (CPE), based on faropenem and temocillin disks, was compared to that of the Committee of the Antibiogram of the French Society of Microbiology (CA-SFM), which is based on ticarcillin-clavulanate, imipenem, and temocillin disks. The two algorithms presented comparable negative predictive values (98.6% versus 97.5%) for CPE screening among carbapenem-nonsusceptible Enterobacteriaceae. However, since 46.2% (n = 49) of the CPE were correctly identified as OXA-48-like producers by the faropenem/temocillin-based algorithm, it significantly decreased the number of complementary tests needed (42.2% versus 62.6% with the CA-SFM algorithm). PMID:28607010
Development and evaluation of a predictive algorithm for telerobotic task complexity

NASA Technical Reports Server (NTRS)

Gernhardt, M. L.; Hunter, R. C.; Hedgecock, J. C.; Stephenson, A. G.

1993-01-01

There is a wide range of complexity in the various telerobotic servicing tasks performed in subsea, space, and hazardous material handling environments. Experience with telerobotic servicing has evolved into a knowledge base used to design tasks to be 'telerobot friendly.' This knowledge base generally resides in a small group of people. Written documentation and requirements are limited in conveying this knowledge base to serviceable equipment designers and are subject to misinterpretation. A mathematical model of task complexity based on measurable task parameters and telerobot performance characteristics would be a valuable tool to designers and operational planners. Oceaneering Space Systems and TRW have performed an independent research and development project to develop such a tool for telerobotic orbital replacement unit (ORU) exchange. This algorithm was developed to predict an ORU exchange degree of difficulty rating (based on the Cooper-Harper rating used to assess piloted operations). It is based on measurable parameters of the ORU, attachment receptacle and quantifiable telerobotic performance characteristics (e.g., link length, joint ranges, positional accuracy, tool lengths, number of cameras, and locations). The resulting algorithm can be used to predict task complexity as the ORU parameters, receptacle parameters, and telerobotic characteristics are varied.
Dynamic Bus Travel Time Prediction Models on Road with Multiple Bus Routes

PubMed Central

Bai, Cong; Peng, Zhong-Ren; Lu, Qing-Chang; Sun, Jian

2015-01-01

Accurate and real-time travel time information for buses can help passengers better plan their trips and minimize waiting times. A dynamic travel time prediction model for buses addressing the cases on road with multiple bus routes is proposed in this paper, based on support vector machines (SVMs) and Kalman filtering-based algorithm. In the proposed model, the well-trained SVM model predicts the baseline bus travel times from the historical bus trip data; the Kalman filtering-based dynamic algorithm can adjust bus travel times with the latest bus operation information and the estimated baseline travel times. The performance of the proposed dynamic model is validated with the real-world data on road with multiple bus routes in Shenzhen, China. The results show that the proposed dynamic model is feasible and applicable for bus travel time prediction and has the best prediction performance among all the five models proposed in the study in terms of prediction accuracy on road with multiple bus routes. PMID:26294903
Dynamic Bus Travel Time Prediction Models on Road with Multiple Bus Routes.

PubMed

Bai, Cong; Peng, Zhong-Ren; Lu, Qing-Chang; Sun, Jian

2015-01-01

Accurate and real-time travel time information for buses can help passengers better plan their trips and minimize waiting times. A dynamic travel time prediction model for buses addressing the cases on road with multiple bus routes is proposed in this paper, based on support vector machines (SVMs) and Kalman filtering-based algorithm. In the proposed model, the well-trained SVM model predicts the baseline bus travel times from the historical bus trip data; the Kalman filtering-based dynamic algorithm can adjust bus travel times with the latest bus operation information and the estimated baseline travel times. The performance of the proposed dynamic model is validated with the real-world data on road with multiple bus routes in Shenzhen, China. The results show that the proposed dynamic model is feasible and applicable for bus travel time prediction and has the best prediction performance among all the five models proposed in the study in terms of prediction accuracy on road with multiple bus routes.
Development and validation of a prediction algorithm for the onset of common mental disorders in a working population.

PubMed

Fernandez, Ana; Salvador-Carulla, Luis; Choi, Isabella; Calvo, Rafael; Harvey, Samuel B; Glozier, Nicholas

2018-01-01

Common mental disorders are the most common reason for long-term sickness absence in most developed countries. Prediction algorithms for the onset of common mental disorders may help target indicated work-based prevention interventions. We aimed to develop and validate a risk algorithm to predict the onset of common mental disorders at 12 months in a working population. We conducted a secondary analysis of the Household, Income and Labour Dynamics in Australia Survey, a longitudinal, nationally representative household panel in Australia. Data from the 6189 working participants who did not meet the criteria for a common mental disorders at baseline were non-randomly split into training and validation databases, based on state of residence. Common mental disorders were assessed with the mental component score of 36-Item Short Form Health Survey questionnaire (score ⩽45). Risk algorithms were constructed following recommendations made by the Transparent Reporting of a multivariable prediction model for Prevention Or Diagnosis statement. Different risk factors were identified among women and men for the final risk algorithms. In the training data, the model for women had a C-index of 0.73 and effect size (Hedges' g) of 0.91. In men, the C-index was 0.76 and the effect size was 1.06. In the validation data, the C-index was 0.66 for women and 0.73 for men, with positive predictive values of 0.28 and 0.26, respectively Conclusion: It is possible to develop an algorithm with good discrimination for the onset identifying overall and modifiable risks of common mental disorders among working men. Such models have the potential to change the way that prevention of common mental disorders at the workplace is conducted, but different models may be required for women.
LaSVM-based big data learning system for dynamic prediction of air pollution in Tehran.

PubMed

Ghaemi, Z; Alimohammadi, A; Farnaghi, M

2018-04-20

Due to critical impacts of air pollution, prediction and monitoring of air quality in urban areas are important tasks. However, because of the dynamic nature and high spatio-temporal variability, prediction of the air pollutant concentrations is a complex spatio-temporal problem. Distribution of pollutant concentration is influenced by various factors such as the historical pollution data and weather conditions. Conventional methods such as the support vector machine (SVM) or artificial neural networks (ANN) show some deficiencies when huge amount of streaming data have to be analyzed for urban air pollution prediction. In order to overcome the limitations of the conventional methods and improve the performance of urban air pollution prediction in Tehran, a spatio-temporal system is designed using a LaSVM-based online algorithm. Pollutant concentration and meteorological data along with geographical parameters are continually fed to the developed online forecasting system. Performance of the system is evaluated by comparing the prediction results of the Air Quality Index (AQI) with those of a traditional SVM algorithm. Results show an outstanding increase of speed by the online algorithm while preserving the accuracy of the SVM classifier. Comparison of the hourly predictions for next coming 24 h, with those of the measured pollution data in Tehran pollution monitoring stations shows an overall accuracy of 0.71, root mean square error of 0.54 and coefficient of determination of 0.81. These results are indicators of the practical usefulness of the online algorithm for real-time spatial and temporal prediction of the urban air quality.
RANdom SAmple Consensus (RANSAC) algorithm for material-informatics: application to photovoltaic solar cells.

PubMed

Kaspi, Omer; Yosipof, Abraham; Senderowitz, Hanoch

2017-06-06

An important aspect of chemoinformatics and material-informatics is the usage of machine learning algorithms to build Quantitative Structure Activity Relationship (QSAR) models. The RANdom SAmple Consensus (RANSAC) algorithm is a predictive modeling tool widely used in the image processing field for cleaning datasets from noise. RANSAC could be used as a "one stop shop" algorithm for developing and validating QSAR models, performing outlier removal, descriptors selection, model development and predictions for test set samples using applicability domain. For "future" predictions (i.e., for samples not included in the original test set) RANSAC provides a statistical estimate for the probability of obtaining reliable predictions, i.e., predictions within a pre-defined number of standard deviations from the true values. In this work we describe the first application of RNASAC in material informatics, focusing on the analysis of solar cells. We demonstrate that for three datasets representing different metal oxide (MO) based solar cell libraries RANSAC-derived models select descriptors previously shown to correlate with key photovoltaic properties and lead to good predictive statistics for these properties. These models were subsequently used to predict the properties of virtual solar cells libraries highlighting interesting dependencies of PV properties on MO compositions.
Cost-effectiveness of the non-laboratory based Framingham algorithm in primary prevention of cardiovascular disease: A simulated analysis of a cohort of African American adults.

PubMed

Kariuki, Jacob K; Gona, Philimon; Leveille, Suzanne G; Stuart-Shor, Eileen M; Hayman, Laura L; Cromwell, Jerry

2018-06-01

The non-lab Framingham algorithm, which substitute body mass index for lipids in the laboratory based (lab-based) Framingham algorithm, has been validated among African Americans (AAs). However, its cost-effectiveness and economic tradeoffs have not been evaluated. This study examines the incremental cost-effectiveness ratio (ICER) of two cardiovascular disease (CVD) prevention programs guided by the non-lab versus lab-based Framingham algorithm. We simulated the World Health Organization CVD prevention guidelines on a cohort of 2690 AA participants in the Atherosclerosis Risk in Communities (ARIC) cohort. Costs were estimated using Medicare fee schedules (diagnostic tests, drugs & visits), Bureau of Labor Statistics (RN wages), and estimates for managing incident CVD events. Outcomes were assumed to be true positive cases detected at a data driven treatment threshold. Both algorithms had the best balance of sensitivity/specificity at the moderate risk threshold (>10% risk). Over 12years, 82% and 77% of 401 incident CVD events were accurately predicted via the non-lab and lab-based Framingham algorithms, respectively. There were 20 fewer false negative cases in the non-lab approach translating into over $900,000 in savings over 12years. The ICER was -$57,153 for every extra CVD event prevented when using the non-lab algorithm. The approach guided by the non-lab Framingham strategy dominated the lab-based approach with respect to both costs and predictive ability. Consequently, the non-lab Framingham algorithm could potentially provide a highly effective screening tool at lower cost to address the high burden of CVD especially among AA and in resource-constrained settings where lab tests are unavailable. Copyright © 2017 Elsevier Inc. All rights reserved.
An Impact-Location Estimation Algorithm for Subsonic Uninhabited Aircraft

NASA Technical Reports Server (NTRS)

Bauer, Jeffrey E.; Teets, Edward

1997-01-01

An impact-location estimation algorithm is being used at the NASA Dryden Flight Research Center to support range safety for uninhabited aerial vehicle flight tests. The algorithm computes an impact location based on the descent rate, mass, and altitude of the vehicle and current wind information. The predicted impact location is continuously displayed on the range safety officer's moving map display so that the flightpath of the vehicle can be routed to avoid ground assets if the flight must be terminated. The algorithm easily adapts to different vehicle termination techniques and has been shown to be accurate to the extent required to support range safety for subsonic uninhabited aerial vehicles. This paper describes how the algorithm functions, how the algorithm is used at NASA Dryden, and how various termination techniques are handled by the algorithm. Other approaches to predicting the impact location and the reasons why they were not selected for real-time implementation are also discussed.
Syndromic Algorithms for Detection of Gambiense Human African Trypanosomiasis in South Sudan

PubMed Central

Palmer, Jennifer J.; Surur, Elizeous I.; Goch, Garang W.; Mayen, Mangar A.; Lindner, Andreas K.; Pittet, Anne; Kasparian, Serena; Checchi, Francesco; Whitty, Christopher J. M.

2013-01-01

Background Active screening by mobile teams is considered the best method for detecting human African trypanosomiasis (HAT) caused by Trypanosoma brucei gambiense but the current funding context in many post-conflict countries limits this approach. As an alternative, non-specialist health care workers (HCWs) in peripheral health facilities could be trained to identify potential cases who need testing based on their symptoms. We explored the predictive value of syndromic referral algorithms to identify symptomatic cases of HAT among a treatment-seeking population in Nimule, South Sudan. Methodology/Principal Findings Symptom data from 462 patients (27 cases) presenting for a HAT test via passive screening over a 7 month period were collected to construct and evaluate over 14,000 four item syndromic algorithms considered simple enough to be used by peripheral HCWs. For comparison, algorithms developed in other settings were also tested on our data, and a panel of expert HAT clinicians were asked to make referral decisions based on the symptom dataset. The best performing algorithms consisted of three core symptoms (sleep problems, neurological problems and weight loss), with or without a history of oedema, cervical adenopathy or proximity to livestock. They had a sensitivity of 88.9–92.6%, a negative predictive value of up to 98.8% and a positive predictive value in this context of 8.4–8.7%. In terms of sensitivity, these out-performed more complex algorithms identified in other studies, as well as the expert panel. The best-performing algorithm is predicted to identify about 9/10 treatment-seeking HAT cases, though only 1/10 patients referred would test positive. Conclusions/Significance In the absence of regular active screening, improving referrals of HAT patients through other means is essential. Systematic use of syndromic algorithms by peripheral HCWs has the potential to increase case detection and would increase their participation in HAT programmes. The algorithms proposed here, though promising, should be validated elsewhere. PMID:23350005

IDMA-Based MAC Protocol for Satellite Networks with Consideration on Channel Quality

PubMed Central

2014-01-01

In order to overcome the shortcomings of existing medium access control (MAC) protocols based on TDMA or CDMA in satellite networks, interleave division multiple access (IDMA) technique is introduced into satellite communication networks. Therefore, a novel wide-band IDMA MAC protocol based on channel quality is proposed in this paper, consisting of a dynamic power allocation algorithm, a rate adaptation algorithm, and a call admission control (CAC) scheme. Firstly, the power allocation algorithm combining the technique of IDMA SINR-evolution and channel quality prediction is developed to guarantee high power efficiency even in terrible channel conditions. Secondly, the effective rate adaptation algorithm, based on accurate channel information per timeslot and by the means of rate degradation, can be realized. What is more, based on channel quality prediction, the CAC scheme, combining the new power allocation algorithm, rate scheduling, and buffering strategies together, is proposed for the emerging IDMA systems, which can support a variety of traffic types, and offering quality of service (QoS) requirements corresponding to different priority levels. Simulation results show that the new wide-band IDMA MAC protocol can make accurate estimation of available resource considering the effect of multiuser detection (MUD) and QoS requirements of multimedia traffic, leading to low outage probability as well as high overall system throughput. PMID:25126592
The utility and limitations of current web-available algorithms to predict peptides recognized by CD4 T cells in response to pathogen infection #

PubMed Central

Chaves, Francisco A.; Lee, Alvin H.; Nayak, Jennifer; Richards, Katherine A.; Sant, Andrea J.

2012-01-01

The ability to track CD4 T cells elicited in response to pathogen infection or vaccination is critical because of the role these cells play in protective immunity. Coupled with advances in genome sequencing of pathogenic organisms, there is considerable appeal for implementation of computer-based algorithms to predict peptides that bind to the class II molecules, forming the complex recognized by CD4 T cells. Despite recent progress in this area, there is a paucity of data regarding their success in identifying actual pathogen-derived epitopes. In this study, we sought to rigorously evaluate the performance of multiple web-available algorithms by comparing their predictions and our results using purely empirical methods for epitope discovery in influenza that utilized overlapping peptides and cytokine Elispots, for three independent class II molecules. We analyzed the data in different ways, trying to anticipate how an investigator might use these computational tools for epitope discovery. We come to the conclusion that currently available algorithms can indeed facilitate epitope discovery, but all shared a high degree of false positive and false negative predictions. Therefore, efficiencies were low. We also found dramatic disparities among algorithms and between predicted IC50 values and true dissociation rates of peptide:MHC class II complexes. We suggest that improved success of predictive algorithms will depend less on changes in computational methods or increased data sets and more on changes in parameters used to “train” the algorithms that factor in elements of T cell repertoire and peptide acquisition by class II molecules. PMID:22467652
3D Protein structure prediction with genetic tabu search algorithm

PubMed Central

2010-01-01

Background Protein structure prediction (PSP) has important applications in different fields, such as drug design, disease prediction, and so on. In protein structure prediction, there are two important issues. The first one is the design of the structure model and the second one is the design of the optimization technology. Because of the complexity of the realistic protein structure, the structure model adopted in this paper is a simplified model, which is called off-lattice AB model. After the structure model is assumed, optimization technology is needed for searching the best conformation of a protein sequence based on the assumed structure model. However, PSP is an NP-hard problem even if the simplest model is assumed. Thus, many algorithms have been developed to solve the global optimization problem. In this paper, a hybrid algorithm, which combines genetic algorithm (GA) and tabu search (TS) algorithm, is developed to complete this task. Results In order to develop an efficient optimization algorithm, several improved strategies are developed for the proposed genetic tabu search algorithm. The combined use of these strategies can improve the efficiency of the algorithm. In these strategies, tabu search introduced into the crossover and mutation operators can improve the local search capability, the adoption of variable population size strategy can maintain the diversity of the population, and the ranking selection strategy can improve the possibility of an individual with low energy value entering into next generation. Experiments are performed with Fibonacci sequences and real protein sequences. Experimental results show that the lowest energy obtained by the proposed GATS algorithm is lower than that obtained by previous methods. Conclusions The hybrid algorithm has the advantages from both genetic algorithm and tabu search algorithm. It makes use of the advantage of multiple search points in genetic algorithm, and can overcome poor hill-climbing capability in the conventional genetic algorithm by using the flexible memory functions of TS. Compared with some previous algorithms, GATS algorithm has better performance in global optimization and can predict 3D protein structure more effectively. PMID:20522256
Prediction model of dissolved oxygen in ponds based on ELM neural network

NASA Astrophysics Data System (ADS)

Li, Xinfei; Ai, Jiaoyan; Lin, Chunhuan; Guan, Haibin

2018-02-01

Dissolved oxygen in ponds is affected by many factors, and its distribution is unbalanced. In this paper, in order to improve the imbalance of dissolved oxygen distribution more effectively, the dissolved oxygen prediction model of Extreme Learning Machine (ELM) intelligent algorithm is established, based on the method of improving dissolved oxygen distribution by artificial push flow. Select the Lake Jing of Guangxi University as the experimental area. Using the model to predict the dissolved oxygen concentration of different voltage pumps, the results show that the ELM prediction accuracy is higher than the BP algorithm, and its mean square error is MSEELM=0.0394, the correlation coefficient RELM=0.9823. The prediction results of the 24V voltage pump push flow show that the discrete prediction curve can approximate the measured values well. The model can provide the basis for the artificial improvement of the dissolved oxygen distribution decision.
PRESS-based EFOR algorithm for the dynamic parametrical modeling of nonlinear MDOF systems

NASA Astrophysics Data System (ADS)

Liu, Haopeng; Zhu, Yunpeng; Luo, Zhong; Han, Qingkai

2017-09-01

In response to the identification problem concerning multi-degree of freedom (MDOF) nonlinear systems, this study presents the extended forward orthogonal regression (EFOR) based on predicted residual sums of squares (PRESS) to construct a nonlinear dynamic parametrical model. The proposed parametrical model is based on the non-linear autoregressive with exogenous inputs (NARX) model and aims to explicitly reveal the physical design parameters of the system. The PRESS-based EFOR algorithm is proposed to identify such a model for MDOF systems. By using the algorithm, we built a common-structured model based on the fundamental concept of evaluating its generalization capability through cross-validation. The resulting model aims to prevent over-fitting with poor generalization performance caused by the average error reduction ratio (AERR)-based EFOR algorithm. Then, a functional relationship is established between the coefficients of the terms and the design parameters of the unified model. Moreover, a 5-DOF nonlinear system is taken as a case to illustrate the modeling of the proposed algorithm. Finally, a dynamic parametrical model of a cantilever beam is constructed from experimental data. Results indicate that the dynamic parametrical model of nonlinear systems, which depends on the PRESS-based EFOR, can accurately predict the output response, thus providing a theoretical basis for the optimal design of modeling methods for MDOF nonlinear systems.
Real-time prediction and gating of respiratory motion in 3D space using extended Kalman filters and Gaussian process regression network

NASA Astrophysics Data System (ADS)

Bukhari, W.; Hong, S.-M.

2016-03-01

The prediction as well as the gating of respiratory motion have received much attention over the last two decades for reducing the targeting error of the radiation treatment beam due to respiratory motion. In this article, we present a real-time algorithm for predicting respiratory motion in 3D space and realizing a gating function without pre-specifying a particular phase of the patient’s breathing cycle. The algorithm, named EKF-GPRN+ , first employs an extended Kalman filter (EKF) independently along each coordinate to predict the respiratory motion and then uses a Gaussian process regression network (GPRN) to correct the prediction error of the EKF in 3D space. The GPRN is a nonparametric Bayesian algorithm for modeling input-dependent correlations between the output variables in multi-output regression. Inference in GPRN is intractable and we employ variational inference with mean field approximation to compute an approximate predictive mean and predictive covariance matrix. The approximate predictive mean is used to correct the prediction error of the EKF. The trace of the approximate predictive covariance matrix is utilized to capture the uncertainty in EKF-GPRN+ prediction error and systematically identify breathing points with a higher probability of large prediction error in advance. This identification enables us to pause the treatment beam over such instances. EKF-GPRN+ implements a gating function by using simple calculations based on the trace of the predictive covariance matrix. Extensive numerical experiments are performed based on a large database of 304 respiratory motion traces to evaluate EKF-GPRN+ . The experimental results show that the EKF-GPRN+ algorithm reduces the patient-wise prediction error to 38%, 40% and 40% in root-mean-square, compared to no prediction, at lookahead lengths of 192 ms, 384 ms and 576 ms, respectively. The EKF-GPRN+ algorithm can further reduce the prediction error by employing the gating function, albeit at the cost of reduced duty cycle. The error reduction allows the clinical target volume to planning target volume (CTV-PTV) margin to be reduced, leading to decreased normal-tissue toxicity and possible dose escalation. The CTV-PTV margin is also evaluated to quantify clinical benefits of EKF-GPRN+ prediction.
Paroxysmal atrial fibrillation prediction based on HRV analysis and non-dominated sorting genetic algorithm III.

PubMed

Boon, K H; Khalil-Hani, M; Malarvili, M B

2018-01-01

This paper presents a method that able to predict the paroxysmal atrial fibrillation (PAF). The method uses shorter heart rate variability (HRV) signals when compared to existing methods, and achieves good prediction accuracy. PAF is a common cardiac arrhythmia that increases the health risk of a patient, and the development of an accurate predictor of the onset of PAF is clinical important because it increases the possibility to electrically stabilize and prevent the onset of atrial arrhythmias with different pacing techniques. We propose a multi-objective optimization algorithm based on the non-dominated sorting genetic algorithm III for optimizing the baseline PAF prediction system, that consists of the stages of pre-processing, HRV feature extraction, and support vector machine (SVM) model. The pre-processing stage comprises of heart rate correction, interpolation, and signal detrending. After that, time-domain, frequency-domain, non-linear HRV features are extracted from the pre-processed data in feature extraction stage. Then, these features are used as input to the SVM for predicting the PAF event. The proposed optimization algorithm is used to optimize the parameters and settings of various HRV feature extraction algorithms, select the best feature subsets, and tune the SVM parameters simultaneously for maximum prediction performance. The proposed method achieves an accuracy rate of 87.7%, which significantly outperforms most of the previous works. This accuracy rate is achieved even with the HRV signal length being reduced from the typical 30 min to just 5 min (a reduction of 83%). Furthermore, another significant result is the sensitivity rate, which is considered more important that other performance metrics in this paper, can be improved with the trade-off of lower specificity. Copyright © 2017 Elsevier B.V. All rights reserved.
Machine-Learning Based Channel Quality and Stability Estimation for Stream-Based Multichannel Wireless Sensor Networks.

PubMed

Rehan, Waqas; Fischer, Stefan; Rehan, Maaz

2016-09-12

Wireless sensor networks (WSNs) have become more and more diversified and are today able to also support high data rate applications, such as multimedia. In this case, per-packet channel handshaking/switching may result in inducing additional overheads, such as energy consumption, delays and, therefore, data loss. One of the solutions is to perform stream-based channel allocation where channel handshaking is performed once before transmitting the whole data stream. Deciding stream-based channel allocation is more critical in case of multichannel WSNs where channels of different quality/stability are available and the wish for high performance requires sensor nodes to switch to the best among the available channels. In this work, we will focus on devising mechanisms that perform channel quality/stability estimation in order to improve the accommodation of stream-based communication in multichannel wireless sensor networks. For performing channel quality assessment, we have formulated a composite metric, which we call channel rank measurement (CRM), that can demarcate channels into good, intermediate and bad quality on the basis of the standard deviation of the received signal strength indicator (RSSI) and the average of the link quality indicator (LQI) of the received packets. CRM is then used to generate a data set for training a supervised machine learning-based algorithm (which we call Normal Equation based Channel quality prediction (NEC) algorithm) in such a way that it may perform instantaneous channel rank estimation of any channel. Subsequently, two robust extensions of the NEC algorithm are proposed (which we call Normal Equation based Weighted Moving Average Channel quality prediction (NEWMAC) algorithm and Normal Equation based Aggregate Maturity Criteria with Beta Tracking based Channel weight prediction (NEAMCBTC) algorithm), that can perform channel quality estimation on the basis of both current and past values of channel rank estimation. In the end, simulations are made using MATLAB, and the results show that the Extended version of NEAMCBTC algorithm (Ext-NEAMCBTC) outperforms the compared techniques in terms of channel quality and stability assessment. It also minimizes channel switching overheads (in terms of switching delays and energy consumption) for accommodating stream-based communication in multichannel WSNs.
Machine-Learning Based Channel Quality and Stability Estimation for Stream-Based Multichannel Wireless Sensor Networks

PubMed Central

Rehan, Waqas; Fischer, Stefan; Rehan, Maaz

2016-01-01

Wireless sensor networks (WSNs) have become more and more diversified and are today able to also support high data rate applications, such as multimedia. In this case, per-packet channel handshaking/switching may result in inducing additional overheads, such as energy consumption, delays and, therefore, data loss. One of the solutions is to perform stream-based channel allocation where channel handshaking is performed once before transmitting the whole data stream. Deciding stream-based channel allocation is more critical in case of multichannel WSNs where channels of different quality/stability are available and the wish for high performance requires sensor nodes to switch to the best among the available channels. In this work, we will focus on devising mechanisms that perform channel quality/stability estimation in order to improve the accommodation of stream-based communication in multichannel wireless sensor networks. For performing channel quality assessment, we have formulated a composite metric, which we call channel rank measurement (CRM), that can demarcate channels into good, intermediate and bad quality on the basis of the standard deviation of the received signal strength indicator (RSSI) and the average of the link quality indicator (LQI) of the received packets. CRM is then used to generate a data set for training a supervised machine learning-based algorithm (which we call Normal Equation based Channel quality prediction (NEC) algorithm) in such a way that it may perform instantaneous channel rank estimation of any channel. Subsequently, two robust extensions of the NEC algorithm are proposed (which we call Normal Equation based Weighted Moving Average Channel quality prediction (NEWMAC) algorithm and Normal Equation based Aggregate Maturity Criteria with Beta Tracking based Channel weight prediction (NEAMCBTC) algorithm), that can perform channel quality estimation on the basis of both current and past values of channel rank estimation. In the end, simulations are made using MATLAB, and the results show that the Extended version of NEAMCBTC algorithm (Ext-NEAMCBTC) outperforms the compared techniques in terms of channel quality and stability assessment. It also minimizes channel switching overheads (in terms of switching delays and energy consumption) for accommodating stream-based communication in multichannel WSNs. PMID:27626429
Prediction model for peninsular Indian summer monsoon rainfall using data mining and statistical approaches

NASA Astrophysics Data System (ADS)

Vathsala, H.; Koolagudi, Shashidhar G.

2017-01-01

In this paper we discuss a data mining application for predicting peninsular Indian summer monsoon rainfall, and propose an algorithm that combine data mining and statistical techniques. We select likely predictors based on association rules that have the highest confidence levels. We then cluster the selected predictors to reduce their dimensions and use cluster membership values for classification. We derive the predictors from local conditions in southern India, including mean sea level pressure, wind speed, and maximum and minimum temperatures. The global condition variables include southern oscillation and Indian Ocean dipole conditions. The algorithm predicts rainfall in five categories: Flood, Excess, Normal, Deficit and Drought. We use closed itemset mining, cluster membership calculations and a multilayer perceptron function in the algorithm to predict monsoon rainfall in peninsular India. Using Indian Institute of Tropical Meteorology data, we found the prediction accuracy of our proposed approach to be exceptionally good.
A model of proto-object based saliency

PubMed Central

Russell, Alexander F.; Mihalaş, Stefan; von der Heydt, Rudiger; Niebur, Ernst; Etienne-Cummings, Ralph

2013-01-01

Organisms use the process of selective attention to optimally allocate their computational resources to the instantaneously most relevant subsets of a visual scene, ensuring that they can parse the scene in real time. Many models of bottom-up attentional selection assume that elementary image features, like intensity, color and orientation, attract attention. Gestalt psychologists, how-ever, argue that humans perceive whole objects before they analyze individual features. This is supported by recent psychophysical studies that show that objects predict eye-fixations better than features. In this report we present a neurally inspired algorithm of object based, bottom-up attention. The model rivals the performance of state of the art non-biologically plausible feature based algorithms (and outperforms biologically plausible feature based algorithms) in its ability to predict perceptual saliency (eye fixations and subjective interest points) in natural scenes. The model achieves this by computing saliency as a function of proto-objects that establish the perceptual organization of the scene. All computational mechanisms of the algorithm have direct neural correlates, and our results provide evidence for the interface theory of attention. PMID:24184601
Application of the Polynomial-Based Least Squares and Total Least Squares Models for the Attenuated Total Reflection Fourier Transform Infrared Spectra of Binary Mixtures of Hydroxyl Compounds.

PubMed

Shan, Peng; Peng, Silong; Zhao, Yuhui; Tang, Liang

2016-03-01

An analysis of binary mixtures of hydroxyl compound by Attenuated Total Reflection Fourier transform infrared spectroscopy (ATR FT-IR) and classical least squares (CLS) yield large model error due to the presence of unmodeled components such as H-bonded components. To accommodate these spectral variations, polynomial-based least squares (LSP) and polynomial-based total least squares (TLSP) are proposed to capture the nonlinear absorbance-concentration relationship. LSP is based on assuming that only absorbance noise exists; while TLSP takes both absorbance noise and concentration noise into consideration. In addition, based on different solving strategy, two optimization algorithms (limited-memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS) algorithm and Levenberg-Marquardt (LM) algorithm) are combined with TLSP and then two different TLSP versions (termed as TLSP-LBFGS and TLSP-LM) are formed. The optimum order of each nonlinear model is determined by cross-validation. Comparison and analyses of the four models are made from two aspects: absorbance prediction and concentration prediction. The results for water-ethanol solution and ethanol-ethyl lactate solution show that LSP, TLSP-LBFGS, and TLSP-LM can, for both absorbance prediction and concentration prediction, obtain smaller root mean square error of prediction than CLS. Additionally, they can also greatly enhance the accuracy of estimated pure component spectra. However, from the view of concentration prediction, the Wilcoxon signed rank test shows that there is no statistically significant difference between each nonlinear model and CLS. © The Author(s) 2016.
Connecting clinical and actuarial prediction with rule-based methods.

PubMed

Fokkema, Marjolein; Smits, Niels; Kelderman, Henk; Penninx, Brenda W J H

2015-06-01

Meta-analyses comparing the accuracy of clinical versus actuarial prediction have shown actuarial methods to outperform clinical methods, on average. However, actuarial methods are still not widely used in clinical practice, and there has been a call for the development of actuarial prediction methods for clinical practice. We argue that rule-based methods may be more useful than the linear main effect models usually employed in prediction studies, from a data and decision analytic as well as a practical perspective. In addition, decision rules derived with rule-based methods can be represented as fast and frugal trees, which, unlike main effects models, can be used in a sequential fashion, reducing the number of cues that have to be evaluated before making a prediction. We illustrate the usability of rule-based methods by applying RuleFit, an algorithm for deriving decision rules for classification and regression problems, to a dataset on prediction of the course of depressive and anxiety disorders from Penninx et al. (2011). The RuleFit algorithm provided a model consisting of 2 simple decision rules, requiring evaluation of only 2 to 4 cues. Predictive accuracy of the 2-rule model was very similar to that of a logistic regression model incorporating 20 predictor variables, originally applied to the dataset. In addition, the 2-rule model required, on average, evaluation of only 3 cues. Therefore, the RuleFit algorithm appears to be a promising method for creating decision tools that are less time consuming and easier to apply in psychological practice, and with accuracy comparable to traditional actuarial methods. (c) 2015 APA, all rights reserved).
Crystal-structure prediction via the Floppy-Box Monte Carlo algorithm: Method and application to hard (non)convex particles

NASA Astrophysics Data System (ADS)

de Graaf, Joost; Filion, Laura; Marechal, Matthieu; van Roij, René; Dijkstra, Marjolein

2012-12-01

In this paper, we describe the way to set up the floppy-box Monte Carlo (FBMC) method [L. Filion, M. Marechal, B. van Oorschot, D. Pelt, F. Smallenburg, and M. Dijkstra, Phys. Rev. Lett. 103, 188302 (2009), 10.1103/PhysRevLett.103.188302] to predict crystal-structure candidates for colloidal particles. The algorithm is explained in detail to ensure that it can be straightforwardly implemented on the basis of this text. The handling of hard-particle interactions in the FBMC algorithm is given special attention, as (soft) short-range and semi-long-range interactions can be treated in an analogous way. We also discuss two types of algorithms for checking for overlaps between polyhedra, the method of separating axes and a triangular-tessellation based technique. These can be combined with the FBMC method to enable crystal-structure prediction for systems composed of highly shape-anisotropic particles. Moreover, we present the results for the dense crystal structures predicted using the FBMC method for 159 (non)convex faceted particles, on which the findings in [J. de Graaf, R. van Roij, and M. Dijkstra, Phys. Rev. Lett. 107, 155501 (2011), 10.1103/PhysRevLett.107.155501] were based. Finally, we comment on the process of crystal-structure prediction itself and the choices that can be made in these simulations.
Network congestion control algorithm based on Actor-Critic reinforcement learning model

NASA Astrophysics Data System (ADS)

Xu, Tao; Gong, Lina; Zhang, Wei; Li, Xuhong; Wang, Xia; Pan, Wenwen

2018-04-01

Aiming at the network congestion control problem, a congestion control algorithm based on Actor-Critic reinforcement learning model is designed. Through the genetic algorithm in the congestion control strategy, the network congestion problems can be better found and prevented. According to Actor-Critic reinforcement learning, the simulation experiment of network congestion control algorithm is designed. The simulation experiments verify that the AQM controller can predict the dynamic characteristics of the network system. Moreover, the learning strategy is adopted to optimize the network performance, and the dropping probability of packets is adaptively adjusted so as to improve the network performance and avoid congestion. Based on the above finding, it is concluded that the network congestion control algorithm based on Actor-Critic reinforcement learning model can effectively avoid the occurrence of TCP network congestion.
Investigation of a New Handover Approach in LTE and WiMAX

PubMed Central

Hindia, Mohammad Nour; Reza, Ahmed Wasif; Noordin, Kamarul Ariffin

2014-01-01

Nowadays, one of the most important challenges in heterogeneous networks is the connection consistency between the mobile station and the base stations. Furthermore, along the roaming process between the mobile station and the base station, the system performance degrades significantly due to the interferences from neighboring base stations, handovers to inaccurate base station and inappropriate technology selection. In this paper, several algorithms are proposed to improve mobile station performance and seamless mobility across the long-term evolution (LTE) and Worldwide Interoperability for Microwave Access (WiMAX) technologies, along with a minimum number of redundant handovers. Firstly, the enhanced global positioning system (GPS) and the novel received signal strength (RSS) prediction approaches are suggested to predict the target base station accurately. Then, the multiple criteria with two thresholds algorithm is proposed to prioritize the selection between LTE and WiMAX as the target technology. In addition, this study also covers the intercell and cochannel interference reduction by adjusting the frequency reuse ratio 3 (FRR3) to work with LTE and WiMAX. The obtained results demonstrate high next base station prediction efficiency and high accuracy for both horizontal and vertical handovers. Moreover, the received signal strength is kept at levels higher than the threshold, while maintaining low connection cost and delay within acceptable levels. In order to highlight the combination of the proposed algorithms' performance, it is compared with the existing RSS and multiple criteria handover decision algorithms. PMID:25379524
Investigation of a new handover approach in LTE and WiMAX.

PubMed

Hindia, Mohammad Nour; Reza, Ahmed Wasif; Noordin, Kamarul Ariffin

2014-01-01

Nowadays, one of the most important challenges in heterogeneous networks is the connection consistency between the mobile station and the base stations. Furthermore, along the roaming process between the mobile station and the base station, the system performance degrades significantly due to the interferences from neighboring base stations, handovers to inaccurate base station and inappropriate technology selection. In this paper, several algorithms are proposed to improve mobile station performance and seamless mobility across the long-term evolution (LTE) and Worldwide Interoperability for Microwave Access (WiMAX) technologies, along with a minimum number of redundant handovers. Firstly, the enhanced global positioning system (GPS) and the novel received signal strength (RSS) prediction approaches are suggested to predict the target base station accurately. Then, the multiple criteria with two thresholds algorithm is proposed to prioritize the selection between LTE and WiMAX as the target technology. In addition, this study also covers the intercell and cochannel interference reduction by adjusting the frequency reuse ratio 3 (FRR3) to work with LTE and WiMAX. The obtained results demonstrate high next base station prediction efficiency and high accuracy for both horizontal and vertical handovers. Moreover, the received signal strength is kept at levels higher than the threshold, while maintaining low connection cost and delay within acceptable levels. In order to highlight the combination of the proposed algorithms' performance, it is compared with the existing RSS and multiple criteria handover decision algorithms.
A New Ensemble Canonical Correlation Prediction Scheme for Seasonal Precipitation

NASA Technical Reports Server (NTRS)

Kim, Kyu-Myong; Lau, William K. M.; Li, Guilong; Shen, Samuel S. P.; Lau, William K. M. (Technical Monitor)

2001-01-01

Department of Mathematical Sciences, University of Alberta, Edmonton, Canada This paper describes the fundamental theory of the ensemble canonical correlation (ECC) algorithm for the seasonal climate forecasting. The algorithm is a statistical regression sch eme based on maximal correlation between the predictor and predictand. The prediction error is estimated by a spectral method using the basis of empirical orthogonal functions. The ECC algorithm treats the predictors and predictands as continuous fields and is an improvement from the traditional canonical correlation prediction. The improvements include the use of area-factor, estimation of prediction error, and the optimal ensemble of multiple forecasts. The ECC is applied to the seasonal forecasting over various parts of the world. The example presented here is for the North America precipitation. The predictor is the sea surface temperature (SST) from different ocean basins. The Climate Prediction Center's reconstructed SST (1951-1999) is used as the predictor's historical data. The optimally interpolated global monthly precipitation is used as the predictand?s historical data. Our forecast experiments show that the ECC algorithm renders very high skill and the optimal ensemble is very important to the high value.
Hypoglycemia early alarm systems based on recursive autoregressive partial least squares models.

PubMed

Bayrak, Elif Seyma; Turksoy, Kamuran; Cinar, Ali; Quinn, Lauretta; Littlejohn, Elizabeth; Rollins, Derrick

2013-01-01

Hypoglycemia caused by intensive insulin therapy is a major challenge for artificial pancreas systems. Early detection and prevention of potential hypoglycemia are essential for the acceptance of fully automated artificial pancreas systems. Many of the proposed alarm systems are based on interpretation of recent values or trends in glucose values. In the present study, subject-specific linear models are introduced to capture glucose variations and predict future blood glucose concentrations. These models can be used in early alarm systems of potential hypoglycemia. A recursive autoregressive partial least squares (RARPLS) algorithm is used to model the continuous glucose monitoring sensor data and predict future glucose concentrations for use in hypoglycemia alarm systems. The partial least squares models constructed are updated recursively at each sampling step with a moving window. An early hypoglycemia alarm algorithm using these models is proposed and evaluated. Glucose prediction models based on real-time filtered data has a root mean squared error of 7.79 and a sum of squares of glucose prediction error of 7.35% for six-step-ahead (30 min) glucose predictions. The early alarm systems based on RARPLS shows good performance. A sensitivity of 86% and a false alarm rate of 0.42 false positive/day are obtained for the early alarm system based on six-step-ahead predicted glucose values with an average early detection time of 25.25 min. The RARPLS models developed provide satisfactory glucose prediction with relatively smaller error than other proposed algorithms and are good candidates to forecast and warn about potential hypoglycemia unless preventive action is taken far in advance. © 2012 Diabetes Technology Society.
Hypoglycemia Early Alarm Systems Based on Recursive Autoregressive Partial Least Squares Models

PubMed Central

Bayrak, Elif Seyma; Turksoy, Kamuran; Cinar, Ali; Quinn, Lauretta; Littlejohn, Elizabeth; Rollins, Derrick

2013-01-01

Background Hypoglycemia caused by intensive insulin therapy is a major challenge for artificial pancreas systems. Early detection and prevention of potential hypoglycemia are essential for the acceptance of fully automated artificial pancreas systems. Many of the proposed alarm systems are based on interpretation of recent values or trends in glucose values. In the present study, subject-specific linear models are introduced to capture glucose variations and predict future blood glucose concentrations. These models can be used in early alarm systems of potential hypoglycemia. Methods A recursive autoregressive partial least squares (RARPLS) algorithm is used to model the continuous glucose monitoring sensor data and predict future glucose concentrations for use in hypoglycemia alarm systems. The partial least squares models constructed are updated recursively at each sampling step with a moving window. An early hypoglycemia alarm algorithm using these models is proposed and evaluated. Results Glucose prediction models based on real-time filtered data has a root mean squared error of 7.79 and a sum of squares of glucose prediction error of 7.35% for six-step-ahead (30 min) glucose predictions. The early alarm systems based on RARPLS shows good performance. A sensitivity of 86% and a false alarm rate of 0.42 false positive/day are obtained for the early alarm system based on six-step-ahead predicted glucose values with an average early detection time of 25.25 min. Conclusions The RARPLS models developed provide satisfactory glucose prediction with relatively smaller error than other proposed algorithms and are good candidates to forecast and warn about potential hypoglycemia unless preventive action is taken far in advance. PMID:23439179

RNA design using simulated SHAPE data.

PubMed

Lotfi, Mohadeseh; Zare-Mirakabad, Fatemeh; Montaseri, Soheila

2018-05-03

It has long been established that in addition to being involved in protein translation, RNA plays essential roles in numerous other cellular processes, including gene regulation and DNA replication. Such roles are known to be dictated by higher-order structures of RNA molecules. It is therefore of prime importance to find an RNA sequence that can fold to acquire a particular function that is desirable for use in pharmaceuticals and basic research. The challenge of finding an RNA sequence for a given structure is known as the RNA design problem. Although there are several algorithms to solve this problem, they mainly consider hard constraints, such as minimum free energy, to evaluate the predicted sequences. Recently, SHAPE data has emerged as a new soft constraint for RNA secondary structure prediction. To take advantage of this new experimental constraint, we report here a new method for accurate design of RNA sequences based on their secondary structures using SHAPE data as pseudo-free energy. We then compare our algorithm with four others: INFO-RNA, ERD, MODENA and RNAifold 2.0. Our algorithm precisely predicts 26 out of 29 new sequences for the structures extracted from the Rfam dataset, while the other four algorithms predict no more than 22 out of 29. The proposed algorithm is comparable to the above algorithms on RNA-SSD datasets, where they can predict up to 33 appropriate sequences for RNA secondary structures out of 34.
Firefly Algorithm for Structural Search.

PubMed

Avendaño-Franco, Guillermo; Romero, Aldo H

2016-07-12

The problem of computational structure prediction of materials is approached using the firefly (FF) algorithm. Starting from the chemical composition and optionally using prior knowledge of similar structures, the FF method is able to predict not only known stable structures but also a variety of novel competitive metastable structures. This article focuses on the strengths and limitations of the algorithm as a multimodal global searcher. The algorithm has been implemented in software package PyChemia ( https://github.com/MaterialsDiscovery/PyChemia ), an open source python library for materials analysis. We present applications of the method to van der Waals clusters and crystal structures. The FF method is shown to be competitive when compared to other population-based global searchers.
Correlation approach to identify coding regions in DNA sequences

NASA Technical Reports Server (NTRS)

Ossadnik, S. M.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.

1994-01-01

Recently, it was observed that noncoding regions of DNA sequences possess long-range power-law correlations, whereas coding regions typically display only short-range correlations. We develop an algorithm based on this finding that enables investigators to perform a statistical analysis on long DNA sequences to locate possible coding regions. The algorithm is particularly successful in predicting the location of lengthy coding regions. For example, for the complete genome of yeast chromosome III (315,344 nucleotides), at least 82% of the predictions correspond to putative coding regions; the algorithm correctly identified all coding regions larger than 3000 nucleotides, 92% of coding regions between 2000 and 3000 nucleotides long, and 79% of coding regions between 1000 and 2000 nucleotides. The predictive ability of this new algorithm supports the claim that there is a fundamental difference in the correlation property between coding and noncoding sequences. This algorithm, which is not species-dependent, can be implemented with other techniques for rapidly and accurately locating relatively long coding regions in genomic sequences.
Probability of coding of a DNA sequence: an algorithm to predict translated reading frames from their thermodynamic characteristics.

PubMed Central

Tramontano, A; Macchiato, M F

1986-01-01

An algorithm to determine the probability that a reading frame codifies for a protein is presented. It is based on the results of our previous studies on the thermodynamic characteristics of a translated reading frame. We also develop a prediction procedure to distinguish between coding and non-coding reading frames. The procedure is based on the characteristics of the putative product of the DNA sequence and not on periodicity characteristics of the sequence, so the prediction is not biased by the presence of overlapping translated reading frames or by the presence of translated reading frames on the complementary DNA strand. PMID:3753761
Fast prediction of RNA-RNA interaction using heuristic algorithm.

PubMed

Montaseri, Soheila

2015-01-01

Interaction between two RNA molecules plays a crucial role in many medical and biological processes such as gene expression regulation. In this process, an RNA molecule prohibits the translation of another RNA molecule by establishing stable interactions with it. Some algorithms have been formed to predict the structure of the RNA-RNA interaction. High computational time is a common challenge in most of the presented algorithms. In this context, a heuristic method is introduced to accurately predict the interaction between two RNAs based on minimum free energy (MFE). This algorithm uses a few dot matrices for finding the secondary structure of each RNA and binding sites between two RNAs. Furthermore, a parallel version of this method is presented. We describe the algorithm's concurrency and parallelism for a multicore chip. The proposed algorithm has been performed on some datasets including CopA-CopT, R1inv-R2inv, Tar-Tar*, DIS-DIS, and IncRNA54-RepZ in Escherichia coli bacteria. The method has high validity and efficiency, and it is run in low computational time in comparison to other approaches.
Behavioral-Based Predictors of Workplace Violence in the Army STARRS

DTIC Science & Technology

2014-10-01

Dawes RM, Faust D, Meehl PE. Clinical versus actuarial judgment. Science . 1989;243(4899): 1668-1674. 46. Grove WM, Zald DH, Lebow BS, Snitz BE, Nelson...develop an actuarial risk algorithm predicting suicide in the 12 months after US Army soldier inpatient treatment of a psychiatric disorder to target...generate an actuarial post- hospitalization suicide risk algorithm. Previous research has revealed that actuarial suicide prediction is much more
Graph regularized nonnegative matrix factorization for temporal link prediction in dynamic networks

NASA Astrophysics Data System (ADS)

Ma, Xiaoke; Sun, Penggang; Wang, Yu

2018-04-01

Many networks derived from society and nature are temporal and incomplete. The temporal link prediction problem in networks is to predict links at time T + 1 based on a given temporal network from time 1 to T, which is essential to important applications. The current algorithms either predict the temporal links by collapsing the dynamic networks or collapsing features derived from each network, which are criticized for ignoring the connection among slices. to overcome the issue, we propose a novel graph regularized nonnegative matrix factorization algorithm (GrNMF) for the temporal link prediction problem without collapsing the dynamic networks. To obtain the feature for each network from 1 to t, GrNMF factorizes the matrix associated with networks by setting the rest networks as regularization, which provides a better way to characterize the topological information of temporal links. Then, the GrNMF algorithm collapses the feature matrices to predict temporal links. Compared with state-of-the-art methods, the proposed algorithm exhibits significantly improved accuracy by avoiding the collapse of temporal networks. Experimental results of a number of artificial and real temporal networks illustrate that the proposed method is not only more accurate but also more robust than state-of-the-art approaches.
Developing a NIR multispectral imaging for prediction and visualization of peanut protein content using variable selection algorithms

NASA Astrophysics Data System (ADS)

Cheng, Jun-Hu; Jin, Huali; Liu, Zhiwei

2018-01-01

The feasibility of developing a multispectral imaging method using important wavelengths from hyperspectral images selected by genetic algorithm (GA), successive projection algorithm (SPA) and regression coefficient (RC) methods for modeling and predicting protein content in peanut kernel was investigated for the first time. Partial least squares regression (PLSR) calibration model was established between the spectral data from the selected optimal wavelengths and the reference measured protein content ranged from 23.46% to 28.43%. The RC-PLSR model established using eight key wavelengths (1153, 1567, 1972, 2143, 2288, 2339, 2389 and 2446 nm) showed the best predictive results with the coefficient of determination of prediction (R2P) of 0.901, and root mean square error of prediction (RMSEP) of 0.108 and residual predictive deviation (RPD) of 2.32. Based on the obtained best model and image processing algorithms, the distribution maps of protein content were generated. The overall results of this study indicated that developing a rapid and online multispectral imaging system using the feature wavelengths and PLSR analysis is potential and feasible for determination of the protein content in peanut kernels.
A Stochastic Framework for Evaluating Seizure Prediction Algorithms Using Hidden Markov Models

PubMed Central

Wong, Stephen; Gardner, Andrew B.; Krieger, Abba M.; Litt, Brian

2007-01-01

Responsive, implantable stimulation devices to treat epilepsy are now in clinical trials. New evidence suggests that these devices may be more effective when they deliver therapy before seizure onset. Despite years of effort, prospective seizure prediction, which could improve device performance, remains elusive. In large part, this is explained by lack of agreement on a statistical framework for modeling seizure generation and a method for validating algorithm performance. We present a novel stochastic framework based on a three-state hidden Markov model (HMM) (representing interictal, preictal, and seizure states) with the feature that periods of increased seizure probability can transition back to the interictal state. This notion reflects clinical experience and may enhance interpretation of published seizure prediction studies. Our model accommodates clipped EEG segments and formalizes intuitive notions regarding statistical validation. We derive equations for type I and type II errors as a function of the number of seizures, duration of interictal data, and prediction horizon length and we demonstrate the model’s utility with a novel seizure detection algorithm that appeared to predicted seizure onset. We propose this framework as a vital tool for designing and validating prediction algorithms and for facilitating collaborative research in this area. PMID:17021032
Self-Adaptive Prediction of Cloud Resource Demands Using Ensemble Model and Subtractive-Fuzzy Clustering Based Fuzzy Neural Network

PubMed Central

Chen, Zhijia; Zhu, Yuanchang; Di, Yanqiang; Feng, Shaochong

2015-01-01

In IaaS (infrastructure as a service) cloud environment, users are provisioned with virtual machines (VMs). To allocate resources for users dynamically and effectively, accurate resource demands predicting is essential. For this purpose, this paper proposes a self-adaptive prediction method using ensemble model and subtractive-fuzzy clustering based fuzzy neural network (ESFCFNN). We analyze the characters of user preferences and demands. Then the architecture of the prediction model is constructed. We adopt some base predictors to compose the ensemble model. Then the structure and learning algorithm of fuzzy neural network is researched. To obtain the number of fuzzy rules and the initial value of the premise and consequent parameters, this paper proposes the fuzzy c-means combined with subtractive clustering algorithm, that is, the subtractive-fuzzy clustering. Finally, we adopt different criteria to evaluate the proposed method. The experiment results show that the method is accurate and effective in predicting the resource demands. PMID:25691896
Use of registration-based contour propagation in texture analysis for esophageal cancer pathologic response prediction

NASA Astrophysics Data System (ADS)

Yip, Stephen S. F.; Coroller, Thibaud P.; Sanford, Nina N.; Huynh, Elizabeth; Mamon, Harvey; Aerts, Hugo J. W. L.; Berbeco, Ross I.

2016-01-01

Change in PET-based textural features has shown promise in predicting cancer response to treatment. However, contouring tumour volumes on longitudinal scans is time-consuming. This study investigated the usefulness of contour propagation in texture analysis for the purpose of pathologic response prediction in esophageal cancer. Forty-five esophageal cancer patients underwent PET/CT scans before and after chemo-radiotherapy. Patients were classified into responders and non-responders after the surgery. Physician-defined tumour ROIs on pre-treatment PET were propagated onto the post-treatment PET using rigid and ten deformable registration algorithms. PET images were converted into 256 discrete values. Co-occurrence, run-length, and size zone matrix textures were computed within all ROIs. The relative difference of each texture at different treatment time-points was used to predict the pathologic responders. Their predictive value was assessed using the area under the receiver-operating-characteristic curve (AUC). Propagated ROIs from different algorithms were compared using Dice similarity index (DSI). Contours propagated by the fast-demons, fast-free-form and rigid algorithms did not fully capture the high FDG uptake regions of tumours. Fast-demons propagated ROIs had the least agreement with other contours (DSI = 58%). Moderate to substantial overlap were found in the ROIs propagated by all other algorithms (DSI = 69%-79%). Rigidly propagated ROIs with co-occurrence texture failed to significantly differentiate between responders and non-responders (AUC = 0.58, q-value = 0.33), while the differentiation was significant with other textures (AUC = 0.71‒0.73, p < 0.009). Among the deformable algorithms, fast-demons (AUC = 0.68‒0.70, q-value < 0.03) and fast-free-form (AUC = 0.69‒0.74, q-value < 0.04) were the least predictive. ROIs propagated by all other deformable algorithms with any texture significantly predicted pathologic responders (AUC = 0.72‒0.78, q-value < 0.01). Propagated ROIs using deformable registration for all textures can lead to accurate prediction of pathologic response, potentially expediting the temporal texture analysis process. However, fast-demons, fast-free-form, and rigid algorithms should be applied with care due to their inferior performance compared to other algorithms.
Predictive Cache Modeling and Analysis

DTIC Science & Technology

2011-11-01

metaheuristic /bin-packing algorithm to optimize task placement based on task communication characterization. Our previous work on task allocation showed...Cache Miss Minimization Technology To efficiently explore combinations and discover nearly-optimal task-assignment algorithms , we extended to our...it was possible to use our algorithmic techniques to decrease network bandwidth consumption by ~25%. In this effort, we adapted these existing
Prediction of pKa Values for Neutral and Basic Drugs based on Hybrid Artificial Intelligence Methods.

PubMed

Li, Mengshan; Zhang, Huaijing; Chen, Bingsheng; Wu, Yan; Guan, Lixin

2018-03-05

The pKa value of drugs is an important parameter in drug design and pharmacology. In this paper, an improved particle swarm optimization (PSO) algorithm was proposed based on the population entropy diversity. In the improved algorithm, when the population entropy was higher than the set maximum threshold, the convergence strategy was adopted; when the population entropy was lower than the set minimum threshold the divergence strategy was adopted; when the population entropy was between the maximum and minimum threshold, the self-adaptive adjustment strategy was maintained. The improved PSO algorithm was applied in the training of radial basis function artificial neural network (RBF ANN) model and the selection of molecular descriptors. A quantitative structure-activity relationship model based on RBF ANN trained by the improved PSO algorithm was proposed to predict the pKa values of 74 kinds of neutral and basic drugs and then validated by another database containing 20 molecules. The validation results showed that the model had a good prediction performance. The absolute average relative error, root mean square error, and squared correlation coefficient were 0.3105, 0.0411, and 0.9685, respectively. The model can be used as a reference for exploring other quantitative structure-activity relationships.
Sensory prediction on a whiskered robot: a tactile analogy to “optical flow”

PubMed Central

Schroeder, Christopher L.; Hartmann, Mitra J. Z.

2012-01-01

When an animal moves an array of sensors (e.g., the hand, the eye) through the environment, spatial and temporal gradients of sensory data are related by the velocity of the moving sensory array. In vision, the relationship between spatial and temporal brightness gradients is quantified in the “optical flow” equation. In the present work, we suggest an analog to optical flow for the rodent vibrissal (whisker) array, in which the perceptual intensity that “flows” over the array is bending moment. Changes in bending moment are directly related to radial object distance, defined as the distance between the base of a whisker and the point of contact with the object. Using both simulations and a 1×5 array (row) of artificial whiskers, we demonstrate that local object curvature can be estimated based on differences in radial distance across the array. We then develop two algorithms, both based on tactile flow, to predict the future contact points that will be obtained as the whisker array translates along the object. The translation of the robotic whisker array represents the rat's head velocity. The first algorithm uses a calculation of the local object slope, while the second uses a calculation of the local object curvature. Both algorithms successfully predict future contact points for simple surfaces. The algorithm based on curvature was found to more accurately predict future contact points as surfaces became more irregular. We quantify the inter-related effects of whisker spacing and the object's spatial frequencies, and examine the issues that arise in the presence of real-world noise, friction, and slip. PMID:23097641
Sensory prediction on a whiskered robot: a tactile analogy to "optical flow".

PubMed

Schroeder, Christopher L; Hartmann, Mitra J Z

2012-01-01

When an animal moves an array of sensors (e.g., the hand, the eye) through the environment, spatial and temporal gradients of sensory data are related by the velocity of the moving sensory array. In vision, the relationship between spatial and temporal brightness gradients is quantified in the "optical flow" equation. In the present work, we suggest an analog to optical flow for the rodent vibrissal (whisker) array, in which the perceptual intensity that "flows" over the array is bending moment. Changes in bending moment are directly related to radial object distance, defined as the distance between the base of a whisker and the point of contact with the object. Using both simulations and a 1×5 array (row) of artificial whiskers, we demonstrate that local object curvature can be estimated based on differences in radial distance across the array. We then develop two algorithms, both based on tactile flow, to predict the future contact points that will be obtained as the whisker array translates along the object. The translation of the robotic whisker array represents the rat's head velocity. The first algorithm uses a calculation of the local object slope, while the second uses a calculation of the local object curvature. Both algorithms successfully predict future contact points for simple surfaces. The algorithm based on curvature was found to more accurately predict future contact points as surfaces became more irregular. We quantify the inter-related effects of whisker spacing and the object's spatial frequencies, and examine the issues that arise in the presence of real-world noise, friction, and slip.
Detecting non-orthology in the COGs database and other approaches grouping orthologs using genome-specific best hits.

PubMed

Dessimoz, Christophe; Boeckmann, Brigitte; Roth, Alexander C J; Gonnet, Gaston H

2006-01-01

Correct orthology assignment is a critical prerequisite of numerous comparative genomics procedures, such as function prediction, construction of phylogenetic species trees and genome rearrangement analysis. We present an algorithm for the detection of non-orthologs that arise by mistake in current orthology classification methods based on genome-specific best hits, such as the COGs database. The algorithm works with pairwise distance estimates, rather than computationally expensive and error-prone tree-building methods. The accuracy of the algorithm is evaluated through verification of the distribution of predicted cases, case-by-case phylogenetic analysis and comparisons with predictions from other projects using independent methods. Our results show that a very significant fraction of the COG groups include non-orthologs: using conservative parameters, the algorithm detects non-orthology in a third of all COG groups. Consequently, sequence analysis sensitive to correct orthology assignments will greatly benefit from these findings.
Numerical Algorithms for Acoustic Integrals - The Devil is in the Details

NASA Technical Reports Server (NTRS)

Brentner, Kenneth S.

1996-01-01

The accurate prediction of the aeroacoustic field generated by aerospace vehicles or nonaerospace machinery is necessary for designers to control and reduce source noise. Powerful computational aeroacoustic methods, based on various acoustic analogies (primarily the Lighthill acoustic analogy) and Kirchhoff methods, have been developed for prediction of noise from complicated sources, such as rotating blades. Both methods ultimately predict the noise through a numerical evaluation of an integral formulation. In this paper, we consider three generic acoustic formulations and several numerical algorithms that have been used to compute the solutions to these formulations. Algorithms for retarded-time formulations are the most efficient and robust, but they are difficult to implement for supersonic-source motion. Collapsing-sphere and emission-surface formulations are good alternatives when supersonic-source motion is present, but the numerical implementations of these formulations are more computationally demanding. New algorithms - which utilize solution adaptation to provide a specified error level - are needed.
A parallel strategy for predicting the secondary structure of polycistronic microRNAs.

PubMed

Han, Dianwei; Tang, Guiliang; Zhang, Jun

2013-01-01

The biogenesis of a functional microRNA is largely dependent on the secondary structure of the microRNA precursor (pre-miRNA). Recently, it has been shown that microRNAs are present in the genome as the form of polycistronic transcriptional units in plants and animals. It will be important to design efficient computational methods to predict such structures for microRNA discovery and its applications in gene silencing. In this paper, we propose a parallel algorithm based on the master-slave architecture to predict the secondary structure from an input sequence. We conducted some experiments to verify the effectiveness of our parallel algorithm. The experimental results show that our algorithm is able to produce the optimal secondary structure of polycistronic microRNAs.
A Novel Admixture-Based Pharmacogenetic Approach to Refine Warfarin Dosing in Caribbean Hispanics.

PubMed

Duconge, Jorge; Ramos, Alga S; Claudio-Campos, Karla; Rivera-Miranda, Giselle; Bermúdez-Bosch, Luis; Renta, Jessicca Y; Cadilla, Carmen L; Cruz, Iadelisse; Feliu, Juan F; Vergara, Cunegundo; Ruaño, Gualberto

2016-01-01

This study is aimed at developing a novel admixture-adjusted pharmacogenomic approach to individually refine warfarin dosing in Caribbean Hispanic patients. A multiple linear regression analysis of effective warfarin doses versus relevant genotypes, admixture, clinical and demographic factors was performed in 255 patients and further validated externally in another cohort of 55 individuals. The admixture-adjusted, genotype-guided warfarin dosing refinement algorithm developed in Caribbean Hispanics showed better predictability (R2 = 0.70, MAE = 0.72mg/day) than a clinical algorithm that excluded genotypes and admixture (R2 = 0.60, MAE = 0.99mg/day), and outperformed two prior pharmacogenetic algorithms in predicting effective dose in this population. For patients at the highest risk of adverse events, 45.5% of the dose predictions using the developed pharmacogenetic model resulted in ideal dose as compared with only 29% when using the clinical non-genetic algorithm (p<0.001). The admixture-driven pharmacogenetic algorithm predicted 58% of warfarin dose variance when externally validated in 55 individuals from an independent validation cohort (MAE = 0.89 mg/day, 24% mean bias). Results supported our rationale to incorporate individual's genotypes and unique admixture metrics into pharmacogenetic refinement models in order to increase predictability when expanding them to admixed populations like Caribbean Hispanics. ClinicalTrials.gov NCT01318057.
Case-Mix for Performance Management: A Risk Algorithm Based on ICD-10-CM.

PubMed

Gao, Jian; Moran, Eileen; Almenoff, Peter L

2018-06-01

Accurate risk adjustment is the key to a reliable comparison of cost and quality performance among providers and hospitals. However, the existing case-mix algorithms based on age, sex, and diagnoses can only explain up to 50% of the cost variation. More accurate risk adjustment is desired for provider performance assessment and improvement. To develop a case-mix algorithm that hospitals and payers can use to measure and compare cost and quality performance of their providers. All 6,048,895 patients with valid diagnoses and cost recorded in the US Veterans health care system in fiscal year 2016 were included in this study. The dependent variable was total cost at the patient level, and the explanatory variables were age, sex, and comorbidities represented by 762 clinically homogeneous groups, which were created by expanding the 283 categories from Clinical Classifications Software based on ICD-10-CM codes. The split-sample method was used to assess model overfitting and coefficient stability. The predictive power of the algorithms was ascertained by comparing the R, mean absolute percentage error, root mean square error, predictive ratios, and c-statistics. The expansion of the Clinical Classifications Software categories resulted in higher predictive power. The R reached 0.72 and 0.52 for the transformed and raw scale cost, respectively. The case-mix algorithm we developed based on age, sex, and diagnoses outperformed the existing case-mix models reported in the literature. The method developed in this study can be used by other health systems to produce tailored risk models for their specific purpose.

A Feature Selection Algorithm to Compute Gene Centric Methylation from Probe Level Methylation Data.

PubMed

Baur, Brittany; Bozdag, Serdar

2016-01-01

DNA methylation is an important epigenetic event that effects gene expression during development and various diseases such as cancer. Understanding the mechanism of action of DNA methylation is important for downstream analysis. In the Illumina Infinium HumanMethylation 450K array, there are tens of probes associated with each gene. Given methylation intensities of all these probes, it is necessary to compute which of these probes are most representative of the gene centric methylation level. In this study, we developed a feature selection algorithm based on sequential forward selection that utilized different classification methods to compute gene centric DNA methylation using probe level DNA methylation data. We compared our algorithm to other feature selection algorithms such as support vector machines with recursive feature elimination, genetic algorithms and ReliefF. We evaluated all methods based on the predictive power of selected probes on their mRNA expression levels and found that a K-Nearest Neighbors classification using the sequential forward selection algorithm performed better than other algorithms based on all metrics. We also observed that transcriptional activities of certain genes were more sensitive to DNA methylation changes than transcriptional activities of other genes. Our algorithm was able to predict the expression of those genes with high accuracy using only DNA methylation data. Our results also showed that those DNA methylation-sensitive genes were enriched in Gene Ontology terms related to the regulation of various biological processes.
The energy-dependent electron loss model: backscattering and application to heterogeneous slab media.

PubMed

Lee, Tae Kyu; Sandison, George A

2003-01-21

Electron backscattering has been incorporated into the energy-dependent electron loss (EL) model and the resulting algorithm is applied to predict dose deposition in slab heterogeneous media. This algorithm utilizes a reflection coefficient from the interface that is computed on the basis of Goudsmit-Saunderson theory and an average energy for the backscattered electrons based on Everhart's theory. Predictions of dose deposition in slab heterogeneous media are compared to the Monte Carlo based dose planning method (DPM) and a numerical discrete ordinates method (DOM). The slab media studied comprised water/Pb, water/Al, water/bone, water/bone/water, and water/lung/water, and incident electron beam energies of 10 MeV and 18 MeV. The predicted dose enhancement due to backscattering is accurate to within 3% of dose maximum even for lead as the backscattering medium. Dose discrepancies at large depths beyond the interface were as high as 5% of dose maximum and we speculate that this error may be attributed to the EL model assuming a Gaussian energy distribution for the electrons at depth. The computational cost is low compared to Monte Carlo simulations making the EL model attractive as a fast dose engine for dose optimization algorithms. The predictive power of the algorithm demonstrates that the small angle scattering restriction on the EL model can be overcome while retaining dose calculation accuracy and requiring only one free variable, chi, in the algorithm to be determined in advance of calculation.
The energy-dependent electron loss model: backscattering and application to heterogeneous slab media

NASA Astrophysics Data System (ADS)

Lee, Tae Kyu; Sandison, George A.

2003-01-01

Electron backscattering has been incorporated into the energy-dependent electron loss (EL) model and the resulting algorithm is applied to predict dose deposition in slab heterogeneous media. This algorithm utilizes a reflection coefficient from the interface that is computed on the basis of Goudsmit-Saunderson theory and an average energy for the backscattered electrons based on Everhart's theory. Predictions of dose deposition in slab heterogeneous media are compared to the Monte Carlo based dose planning method (DPM) and a numerical discrete ordinates method (DOM). The slab media studied comprised water/Pb, water/Al, water/bone, water/bone/water, and water/lung/water, and incident electron beam energies of 10 MeV and 18 MeV. The predicted dose enhancement due to backscattering is accurate to within 3% of dose maximum even for lead as the backscattering medium. Dose discrepancies at large depths beyond the interface were as high as 5% of dose maximum and we speculate that this error may be attributed to the EL model assuming a Gaussian energy distribution for the electrons at depth. The computational cost is low compared to Monte Carlo simulations making the EL model attractive as a fast dose engine for dose optimization algorithms. The predictive power of the algorithm demonstrates that the small angle scattering restriction on the EL model can be overcome while retaining dose calculation accuracy and requiring only one free variable, χ, in the algorithm to be determined in advance of calculation.
Comparison of linear–stochastic and nonlinear–deterministic algorithms in the analysis of 15-minute clinical ECGs to predict risk of arrhythmic death

PubMed Central

Skinner, James E; Meyer, Michael; Nester, Brian A; Geary, Una; Taggart, Pamela; Mangione, Antoinette; Ramalanjaona, George; Terregino, Carol; Dalsey, William C

2009-01-01

Objective: Comparative algorithmic evaluation of heartbeat series in low-to-high risk cardiac patients for the prospective prediction of risk of arrhythmic death (AD). Background: Heartbeat variation reflects cardiac autonomic function and risk of AD. Indices based on linear stochastic models are independent risk factors for AD in post-myocardial infarction (post-MI) cohorts. Indices based on nonlinear deterministic models have superior predictability in retrospective data. Methods: Patients were enrolled (N = 397) in three emergency departments upon presenting with chest pain and were determined to be at low-to-high risk of acute MI (>7%). Brief ECGs were recorded (15 min) and R-R intervals assessed by three nonlinear algorithms (PD2i, DFA, and ApEn) and four conventional linear-stochastic measures (SDNN, MNN, 1/f-Slope, LF/HF). Out-of-hospital AD was determined by modified Hinkle–Thaler criteria. Results: All-cause mortality at one-year follow-up was 10.3%, with 7.7% adjudicated to be AD. The sensitivity and relative risk for predicting AD was highest at all time-points for the nonlinear PD2i algorithm (p ≤0.001). The sensitivity at 30 days was 100%, specificity 58%, and relative risk >100 (p ≤0.001); sensitivity at 360 days was 95%, specificity 58%, and relative risk >11.4 (p ≤0.001). Conclusions: Heartbeat analysis by the time-dependent nonlinear PD2i algorithm is comparatively the superior test. PMID:19707283
GASP: Gapped Ancestral Sequence Prediction for proteins

PubMed Central

Edwards, Richard J; Shields, Denis C

2004-01-01

Background The prediction of ancestral protein sequences from multiple sequence alignments is useful for many bioinformatics analyses. Predicting ancestral sequences is not a simple procedure and relies on accurate alignments and phylogenies. Several algorithms exist based on Maximum Parsimony or Maximum Likelihood methods but many current implementations are unable to process residues with gaps, which may represent insertion/deletion (indel) events or sequence fragments. Results Here we present a new algorithm, GASP (Gapped Ancestral Sequence Prediction), for predicting ancestral sequences from phylogenetic trees and the corresponding multiple sequence alignments. Alignments may be of any size and contain gaps. GASP first assigns the positions of gaps in the phylogeny before using a likelihood-based approach centred on amino acid substitution matrices to assign ancestral amino acids. Important outgroup information is used by first working down from the tips of the tree to the root, using descendant data only to assign probabilities, and then working back up from the root to the tips using descendant and outgroup data to make predictions. GASP was tested on a number of simulated datasets based on real phylogenies. Prediction accuracy for ungapped data was similar to three alternative algorithms tested, with GASP performing better in some cases and worse in others. Adding simple insertions and deletions to the simulated data did not have a detrimental effect on GASP accuracy. Conclusions GASP (Gapped Ancestral Sequence Prediction) will predict ancestral sequences from multiple protein alignments of any size. Although not as accurate in all cases as some of the more sophisticated maximum likelihood approaches, it can process a wide range of input phylogenies and will predict ancestral sequences for gapped and ungapped residues alike. PMID:15350199
TMDIM: an improved algorithm for the structure prediction of transmembrane domains of bitopic dimers.

PubMed

Cao, Han; Ng, Marcus C K; Jusoh, Siti Azma; Tai, Hio Kuan; Siu, Shirley W I

2017-09-01

[Formula: see text]-Helical transmembrane proteins are the most important drug targets in rational drug development. However, solving the experimental structures of these proteins remains difficult, therefore computational methods to accurately and efficiently predict the structures are in great demand. We present an improved structure prediction method TMDIM based on Park et al. (Proteins 57:577-585, 2004) for predicting bitopic transmembrane protein dimers. Three major algorithmic improvements are introduction of the packing type classification, the multiple-condition decoy filtering, and the cluster-based candidate selection. In a test of predicting nine known bitopic dimers, approximately 78% of our predictions achieved a successful fit (RMSD <2.0 Å) and 78% of the cases are better predicted than the two other methods compared. Our method provides an alternative for modeling TM bitopic dimers of unknown structures for further computational studies. TMDIM is freely available on the web at https://cbbio.cis.umac.mo/TMDIM . Website is implemented in PHP, MySQL and Apache, with all major browsers supported.
TMDIM: an improved algorithm for the structure prediction of transmembrane domains of bitopic dimers

NASA Astrophysics Data System (ADS)

Cao, Han; Ng, Marcus C. K.; Jusoh, Siti Azma; Tai, Hio Kuan; Siu, Shirley W. I.

2017-09-01

α-Helical transmembrane proteins are the most important drug targets in rational drug development. However, solving the experimental structures of these proteins remains difficult, therefore computational methods to accurately and efficiently predict the structures are in great demand. We present an improved structure prediction method TMDIM based on Park et al. (Proteins 57:577-585, 2004) for predicting bitopic transmembrane protein dimers. Three major algorithmic improvements are introduction of the packing type classification, the multiple-condition decoy filtering, and the cluster-based candidate selection. In a test of predicting nine known bitopic dimers, approximately 78% of our predictions achieved a successful fit (RMSD <2.0 Å) and 78% of the cases are better predicted than the two other methods compared. Our method provides an alternative for modeling TM bitopic dimers of unknown structures for further computational studies. TMDIM is freely available on the web at https://cbbio.cis.umac.mo/TMDIM. Website is implemented in PHP, MySQL and Apache, with all major browsers supported.
[Research on engine remaining useful life prediction based on oil spectrum analysis and particle filtering].

PubMed

Sun, Lei; Jia, Yun-xian; Cai, Li-ying; Lin, Guo-yu; Zhao, Jin-song

2013-09-01

The spectrometric oil analysis(SOA) is an important technique for machine state monitoring, fault diagnosis and prognosis, and SOA based remaining useful life(RUL) prediction has an advantage of finding out the optimal maintenance strategy for machine system. Because the complexity of machine system, its health state degradation process can't be simply characterized by linear model, while particle filtering(PF) possesses obvious advantages over traditional Kalman filtering for dealing nonlinear and non-Gaussian system, the PF approach was applied to state forecasting by SOA, and the RUL prediction technique based on SOA and PF algorithm is proposed. In the prediction model, according to the estimating result of system's posterior probability, its prior probability distribution is realized, and the multi-step ahead prediction model based on PF algorithm is established. Finally, the practical SOA data of some engine was analyzed and forecasted by the above method, and the forecasting result was compared with that of traditional Kalman filtering method. The result fully shows the superiority and effectivity of the
A parallel algorithm for the initial screening of space debris collisions prediction using the SGP4/SDP4 models and GPU acceleration

NASA Astrophysics Data System (ADS)

Lin, Mingpei; Xu, Ming; Fu, Xiaoyu

2017-05-01

Currently, a tremendous amount of space debris in Earth's orbit imperils operational spacecraft. It is essential to undertake risk assessments of collisions and predict dangerous encounters in space. However, collision predictions for an enormous amount of space debris give rise to large-scale computations. In this paper, a parallel algorithm is established on the Compute Unified Device Architecture (CUDA) platform of NVIDIA Corporation for collision prediction. According to the parallel structure of NVIDIA graphics processors, a block decomposition strategy is adopted in the algorithm. Space debris is divided into batches, and the computation and data transfer operations of adjacent batches overlap. As a consequence, the latency to access shared memory during the entire computing process is significantly reduced, and a higher computing speed is reached. Theoretically, a simulation of collision prediction for space debris of any amount and for any time span can be executed. To verify this algorithm, a simulation example including 1382 pieces of debris, whose operational time scales vary from 1 min to 3 days, is conducted on Tesla C2075 of NVIDIA. The simulation results demonstrate that with the same computational accuracy as that of a CPU, the computing speed of the parallel algorithm on a GPU is 30 times that on a CPU. Based on this algorithm, collision prediction of over 150 Chinese spacecraft for a time span of 3 days can be completed in less than 3 h on a single computer, which meets the timeliness requirement of the initial screening task. Furthermore, the algorithm can be adapted for multiple tasks, including particle filtration, constellation design, and Monte-Carlo simulation of an orbital computation.
An algorithm for direct causal learning of influences on patient outcomes.

PubMed

Rathnam, Chandramouli; Lee, Sanghoon; Jiang, Xia

2017-01-01

This study aims at developing and introducing a new algorithm, called direct causal learner (DCL), for learning the direct causal influences of a single target. We applied it to both simulated and real clinical and genome wide association study (GWAS) datasets and compared its performance to classic causal learning algorithms. The DCL algorithm learns the causes of a single target from passive data using Bayesian-scoring, instead of using independence checks, and a novel deletion algorithm. We generate 14,400 simulated datasets and measure the number of datasets for which DCL correctly and partially predicts the direct causes. We then compare its performance with the constraint-based path consistency (PC) and conservative PC (CPC) algorithms, the Bayesian-score based fast greedy search (FGS) algorithm, and the partial ancestral graphs algorithm fast causal inference (FCI). In addition, we extend our comparison of all five algorithms to both a real GWAS dataset and real breast cancer datasets over various time-points in order to observe how effective they are at predicting the causal influences of Alzheimer's disease and breast cancer survival. DCL consistently outperforms FGS, PC, CPC, and FCI in discovering the parents of the target for the datasets simulated using a simple network. Overall, DCL predicts significantly more datasets correctly (McNemar's test significance: p<0.0001) than any of the other algorithms for these network types. For example, when assessing overall performance (simple and complex network results combined), DCL correctly predicts approximately 1400 more datasets than the top FGS method, 1600 more datasets than the top CPC method, 4500 more datasets than the top PC method, and 5600 more datasets than the top FCI method. Although FGS did correctly predict more datasets than DCL for the complex networks, and DCL correctly predicted only a few more datasets than CPC for these networks, there is no significant difference in performance between these three algorithms for this network type. However, when we use a more continuous measure of accuracy, we find that all the DCL methods are able to better partially predict more direct causes than FGS and CPC for the complex networks. In addition, DCL consistently had faster runtimes than the other algorithms. In the application to the real datasets, DCL identified rs6784615, located on the NISCH gene, and rs10824310, located on the PRKG1 gene, as direct causes of late onset Alzheimer's disease (LOAD) development. In addition, DCL identified ER category as a direct predictor of breast cancer mortality within 5 years, and HER2 status as a direct predictor of 10-year breast cancer mortality. These predictors have been identified in previous studies to have a direct causal relationship with their respective phenotypes, supporting the predictive power of DCL. When the other algorithms discovered predictors from the real datasets, these predictors were either also found by DCL or could not be supported by previous studies. Our results show that DCL outperforms FGS, PC, CPC, and FCI in almost every case, demonstrating its potential to advance causal learning. Furthermore, our DCL algorithm effectively identifies direct causes in the LOAD and Metabric GWAS datasets, which indicates its potential for clinical applications. Copyright © 2016 Elsevier B.V. All rights reserved.
Optimizing research in symptomatic uterine fibroids with development of a computable phenotype for use with electronic health records.

PubMed

Hoffman, Sarah R; Vines, Anissa I; Halladay, Jacqueline R; Pfaff, Emily; Schiff, Lauren; Westreich, Daniel; Sundaresan, Aditi; Johnson, La-Shell; Nicholson, Wanda K

2018-06-01

Women with symptomatic uterine fibroids can report a myriad of symptoms, including pain, bleeding, infertility, and psychosocial sequelae. Optimizing fibroid research requires the ability to enroll populations of women with image-confirmed symptomatic uterine fibroids. Our objective was to develop an electronic health record-based algorithm to identify women with symptomatic uterine fibroids for a comparative effectiveness study of medical or surgical treatments on quality-of-life measures. Using an iterative process and text-mining techniques, an effective computable phenotype algorithm, composed of demographics, and clinical and laboratory characteristics, was developed with reasonable performance. Such algorithms provide a feasible, efficient way to identify populations of women with symptomatic uterine fibroids for the conduct of large traditional or pragmatic trials and observational comparative effectiveness studies. Symptomatic uterine fibroids, due to menorrhagia, pelvic pain, bulk symptoms, or infertility, are a source of substantial morbidity for reproductive-age women. Comparing Treatment Options for Uterine Fibroids is a multisite registry study to compare the effectiveness of hormonal or surgical fibroid treatments on women's perceptions of their quality of life. Electronic health record-based algorithms are able to identify large numbers of women with fibroids, but additional work is needed to develop electronic health record algorithms that can identify women with symptomatic fibroids to optimize fibroid research. We sought to develop an efficient electronic health record-based algorithm that can identify women with symptomatic uterine fibroids in a large health care system for recruitment into large-scale observational and interventional research in fibroid management. We developed and assessed the accuracy of 3 algorithms to identify patients with symptomatic fibroids using an iterative approach. The data source was the Carolina Data Warehouse for Health, a repository for the health system's electronic health record data. In addition to International Classification of Diseases, Ninth Revision diagnosis and procedure codes and clinical characteristics, text data-mining software was used to derive information from imaging reports to confirm the presence of uterine fibroids. Results of each algorithm were compared with expert manual review to calculate the positive predictive values for each algorithm. Algorithm 1 was composed of the following criteria: (1) age 18-54 years; (2) either ≥1 International Classification of Diseases, Ninth Revision diagnosis codes for uterine fibroids or mention of fibroids using text-mined key words in imaging records or documents; and (3) no International Classification of Diseases, Ninth Revision or Current Procedural Terminology codes for hysterectomy and no reported history of hysterectomy. The positive predictive value was 47% (95% confidence interval 39-56%). Algorithm 2 required ≥2 International Classification of Diseases, Ninth Revision diagnosis codes for fibroids and positive text-mined key words and had a positive predictive value of 65% (95% confidence interval 50-79%). In algorithm 3, further refinements included ≥2 International Classification of Diseases, Ninth Revision diagnosis codes for fibroids on separate outpatient visit dates, the exclusion of women who had a positive pregnancy test within 3 months of their fibroid-related visit, and exclusion of incidentally detected fibroids during prenatal or emergency department visits. Algorithm 3 achieved a positive predictive value of 76% (95% confidence interval 71-81%). An electronic health record-based algorithm is capable of identifying cases of symptomatic uterine fibroids with moderate positive predictive value and may be an efficient approach for large-scale study recruitment. Copyright © 2018 Elsevier Inc. All rights reserved.
Foraging on the potential energy surface: a swarm intelligence-based optimizer for molecular geometry.

PubMed

Wehmeyer, Christoph; Falk von Rudorff, Guido; Wolf, Sebastian; Kabbe, Gabriel; Schärf, Daniel; Kühne, Thomas D; Sebastiani, Daniel

2012-11-21

We present a stochastic, swarm intelligence-based optimization algorithm for the prediction of global minima on potential energy surfaces of molecular cluster structures. Our optimization approach is a modification of the artificial bee colony (ABC) algorithm which is inspired by the foraging behavior of honey bees. We apply our modified ABC algorithm to the problem of global geometry optimization of molecular cluster structures and show its performance for clusters with 2-57 particles and different interatomic interaction potentials.
Foraging on the potential energy surface: A swarm intelligence-based optimizer for molecular geometry

NASA Astrophysics Data System (ADS)

Wehmeyer, Christoph; Falk von Rudorff, Guido; Wolf, Sebastian; Kabbe, Gabriel; Schärf, Daniel; Kühne, Thomas D.; Sebastiani, Daniel

2012-11-01

We present a stochastic, swarm intelligence-based optimization algorithm for the prediction of global minima on potential energy surfaces of molecular cluster structures. Our optimization approach is a modification of the artificial bee colony (ABC) algorithm which is inspired by the foraging behavior of honey bees. We apply our modified ABC algorithm to the problem of global geometry optimization of molecular cluster structures and show its performance for clusters with 2-57 particles and different interatomic interaction potentials.
Thermodynamic heuristics with case-based reasoning: combined insights for RNA pseudoknot secondary structure.

PubMed

Al-Khatib, Ra'ed M; Rashid, Nur'Aini Abdul; Abdullah, Rosni

2011-08-01

The secondary structure of RNA pseudoknots has been extensively inferred and scrutinized by computational approaches. Experimental methods for determining RNA structure are time consuming and tedious; therefore, predictive computational approaches are required. Predicting the most accurate and energy-stable pseudoknot RNA secondary structure has been proven to be an NP-hard problem. In this paper, a new RNA folding approach, termed MSeeker, is presented; it includes KnotSeeker (a heuristic method) and Mfold (a thermodynamic algorithm). The global optimization of this thermodynamic heuristic approach was further enhanced by using a case-based reasoning technique as a local optimization method. MSeeker is a proposed algorithm for predicting RNA pseudoknot structure from individual sequences, especially long ones. This research demonstrates that MSeeker improves the sensitivity and specificity of existing RNA pseudoknot structure predictions. The performance and structural results from this proposed method were evaluated against seven other state-of-the-art pseudoknot prediction methods. The MSeeker method had better sensitivity than the DotKnot, FlexStem, HotKnots, pknotsRG, ILM, NUPACK and pknotsRE methods, with 79% of the predicted pseudoknot base-pairs being correct.
Dinucleotide controlled null models for comparative RNA gene prediction.

PubMed

Gesell, Tanja; Washietl, Stefan

2008-05-27

Comparative prediction of RNA structures can be used to identify functional noncoding RNAs in genomic screens. It was shown recently by Babak et al. [BMC Bioinformatics. 8:33] that RNA gene prediction programs can be biased by the genomic dinucleotide content, in particular those programs using a thermodynamic folding model including stacking energies. As a consequence, there is need for dinucleotide-preserving control strategies to assess the significance of such predictions. While there have been randomization algorithms for single sequences for many years, the problem has remained challenging for multiple alignments and there is currently no algorithm available. We present a program called SISSIz that simulates multiple alignments of a given average dinucleotide content. Meeting additional requirements of an accurate null model, the randomized alignments are on average of the same sequence diversity and preserve local conservation and gap patterns. We make use of a phylogenetic substitution model that includes overlapping dependencies and site-specific rates. Using fast heuristics and a distance based approach, a tree is estimated under this model which is used to guide the simulations. The new algorithm is tested on vertebrate genomic alignments and the effect on RNA structure predictions is studied. In addition, we directly combined the new null model with the RNAalifold consensus folding algorithm giving a new variant of a thermodynamic structure based RNA gene finding program that is not biased by the dinucleotide content. SISSIz implements an efficient algorithm to randomize multiple alignments preserving dinucleotide content. It can be used to get more accurate estimates of false positive rates of existing programs, to produce negative controls for the training of machine learning based programs, or as standalone RNA gene finding program. Other applications in comparative genomics that require randomization of multiple alignments can be considered. SISSIz is available as open source C code that can be compiled for every major platform and downloaded here: http://sourceforge.net/projects/sissiz.
Soil-pipe interaction modeling for pipe behavior prediction with super learning based methods

NASA Astrophysics Data System (ADS)

Shi, Fang; Peng, Xiang; Liu, Huan; Hu, Yafei; Liu, Zheng; Li, Eric

2018-03-01

Underground pipelines are subject to severe distress from the surrounding expansive soil. To investigate the structural response of water mains to varying soil movements, field data, including pipe wall strains in situ soil water content, soil pressure and temperature, was collected. The research on monitoring data analysis has been reported, but the relationship between soil properties and pipe deformation has not been well-interpreted. To characterize the relationship between soil property and pipe deformation, this paper presents a super learning based approach combining feature selection algorithms to predict the water mains structural behavior in different soil environments. Furthermore, automatic variable selection method, e.i. recursive feature elimination algorithm, were used to identify the critical predictors contributing to the pipe deformations. To investigate the adaptability of super learning to different predictive models, this research employed super learning based methods to three different datasets. The predictive performance was evaluated by R-squared, root-mean-square error and mean absolute error. Based on the prediction performance evaluation, the superiority of super learning was validated and demonstrated by predicting three types of pipe deformations accurately. In addition, a comprehensive understand of the water mains working environments becomes possible.
Traffic Noise Ground Attenuation Algorithm Evaluation

NASA Astrophysics Data System (ADS)

Herman, Lloyd Allen

The Federal Highway Administration traffic noise prediction program, STAMINA 2.0, was evaluated for its accuracy. In addition, the ground attenuation algorithm used in the Ontario ORNAMENT method was evaluated to determine its potential to improve these predictions. Field measurements of sound levels were made at 41 sites on I-440 in Nashville, Tennessee in order to both study noise barrier effectiveness and to evaluate STAMINA 2.0 and the performance of the ORNAMENT ground attenuation algorithm. The measurement sites, which contain large variations in terrain, included several cross sections. Further, all sites contain some type of barrier, natural or constructed, which could more fully expose the strength and weaknesses of the ground attenuation algorithms. The noise barrier evaluation was accomplished in accordance with American National Standard Methods for Determination of Insertion Loss of Outdoor Noise Barriers which resulted in an evaluation of this standard. The entire 7.2 mile length of I-440 was modeled using STAMINA 2.0. A multiple run procedure was developed to emulate the results that would be obtained if the ORNAMENT algorithm was incorporated into STAMINA 2.0. Finally, the predicted noise levels based on STAMINA 2.0 and STAMINA with the ORNAMENT ground attenuation algorithm were compared with each other and with the field measurements. It was found that STAMINA 2.0 overpredicted noise levels by an average of over 2 dB for the receivers on I-440, whereas, the STAMINA with ORNAMENT ground attenuation algorithm overpredicted noise levels by an average of less than 0.5 dB. The mean errors for the two predictions were found to be statistically different from each other, and the mean error for the prediction with the ORNAMENT ground attenuation algorithm was not found to be statistically different from zero. The STAMINA 2.0 program predicts little, if any, ground attenuation for receivers at typical first-row distances from highways where noise barriers are used. The ORNAMENT ground attenuation algorithm, which recognizes and better compensates for the presence of obstacles in the propagation path of a sound wave, predicted significant amounts of ground attenuation for most sites.
The gradient boosting algorithm and random boosting for genome-assisted evaluation in large data sets.

PubMed

González-Recio, O; Jiménez-Montero, J A; Alenda, R

2013-01-01

In the next few years, with the advent of high-density single nucleotide polymorphism (SNP) arrays and genome sequencing, genomic evaluation methods will need to deal with a large number of genetic variants and an increasing sample size. The boosting algorithm is a machine-learning technique that may alleviate the drawbacks of dealing with such large data sets. This algorithm combines different predictors in a sequential manner with some shrinkage on them; each predictor is applied consecutively to the residuals from the committee formed by the previous ones to form a final prediction based on a subset of covariates. Here, a detailed description is provided and examples using a toy data set are included. A modification of the algorithm called "random boosting" was proposed to increase predictive ability and decrease computation time of genome-assisted evaluation in large data sets. Random boosting uses a random selection of markers to add a subsequent weak learner to the predictive model. These modifications were applied to a real data set composed of 1,797 bulls genotyped for 39,714 SNP. Deregressed proofs of 4 yield traits and 1 type trait from January 2009 routine evaluations were used as dependent variables. A 2-fold cross-validation scenario was implemented. Sires born before 2005 were used as a training sample (1,576 and 1,562 for production and type traits, respectively), whereas younger sires were used as a testing sample to evaluate predictive ability of the algorithm on yet-to-be-observed phenotypes. Comparison with the original algorithm was provided. The predictive ability of the algorithm was measured as Pearson correlations between observed and predicted responses. Further, estimated bias was computed as the average difference between observed and predicted phenotypes. The results showed that the modification of the original boosting algorithm could be run in 1% of the time used with the original algorithm and with negligible differences in accuracy and bias. This modification may be used to speed the calculus of genome-assisted evaluation in large data sets such us those obtained from consortiums. Copyright © 2013 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Personalized Risk Prediction in Clinical Oncology Research: Applications and Practical Issues Using Survival Trees and Random Forests.

PubMed

Hu, Chen; Steingrimsson, Jon Arni

2018-01-01

A crucial component of making individualized treatment decisions is to accurately predict each patient's disease risk. In clinical oncology, disease risks are often measured through time-to-event data, such as overall survival and progression/recurrence-free survival, and are often subject to censoring. Risk prediction models based on recursive partitioning methods are becoming increasingly popular largely due to their ability to handle nonlinear relationships, higher-order interactions, and/or high-dimensional covariates. The most popular recursive partitioning methods are versions of the Classification and Regression Tree (CART) algorithm, which builds a simple interpretable tree structured model. With the aim of increasing prediction accuracy, the random forest algorithm averages multiple CART trees, creating a flexible risk prediction model. Risk prediction models used in clinical oncology commonly use both traditional demographic and tumor pathological factors as well as high-dimensional genetic markers and treatment parameters from multimodality treatments. In this article, we describe the most commonly used extensions of the CART and random forest algorithms to right-censored outcomes. We focus on how they differ from the methods for noncensored outcomes, and how the different splitting rules and methods for cost-complexity pruning impact these algorithms. We demonstrate these algorithms by analyzing a randomized Phase III clinical trial of breast cancer. We also conduct Monte Carlo simulations to compare the prediction accuracy of survival forests with more commonly used regression models under various scenarios. These simulation studies aim to evaluate how sensitive the prediction accuracy is to the underlying model specifications, the choice of tuning parameters, and the degrees of missing covariates.
Rover Slip Validation and Prediction Algorithm

NASA Technical Reports Server (NTRS)

Yen, Jeng

2009-01-01

A physical-based simulation has been developed for the Mars Exploration Rover (MER) mission that applies a slope-induced wheel-slippage to the rover location estimator. Using the digital elevation map from the stereo images, the computational method resolves the quasi-dynamic equations of motion that incorporate the actual wheel-terrain speed to estimate the gross velocity of the vehicle. Based on the empirical slippage measured by the Visual Odometry software of the rover, this algorithm computes two factors for the slip model by minimizing the distance of the predicted and actual vehicle location, and then uses the model to predict the next drives. This technique, which has been deployed to operate the MER rovers in the extended mission periods, can accurately predict the rover position and attitude, mitigating the risk and uncertainties in the path planning on high-slope areas.

Improving the Interpretability of Classification Rules Discovered by an Ant Colony Algorithm: Extended Results.

PubMed

Otero, Fernando E B; Freitas, Alex A

2016-01-01

Most ant colony optimization (ACO) algorithms for inducing classification rules use a ACO-based procedure to create a rule in a one-at-a-time fashion. An improved search strategy has been proposed in the cAnt-Miner[Formula: see text] algorithm, where an ACO-based procedure is used to create a complete list of rules (ordered rules), i.e., the ACO search is guided by the quality of a list of rules instead of an individual rule. In this paper we propose an extension of the cAnt-Miner[Formula: see text] algorithm to discover a set of rules (unordered rules). The main motivations for this work are to improve the interpretation of individual rules by discovering a set of rules and to evaluate the impact on the predictive accuracy of the algorithm. We also propose a new measure to evaluate the interpretability of the discovered rules to mitigate the fact that the commonly used model size measure ignores how the rules are used to make a class prediction. Comparisons with state-of-the-art rule induction algorithms, support vector machines, and the cAnt-Miner[Formula: see text] producing ordered rules are also presented.
In Silico Screening Based on Predictive Algorithms as a Design Tool for Exon Skipping Oligonucleotides in Duchenne Muscular Dystrophy

PubMed Central

Echigoya, Yusuke; Mouly, Vincent; Garcia, Luis; Yokota, Toshifumi; Duddy, William

2015-01-01

The use of antisense ‘splice-switching’ oligonucleotides to induce exon skipping represents a potential therapeutic approach to various human genetic diseases. It has achieved greatest maturity in exon skipping of the dystrophin transcript in Duchenne muscular dystrophy (DMD), for which several clinical trials are completed or ongoing, and a large body of data exists describing tested oligonucleotides and their efficacy. The rational design of an exon skipping oligonucleotide involves the choice of an antisense sequence, usually between 15 and 32 nucleotides, targeting the exon that is to be skipped. Although parameters describing the target site can be computationally estimated and several have been identified to correlate with efficacy, methods to predict efficacy are limited. Here, an in silico pre-screening approach is proposed, based on predictive statistical modelling. Previous DMD data were compiled together and, for each oligonucleotide, some 60 descriptors were considered. Statistical modelling approaches were applied to derive algorithms that predict exon skipping for a given target site. We confirmed (1) the binding energetics of the oligonucleotide to the RNA, and (2) the distance in bases of the target site from the splice acceptor site, as the two most predictive parameters, and we included these and several other parameters (while discounting many) into an in silico screening process, based on their capacity to predict high or low efficacy in either phosphorodiamidate morpholino oligomers (89% correctly predicted) and/or 2’O Methyl RNA oligonucleotides (76% correctly predicted). Predictions correlated strongly with in vitro testing for sixteen de novo PMO sequences targeting various positions on DMD exons 44 (R2 0.89) and 53 (R2 0.89), one of which represents a potential novel candidate for clinical trials. We provide these algorithms together with a computational tool that facilitates screening to predict exon skipping efficacy at each position of a target exon. PMID:25816009
Reducing the worst case running times of a family of RNA and CFG problems, using Valiant's approach.

PubMed

Zakov, Shay; Tsur, Dekel; Ziv-Ukelson, Michal

2011-08-18

RNA secondary structure prediction is a mainstream bioinformatic domain, and is key to computational analysis of functional RNA. In more than 30 years, much research has been devoted to defining different variants of RNA structure prediction problems, and to developing techniques for improving prediction quality. Nevertheless, most of the algorithms in this field follow a similar dynamic programming approach as that presented by Nussinov and Jacobson in the late 70's, which typically yields cubic worst case running time algorithms. Recently, some algorithmic approaches were applied to improve the complexity of these algorithms, motivated by new discoveries in the RNA domain and by the need to efficiently analyze the increasing amount of accumulated genome-wide data. We study Valiant's classical algorithm for Context Free Grammar recognition in sub-cubic time, and extract features that are common to problems on which Valiant's approach can be applied. Based on this, we describe several problem templates, and formulate generic algorithms that use Valiant's technique and can be applied to all problems which abide by these templates, including many problems within the world of RNA Secondary Structures and Context Free Grammars. The algorithms presented in this paper improve the theoretical asymptotic worst case running time bounds for a large family of important problems. It is also possible that the suggested techniques could be applied to yield a practical speedup for these problems. For some of the problems (such as computing the RNA partition function and base-pair binding probabilities), the presented techniques are the only ones which are currently known for reducing the asymptotic running time bounds of the standard algorithms.
Reducing the worst case running times of a family of RNA and CFG problems, using Valiant's approach

PubMed Central

2011-01-01

Background RNA secondary structure prediction is a mainstream bioinformatic domain, and is key to computational analysis of functional RNA. In more than 30 years, much research has been devoted to defining different variants of RNA structure prediction problems, and to developing techniques for improving prediction quality. Nevertheless, most of the algorithms in this field follow a similar dynamic programming approach as that presented by Nussinov and Jacobson in the late 70's, which typically yields cubic worst case running time algorithms. Recently, some algorithmic approaches were applied to improve the complexity of these algorithms, motivated by new discoveries in the RNA domain and by the need to efficiently analyze the increasing amount of accumulated genome-wide data. Results We study Valiant's classical algorithm for Context Free Grammar recognition in sub-cubic time, and extract features that are common to problems on which Valiant's approach can be applied. Based on this, we describe several problem templates, and formulate generic algorithms that use Valiant's technique and can be applied to all problems which abide by these templates, including many problems within the world of RNA Secondary Structures and Context Free Grammars. Conclusions The algorithms presented in this paper improve the theoretical asymptotic worst case running time bounds for a large family of important problems. It is also possible that the suggested techniques could be applied to yield a practical speedup for these problems. For some of the problems (such as computing the RNA partition function and base-pair binding probabilities), the presented techniques are the only ones which are currently known for reducing the asymptotic running time bounds of the standard algorithms. PMID:21851589
Flood predictions using the parallel version of distributed numerical physical rainfall-runoff model TOPKAPI

NASA Astrophysics Data System (ADS)

Boyko, Oleksiy; Zheleznyak, Mark

2015-04-01

The original numerical code TOPKAPI-IMMS of the distributed rainfall-runoff model TOPKAPI ( Todini et al, 1996-2014) is developed and implemented in Ukraine. The parallel version of the code has been developed recently to be used on multiprocessors systems - multicore/processors PC and clusters. Algorithm is based on binary-tree decomposition of the watershed for the balancing of the amount of computation for all processors/cores. Message passing interface (MPI) protocol is used as a parallel computing framework. The numerical efficiency of the parallelization algorithms is demonstrated for the case studies for the flood predictions of the mountain watersheds of the Ukrainian Carpathian regions. The modeling results is compared with the predictions based on the lumped parameters models.
User Activity Recognition in Smart Homes Using Pattern Clustering Applied to Temporal ANN Algorithm.

PubMed

Bourobou, Serge Thomas Mickala; Yoo, Younghwan

2015-05-21

This paper discusses the possibility of recognizing and predicting user activities in the IoT (Internet of Things) based smart environment. The activity recognition is usually done through two steps: activity pattern clustering and activity type decision. Although many related works have been suggested, they had some limited performance because they focused only on one part between the two steps. This paper tries to find the best combination of a pattern clustering method and an activity decision algorithm among various existing works. For the first step, in order to classify so varied and complex user activities, we use a relevant and efficient unsupervised learning method called the K-pattern clustering algorithm. In the second step, the training of smart environment for recognizing and predicting user activities inside his/her personal space is done by utilizing the artificial neural network based on the Allen's temporal relations. The experimental results show that our combined method provides the higher recognition accuracy for various activities, as compared with other data mining classification algorithms. Furthermore, it is more appropriate for a dynamic environment like an IoT based smart home.
Machine Learning to Improve the Effectiveness of ANRS in Predicting HIV Drug Resistance.

PubMed

Singh, Yashik

2017-10-01

Human immunodeficiency virus infection and acquired immune deficiency syndrome (HIV/AIDS) is one of the major burdens of disease in developing countries, and the standard-of-care treatment includes prescribing antiretroviral drugs. However, antiretroviral drug resistance is inevitable due to selective pressure associated with the high mutation rate of HIV. Determining antiretroviral resistance can be done by phenotypic laboratory tests or by computer-based interpretation algorithms. Computer-based algorithms have been shown to have many advantages over laboratory tests. The ANRS (Agence Nationale de Recherches sur le SIDA) is regarded as a gold standard in interpreting HIV drug resistance using mutations in genomes. The aim of this study was to improve the prediction of the ANRS gold standard in predicting HIV drug resistance. A genome sequence and HIV drug resistance measures were obtained from the Stanford HIV database (http://hivdb.stanford.edu/). Feature selection was used to determine the most important mutations associated with resistance prediction. These mutations were added to the ANRS rules, and the difference in the prediction ability was measured. This study uncovered important mutations that were not associated with the original ANRS rules. On average, the ANRS algorithm was improved by 79% ± 6.6%. The positive predictive value improved by 28%, and the negative predicative value improved by 10%. The study shows that there is a significant improvement in the prediction ability of ANRS gold standard.
A Deep Learning Algorithm for Prediction of Age-Related Eye Disease Study Severity Scale for Age-Related Macular Degeneration from Color Fundus Photography.

PubMed

Grassmann, Felix; Mengelkamp, Judith; Brandl, Caroline; Harsch, Sebastian; Zimmermann, Martina E; Linkohr, Birgit; Peters, Annette; Heid, Iris M; Palm, Christoph; Weber, Bernhard H F

2018-04-10

Age-related macular degeneration (AMD) is a common threat to vision. While classification of disease stages is critical to understanding disease risk and progression, several systems based on color fundus photographs are known. Most of these require in-depth and time-consuming analysis of fundus images. Herein, we present an automated computer-based classification algorithm. Algorithm development for AMD classification based on a large collection of color fundus images. Validation is performed on a cross-sectional, population-based study. We included 120 656 manually graded color fundus images from 3654 Age-Related Eye Disease Study (AREDS) participants. AREDS participants were >55 years of age, and non-AMD sight-threatening diseases were excluded at recruitment. In addition, performance of our algorithm was evaluated in 5555 fundus images from the population-based Kooperative Gesundheitsforschung in der Region Augsburg (KORA; Cooperative Health Research in the Region of Augsburg) study. We defined 13 classes (9 AREDS steps, 3 late AMD stages, and 1 for ungradable images) and trained several convolution deep learning architectures. An ensemble of network architectures improved prediction accuracy. An independent dataset was used to evaluate the performance of our algorithm in a population-based study. κ Statistics and accuracy to evaluate the concordance between predicted and expert human grader classification. A network ensemble of 6 different neural net architectures predicted the 13 classes in the AREDS test set with a quadratic weighted κ of 92% (95% confidence interval, 89%-92%) and an overall accuracy of 63.3%. In the independent KORA dataset, images wrongly classified as AMD were mainly the result of a macular reflex observed in young individuals. By restricting the KORA analysis to individuals >55 years of age and prior exclusion of other retinopathies, the weighted and unweighted κ increased to 50% and 63%, respectively. Importantly, the algorithm detected 84.2% of all fundus images with definite signs of early or late AMD. Overall, 94.3% of healthy fundus images were classified correctly. Our deep learning algoritm revealed a weighted κ outperforming human graders in the AREDS study and is suitable to classify AMD fundus images in other datasets using individuals >55 years of age. Copyright © 2018 American Academy of Ophthalmology. Published by Elsevier Inc. All rights reserved.
MOlecular MAterials Property Prediction Package (MOMAP) 1.0: a software package for predicting the luminescent properties and mobility of organic functional materials

NASA Astrophysics Data System (ADS)

Niu, Yingli; Li, Wenqiang; Peng, Qian; Geng, Hua; Yi, Yuanping; Wang, Linjun; Nan, Guangjun; Wang, Dong; Shuai, Zhigang

2018-04-01

MOlecular MAterials Property Prediction Package (MOMAP) is a software toolkit for molecular materials property prediction. It focuses on luminescent properties and charge mobility properties. This article contains a brief descriptive introduction of key features, theoretical models and algorithms of the software, together with examples that illustrate the performance. First, we present the theoretical models and algorithms for molecular luminescent properties calculation, which includes the excited-state radiative/non-radiative decay rate constant and the optical spectra. Then, a multi-scale simulation approach and its algorithm for the molecular charge mobility are described. This approach is based on hopping model and combines with Kinetic Monte Carlo and molecular dynamics simulations, and it is especially applicable for describing a large category of organic semiconductors, whose inter-molecular electronic coupling is much smaller than intra-molecular charge reorganisation energy.
A diagnostic algorithm for atypical spitzoid tumors: guidelines for immunohistochemical and molecular assessment.

PubMed

Cho-Vega, Jeong Hee

2016-07-01

Atypical spitzoid tumors are a morphologically diverse group of rare melanocytic lesions most frequently seen in children and young adults. As atypical spitzoid tumors bear striking resemblance to Spitz nevus and spitzoid melanomas clinically and histopathologically, it is crucial to determine its malignant potential and predict its clinical behavior. To date, many researchers have attempted to differentiate atypical spitzoid tumors from unequivocal melanomas based on morphological, immonohistochemical, and molecular diagnostic differences. A diagnostic algorithm is proposed here to assess the malignant potential of atypical spitzoid tumors by using a combination of immunohistochemical and cytogenetic/molecular tests. Together with classical morphological evaluation, this algorithm includes a set of immunohistochemistry assays (p16(Ink4a), a dual-color Ki67/MART-1, and HMB45), fluorescence in situ hybridization (FISH) with five probes (6p25, 8q24, 11q13, CEN9, and 9p21), and an array-based comparative genomic hybridization. This review discusses details of the algorithm, the rationale of each test used in the algorithm, and utility of this algorithm in routine dermatopathology practice. This algorithmic approach will provide a comprehensive diagnostic tool that complements conventional histological criteria and will significantly contribute to improve the diagnosis and prediction of the clinical behavior of atypical spitzoid tumors.
Validation of an algorithm-based definition of treatment resistance in patients with schizophrenia.

PubMed

Ajnakina, Olesya; Horsdal, Henriette Thisted; Lally, John; MacCabe, James H; Murray, Robin M; Gasse, Christiane; Wimberley, Theresa

2018-02-19

Large-scale pharmacoepidemiological research on treatment resistance relies on accurate identification of people with treatment-resistant schizophrenia (TRS) based on data that are retrievable from administrative registers. This is usually approached by operationalising clinical treatment guidelines by using prescription and hospital admission information. We examined the accuracy of an algorithm-based definition of TRS based on clozapine prescription and/or meeting algorithm-based eligibility criteria for clozapine against a gold standard definition using case notes. We additionally validated a definition entirely based on clozapine prescription. 139 schizophrenia patients aged 18-65years were followed for a mean of 5years after first presentation to psychiatric services in South-London, UK. The diagnostic accuracy of the algorithm-based measure against the gold standard was measured with sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV). A total of 45 (32.4%) schizophrenia patients met the criteria for the gold standard definition of TRS; applying the algorithm-based definition to the same cohort led to 44 (31.7%) patients fulfilling criteria for TRS with sensitivity, specificity, PPV and NPV of 62.2%, 83.0%, 63.6% and 82.1%, respectively. The definition based on lifetime clozapine prescription had sensitivity, specificity, PPV and NPV of 40.0%, 94.7%, 78.3% and 76.7%, respectively. Although a perfect definition of TRS cannot be derived from available prescription and hospital registers, these results indicate that researchers can confidently use registries to identify individuals with TRS for research and clinical practices. Copyright © 2018 Elsevier B.V. All rights reserved.
Enabling phenotypic big data with PheNorm.

PubMed

Yu, Sheng; Ma, Yumeng; Gronsbell, Jessica; Cai, Tianrun; Ananthakrishnan, Ashwin N; Gainer, Vivian S; Churchill, Susanne E; Szolovits, Peter; Murphy, Shawn N; Kohane, Isaac S; Liao, Katherine P; Cai, Tianxi

2018-01-01

Electronic health record (EHR)-based phenotyping infers whether a patient has a disease based on the information in his or her EHR. A human-annotated training set with gold-standard disease status labels is usually required to build an algorithm for phenotyping based on a set of predictive features. The time intensiveness of annotation and feature curation severely limits the ability to achieve high-throughput phenotyping. While previous studies have successfully automated feature curation, annotation remains a major bottleneck. In this paper, we present PheNorm, a phenotyping algorithm that does not require expert-labeled samples for training. The most predictive features, such as the number of International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) codes or mentions of the target phenotype, are normalized to resemble a normal mixture distribution with high area under the receiver operating curve (AUC) for prediction. The transformed features are then denoised and combined into a score for accurate disease classification. We validated the accuracy of PheNorm with 4 phenotypes: coronary artery disease, rheumatoid arthritis, Crohn's disease, and ulcerative colitis. The AUCs of the PheNorm score reached 0.90, 0.94, 0.95, and 0.94 for the 4 phenotypes, respectively, which were comparable to the accuracy of supervised algorithms trained with sample sizes of 100-300, with no statistically significant difference. The accuracy of the PheNorm algorithms is on par with algorithms trained with annotated samples. PheNorm fully automates the generation of accurate phenotyping algorithms and demonstrates the capacity for EHR-driven annotations to scale to the next level - phenotypic big data. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com
VWPS: A Ventilator Weaning Prediction System with Artificial Intelligence

NASA Astrophysics Data System (ADS)

Chen, Austin H.; Chen, Guan-Ting

How to wean patients efficiently off mechanical ventilation continues to be a challenge for medical professionals. In this paper we have described a novel approach to the study of a ventilator weaning prediction system (VWPS). Firstly, we have developed and written three Artificial Neural Network (ANN) algorithms to predict a weaning successful rate based on the clinical data. Secondly, we have implemented two user-friendly weaning success rate prediction systems; the VWPS system and the BWAP system. Both systems could be used to help doctors objectively and effectively predict whether weaning is appropriate for patients based on the patients' clinical data. Our system utilizes the powerful processing abilities of MatLab. Thirdly, we have calculated the performance through measures such as sensitivity and accuracy for these three algorithms. The results show a very high sensitivity (around 80%) and accuracy (around 70%). To our knowledge, this is the first design approach of its kind to be used in the study of ventilator weaning success rate prediction.
Use of Artificial Intelligence and Machine Learning Algorithms with Gene Expression Profiling to Predict Recurrent Nonmuscle Invasive Urothelial Carcinoma of the Bladder.

PubMed

Bartsch, Georg; Mitra, Anirban P; Mitra, Sheetal A; Almal, Arpit A; Steven, Kenneth E; Skinner, Donald G; Fry, David W; Lenehan, Peter F; Worzel, William P; Cote, Richard J

2016-02-01

Due to the high recurrence risk of nonmuscle invasive urothelial carcinoma it is crucial to distinguish patients at high risk from those with indolent disease. In this study we used a machine learning algorithm to identify the genes in patients with nonmuscle invasive urothelial carcinoma at initial presentation that were most predictive of recurrence. We used the genes in a molecular signature to predict recurrence risk within 5 years after transurethral resection of bladder tumor. Whole genome profiling was performed on 112 frozen nonmuscle invasive urothelial carcinoma specimens obtained at first presentation on Human WG-6 BeadChips (Illumina®). A genetic programming algorithm was applied to evolve classifier mathematical models for outcome prediction. Cross-validation based resampling and gene use frequencies were used to identify the most prognostic genes, which were combined into rules used in a voting algorithm to predict the sample target class. Key genes were validated by quantitative polymerase chain reaction. The classifier set included 21 genes that predicted recurrence. Quantitative polymerase chain reaction was done for these genes in a subset of 100 patients. A 5-gene combined rule incorporating a voting algorithm yielded 77% sensitivity and 85% specificity to predict recurrence in the training set, and 69% and 62%, respectively, in the test set. A singular 3-gene rule was constructed that predicted recurrence with 80% sensitivity and 90% specificity in the training set, and 71% and 67%, respectively, in the test set. Using primary nonmuscle invasive urothelial carcinoma from initial occurrences genetic programming identified transcripts in reproducible fashion, which were predictive of recurrence. These findings could potentially impact nonmuscle invasive urothelial carcinoma management. Copyright © 2016 American Urological Association Education and Research, Inc. Published by Elsevier Inc. All rights reserved.
Applied Distributed Model Predictive Control for Energy Efficient Buildings and Ramp Metering

NASA Astrophysics Data System (ADS)

Koehler, Sarah Muraoka

Industrial large-scale control problems present an interesting algorithmic design challenge. A number of controllers must cooperate in real-time on a network of embedded hardware with limited computing power in order to maximize system efficiency while respecting constraints and despite communication delays. Model predictive control (MPC) can automatically synthesize a centralized controller which optimizes an objective function subject to a system model, constraints, and predictions of disturbance. Unfortunately, the computations required by model predictive controllers for large-scale systems often limit its industrial implementation only to medium-scale slow processes. Distributed model predictive control (DMPC) enters the picture as a way to decentralize a large-scale model predictive control problem. The main idea of DMPC is to split the computations required by the MPC problem amongst distributed processors that can compute in parallel and communicate iteratively to find a solution. Some popularly proposed solutions are distributed optimization algorithms such as dual decomposition and the alternating direction method of multipliers (ADMM). However, these algorithms ignore two practical challenges: substantial communication delays present in control systems and also problem non-convexity. This thesis presents two novel and practically effective DMPC algorithms. The first DMPC algorithm is based on a primal-dual active-set method which achieves fast convergence, making it suitable for large-scale control applications which have a large communication delay across its communication network. In particular, this algorithm is suited for MPC problems with a quadratic cost, linear dynamics, forecasted demand, and box constraints. We measure the performance of this algorithm and show that it significantly outperforms both dual decomposition and ADMM in the presence of communication delay. The second DMPC algorithm is based on an inexact interior point method which is suited for nonlinear optimization problems. The parallel computation of the algorithm exploits iterative linear algebra methods for the main linear algebra computations in the algorithm. We show that the splitting of the algorithm is flexible and can thus be applied to various distributed platform configurations. The two proposed algorithms are applied to two main energy and transportation control problems. The first application is energy efficient building control. Buildings represent 40% of energy consumption in the United States. Thus, it is significant to improve the energy efficiency of buildings. The goal is to minimize energy consumption subject to the physics of the building (e.g. heat transfer laws), the constraints of the actuators as well as the desired operating constraints (thermal comfort of the occupants), and heat load on the system. In this thesis, we describe the control systems of forced air building systems in practice. We discuss the "Trim and Respond" algorithm which is a distributed control algorithm that is used in practice, and show that it performs similarly to a one-step explicit DMPC algorithm. Then, we apply the novel distributed primal-dual active-set method and provide extensive numerical results for the building MPC problem. The second main application is the control of ramp metering signals to optimize traffic flow through a freeway system. This application is particularly important since urban congestion has more than doubled in the past few decades. The ramp metering problem is to maximize freeway throughput subject to freeway dynamics (derived from mass conservation), actuation constraints, freeway capacity constraints, and predicted traffic demand. In this thesis, we develop a hybrid model predictive controller for ramp metering that is guaranteed to be persistently feasible and stable. This contrasts to previous work on MPC for ramp metering where such guarantees are absent. We apply a smoothing method to the hybrid model predictive controller and apply the inexact interior point method to this nonlinear non-convex ramp metering problem.
Think globally and solve locally: secondary memory-based network learning for automated multi-species function prediction

PubMed Central

2014-01-01

Background Network-based learning algorithms for automated function prediction (AFP) are negatively affected by the limited coverage of experimental data and limited a priori known functional annotations. As a consequence their application to model organisms is often restricted to well characterized biological processes and pathways, and their effectiveness with poorly annotated species is relatively limited. A possible solution to this problem might consist in the construction of big networks including multiple species, but this in turn poses challenging computational problems, due to the scalability limitations of existing algorithms and the main memory requirements induced by the construction of big networks. Distributed computation or the usage of big computers could in principle respond to these issues, but raises further algorithmic problems and require resources not satisfiable with simple off-the-shelf computers. Results We propose a novel framework for scalable network-based learning of multi-species protein functions based on both a local implementation of existing algorithms and the adoption of innovative technologies: we solve “locally” the AFP problem, by designing “vertex-centric” implementations of network-based algorithms, but we do not give up thinking “globally” by exploiting the overall topology of the network. This is made possible by the adoption of secondary memory-based technologies that allow the efficient use of the large memory available on disks, thus overcoming the main memory limitations of modern off-the-shelf computers. This approach has been applied to the analysis of a large multi-species network including more than 300 species of bacteria and to a network with more than 200,000 proteins belonging to 13 Eukaryotic species. To our knowledge this is the first work where secondary-memory based network analysis has been applied to multi-species function prediction using biological networks with hundreds of thousands of proteins. Conclusions The combination of these algorithmic and technological approaches makes feasible the analysis of large multi-species networks using ordinary computers with limited speed and primary memory, and in perspective could enable the analysis of huge networks (e.g. the whole proteomes available in SwissProt), using well-equipped stand-alone machines. PMID:24843788
Global velocity constrained cloud motion prediction for short-term solar forecasting

NASA Astrophysics Data System (ADS)

Chen, Yanjun; Li, Wei; Zhang, Chongyang; Hu, Chuanping

2016-09-01

Cloud motion is the primary reason for short-term solar power output fluctuation. In this work, a new cloud motion estimation algorithm using a global velocity constraint is proposed. Compared to the most used Particle Image Velocity (PIV) algorithm, which assumes the homogeneity of motion vectors, the proposed method can capture the accurate motion vector for each cloud block, including both the motional tendency and morphological changes. Specifically, global velocity derived from PIV is first calculated, and then fine-grained cloud motion estimation can be achieved by global velocity based cloud block researching and multi-scale cloud block matching. Experimental results show that the proposed global velocity constrained cloud motion prediction achieves comparable performance to the existing PIV and filtered PIV algorithms, especially in a short prediction horizon.
Leveraging knowledge from physiological data: on-body heat stress risk prediction with sensor networks.

PubMed

Gaura, Elena; Kemp, John; Brusey, James

2013-12-01

The paper demonstrates that wearable sensor systems, coupled with real-time on-body processing and actuation, can enhance safety for wearers of heavy protective equipment who are subjected to harsh thermal environments by reducing risk of Uncompensable Heat Stress (UHS). The work focuses on Explosive Ordnance Disposal operatives and shows that predictions of UHS risk can be performed in real-time with sufficient accuracy for real-world use. Furthermore, it is shown that the required sensory input for such algorithms can be obtained with wearable, non-intrusive sensors. Two algorithms, one based on Bayesian nets and another on decision trees, are presented for determining the heat stress risk, considering the mean skin temperature prediction as a proxy. The algorithms are trained on empirical data and have accuracies of 92.1±2.9% and 94.4±2.1%, respectively when tested using leave-one-subject-out cross-validation. In applications such as Explosive Ordnance Disposal operative monitoring, such prediction algorithms can enable autonomous actuation of cooling systems and haptic alerts to minimize casualties.
High capacity reversible watermarking for audio by histogram shifting and predicted error expansion.

PubMed

Wang, Fei; Xie, Zhaoxin; Chen, Zuo

2014-01-01

Being reversible, the watermarking information embedded in audio signals can be extracted while the original audio data can achieve lossless recovery. Currently, the few reversible audio watermarking algorithms are confronted with following problems: relatively low SNR (signal-to-noise) of embedded audio; a large amount of auxiliary embedded location information; and the absence of accurate capacity control capability. In this paper, we present a novel reversible audio watermarking scheme based on improved prediction error expansion and histogram shifting. First, we use differential evolution algorithm to optimize prediction coefficients and then apply prediction error expansion to output stego data. Second, in order to reduce location map bits length, we introduced histogram shifting scheme. Meanwhile, the prediction error modification threshold according to a given embedding capacity can be computed by our proposed scheme. Experiments show that this algorithm improves the SNR of embedded audio signals and embedding capacity, drastically reduces location map bits length, and enhances capacity control capability.
Prediction-Correction Algorithms for Time-Varying Constrained Optimization

DOE PAGES

Simonetto, Andrea; Dall'Anese, Emiliano

2017-07-26

This article develops online algorithms to track solutions of time-varying constrained optimization problems. Particularly, resembling workhorse Kalman filtering-based approaches for dynamical systems, the proposed methods involve prediction-correction steps to provably track the trajectory of the optimal solutions of time-varying convex problems. The merits of existing prediction-correction methods have been shown for unconstrained problems and for setups where computing the inverse of the Hessian of the cost function is computationally affordable. This paper addresses the limitations of existing methods by tackling constrained problems and by designing first-order prediction steps that rely on the Hessian of the cost function (and do notmore » require the computation of its inverse). In addition, the proposed methods are shown to improve the convergence speed of existing prediction-correction methods when applied to unconstrained problems. Numerical simulations corroborate the analytical results and showcase performance and benefits of the proposed algorithms. A realistic application of the proposed method to real-time control of energy resources is presented.« less

Evaluation of the performance of existing non-laboratory based cardiovascular risk assessment algorithms

PubMed Central

2013-01-01

Background The high burden and rising incidence of cardiovascular disease (CVD) in resource constrained countries necessitates implementation of robust and pragmatic primary and secondary prevention strategies. Many current CVD management guidelines recommend absolute cardiovascular (CV) risk assessment as a clinically sound guide to preventive and treatment strategies. Development of non-laboratory based cardiovascular risk assessment algorithms enable absolute risk assessment in resource constrained countries. The objective of this review is to evaluate the performance of existing non-laboratory based CV risk assessment algorithms using the benchmarks for clinically useful CV risk assessment algorithms outlined by Cooney and colleagues. Methods A literature search to identify non-laboratory based risk prediction algorithms was performed in MEDLINE, CINAHL, Ovid Premier Nursing Journals Plus, and PubMed databases. The identified algorithms were evaluated using the benchmarks for clinically useful cardiovascular risk assessment algorithms outlined by Cooney and colleagues. Results Five non-laboratory based CV risk assessment algorithms were identified. The Gaziano and Framingham algorithms met the criteria for appropriateness of statistical methods used to derive the algorithms and endpoints. The Swedish Consultation, Framingham and Gaziano algorithms demonstrated good discrimination in derivation datasets. Only the Gaziano algorithm was externally validated where it had optimal discrimination. The Gaziano and WHO algorithms had chart formats which made them simple and user friendly for clinical application. Conclusion Both the Gaziano and Framingham non-laboratory based algorithms met most of the criteria outlined by Cooney and colleagues. External validation of the algorithms in diverse samples is needed to ascertain their performance and applicability to different populations and to enhance clinicians’ confidence in them. PMID:24373202
Analytical Algorithms to Quantify the Uncertainty in Remaining Useful Life Prediction

NASA Technical Reports Server (NTRS)

Sankararaman, Shankar; Saxena, Abhinav; Daigle, Matthew; Goebel, Kai

2013-01-01

This paper investigates the use of analytical algorithms to quantify the uncertainty in the remaining useful life (RUL) estimate of components used in aerospace applications. The prediction of RUL is affected by several sources of uncertainty and it is important to systematically quantify their combined effect by computing the uncertainty in the RUL prediction in order to aid risk assessment, risk mitigation, and decisionmaking. While sampling-based algorithms have been conventionally used for quantifying the uncertainty in RUL, analytical algorithms are computationally cheaper and sometimes, are better suited for online decision-making. While exact analytical algorithms are available only for certain special cases (for e.g., linear models with Gaussian variables), effective approximations can be made using the the first-order second moment method (FOSM), the first-order reliability method (FORM), and the inverse first-order reliability method (Inverse FORM). These methods can be used not only to calculate the entire probability distribution of RUL but also to obtain probability bounds on RUL. This paper explains these three methods in detail and illustrates them using the state-space model of a lithium-ion battery.
Ionosphere monitoring and forecast activities within the IAG working group "Ionosphere Prediction"

NASA Astrophysics Data System (ADS)

Hoque, Mainul; Garcia-Rigo, Alberto; Erdogan, Eren; Cueto Santamaría, Marta; Jakowski, Norbert; Berdermann, Jens; Hernandez-Pajares, Manuel; Schmidt, Michael; Wilken, Volker

2017-04-01

Ionospheric disturbances can affect technologies in space and on Earth disrupting satellite and airline operations, communications networks, navigation systems. As the world becomes ever more dependent on these technologies, ionospheric disturbances as part of space weather pose an increasing risk to the economic vitality and national security. Therefore, having the knowledge of ionospheric state in advance during space weather events is becoming more and more important. To promote scientific cooperation we recently formed a Working Group (WG) called "Ionosphere Predictions" within the International Association of Geodesy (IAG) under Sub-Commission 4.3 "Atmosphere Remote Sensing" of the Commission 4 "Positioning and Applications". The general objective of the WG is to promote the development of ionosphere prediction algorithm/models based on the dependence of ionospheric characteristics on solar and magnetic conditions combining data from different sensors to improve the spatial and temporal resolution and sensitivity taking advantage of different sounding geometries and latency. Our presented work enables the possibility to compare total electron content (TEC) prediction approaches/results from different centers contributing to this WG such as German Aerospace Center (DLR), Universitat Politècnica de Catalunya (UPC), Technische Universität München (TUM) and GMV. DLR developed a model-assisted TEC forecast algorithm taking benefit from actual trends of the TEC behavior at each grid point. Since during perturbations, characterized by large TEC fluctuations or ionization fronts, this approach may fail, the trend information is merged with the current background model which provides a stable climatological TEC behavior. The presented solution is a first step to regularly provide forecasted TEC services via SWACI/IMPC by DLR. UPC forecast model is based on applying linear regression to a temporal window of TEC maps in the Discrete Cosine Transform (DCT) domain. Performance tests are being conducted at the moment in order to improve UPC predicted products for 1-, 2-days ahead. In addition, UPC is working to enable short-term predictions based on UPC real-time GIMs (labelled URTG) and implementing an improved prediction approach. TUM developed a forecast method based on a time series analysis of TEC products which are either B-spline coefficients estimated by a Kalman filter or TEC grid maps derived from the B-spline coefficients. The forecast method uses a Fourier series expansion to extract the trend functions from the estimated TEC product. Then the trend functions are carried out to provide predicted TEC products. The forecast algorithm developed by GMV is based on the ionospheric delay estimation from previous epochs using GNSS data and the main dependence of ionospheric delays on solar and magnetic conditions. Since the ionospheric behavior is highly dependent on the region of the Earth, different region-based algorithmic modifications have been implemented in GMV's magicSBAS ionospheric algorithms to be able to estimate and forecast ionospheric delays worldwide. Different TEC prediction approaches outlined here will certainly help to learn about forecasting ionospheric ionization.
Emerging trend prediction in biomedical literature.

PubMed

Moerchen, Fabian; Fradkin, Dmitriy; Dejori, Mathaeus; Wachmann, Bernd

2008-11-06

We present a study on how to predict new emerging trends in the biomedical domain based on textual data. We thereby propose a way of anticipating the transformation of arbitrary information into ground truth knowledge by predicting the inclusion of new terms into the MeSH ontology. We also discuss the preparation of a dataset for the evaluation of emerging trend prediction algorithms that is based on PubMed abstracts and related MeSH terms. The results suggest that early prediction of emerging trends is possible.
Comparison between three algorithms for Dst predictions over the 2003 2005 period

NASA Astrophysics Data System (ADS)

Amata, E.; Pallocchia, G.; Consolini, G.; Marcucci, M. F.; Bertello, I.

2008-02-01

We compare, over a two and half years period, the performance of a recent artificial neural network (ANN) algorithm for the Dst prediction called EDDA [Pallocchia, G., Amata, E., Consolini, G., Marcucci, M.F., Bertello, I., 2006. Geomagnetic Dst index forecast based on IMF data only. Annales Geophysicae 24, 989-999], based on IMF inputs only, with the performance of the ANN Lundstedt et al. [2002. Operational forecasts of the geomagnetic Dst index. Geophysical Research Letters 29, 341] algorithm and the Wang et al. [2003. Influence of the solar wind dynamic pressure on the decay and injection of the ring current. Journal of Geophysical Research 108, 51] algorithm based on differential equations, which both make use of both IMF and plasma inputs. We show that: (1) all three algorithms perform similarly for "small" and "moderate" storms; (2) the EDDA and Wang algorithms perform similarly and considerably better than the Lundstedt et al. [2002. Operational forecasts of the geomagnetic Dst index. Geophysical Research Letters 29, 341] algorithm for "intense" and for "severe" storms; (3) the EDDA algorithm has the clear advantage, for space weather operational applications, that it makes use of IMF inputs only. The advantage lies in the fact that plasma data are at times less reliable and display data gaps more often than IMF measurements, especially during large solar disturbances, i.e. during periods when space weather forecast are most important. Some considerations are developed on the reasons why EDDA may forecast the Dst index without making use of solar wind density and velocity data.
A clinical decision-making algorithm for penicillin allergy.

PubMed

Soria, Angèle; Autegarden, Elodie; Amsler, Emmanuelle; Gaouar, Hafida; Vial, Amandine; Francès, Camille; Autegarden, Jean-Eric

2017-12-01

About 10% of subjects report suspected penicillin allergy, but 85-90% of these patients are not truly allergic and could safely receive beta-lactam antibiotics Objective: To design and validate a clinical decision-making algorithm, based on anamnesis (chronology, severity, and duration of the suspected allergic reactions) and reaching a 100% sensitivity and negative predictive value, to assess allergy risk related to a penicillin prescription in general practise. All patients were included prospectively and explorated based on ENDA/EAACI recommendations. Results of penicillin allergy work-up (gold standard) were compared with results of the algorithm. Allergological work-up diagnosed penicillin hypersensitivity in 41/259 patients (15.8%) [95% CI: 11.5-20.3]. Three of these patients were diagnosed as having immediate-type hypersensitivity to penicillin, but had been misdiagnosed as low risk patients using the clinical algorithm. Thus, the sensitivity and negative predictive value of the algorithm were 92.7% [95% CI: 80.1-98.5] and 96.3% [95% CI: 89.6-99.2], respectively, and the probability that a patient with true penicillin allergy had been misclassified was 3.7% [95% CI: 0.8-10.4]. Although the risk of misclassification is low, we cannot recommend the use of this algorithm in general practice. However, the algorithm can be useful in emergency situations in hospital settings. Key messages True penicillin allergy is considerably lower than alleged penicillin allergy (15.8%; 41 of the 259 patients with suspected penicillin allergy). A clinical algorithm based on the patient's clinical history of the supposed allergic event to penicillin misclassified 3/41 (3.7%) truly allergic patients.
Predicting coronary artery disease using different artificial neural network models.

PubMed

Colak, M Cengiz; Colak, Cemil; Kocatürk, Hasan; Sağiroğlu, Seref; Barutçu, Irfan

2008-08-01

Eight different learning algorithms used for creating artificial neural network (ANN) models and the different ANN models in the prediction of coronary artery disease (CAD) are introduced. This work was carried out as a retrospective case-control study. Overall, 124 consecutive patients who had been diagnosed with CAD by coronary angiography (at least 1 coronary stenosis > 50% in major epicardial arteries) were enrolled in the work. Angiographically, the 113 people (group 2) with normal coronary arteries were taken as control subjects. Multi-layered perceptrons ANN architecture were applied. The ANN models trained with different learning algorithms were performed in 237 records, divided into training (n=171) and testing (n=66) data sets. The performance of prediction was evaluated by sensitivity, specificity and accuracy values based on standard definitions. The results have demonstrated that ANN models trained with eight different learning algorithms are promising because of high (greater than 71%) sensitivity, specificity and accuracy values in the prediction of CAD. Accuracy, sensitivity and specificity values varied between 83.63%-100%, 86.46%-100% and 74.67%-100% for training, respectively. For testing, the values were more than 71% for sensitivity, 76% for specificity and 81% for accuracy. It may be proposed that the use of different learning algorithms other than backpropagation and larger sample sizes can improve the performance of prediction. The proposed ANN models trained with these learning algorithms could be used a promising approach for predicting CAD without the need for invasive diagnostic methods and could help in the prognostic clinical decision.
Manual physical balance assistance of therapists during gait training of stroke survivors: characteristics and predicting the timing.

PubMed

Haarman, Juliet A M; Maartens, Erik; van der Kooij, Herman; Buurke, Jaap H; Reenalda, Jasper; Rietman, Johan S

2017-12-02

During gait training, physical therapists continuously supervise stroke survivors and provide physical support to their pelvis when they judge that the patient is unable to keep his balance. This paper is the first in providing quantitative data about the corrective forces that therapists use during gait training. It is assumed that changes in the acceleration of a patient's COM are a good predictor for therapeutic balance assistance during the training sessions Therefore, this paper provides a method that predicts the timing of therapeutic balance assistance, based on acceleration data of the sacrum. Eight sub-acute stroke survivors and seven therapists were included in this study. Patients were asked to perform straight line walking as well as slalom walking in a conventional training setting. Acceleration of the sacrum was captured by an Inertial Magnetic Measurement Unit. Balance-assisting corrective forces applied by the therapist were collected from two force sensors positioned on both sides of the patient's hips. Measures to characterize the therapeutic balance assistance were the amount of force, duration, impulse and the anatomical plane in which the assistance took place. Based on the acceleration data of the sacrum, an algorithm was developed to predict therapeutic balance assistance. To validate the developed algorithm, the predicted events of balance assistance by the algorithm were compared with the actual provided therapeutic assistance. The algorithm was able to predict the actual therapeutic assistance with a Positive Predictive Value of 87% and a True Positive Rate of 81%. Assistance mainly took place over the medio-lateral axis and corrective forces of about 2% of the patient's body weight (15.9 N (11), median (IQR)) were provided by therapists in this plane. Median duration of balance assistance was 1.1 s (0.6) (median (IQR)) and median impulse was 9.4Ns (8.2) (median (IQR)). Although therapists were specifically instructed to aim for the force sensors on the iliac crest, a different contact location was reported in 22% of the corrections. This paper presents insights into the behavior of therapists regarding their manual physical assistance during gait training. A quantitative dataset was presented, representing therapeutic balance-assisting force characteristics. Furthermore, an algorithm was developed that predicts events at which therapeutic balance assistance was provided. Prediction scores remain high when different therapists and patients were analyzed with the same algorithm settings. Both the quantitative dataset and the developed algorithm can serve as technical input in the development of (robot-controlled) balance supportive devices.
Development of a Dynamic Operational Scheduling Algorithm for an Independent Micro-Grid with Renewable Energy

NASA Astrophysics Data System (ADS)

Obara, Shin'ya

A micro-grid with the capacity for sustainable energy is expected to be a distributed energy system that exhibits quite a small environmental impact. In an independent micro-grid, “green energy,” which is typically thought of as unstable, can be utilized effectively by introducing a battery. In the past study, the production-of-electricity prediction algorithm (PAS) of the solar cell was developed. In PAS, a layered neural network is made to learn based on past weather data and the operation plan of the compound system of a solar cell and other energy systems was examined using this prediction algorithm. In this paper, a dynamic operational scheduling algorithm is developed using a neural network (PAS) and a genetic algorithm (GA) to provide predictions for solar cell power output. We also do a case study analysis in which we use this algorithm to plan the operation of a system that connects nine houses in Sapporo to a micro-grid composed of power equipment and a polycrystalline silicon solar cell. In this work, the relationship between the accuracy of output prediction of the solar cell and the operation plan of the micro-grid was clarified. Moreover, we found that operating the micro-grid according to the plan derived with PAS was far superior, in terms of equipment hours of operation, to that using past average weather data.
Efficient Prediction of Low-Visibility Events at Airports Using Machine-Learning Regression

NASA Astrophysics Data System (ADS)

Cornejo-Bueno, L.; Casanova-Mateo, C.; Sanz-Justo, J.; Cerro-Prada, E.; Salcedo-Sanz, S.

2017-11-01

We address the prediction of low-visibility events at airports using machine-learning regression. The proposed model successfully forecasts low-visibility events in terms of the runway visual range at the airport, with the use of support-vector regression, neural networks (multi-layer perceptrons and extreme-learning machines) and Gaussian-process algorithms. We assess the performance of these algorithms based on real data collected at the Valladolid airport, Spain. We also propose a study of the atmospheric variables measured at a nearby tower related to low-visibility atmospheric conditions, since they are considered as the inputs of the different regressors. A pre-processing procedure of these input variables with wavelet transforms is also described. The results show that the proposed machine-learning algorithms are able to predict low-visibility events well. The Gaussian process is the best algorithm among those analyzed, obtaining over 98% of the correct classification rate in low-visibility events when the runway visual range is {>}1000 m, and about 80% under this threshold. The performance of all the machine-learning algorithms tested is clearly affected in extreme low-visibility conditions ({<}500 m). However, we show improved results of all the methods when data from a neighbouring meteorological tower are included, and also with a pre-processing scheme using a wavelet transform. Also presented are results of the algorithm performance in daytime and nighttime conditions, and for different prediction time horizons.
A Network Selection Algorithm Considering Power Consumption in Hybrid Wireless Networks

NASA Astrophysics Data System (ADS)

Joe, Inwhee; Kim, Won-Tae; Hong, Seokjoon

In this paper, we propose a novel network selection algorithm considering power consumption in hybrid wireless networks for vertical handover. CDMA, WiBro, WLAN networks are candidate networks for this selection algorithm. This algorithm is composed of the power consumption prediction algorithm and the final network selection algorithm. The power consumption prediction algorithm estimates the expected lifetime of the mobile station based on the current battery level, traffic class and power consumption for each network interface card of the mobile station. If the expected lifetime of the mobile station in a certain network is not long enough compared the handover delay, this particular network will be removed from the candidate network list, thereby preventing unnecessary handovers in the preprocessing procedure. On the other hand, the final network selection algorithm consists of AHP (Analytic Hierarchical Process) and GRA (Grey Relational Analysis). The global factors of the network selection structure are QoS, cost and lifetime. If user preference is lifetime, our selection algorithm selects the network that offers longest service duration due to low power consumption. Also, we conduct some simulations using the OPNET simulation tool. The simulation results show that the proposed algorithm provides longer lifetime in the hybrid wireless network environment.
A Novel Admixture-Based Pharmacogenetic Approach to Refine Warfarin Dosing in Caribbean Hispanics

PubMed Central

Claudio-Campos, Karla; Rivera-Miranda, Giselle; Bermúdez-Bosch, Luis; Renta, Jessicca Y.; Cadilla, Carmen L.; Cruz, Iadelisse; Feliu, Juan F.; Vergara, Cunegundo; Ruaño, Gualberto

2016-01-01

Aim This study is aimed at developing a novel admixture-adjusted pharmacogenomic approach to individually refine warfarin dosing in Caribbean Hispanic patients. Patients & Methods A multiple linear regression analysis of effective warfarin doses versus relevant genotypes, admixture, clinical and demographic factors was performed in 255 patients and further validated externally in another cohort of 55 individuals. Results The admixture-adjusted, genotype-guided warfarin dosing refinement algorithm developed in Caribbean Hispanics showed better predictability (R2 = 0.70, MAE = 0.72mg/day) than a clinical algorithm that excluded genotypes and admixture (R2 = 0.60, MAE = 0.99mg/day), and outperformed two prior pharmacogenetic algorithms in predicting effective dose in this population. For patients at the highest risk of adverse events, 45.5% of the dose predictions using the developed pharmacogenetic model resulted in ideal dose as compared with only 29% when using the clinical non-genetic algorithm (p<0.001). The admixture-driven pharmacogenetic algorithm predicted 58% of warfarin dose variance when externally validated in 55 individuals from an independent validation cohort (MAE = 0.89 mg/day, 24% mean bias). Conclusions Results supported our rationale to incorporate individual’s genotypes and unique admixture metrics into pharmacogenetic refinement models in order to increase predictability when expanding them to admixed populations like Caribbean Hispanics. Trial Registration ClinicalTrials.gov NCT01318057 PMID:26745506
A difference tracking algorithm based on discrete sine transform

NASA Astrophysics Data System (ADS)

Liu, HaoPeng; Yao, Yong; Lei, HeBing; Wu, HaoKun

2018-04-01

Target tracking is an important field of computer vision. The template matching tracking algorithm based on squared difference matching (SSD) and standard correlation coefficient (NCC) matching is very sensitive to the gray change of image. When the brightness or gray change, the tracking algorithm will be affected by high-frequency information. Tracking accuracy is reduced, resulting in loss of tracking target. In this paper, a differential tracking algorithm based on discrete sine transform is proposed to reduce the influence of image gray or brightness change. The algorithm that combines the discrete sine transform and the difference algorithm maps the target image into a image digital sequence. The Kalman filter predicts the target position. Using the Hamming distance determines the degree of similarity between the target and the template. The window closest to the template is determined the target to be tracked. The target to be tracked updates the template. Based on the above achieve target tracking. The algorithm is tested in this paper. Compared with SSD and NCC template matching algorithms, the algorithm tracks target stably when image gray or brightness change. And the tracking speed can meet the read-time requirement.
An Extended Kalman Filter-Based Attitude Tracking Algorithm for Star Sensors

PubMed Central

Li, Jian; Wei, Xinguo; Zhang, Guangjun

2017-01-01

Efficiency and reliability are key issues when a star sensor operates in tracking mode. In the case of high attitude dynamics, the performance of existing attitude tracking algorithms degenerates rapidly. In this paper an extended Kalman filtering-based attitude tracking algorithm is presented. The star sensor is modeled as a nonlinear stochastic system with the state estimate providing the three degree-of-freedom attitude quaternion and angular velocity. The star positions in the star image are predicted and measured to estimate the optimal attitude. Furthermore, all the cataloged stars observed in the sensor field-of-view according the predicted image motion are accessed using a catalog partition table to speed up the tracking, called star mapping. Software simulation and night-sky experiment are performed to validate the efficiency and reliability of the proposed method. PMID:28825684
An Extended Kalman Filter-Based Attitude Tracking Algorithm for Star Sensors.

PubMed

Li, Jian; Wei, Xinguo; Zhang, Guangjun

2017-08-21

Efficiency and reliability are key issues when a star sensor operates in tracking mode. In the case of high attitude dynamics, the performance of existing attitude tracking algorithms degenerates rapidly. In this paper an extended Kalman filtering-based attitude tracking algorithm is presented. The star sensor is modeled as a nonlinear stochastic system with the state estimate providing the three degree-of-freedom attitude quaternion and angular velocity. The star positions in the star image are predicted and measured to estimate the optimal attitude. Furthermore, all the cataloged stars observed in the sensor field-of-view according the predicted image motion are accessed using a catalog partition table to speed up the tracking, called star mapping. Software simulation and night-sky experiment are performed to validate the efficiency and reliability of the proposed method.
Using Time Series Analysis to Predict Cardiac Arrest in a PICU.

PubMed

Kennedy, Curtis E; Aoki, Noriaki; Mariscalco, Michele; Turley, James P

2015-11-01

To build and test cardiac arrest prediction models in a PICU, using time series analysis as input, and to measure changes in prediction accuracy attributable to different classes of time series data. Retrospective cohort study. Thirty-one bed academic PICU that provides care for medical and general surgical (not congenital heart surgery) patients. Patients experiencing a cardiac arrest in the PICU and requiring external cardiac massage for at least 2 minutes. None. One hundred three cases of cardiac arrest and 109 control cases were used to prepare a baseline dataset that consisted of 1,025 variables in four data classes: multivariate, raw time series, clinical calculations, and time series trend analysis. We trained 20 arrest prediction models using a matrix of five feature sets (combinations of data classes) with four modeling algorithms: linear regression, decision tree, neural network, and support vector machine. The reference model (multivariate data with regression algorithm) had an accuracy of 78% and 87% area under the receiver operating characteristic curve. The best model (multivariate + trend analysis data with support vector machine algorithm) had an accuracy of 94% and 98% area under the receiver operating characteristic curve. Cardiac arrest predictions based on a traditional model built with multivariate data and a regression algorithm misclassified cases 3.7 times more frequently than predictions that included time series trend analysis and built with a support vector machine algorithm. Although the final model lacks the specificity necessary for clinical application, we have demonstrated how information from time series data can be used to increase the accuracy of clinical prediction models.
Classifying injury narratives of large administrative databases for surveillance-A practical approach combining machine learning ensembles and human review.

PubMed

Marucci-Wellman, Helen R; Corns, Helen L; Lehto, Mark R

2017-01-01

Injury narratives are now available real time and include useful information for injury surveillance and prevention. However, manual classification of the cause or events leading to injury found in large batches of narratives, such as workers compensation claims databases, can be prohibitive. In this study we compare the utility of four machine learning algorithms (Naïve Bayes, Single word and Bi-gram models, Support Vector Machine and Logistic Regression) for classifying narratives into Bureau of Labor Statistics Occupational Injury and Illness event leading to injury classifications for a large workers compensation database. These algorithms are known to do well classifying narrative text and are fairly easy to implement with off-the-shelf software packages such as Python. We propose human-machine learning ensemble approaches which maximize the power and accuracy of the algorithms for machine-assigned codes and allow for strategic filtering of rare, emerging or ambiguous narratives for manual review. We compare human-machine approaches based on filtering on the prediction strength of the classifier vs. agreement between algorithms. Regularized Logistic Regression (LR) was the best performing algorithm alone. Using this algorithm and filtering out the bottom 30% of predictions for manual review resulted in high accuracy (overall sensitivity/positive predictive value of 0.89) of the final machine-human coded dataset. The best pairings of algorithms included Naïve Bayes with Support Vector Machine whereby the triple ensemble NB SW =NB BI-GRAM =SVM had very high performance (0.93 overall sensitivity/positive predictive value and high accuracy (i.e. high sensitivity and positive predictive values)) across both large and small categories leaving 41% of the narratives for manual review. Integrating LR into this ensemble mix improved performance only slightly. For large administrative datasets we propose incorporation of methods based on human-machine pairings such as we have done here, utilizing readily-available off-the-shelf machine learning techniques and resulting in only a fraction of narratives that require manual review. Human-machine ensemble methods are likely to improve performance over total manual coding. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
A novel stock forecasting model based on High-order-fuzzy-fluctuation Trends and Back Propagation Neural Network

PubMed Central

Dai, Zongli; Zhao, Aiwu; He, Jie

2018-01-01

In this paper, we propose a hybrid method to forecast the stock prices called High-order-fuzzy-fluctuation-Trends-based Back Propagation(HTBP)Neural Network model. First, we compare each value of the historical training data with the previous day's value to obtain a fluctuation trend time series (FTTS). On this basis, the FTTS blur into fuzzy time series (FFTS) based on the fluctuation of the increasing, equality, decreasing amplitude and direction. Since the relationship between FFTS and future wave trends is nonlinear, the HTBP neural network algorithm is used to find the mapping rules in the form of self-learning. Finally, the results of the algorithm output are used to predict future fluctuations. The proposed model provides some innovative features:(1)It combines fuzzy set theory and neural network algorithm to avoid overfitting problems existed in traditional models. (2)BP neural network algorithm can intelligently explore the internal rules of the actual existence of sequential data, without the need to analyze the influence factors of specific rules and the path of action. (3)The hybrid modal can reasonably remove noises from the internal rules by proper fuzzy treatment. This paper takes the TAIEX data set of Taiwan stock exchange as an example, and compares and analyzes the prediction performance of the model. The experimental results show that this method can predict the stock market in a very simple way. At the same time, we use this method to predict the Shanghai stock exchange composite index, and further verify the effectiveness and universality of the method. PMID:29420584
A novel stock forecasting model based on High-order-fuzzy-fluctuation Trends and Back Propagation Neural Network.

PubMed

Guan, Hongjun; Dai, Zongli; Zhao, Aiwu; He, Jie

2018-01-01

In this paper, we propose a hybrid method to forecast the stock prices called High-order-fuzzy-fluctuation-Trends-based Back Propagation(HTBP)Neural Network model. First, we compare each value of the historical training data with the previous day's value to obtain a fluctuation trend time series (FTTS). On this basis, the FTTS blur into fuzzy time series (FFTS) based on the fluctuation of the increasing, equality, decreasing amplitude and direction. Since the relationship between FFTS and future wave trends is nonlinear, the HTBP neural network algorithm is used to find the mapping rules in the form of self-learning. Finally, the results of the algorithm output are used to predict future fluctuations. The proposed model provides some innovative features:(1)It combines fuzzy set theory and neural network algorithm to avoid overfitting problems existed in traditional models. (2)BP neural network algorithm can intelligently explore the internal rules of the actual existence of sequential data, without the need to analyze the influence factors of specific rules and the path of action. (3)The hybrid modal can reasonably remove noises from the internal rules by proper fuzzy treatment. This paper takes the TAIEX data set of Taiwan stock exchange as an example, and compares and analyzes the prediction performance of the model. The experimental results show that this method can predict the stock market in a very simple way. At the same time, we use this method to predict the Shanghai stock exchange composite index, and further verify the effectiveness and universality of the method.
A complete diet-based algorithm for predicting nonheme iron absorption in adults.

PubMed

Armah, Seth M; Carriquiry, Alicia; Sullivan, Debra; Cook, James D; Reddy, Manju B

2013-07-01

Many algorithms have been developed in the past few decades to estimate nonheme iron absorption from the diet based on single meal absorption studies. Yet single meal studies exaggerate the effect of diet and other factors on absorption. Here, we propose a new algorithm based on complete diets for estimating nonheme iron absorption. We used data from 4 complete diet studies each with 12-14 participants for a total of 53 individuals (19 men and 34 women) aged 19-38 y. In each study, each participant was observed during three 1-wk periods during which they consumed different diets. The diets were typical, high, or low in meat, tea, calcium, or vitamin C. The total sample size was 159 (53 × 3) observations. We used multiple linear regression to quantify the effect of different factors on iron absorption. Serum ferritin was the most important factor in explaining differences in nonheme iron absorption, whereas the effect of dietary factors was small. When our algorithm was validated with single meal and complete diet data, the respective R(2) values were 0.57 (P < 0.001) and 0.84 (P < 0.0001). The results also suggest that between-person variations explain a large proportion of the differences in nonheme iron absorption. The algorithm based on complete diets we propose is useful for predicting nonheme iron absorption from the diets of different populations.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Jing, E-mail: jing.zhang2@duke.edu; Ghate, Sujata V.; Yoon, Sora C.

Purpose: Mammography is the most widely accepted and utilized screening modality for early breast cancer detection. Providing high quality mammography education to radiology trainees is essential, since excellent interpretation skills are needed to ensure the highest benefit of screening mammography for patients. The authors have previously proposed a computer-aided education system based on trainee models. Those models relate human-assessed image characteristics to trainee error. In this study, the authors propose to build trainee models that utilize features automatically extracted from images using computer vision algorithms to predict likelihood of missing each mass by the trainee. This computer vision-based approach tomore » trainee modeling will allow for automatically searching large databases of mammograms in order to identify challenging cases for each trainee. Methods: The authors’ algorithm for predicting the likelihood of missing a mass consists of three steps. First, a mammogram is segmented into air, pectoral muscle, fatty tissue, dense tissue, and mass using automated segmentation algorithms. Second, 43 features are extracted using computer vision algorithms for each abnormality identified by experts. Third, error-making models (classifiers) are applied to predict the likelihood of trainees missing the abnormality based on the extracted features. The models are developed individually for each trainee using his/her previous reading data. The authors evaluated the predictive performance of the proposed algorithm using data from a reader study in which 10 subjects (7 residents and 3 novices) and 3 experts read 100 mammographic cases. Receiver operating characteristic (ROC) methodology was applied for the evaluation. Results: The average area under the ROC curve (AUC) of the error-making models for the task of predicting which masses will be detected and which will be missed was 0.607 (95% CI,0.564-0.650). This value was statistically significantly different from 0.5 (p < 0.0001). For the 7 residents only, the AUC performance of the models was 0.590 (95% CI,0.537-0.642) and was also significantly higher than 0.5 (p = 0.0009). Therefore, generally the authors’ models were able to predict which masses were detected and which were missed better than chance. Conclusions: The authors proposed an algorithm that was able to predict which masses will be detected and which will be missed by each individual trainee. This confirms existence of error-making patterns in the detection of masses among radiology trainees. Furthermore, the proposed methodology will allow for the optimized selection of difficult cases for the trainees in an automatic and efficient manner.« less
Use of sexually transmitted disease risk assessment algorithms for selection of intrauterine device candidates.

PubMed

Morrison, C S; Sekadde-Kigondu, C; Miller, W C; Weiner, D H; Sinei, S K

1999-02-01

Sexually transmitted diseases (STD) are an important contraindication for intrauterine device (IUD) insertion. Nevertheless, laboratory testing for STD is not possible in many settings. The objective of this study is to evaluate the use of risk assessment algorithms to predict STD and subsequent IUD-related complications among IUD candidates. Among 615 IUD users in Kenya, the following algorithms were evaluated: 1) an STD algorithm based on US Agency for International Development (USAID) Technical Working Group guidelines: 2) a Centers for Disease Control and Prevention (CDC) algorithm for management of chlamydia; and 3) a data-derived algorithm modeled from study data. Algorithms were evaluated for prediction of chlamydial and gonococcal infection at 1 month and complications (pelvic inflammatory disease [PID], IUD removals, and IUD expulsions) over 4 months. Women with STD were more likely to develop complications than women without STD (19% vs 6%; risk ratio = 2.9; 95% CI 1.3-6.5). For STD prediction, the USAID algorithm was 75% sensitive and 48% specific, with a positive likelihood ratio (LR+) of 1.4. The CDC algorithm was 44% sensitive and 72% specific, LR+ = 1.6. The data-derived algorithm was 91% sensitive and 56% specific, with LR+ = 2.0 and LR- = 0.2. Category-specific LR for this algorithm identified women with very low (< 1%) and very high (29%) infection probabilities. The data-derived algorithm was also the best predictor of IUD-related complications. These results suggest that use of STD algorithms may improve selection of IUD users. Women at high risk for STD could be counseled to avoid IUD, whereas women at moderate risk should be monitored closely and counseled to use condoms.
Adaptive adjustment of interval predictive control based on combined model and application in shell brand petroleum distillation tower

NASA Astrophysics Data System (ADS)

Sun, Chao; Zhang, Chunran; Gu, Xinfeng; Liu, Bin

2017-10-01

Constraints of the optimization objective are often unable to be met when predictive control is applied to industrial production process. Then, online predictive controller will not find a feasible solution or a global optimal solution. To solve this problem, based on Back Propagation-Auto Regressive with exogenous inputs (BP-ARX) combined control model, nonlinear programming method is used to discuss the feasibility of constrained predictive control, feasibility decision theorem of the optimization objective is proposed, and the solution method of soft constraint slack variables is given when the optimization objective is not feasible. Based on this, for the interval control requirements of the controlled variables, the slack variables that have been solved are introduced, the adaptive weighted interval predictive control algorithm is proposed, achieving adaptive regulation of the optimization objective and automatically adjust of the infeasible interval range, expanding the scope of the feasible region, and ensuring the feasibility of the interval optimization objective. Finally, feasibility and effectiveness of the algorithm is validated through the simulation comparative experiments.
Predicting Energy Consumption for Potential Effective Use in Hybrid Vehicle Powertrain Management Using Driver Prediction

NASA Astrophysics Data System (ADS)

Magnuson, Brian

A proof-of-concept software-in-the-loop study is performed to assess the accuracy of predicted net and charge-gaining energy consumption for potential effective use in optimizing powertrain management of hybrid vehicles. With promising results of improving fuel efficiency of a thermostatic control strategy for a series, plug-ing, hybrid-electric vehicle by 8.24%, the route and speed prediction machine learning algorithms are redesigned and implemented for real- world testing in a stand-alone C++ code-base to ingest map data, learn and predict driver habits, and store driver data for fast startup and shutdown of the controller or computer used to execute the compiled algorithm. Speed prediction is performed using a multi-layer, multi-input, multi- output neural network using feed-forward prediction and gradient descent through back- propagation training. Route prediction utilizes a Hidden Markov Model with a recurrent forward algorithm for prediction and multi-dimensional hash maps to store state and state distribution constraining associations between atomic road segments and end destinations. Predicted energy is calculated using the predicted time-series speed and elevation profile over the predicted route and the road-load equation. Testing of the code-base is performed over a known road network spanning 24x35 blocks on the south hill of Spokane, Washington. A large set of training routes are traversed once to add randomness to the route prediction algorithm, and a subset of the training routes, testing routes, are traversed to assess the accuracy of the net and charge-gaining predicted energy consumption. Each test route is traveled a random number of times with varying speed conditions from traffic and pedestrians to add randomness to speed prediction. Prediction data is stored and analyzed in a post process Matlab script. The aggregated results and analysis of all traversals of all test routes reflect the performance of the Driver Prediction algorithm. The error of average energy gained through charge-gaining events is 31.3% and the error of average net energy consumed is 27.3%. The average delta and average standard deviation of the delta of predicted energy gained through charge-gaining events is 0.639 and 0.601 Wh respectively for individual time-series calculations. Similarly, the average delta and average standard deviation of the delta of the predicted net energy consumed is 0.567 and 0.580 Wh respectively for individual time-series calculations. The average delta and standard deviation of the delta of the predicted speed is 1.60 and 1.15 respectively also for the individual time-series measurements. The percentage of accuracy of route prediction is 91%. Overall, test routes are traversed 151 times for a total test distance of 276.4 km.
Processing LiDAR Data to Predict Natural Hazards

NASA Technical Reports Server (NTRS)

Fairweather, Ian; Crabtree, Robert; Hager, Stacey

2008-01-01

ELF-Base and ELF-Hazards (wherein 'ELF' signifies 'Extract LiDAR Features' and 'LiDAR' signifies 'light detection and ranging') are developmental software modules for processing remote-sensing LiDAR data to identify past natural hazards (principally, landslides) and predict future ones. ELF-Base processes raw LiDAR data, including LiDAR intensity data that are often ignored in other software, to create digital terrain models (DTMs) and digital feature models (DFMs) with sub-meter accuracy. ELF-Hazards fuses raw LiDAR data, data from multispectral and hyperspectral optical images, and DTMs and DFMs generated by ELF-Base to generate hazard risk maps. Advanced algorithms in these software modules include line-enhancement and edge-detection algorithms, surface-characterization algorithms, and algorithms that implement innovative data-fusion techniques. The line-extraction and edge-detection algorithms enable users to locate such features as faults and landslide headwall scarps. Also implemented in this software are improved methodologies for identification and mapping of past landslide events by use of (1) accurate, ELF-derived surface characterizations and (2) three LiDAR/optical-data-fusion techniques: post-classification data fusion, maximum-likelihood estimation modeling, and hierarchical within-class discrimination. This software is expected to enable faster, more accurate forecasting of natural hazards than has previously been possible.
Visual saliency-based fast intracoding algorithm for high efficiency video coding

NASA Astrophysics Data System (ADS)

Zhou, Xin; Shi, Guangming; Zhou, Wei; Duan, Zhemin

2017-01-01

Intraprediction has been significantly improved in high efficiency video coding over H.264/AVC with quad-tree-based coding unit (CU) structure from size 64×64 to 8×8 and more prediction modes. However, these techniques cause a dramatic increase in computational complexity. An intracoding algorithm is proposed that consists of perceptual fast CU size decision algorithm and fast intraprediction mode decision algorithm. First, based on the visual saliency detection, an adaptive and fast CU size decision method is proposed to alleviate intraencoding complexity. Furthermore, a fast intraprediction mode decision algorithm with step halving rough mode decision method and early modes pruning algorithm is presented to selectively check the potential modes and effectively reduce the complexity of computation. Experimental results show that our proposed fast method reduces the computational complexity of the current HM to about 57% in encoding time with only 0.37% increases in BD rate. Meanwhile, the proposed fast algorithm has reasonable peak signal-to-noise ratio losses and nearly the same subjective perceptual quality.
Switching algorithm for maglev train double-modular redundant positioning sensors.

PubMed

He, Ning; Long, Zhiqiang; Xue, Song

2012-01-01

High-resolution positioning for maglev trains is implemented by detecting the tooth-slot structure of the long stator installed along the rail, but there are large joint gaps between long stator sections. When a positioning sensor is below a large joint gap, its positioning signal is invalidated, thus double-modular redundant positioning sensors are introduced into the system. This paper studies switching algorithms for these redundant positioning sensors. At first, adaptive prediction is applied to the sensor signals. The prediction errors are used to trigger sensor switching. In order to enhance the reliability of the switching algorithm, wavelet analysis is introduced to suppress measuring disturbances without weakening the signal characteristics reflecting the stator joint gap based on the correlation between the wavelet coefficients of adjacent scales. The time delay characteristics of the method are analyzed to guide the algorithm simplification. Finally, the effectiveness of the simplified switching algorithm is verified through experiments.
Switching Algorithm for Maglev Train Double-Modular Redundant Positioning Sensors

PubMed Central

He, Ning; Long, Zhiqiang; Xue, Song

2012-01-01

High-resolution positioning for maglev trains is implemented by detecting the tooth-slot structure of the long stator installed along the rail, but there are large joint gaps between long stator sections. When a positioning sensor is below a large joint gap, its positioning signal is invalidated, thus double-modular redundant positioning sensors are introduced into the system. This paper studies switching algorithms for these redundant positioning sensors. At first, adaptive prediction is applied to the sensor signals. The prediction errors are used to trigger sensor switching. In order to enhance the reliability of the switching algorithm, wavelet analysis is introduced to suppress measuring disturbances without weakening the signal characteristics reflecting the stator joint gap based on the correlation between the wavelet coefficients of adjacent scales. The time delay characteristics of the method are analyzed to guide the algorithm simplification. Finally, the effectiveness of the simplified switching algorithm is verified through experiments. PMID:23112657
Multiple objects tracking with HOGs matching in circular windows

NASA Astrophysics Data System (ADS)

Miramontes-Jaramillo, Daniel; Kober, Vitaly; Díaz-Ramírez, Víctor H.

2014-09-01

In recent years tracking applications with development of new technologies like smart TVs, Kinect, Google Glass and Oculus Rift become very important. When tracking uses a matching algorithm, a good prediction algorithm is required to reduce the search area for each object to be tracked as well as processing time. In this work, we analyze the performance of different tracking algorithms based on prediction and matching for a real-time tracking multiple objects. The used matching algorithm utilizes histograms of oriented gradients. It carries out matching in circular windows, and possesses rotation invariance and tolerance to viewpoint and scale changes. The proposed algorithm is implemented in a personal computer with GPU, and its performance is analyzed in terms of processing time in real scenarios. Such implementation takes advantage of current technologies and helps to process video sequences in real-time for tracking several objects at the same time.
Can we predict failure in couple therapy early enough to enhance outcome?

PubMed

Pepping, Christopher A; Halford, W Kim; Doss, Brian D

2015-02-01

Feedback to therapists based on systematic monitoring of individual therapy progress reliably enhances therapy outcome. An implicit assumption of therapy progress feedback is that clients unlikely to benefit from therapy can be detected early enough in the course of therapy for corrective action to be taken. To explore the possibility of using feedback of therapy progress to enhance couple therapy outcome, the current study tested whether weekly therapy progress could detect off-track clients early in couple therapy. In an effectiveness trial of couple therapy, 136 couples were monitored weekly on relationship satisfaction and an expert derived algorithm was used to attempt to predict eventual therapy outcome. As expected, the algorithm detected a significant proportion of couples who did not benefit from couple therapy at Session 3, but prediction was substantially improved at Session 4 so that eventual outcome was accurately predicted for 70% of couples, with little improvement of prediction thereafter. More sophisticated algorithms might enhance prediction accuracy, and a trial of the effects of therapy progress feedback on couple therapy outcome is needed. Copyright © 2015 Elsevier Ltd. All rights reserved.
Using experimental data to test an n -body dynamical model coupled with an energy-based clusterization algorithm at low incident energies

NASA Astrophysics Data System (ADS)

Kumar, Rohit; Puri, Rajeev K.

2018-03-01

Employing the quantum molecular dynamics (QMD) approach for nucleus-nucleus collisions, we test the predictive power of the energy-based clusterization algorithm, i.e., the simulating annealing clusterization algorithm (SACA), to describe the experimental data of charge distribution and various event-by-event correlations among fragments. The calculations are constrained into the Fermi-energy domain and/or mildly excited nuclear matter. Our detailed study spans over different system masses, and system-mass asymmetries of colliding partners show the importance of the energy-based clusterization algorithm for understanding multifragmentation. The present calculations are also compared with the other available calculations, which use one-body models, statistical models, and/or hybrid models.
Knowledge-based analysis of microarrays for the discovery of transcriptional regulation relationships

PubMed Central

2010-01-01

Background The large amount of high-throughput genomic data has facilitated the discovery of the regulatory relationships between transcription factors and their target genes. While early methods for discovery of transcriptional regulation relationships from microarray data often focused on the high-throughput experimental data alone, more recent approaches have explored the integration of external knowledge bases of gene interactions. Results In this work, we develop an algorithm that provides improved performance in the prediction of transcriptional regulatory relationships by supplementing the analysis of microarray data with a new method of integrating information from an existing knowledge base. Using a well-known dataset of yeast microarrays and the Yeast Proteome Database, a comprehensive collection of known information of yeast genes, we show that knowledge-based predictions demonstrate better sensitivity and specificity in inferring new transcriptional interactions than predictions from microarray data alone. We also show that comprehensive, direct and high-quality knowledge bases provide better prediction performance. Comparison of our results with ChIP-chip data and growth fitness data suggests that our predicted genome-wide regulatory pairs in yeast are reasonable candidates for follow-up biological verification. Conclusion High quality, comprehensive, and direct knowledge bases, when combined with appropriate bioinformatic algorithms, can significantly improve the discovery of gene regulatory relationships from high throughput gene expression data. PMID:20122245
Knowledge-based analysis of microarrays for the discovery of transcriptional regulation relationships.

PubMed

Seok, Junhee; Kaushal, Amit; Davis, Ronald W; Xiao, Wenzhong

2010-01-18

The large amount of high-throughput genomic data has facilitated the discovery of the regulatory relationships between transcription factors and their target genes. While early methods for discovery of transcriptional regulation relationships from microarray data often focused on the high-throughput experimental data alone, more recent approaches have explored the integration of external knowledge bases of gene interactions. In this work, we develop an algorithm that provides improved performance in the prediction of transcriptional regulatory relationships by supplementing the analysis of microarray data with a new method of integrating information from an existing knowledge base. Using a well-known dataset of yeast microarrays and the Yeast Proteome Database, a comprehensive collection of known information of yeast genes, we show that knowledge-based predictions demonstrate better sensitivity and specificity in inferring new transcriptional interactions than predictions from microarray data alone. We also show that comprehensive, direct and high-quality knowledge bases provide better prediction performance. Comparison of our results with ChIP-chip data and growth fitness data suggests that our predicted genome-wide regulatory pairs in yeast are reasonable candidates for follow-up biological verification. High quality, comprehensive, and direct knowledge bases, when combined with appropriate bioinformatic algorithms, can significantly improve the discovery of gene regulatory relationships from high throughput gene expression data.
An ensemble approach to protein fold classification by integration of template-based assignment and support vector machine classifier.

PubMed

Xia, Jiaqi; Peng, Zhenling; Qi, Dawei; Mu, Hongbo; Yang, Jianyi

2017-03-15

Protein fold classification is a critical step in protein structure prediction. There are two possible ways to classify protein folds. One is through template-based fold assignment and the other is ab-initio prediction using machine learning algorithms. Combination of both solutions to improve the prediction accuracy was never explored before. We developed two algorithms, HH-fold and SVM-fold for protein fold classification. HH-fold is a template-based fold assignment algorithm using the HHsearch program. SVM-fold is a support vector machine-based ab-initio classification algorithm, in which a comprehensive set of features are extracted from three complementary sequence profiles. These two algorithms are then combined, resulting to the ensemble approach TA-fold. We performed a comprehensive assessment for the proposed methods by comparing with ab-initio methods and template-based threading methods on six benchmark datasets. An accuracy of 0.799 was achieved by TA-fold on the DD dataset that consists of proteins from 27 folds. This represents improvement of 5.4-11.7% over ab-initio methods. After updating this dataset to include more proteins in the same folds, the accuracy increased to 0.971. In addition, TA-fold achieved >0.9 accuracy on a large dataset consisting of 6451 proteins from 184 folds. Experiments on the LE dataset show that TA-fold consistently outperforms other threading methods at the family, superfamily and fold levels. The success of TA-fold is attributed to the combination of template-based fold assignment and ab-initio classification using features from complementary sequence profiles that contain rich evolution information. http://yanglab.nankai.edu.cn/TA-fold/. yangjy@nankai.edu.cn or mhb-506@163.com. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Validation of Medicaid claims-based diagnosis of myocardial infarction using an HIV clinical cohort

PubMed Central

Brouwer, Emily S.; Napravnik, Sonia; Eron, Joseph J; Simpson, Ross J; Brookhart, M. Alan; Stalzer, Brant; Vinikoor, Michael; Floris-Moore, Michelle; Stürmer, Til

2014-01-01

Background In non-experimental comparative effectiveness research using healthcare databases, outcome measurements must be validated to evaluate and potentially adjust for misclassification bias. We aimed to validate claims-based myocardial infarction algorithms in a Medicaid population using an HIV clinical cohort as the gold standard. Methods Medicaid administrative data were obtained for the years 2002–2008 and linked to the UNC CFAR HIV Clinical Cohort based on social security number, first name and last name and myocardial infarction were adjudicated. Sensitivity, specificity, positive predictive value, and negative predictive value were calculated. Results There were 1,063 individuals included. Over a median observed time of 2.5 years, 17 had a myocardial infarction. Specificity ranged from 0.979–0.993 with the highest specificity obtained using criteria with the ICD-9 code in the primary and secondary position and a length of stay ≥ 3 days. Sensitivity of myocardial infarction ascertainment varied from 0.588–0.824 depending on algorithm. Conclusion: Specificities of varying claims-based myocardial infarction ascertainment criteria are high but small changes impact positive predictive value in a cohort with low incidence. Sensitivities vary based on ascertainment criteria. Type of algorithm used should be prioritized based on study question and maximization of specific validation parameters that will minimize bias while also considering precision. PMID:23604043
Annotation of Alternatively Spliced Proteins and Transcripts with Protein-Folding Algorithms and Isoform-Level Functional Networks.

PubMed

Li, Hongdong; Zhang, Yang; Guan, Yuanfang; Menon, Rajasree; Omenn, Gilbert S

2017-01-01

Tens of thousands of splice isoforms of proteins have been catalogued as predicted sequences from transcripts in humans and other species. Relatively few have been characterized biochemically or structurally. With the extensive development of protein bioinformatics, the characterization and modeling of isoform features, isoform functions, and isoform-level networks have advanced notably. Here we present applications of the I-TASSER family of algorithms for folding and functional predictions and the IsoFunc, MIsoMine, and Hisonet data resources for isoform-level analyses of network and pathway-based functional predictions and protein-protein interactions. Hopefully, predictions and insights from protein bioinformatics will stimulate many experimental validation studies.
Automatic measurement of voice onset time using discriminative structured prediction.

PubMed

Sonderegger, Morgan; Keshet, Joseph

2012-12-01

A discriminative large-margin algorithm for automatic measurement of voice onset time (VOT) is described, considered as a case of predicting structured output from speech. Manually labeled data are used to train a function that takes as input a speech segment of an arbitrary length containing a voiceless stop, and outputs its VOT. The function is explicitly trained to minimize the difference between predicted and manually measured VOT; it operates on a set of acoustic feature functions designed based on spectral and temporal cues used by human VOT annotators. The algorithm is applied to initial voiceless stops from four corpora, representing different types of speech. Using several evaluation methods, the algorithm's performance is near human intertranscriber reliability, and compares favorably with previous work. Furthermore, the algorithm's performance is minimally affected by training and testing on different corpora, and remains essentially constant as the amount of training data is reduced to 50-250 manually labeled examples, demonstrating the method's practical applicability to new datasets.
An Efficient Deterministic Approach to Model-based Prediction Uncertainty Estimation

DTIC Science & Technology

2012-09-01

94035, USA abhinav.saxena@nasa.gov ABSTRACT Prognostics deals with the prediction of the end of life ( EOL ) of a system. EOL is a random variable, due...future evolution of the system, accumulating additional uncertainty into the predicted EOL . Prediction algorithms that do not account for these sources of...uncertainty are misrepresenting the EOL and can lead to poor decisions based on their results. In this paper, we explore the impact of uncertainty in
Metaheuristic optimization approaches to predict shear-wave velocity from conventional well logs in sandstone and carbonate case studies

NASA Astrophysics Data System (ADS)

Emami Niri, Mohammad; Amiri Kolajoobi, Rasool; Khodaiy Arbat, Mohammad; Shahbazi Raz, Mahdi

2018-06-01

Seismic wave velocities, along with petrophysical data, provide valuable information during the exploration and development stages of oil and gas fields. The compressional-wave velocity (VP ) is acquired using conventional acoustic logging tools in many drilled wells. But the shear-wave velocity (VS ) is recorded using advanced logging tools only in a limited number of wells, mainly because of the high operational costs. In addition, laboratory measurements of seismic velocities on core samples are expensive and time consuming. So, alternative methods are often used to estimate VS . Heretofore, several empirical correlations that predict VS by using well logging measurements and petrophysical data such as VP , porosity and density are proposed. However, these empirical relations can only be used in limited cases. The use of intelligent systems and optimization algorithms are inexpensive, fast and efficient approaches for predicting VS. In this study, in addition to the widely used Greenberg–Castagna empirical method, we implement three relatively recently developed metaheuristic algorithms to construct linear and nonlinear models for predicting VS : teaching–learning based optimization, imperialist competitive and artificial bee colony algorithms. We demonstrate the applicability and performance of these algorithms to predict Vs using conventional well logs in two field data examples, a sandstone formation from an offshore oil field and a carbonate formation from an onshore oil field. We compared the estimated VS using each of the employed metaheuristic approaches with observed VS and also with those predicted by Greenberg–Castagna relations. The results indicate that, for both sandstone and carbonate case studies, all three implemented metaheuristic algorithms are more efficient and reliable than the empirical correlation to predict VS . The results also demonstrate that in both sandstone and carbonate case studies, the performance of an artificial bee colony algorithm in VS prediction is slightly higher than two other alternative employed approaches.
Predicting Ligand Binding Sites on Protein Surfaces by 3-Dimensional Probability Density Distributions of Interacting Atoms

PubMed Central

Jian, Jhih-Wei; Elumalai, Pavadai; Pitti, Thejkiran; Wu, Chih Yuan; Tsai, Keng-Chang; Chang, Jeng-Yih; Peng, Hung-Pin; Yang, An-Suei

2016-01-01

Predicting ligand binding sites (LBSs) on protein structures, which are obtained either from experimental or computational methods, is a useful first step in functional annotation or structure-based drug design for the protein structures. In this work, the structure-based machine learning algorithm ISMBLab-LIG was developed to predict LBSs on protein surfaces with input attributes derived from the three-dimensional probability density maps of interacting atoms, which were reconstructed on the query protein surfaces and were relatively insensitive to local conformational variations of the tentative ligand binding sites. The prediction accuracy of the ISMBLab-LIG predictors is comparable to that of the best LBS predictors benchmarked on several well-established testing datasets. More importantly, the ISMBLab-LIG algorithm has substantial tolerance to the prediction uncertainties of computationally derived protein structure models. As such, the method is particularly useful for predicting LBSs not only on experimental protein structures without known LBS templates in the database but also on computationally predicted model protein structures with structural uncertainties in the tentative ligand binding sites. PMID:27513851

Effect of window length on performance of the elbow-joint angle prediction based on electromyography

NASA Astrophysics Data System (ADS)

Triwiyanto; Wahyunggoro, Oyas; Adi Nugroho, Hanung; Herianto

2017-05-01

The high performance of the elbow joint angle prediction is essential on the development of the devices based on electromyography (EMG) control. The performance of the prediction depends on the feature of extraction parameters such as window length. In this paper, we evaluated the effect of the window length on the performance of the elbow-joint angle prediction. The prediction algorithm consists of zero-crossing feature extraction and second order of Butterworth low pass filter. The feature was used to extract the EMG signal by varying window length. The EMG signal was collected from the biceps muscle while the elbow was moved in the flexion and extension motion. The subject performed the elbow motion by holding a 1-kg load and moved the elbow in different periods (12 seconds, 8 seconds and 6 seconds). The results indicated that the window length affected the performance of the prediction. The 250 window lengths yielded the best performance of the prediction algorithm of (mean±SD) root mean square error = 5.68%±1.53% and Person’s correlation = 0.99±0.0059.
Multi-model blending

DOEpatents

Hamann, Hendrik F.; Hwang, Youngdeok; van Kessel, Theodore G.; Khabibrakhmanov, Ildar K.; Muralidhar, Ramachandran

2016-10-18

A method and a system to perform multi-model blending are described. The method includes obtaining one or more sets of predictions of historical conditions, the historical conditions corresponding with a time T that is historical in reference to current time, and the one or more sets of predictions of the historical conditions being output by one or more models. The method also includes obtaining actual historical conditions, the actual historical conditions being measured conditions at the time T, assembling a training data set including designating the two or more set of predictions of historical conditions as predictor variables and the actual historical conditions as response variables, and training a machine learning algorithm based on the training data set. The method further includes obtaining a blended model based on the machine learning algorithm.
Refined genetic algorithm -- Economic dispatch example

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sheble, G.B.; Brittig, K.

1995-02-01

A genetic-based algorithm is used to solve an economic dispatch (ED) problem. The algorithm utilizes payoff information of perspective solutions to evaluate optimality. Thus, the constraints of classical LaGrangian techniques on unit curves are eliminated. Using an economic dispatch problem as a basis for comparison, several different techniques which enhance program efficiency and accuracy, such as mutation prediction, elitism, interval approximation and penalty factors, are explored. Two unique genetic algorithms are also compared. The results are verified for a sample problem using a classical technique.
On several aspects and applications of the multigrid method for solving partial differential equations

NASA Technical Reports Server (NTRS)

Dinar, N.

1978-01-01

Several aspects of multigrid methods are briefly described. The main subjects include the development of very efficient multigrid algorithms for systems of elliptic equations (Cauchy-Riemann, Stokes, Navier-Stokes), as well as the development of control and prediction tools (based on local mode Fourier analysis), used to analyze, check and improve these algorithms. Preliminary research on multigrid algorithms for time dependent parabolic equations is also described. Improvements in existing multigrid processes and algorithms for elliptic equations were studied.
Effect of symptom-based risk stratification on the costs of managing patients with chronic rhinosinusitis symptoms.

PubMed

Tan, Bruce K; Lu, Guanning; Kwasny, Mary J; Hsueh, Wayne D; Shintani-Smith, Stephanie; Conley, David B; Chandra, Rakesh K; Kern, Robert C; Leung, Randy

2013-11-01

Current symptom criteria poorly predict a diagnosis of chronic rhinosinusitis (CRS) resulting in excessive treatment of patients with presumed CRS. The objective of this study was analyze the positive predictive value of individual symptoms, or symptoms in combination, in patients with CRS symptoms and examine the costs of the subsequent diagnostic algorithm using a decision tree-based cost analysis. We analyzed previously collected patient-reported symptoms from a cross-sectional study of patients who had received a computed tomography (CT) scan of their sinuses at a tertiary care otolaryngology clinic for evaluation of CRS symptoms to calculate the positive predictive value of individual symptoms. Classification and regression tree (CART) analysis then optimized combinations of symptoms and thresholds to identify CRS patients. The calculated positive predictive values were applied to a previously developed decision tree that compared an upfront CT (uCT) algorithm against an empiric medical therapy (EMT) algorithm with further analysis that considered the availability of point of care (POC) imaging. The positive predictive value of individual symptoms ranged from 0.21 for patients reporting forehead pain and to 0.69 for patients reporting hyposmia. The CART model constructed a dichotomous model based on forehead pain, maxillary pain, hyposmia, nasal discharge, and facial pain (C-statistic 0.83). If POC CT were available, median costs ($64-$415) favored using the upfront CT for all individual symptoms. If POC CT was unavailable, median costs favored uCT for most symptoms except intercanthal pain (-$15), hyposmia (-$100), and discolored nasal discharge (-$24), although these symptoms became equivocal on cost sensitivity analysis. The three-tiered CART model could subcategorize patients into tiers where uCT was always favorable (median costs: $332-$504) and others for which EMT was always favorable (median costs -$121 to -$275). The uCT algorithm was always more costly if the nasal endoscopy was positive. Among patients with classic CRS symptoms, the frequency of individual symptoms varied the likelihood of a CRS diagnosis marginally. Only hyposmia, the absence of facial pain, and discolored discharge sufficiently increased the likelihood of diagnosis to potentially make EMT less costly. The development of an evidence-based, multisymptom-based risk stratification model could substantially affect the management costs of the subsequent diagnostic algorithm. © 2013 ARS-AAOA, LLC.
A TCAS-II Resolution Advisory Detection Algorithm

NASA Technical Reports Server (NTRS)

Munoz, Cesar; Narkawicz, Anthony; Chamberlain, James

2013-01-01

The Traffic Alert and Collision Avoidance System (TCAS) is a family of airborne systems designed to reduce the risk of mid-air collisions between aircraft. TCASII, the current generation of TCAS devices, provides resolution advisories that direct pilots to maintain or increase vertical separation when aircraft distance and time parameters are beyond designed system thresholds. This paper presents a mathematical model of the TCASII Resolution Advisory (RA) logic that assumes accurate aircraft state information. Based on this model, an algorithm for RA detection is also presented. This algorithm is analogous to a conflict detection algorithm, but instead of predicting loss of separation, it predicts resolution advisories. It has been formally verified that for a kinematic model of aircraft trajectories, this algorithm completely and correctly characterizes all encounter geometries between two aircraft that lead to a resolution advisory within a given lookahead time interval. The RA detection algorithm proposed in this paper is a fundamental component of a NASA sense and avoid concept for the integration of Unmanned Aircraft Systems in civil airspace.
Testing mapping algorithms of the cancer-specific EORTC QLQ-C30 onto EQ-5D in malignant mesothelioma.

PubMed

Arnold, David T; Rowen, Donna; Versteegh, Matthijs M; Morley, Anna; Hooper, Clare E; Maskell, Nicholas A

2015-01-23

In order to estimate utilities for cancer studies where the EQ-5D was not used, the EORTC QLQ-C30 can be used to estimate EQ-5D using existing mapping algorithms. Several mapping algorithms exist for this transformation, however, algorithms tend to lose accuracy in patients in poor health states. The aim of this study was to test all existing mapping algorithms of QLQ-C30 onto EQ-5D, in a dataset of patients with malignant pleural mesothelioma, an invariably fatal malignancy where no previous mapping estimation has been published. Health related quality of life (HRQoL) data where both the EQ-5D and QLQ-C30 were used simultaneously was obtained from the UK-based prospective observational SWAMP (South West Area Mesothelioma and Pemetrexed) trial. In the original trial 73 patients with pleural mesothelioma were offered palliative chemotherapy and their HRQoL was assessed across five time points. This data was used to test the nine available mapping algorithms found in the literature, comparing predicted against observed EQ-5D values. The ability of algorithms to predict the mean, minimise error and detect clinically significant differences was assessed. The dataset had a total of 250 observations across 5 timepoints. The linear regression mapping algorithms tested generally performed poorly, over-estimating the predicted compared to observed EQ-5D values, especially when observed EQ-5D was below 0.5. The best performing algorithm used a response mapping method and predicted the mean EQ-5D with accuracy with an average root mean squared error of 0.17 (Standard Deviation; 0.22). This algorithm reliably discriminated between clinically distinct subgroups seen in the primary dataset. This study tested mapping algorithms in a population with poor health states, where they have been previously shown to perform poorly. Further research into EQ-5D estimation should be directed at response mapping methods given its superior performance in this study.
Predicting CD4 count changes among patients on antiretroviral treatment: Application of data mining techniques.

PubMed

Kebede, Mihiretu; Zegeye, Desalegn Tigabu; Zeleke, Berihun Megabiaw

2017-12-01

To monitor the progress of therapy and disease progression, periodic CD4 counts are required throughout the course of HIV/AIDS care and support. The demand for CD4 count measurement is increasing as ART programs expand over the last decade. This study aimed to predict CD4 count changes and to identify the predictors of CD4 count changes among patients on ART. A cross-sectional study was conducted at the University of Gondar Hospital from 3,104 adult patients on ART with CD4 counts measured at least twice (baseline and most recent). Data were retrieved from the HIV care clinic electronic database and patients` charts. Descriptive data were analyzed by SPSS version 20. Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology was followed to undertake the study. WEKA version 3.8 was used to conduct a predictive data mining. Before building the predictive data mining models, information gain values and correlation-based Feature Selection methods were used for attribute selection. Variables were ranked according to their relevance based on their information gain values. J48, Neural Network, and Random Forest algorithms were experimented to assess model accuracies. The median duration of ART was 191.5 weeks. The mean CD4 count change was 243 (SD 191.14) cells per microliter. Overall, 2427 (78.2%) patients had their CD4 counts increased by at least 100 cells per microliter, while 4% had a decline from the baseline CD4 value. Baseline variables including age, educational status, CD8 count, ART regimen, and hemoglobin levels predicted CD4 count changes with predictive accuracies of J48, Neural Network, and Random Forest being 87.1%, 83.5%, and 99.8%, respectively. Random Forest algorithm had a superior performance accuracy level than both J48 and Artificial Neural Network. The precision, sensitivity and recall values of Random Forest were also more than 99%. Nearly accurate prediction results were obtained using Random Forest algorithm. This algorithm could be used in a low-resource setting to build a web-based prediction model for CD4 count changes. Copyright © 2017 Elsevier B.V. All rights reserved.
Lossless Video Sequence Compression Using Adaptive Prediction

NASA Technical Reports Server (NTRS)

Li, Ying; Sayood, Khalid

2007-01-01

We present an adaptive lossless video compression algorithm based on predictive coding. The proposed algorithm exploits temporal, spatial, and spectral redundancies in a backward adaptive fashion with extremely low side information. The computational complexity is further reduced by using a caching strategy. We also study the relationship between the operational domain for the coder (wavelet or spatial) and the amount of temporal and spatial redundancy in the sequence being encoded. Experimental results show that the proposed scheme provides significant improvements in compression efficiencies.
Association between split selection instability and predictive error in survival trees.

PubMed

Radespiel-Tröger, M; Gefeller, O; Rabenstein, T; Hothorn, T

2006-01-01

To evaluate split selection instability in six survival tree algorithms and its relationship with predictive error by means of a bootstrap study. We study the following algorithms: logrank statistic with multivariate p-value adjustment without pruning (LR), Kaplan-Meier distance of survival curves (KM), martingale residuals (MR), Poisson regression for censored data (PR), within-node impurity (WI), and exponential log-likelihood loss (XL). With the exception of LR, initial trees are pruned by using split-complexity, and final trees are selected by means of cross-validation. We employ a real dataset from a clinical study of patients with gallbladder stones. The predictive error is evaluated using the integrated Brier score for censored data. The relationship between split selection instability and predictive error is evaluated by means of box-percentile plots, covariate and cutpoint selection entropy, and cutpoint selection coefficients of variation, respectively, in the root node. We found a positive association between covariate selection instability and predictive error in the root node. LR yields the lowest predictive error, while KM and MR yield the highest predictive error. The predictive error of survival trees is related to split selection instability. Based on the low predictive error of LR, we recommend the use of this algorithm for the construction of survival trees. Unpruned survival trees with multivariate p-value adjustment can perform equally well compared to pruned trees. The analysis of split selection instability can be used to communicate the results of tree-based analyses to clinicians and to support the application of survival trees.
Motion prediction of a non-cooperative space target

NASA Astrophysics Data System (ADS)

Zhou, Bang-Zhao; Cai, Guo-Ping; Liu, Yun-Meng; Liu, Pan

2018-01-01

Capturing a non-cooperative space target is a tremendously challenging research topic. Effective acquisition of motion information of the space target is the premise to realize target capture. In this paper, motion prediction of a free-floating non-cooperative target in space is studied and a motion prediction algorithm is proposed. In order to predict the motion of the free-floating non-cooperative target, dynamic parameters of the target must be firstly identified (estimated), such as inertia, angular momentum and kinetic energy and so on; then the predicted motion of the target can be acquired by substituting these identified parameters into the Euler's equations of the target. Accurate prediction needs precise identification. This paper presents an effective method to identify these dynamic parameters of a free-floating non-cooperative target. This method is based on two steps, (1) the rough estimation of the parameters is computed using the motion observation data to the target, and (2) the best estimation of the parameters is found by an optimization method. In the optimization problem, the objective function is based on the difference between the observed and the predicted motion, and the interior-point method (IPM) is chosen as the optimization algorithm, which starts at the rough estimate obtained in the first step and finds a global minimum to the objective function with the guidance of objective function's gradient. So the speed of IPM searching for the global minimum is fast, and an accurate identification can be obtained in time. The numerical results show that the proposed motion prediction algorithm is able to predict the motion of the target.
MED: a new non-supervised gene prediction algorithm for bacterial and archaeal genomes.

PubMed

Zhu, Huaiqiu; Hu, Gang-Qing; Yang, Yi-Fan; Wang, Jin; She, Zhen-Su

2007-03-16

Despite a remarkable success in the computational prediction of genes in Bacteria and Archaea, a lack of comprehensive understanding of prokaryotic gene structures prevents from further elucidation of differences among genomes. It continues to be interesting to develop new ab initio algorithms which not only accurately predict genes, but also facilitate comparative studies of prokaryotic genomes. This paper describes a new prokaryotic genefinding algorithm based on a comprehensive statistical model of protein coding Open Reading Frames (ORFs) and Translation Initiation Sites (TISs). The former is based on a linguistic "Entropy Density Profile" (EDP) model of coding DNA sequence and the latter comprises several relevant features related to the translation initiation. They are combined to form a so-called Multivariate Entropy Distance (MED) algorithm, MED 2.0, that incorporates several strategies in the iterative program. The iterations enable us to develop a non-supervised learning process and to obtain a set of genome-specific parameters for the gene structure, before making the prediction of genes. Results of extensive tests show that MED 2.0 achieves a competitive high performance in the gene prediction for both 5' and 3' end matches, compared to the current best prokaryotic gene finders. The advantage of the MED 2.0 is particularly evident for GC-rich genomes and archaeal genomes. Furthermore, the genome-specific parameters given by MED 2.0 match with the current understanding of prokaryotic genomes and may serve as tools for comparative genomic studies. In particular, MED 2.0 is shown to reveal divergent translation initiation mechanisms in archaeal genomes while making a more accurate prediction of TISs compared to the existing gene finders and the current GenBank annotation.
Reducing patient mortality, length of stay and readmissions through machine learning-based sepsis prediction in the emergency department, intensive care unit and hospital floor units

PubMed Central

McCoy, Andrea

2017-01-01

Introduction Sepsis management is a challenge for hospitals nationwide, as severe sepsis carries high mortality rates and costs the US healthcare system billions of dollars each year. It has been shown that early intervention for patients with severe sepsis and septic shock is associated with higher rates of survival. The Cape Regional Medical Center (CRMC) aimed to improve sepsis-related patient outcomes through a revised sepsis management approach. Methods In collaboration with Dascena, CRMC formed a quality improvement team to implement a machine learning-based sepsis prediction algorithm to identify patients with sepsis earlier. Previously, CRMC assessed all patients for sepsis using twice-daily systemic inflammatory response syndrome screenings, but desired improvements. The quality improvement team worked to implement a machine learning-based algorithm, collect and incorporate feedback, and tailor the system to current hospital workflow. Results Relative to the pre-implementation period, the post-implementation period sepsis-related in-hospital mortality rate decreased by 60.24%, sepsis-related hospital length of stay decreased by 9.55% and sepsis-related 30-day readmission rate decreased by 50.14%. Conclusion The machine learning-based sepsis prediction algorithm improved patient outcomes at CRMC. PMID:29450295
Design of a fuzzy differential evolution algorithm to predict non-deposition sediment transport

NASA Astrophysics Data System (ADS)

Ebtehaj, Isa; Bonakdari, Hossein

2017-12-01

Since the flow entering a sewer contains solid matter, deposition at the bottom of the channel is inevitable. It is difficult to understand the complex, three-dimensional mechanism of sediment transport in sewer pipelines. Therefore, a method to estimate the limiting velocity is necessary for optimal designs. Due to the inability of gradient-based algorithms to train Adaptive Neuro-Fuzzy Inference Systems (ANFIS) for non-deposition sediment transport prediction, a new hybrid ANFIS method based on a differential evolutionary algorithm (ANFIS-DE) is developed. The training and testing performance of ANFIS-DE is evaluated using a wide range of dimensionless parameters gathered from the literature. The input combination used to estimate the densimetric Froude number ( Fr) parameters includes the volumetric sediment concentration ( C V ), ratio of median particle diameter to hydraulic radius ( d/R), ratio of median particle diameter to pipe diameter ( d/D) and overall friction factor of sediment ( λ s ). The testing results are compared with the ANFIS model and regression-based equation results. The ANFIS-DE technique predicted sediment transport at limit of deposition with lower root mean square error (RMSE = 0.323) and mean absolute percentage of error (MAPE = 0.065) and higher accuracy ( R 2 = 0.965) than the ANFIS model and regression-based equations.
Predicting the survival of diabetes using neural network

NASA Astrophysics Data System (ADS)

Mamuda, Mamman; Sathasivam, Saratha

2017-08-01

Data mining techniques at the present time are used in predicting diseases of health care industries. Neural Network is one among the prevailing method in data mining techniques of an intelligent field for predicting diseases in health care industries. This paper presents a study on the prediction of the survival of diabetes diseases using different learning algorithms from the supervised learning algorithms of neural network. Three learning algorithms are considered in this study: (i) The levenberg-marquardt learning algorithm (ii) The Bayesian regulation learning algorithm and (iii) The scaled conjugate gradient learning algorithm. The network is trained using the Pima Indian Diabetes Dataset with the help of MATLAB R2014(a) software. The performance of each algorithm is further discussed through regression analysis. The prediction accuracy of the best algorithm is further computed to validate the accurate prediction
Network Community Detection based on the Physarum-inspired Computational Framework.

PubMed

Gao, Chao; Liang, Mingxin; Li, Xianghua; Zhang, Zili; Wang, Zhen; Zhou, Zhili

2016-12-13

Community detection is a crucial and essential problem in the structure analytics of complex networks, which can help us understand and predict the characteristics and functions of complex networks. Many methods, ranging from the optimization-based algorithms to the heuristic-based algorithms, have been proposed for solving such a problem. Due to the inherent complexity of identifying network structure, how to design an effective algorithm with a higher accuracy and a lower computational cost still remains an open problem. Inspired by the computational capability and positive feedback mechanism in the wake of foraging process of Physarum, which is a large amoeba-like cell consisting of a dendritic network of tube-like pseudopodia, a general Physarum-based computational framework for community detection is proposed in this paper. Based on the proposed framework, the inter-community edges can be identified from the intra-community edges in a network and the positive feedback of solving process in an algorithm can be further enhanced, which are used to improve the efficiency of original optimization-based and heuristic-based community detection algorithms, respectively. Some typical algorithms (e.g., genetic algorithm, ant colony optimization algorithm, and Markov clustering algorithm) and real-world datasets have been used to estimate the efficiency of our proposed computational framework. Experiments show that the algorithms optimized by Physarum-inspired computational framework perform better than the original ones, in terms of accuracy and computational cost. Moreover, a computational complexity analysis verifies the scalability of our framework.
Predicting distant failure in early stage NSCLC treated with SBRT using clinical parameters.

PubMed

Zhou, Zhiguo; Folkert, Michael; Cannon, Nathan; Iyengar, Puneeth; Westover, Kenneth; Zhang, Yuanyuan; Choy, Hak; Timmerman, Robert; Yan, Jingsheng; Xie, Xian-J; Jiang, Steve; Wang, Jing

2016-06-01

The aim of this study is to predict early distant failure in early stage non-small cell lung cancer (NSCLC) treated with stereotactic body radiation therapy (SBRT) using clinical parameters by machine learning algorithms. The dataset used in this work includes 81 early stage NSCLC patients with at least 6months of follow-up who underwent SBRT between 2006 and 2012 at a single institution. The clinical parameters (n=18) for each patient include demographic parameters, tumor characteristics, treatment fraction schemes, and pretreatment medications. Three predictive models were constructed based on different machine learning algorithms: (1) artificial neural network (ANN), (2) logistic regression (LR) and (3) support vector machine (SVM). Furthermore, to select an optimal clinical parameter set for the model construction, three strategies were adopted: (1) clonal selection algorithm (CSA) based selection strategy; (2) sequential forward selection (SFS) method; and (3) statistical analysis (SA) based strategy. 5-cross-validation is used to validate the performance of each predictive model. The accuracy was assessed by area under the receiver operating characteristic (ROC) curve (AUC), sensitivity and specificity of the system was also evaluated. The AUCs for ANN, LR and SVM were 0.75, 0.73, and 0.80, respectively. The sensitivity values for ANN, LR and SVM were 71.2%, 72.9% and 83.1%, while the specificity values for ANN, LR and SVM were 59.1%, 63.6% and 63.6%, respectively. Meanwhile, the CSA based strategy outperformed SFS and SA in terms of AUC, sensitivity and specificity. Based on clinical parameters, the SVM with the CSA optimal parameter set selection strategy achieves better performance than other strategies for predicting distant failure in lung SBRT patients. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Glucose Prediction Algorithms from Continuous Monitoring Data: Assessment of Accuracy via Continuous Glucose Error-Grid Analysis.

PubMed

Zanderigo, Francesca; Sparacino, Giovanni; Kovatchev, Boris; Cobelli, Claudio

2007-09-01

The aim of this article was to use continuous glucose error-grid analysis (CG-EGA) to assess the accuracy of two time-series modeling methodologies recently developed to predict glucose levels ahead of time using continuous glucose monitoring (CGM) data. We considered subcutaneous time series of glucose concentration monitored every 3 minutes for 48 hours by the minimally invasive CGM sensor Glucoday® (Menarini Diagnostics, Florence, Italy) in 28 type 1 diabetic volunteers. Two prediction algorithms, based on first-order polynomial and autoregressive (AR) models, respectively, were considered with prediction horizons of 30 and 45 minutes and forgetting factors (ff) of 0.2, 0.5, and 0.8. CG-EGA was used on the predicted profiles to assess their point and dynamic accuracies using original CGM profiles as reference. Continuous glucose error-grid analysis showed that the accuracy of both prediction algorithms is overall very good and that their performance is similar from a clinical point of view. However, the AR model seems preferable for hypoglycemia prevention. CG-EGA also suggests that, irrespective of the time-series model, the use of ff = 0.8 yields the highest accurate readings in all glucose ranges. For the first time, CG-EGA is proposed as a tool to assess clinically relevant performance of a prediction method separately at hypoglycemia, euglycemia, and hyperglycemia. In particular, we have shown that CG-EGA can be helpful in comparing different prediction algorithms, as well as in optimizing their parameters.
In Situ Measurement of Some Soil Properties in Paddy Soil Using Visible and Near-Infrared Spectroscopy

PubMed Central

Wenjun, Ji; Zhou, Shi; Jingyi, Huang; Shuo, Li

2014-01-01

In situ measurements with visible and near-infrared spectroscopy (vis-NIR) provide an efficient way for acquiring soil information of paddy soils in the short time gap between the harvest and following rotation. The aim of this study was to evaluate its feasibility to predict a series of soil properties including organic matter (OM), organic carbon (OC), total nitrogen (TN), available nitrogen (AN), available phosphorus (AP), available potassium (AK) and pH of paddy soils in Zhejiang province, China. Firstly, the linear partial least squares regression (PLSR) was performed on the in situ spectra and the predictions were compared to those with laboratory-based recorded spectra. Then, the non-linear least-square support vector machine (LS-SVM) algorithm was carried out aiming to extract more useful information from the in situ spectra and improve predictions. Results show that in terms of OC, OM, TN, AN and pH, (i) the predictions were worse using in situ spectra compared to laboratory-based spectra with PLSR algorithm (ii) the prediction accuracy using LS-SVM (R2>0.75, RPD>1.90) was obviously improved with in situ vis-NIR spectra compared to PLSR algorithm, and comparable or even better than results generated using laboratory-based spectra with PLSR; (iii) in terms of AP and AK, poor predictions were obtained with in situ spectra (R2<0.5, RPD<1.50) either using PLSR or LS-SVM. The results highlight the use of LS-SVM for in situ vis-NIR spectroscopic estimation of soil properties of paddy soils. PMID:25153132
Symbolic Processing Combined with Model-Based Reasoning

NASA Technical Reports Server (NTRS)

James, Mark

2009-01-01

A computer program for the detection of present and prediction of future discrete states of a complex, real-time engineering system utilizes a combination of symbolic processing and numerical model-based reasoning. One of the biggest weaknesses of a purely symbolic approach is that it enables prediction of only future discrete states while missing all unmodeled states or leading to incorrect identification of an unmodeled state as a modeled one. A purely numerical approach is based on a combination of statistical methods and mathematical models of the applicable physics and necessitates development of a complete model to the level of fidelity required for prediction. In addition, a purely numerical approach does not afford the ability to qualify its results without some form of symbolic processing. The present software implements numerical algorithms to detect unmodeled events and symbolic algorithms to predict expected behavior, correlate the expected behavior with the unmodeled events, and interpret the results in order to predict future discrete states. The approach embodied in this software differs from that of the BEAM methodology (aspects of which have been discussed in several prior NASA Tech Briefs articles), which provides for prediction of future measurements in the continuous-data domain.

sNebula, a network-based algorithm to predict binding between human leukocyte antigens and peptides

DOE Office of Scientific and Technical Information (OSTI.GOV)

Luo, Heng; Ye, Hao; Ng, Hui Wen

Understanding the binding between human leukocyte antigens (HLAs) and peptides is important to understand the functioning of the immune system. Since it is time-consuming and costly to measure the binding between large numbers of HLAs and peptides, computational methods including machine learning models and network approaches have been developed to predict HLA-peptide binding. However, there are several limitations for the existing methods. We developed a network-based algorithm called sNebula to address these limitations. We curated qualitative Class I HLA-peptide binding data and demonstrated the prediction performance of sNebula on this dataset using leave-one-out cross-validation and five-fold cross-validations. Furthermore, this algorithmmore » can predict not only peptides of different lengths and different types of HLAs, but also the peptides or HLAs that have no existing binding data. We believe sNebula is an effective method to predict HLA-peptide binding and thus improve our understanding of the immune system.« less
sNebula, a network-based algorithm to predict binding between human leukocyte antigens and peptides

DOE PAGES

Luo, Heng; Ye, Hao; Ng, Hui Wen; ...

2016-08-25

Understanding the binding between human leukocyte antigens (HLAs) and peptides is important to understand the functioning of the immune system. Since it is time-consuming and costly to measure the binding between large numbers of HLAs and peptides, computational methods including machine learning models and network approaches have been developed to predict HLA-peptide binding. However, there are several limitations for the existing methods. We developed a network-based algorithm called sNebula to address these limitations. We curated qualitative Class I HLA-peptide binding data and demonstrated the prediction performance of sNebula on this dataset using leave-one-out cross-validation and five-fold cross-validations. Furthermore, this algorithmmore » can predict not only peptides of different lengths and different types of HLAs, but also the peptides or HLAs that have no existing binding data. We believe sNebula is an effective method to predict HLA-peptide binding and thus improve our understanding of the immune system.« less
A machine learning approach to triaging patients with chronic obstructive pulmonary disease

PubMed Central

Qirko, Klajdi; Smith, Ted; Corcoran, Ethan; Wysham, Nicholas G.; Bazaz, Gaurav; Kappel, George; Gerber, Anthony N.

2017-01-01

COPD patients are burdened with a daily risk of acute exacerbation and loss of control, which could be mitigated by effective, on-demand decision support tools. In this study, we present a machine learning-based strategy for early detection of exacerbations and subsequent triage. Our application uses physician opinion in a statistically and clinically comprehensive set of patient cases to train a supervised prediction algorithm. The accuracy of the model is assessed against a panel of physicians each triaging identical cases in a representative patient validation set. Our results show that algorithm accuracy and safety indicators surpass all individual pulmonologists in both identifying exacerbations and predicting the consensus triage in a 101 case validation set. The algorithm is also the top performer in sensitivity, specificity, and ppv when predicting a patient’s need for emergency care. PMID:29166411
Combined rule extraction and feature elimination in supervised classification.

PubMed

Liu, Sheng; Patel, Ronak Y; Daga, Pankaj R; Liu, Haining; Fu, Gang; Doerksen, Robert J; Chen, Yixin; Wilkins, Dawn E

2012-09-01

There are a vast number of biology related research problems involving a combination of multiple sources of data to achieve a better understanding of the underlying problems. It is important to select and interpret the most important information from these sources. Thus it will be beneficial to have a good algorithm to simultaneously extract rules and select features for better interpretation of the predictive model. We propose an efficient algorithm, Combined Rule Extraction and Feature Elimination (CRF), based on 1-norm regularized random forests. CRF simultaneously extracts a small number of rules generated by random forests and selects important features. We applied CRF to several drug activity prediction and microarray data sets. CRF is capable of producing performance comparable with state-of-the-art prediction algorithms using a small number of decision rules. Some of the decision rules are biologically significant.
Potassium-based algorithm allows correction for the hematocrit bias in quantitative analysis of caffeine and its major metabolite in dried blood spots.

PubMed

De Kesel, Pieter M M; Capiau, Sara; Stove, Veronique V; Lambert, Willy E; Stove, Christophe P

2014-10-01

Although dried blood spot (DBS) sampling is increasingly receiving interest as a potential alternative to traditional blood sampling, the impact of hematocrit (Hct) on DBS results is limiting its final breakthrough in routine bioanalysis. To predict the Hct of a given DBS, potassium (K(+)) proved to be a reliable marker. The aim of this study was to evaluate whether application of an algorithm, based upon predicted Hct or K(+) concentrations as such, allowed correction for the Hct bias. Using validated LC-MS/MS methods, caffeine, chosen as a model compound, was determined in whole blood and corresponding DBS samples with a broad Hct range (0.18-0.47). A reference subset (n = 50) was used to generate an algorithm based on K(+) concentrations in DBS. Application of the developed algorithm on an independent test set (n = 50) alleviated the assay bias, especially at lower Hct values. Before correction, differences between DBS and whole blood concentrations ranged from -29.1 to 21.1%. The mean difference, as obtained by Bland-Altman comparison, was -6.6% (95% confidence interval (CI), -9.7 to -3.4%). After application of the algorithm, differences between corrected and whole blood concentrations lay between -19.9 and 13.9% with a mean difference of -2.1% (95% CI, -4.5 to 0.3%). The same algorithm was applied to a separate compound, paraxanthine, which was determined in 103 samples (Hct range, 0.17-0.47), yielding similar results. In conclusion, a K(+)-based algorithm allows correction for the Hct bias in the quantitative analysis of caffeine and its metabolite paraxanthine.
Automated assessment of cognitive health using smart home technologies.

PubMed

Dawadi, Prafulla N; Cook, Diane J; Schmitter-Edgecombe, Maureen; Parsey, Carolyn

2013-01-01

The goal of this work is to develop intelligent systems to monitor the wellbeing of individuals in their home environments. This paper introduces a machine learning-based method to automatically predict activity quality in smart homes and automatically assess cognitive health based on activity quality. This paper describes an automated framework to extract set of features from smart home sensors data that reflects the activity performance or ability of an individual to complete an activity which can be input to machine learning algorithms. Output from learning algorithms including principal component analysis, support vector machine, and logistic regression algorithms are used to quantify activity quality for a complex set of smart home activities and predict cognitive health of participants. Smart home activity data was gathered from volunteer participants (n=263) who performed a complex set of activities in our smart home testbed. We compare our automated activity quality prediction and cognitive health prediction with direct observation scores and health assessment obtained from neuropsychologists. With all samples included, we obtained statistically significant correlation (r=0.54) between direct observation scores and predicted activity quality. Similarly, using a support vector machine classifier, we obtained reasonable classification accuracy (area under the ROC curve=0.80, g-mean=0.73) in classifying participants into two different cognitive classes, dementia and cognitive healthy. The results suggest that it is possible to automatically quantify the task quality of smart home activities and perform limited assessment of the cognitive health of individual if smart home activities are properly chosen and learning algorithms are appropriately trained.
Automated Assessment of Cognitive Health Using Smart Home Technologies

PubMed Central

Dawadi, Prafulla N.; Cook, Diane J.; Schmitter-Edgecombe, Maureen; Parsey, Carolyn

2014-01-01

BACKGROUND The goal of this work is to develop intelligent systems to monitor the well being of individuals in their home environments. OBJECTIVE This paper introduces a machine learning-based method to automatically predict activity quality in smart homes and automatically assess cognitive health based on activity quality. METHODS This paper describes an automated framework to extract set of features from smart home sensors data that reflects the activity performance or ability of an individual to complete an activity which can be input to machine learning algorithms. Output from learning algorithms including principal component analysis, support vector machine, and logistic regression algorithms are used to quantify activity quality for a complex set of smart home activities and predict cognitive health of participants. RESULTS Smart home activity data was gathered from volunteer participants (n=263) who performed a complex set of activities in our smart home testbed. We compare our automated activity quality prediction and cognitive health prediction with direct observation scores and health assessment obtained from neuropsychologists. With all samples included, we obtained statistically significant correlation (r=0.54) between direct observation scores and predicted activity quality. Similarly, using a support vector machine classifier, we obtained reasonable classification accuracy (area under the ROC curve = 0.80, g-mean = 0.73) in classifying participants into two different cognitive classes, dementia and cognitive healthy. CONCLUSIONS The results suggest that it is possible to automatically quantify the task quality of smart home activities and perform limited assessment of the cognitive health of individual if smart home activities are properly chosen and learning algorithms are appropriately trained. PMID:23949177
Sepsis mortality prediction with the Quotient Basis Kernel.

PubMed

Ribas Ripoll, Vicent J; Vellido, Alfredo; Romero, Enrique; Ruiz-Rodríguez, Juan Carlos

2014-05-01

This paper presents an algorithm to assess the risk of death in patients with sepsis. Sepsis is a common clinical syndrome in the intensive care unit (ICU) that can lead to severe sepsis, a severe state of septic shock or multi-organ failure. The proposed algorithm may be implemented as part of a clinical decision support system that can be used in combination with the scores deployed in the ICU to improve the accuracy, sensitivity and specificity of mortality prediction for patients with sepsis. In this paper, we used the Simplified Acute Physiology Score (SAPS) for ICU patients and the Sequential Organ Failure Assessment (SOFA) to build our kernels and algorithms. In the proposed method, we embed the available data in a suitable feature space and use algorithms based on linear algebra, geometry and statistics for inference. We present a simplified version of the Fisher kernel (practical Fisher kernel for multinomial distributions), as well as a novel kernel that we named the Quotient Basis Kernel (QBK). These kernels are used as the basis for mortality prediction using soft-margin support vector machines. The two new kernels presented are compared against other generative kernels based on the Jensen-Shannon metric (centred, exponential and inverse) and other widely used kernels (linear, polynomial and Gaussian). Clinical relevance is also evaluated by comparing these results with logistic regression and the standard clinical prediction method based on the initial SAPS score. As described in this paper, we tested the new methods via cross-validation with a cohort of 400 test patients. The results obtained using our methods compare favourably with those obtained using alternative kernels (80.18% accuracy for the QBK) and the standard clinical prediction method, which are based on the basal SAPS score or logistic regression (71.32% and 71.55%, respectively). The QBK presented a sensitivity and specificity of 79.34% and 83.24%, which outperformed the other kernels analysed, logistic regression and the standard clinical prediction method based on the basal SAPS score. Several scoring systems for patients with sepsis have been introduced and developed over the last 30 years. They allow for the assessment of the severity of disease and provide an estimate of in-hospital mortality. Physiology-based scoring systems are applied to critically ill patients and have a number of advantages over diagnosis-based systems. Severity score systems are often used to stratify critically ill patients for possible inclusion in clinical trials. In this paper, we present an effective algorithm that combines both scoring methodologies for the assessment of death in patients with sepsis that can be used to improve the sensitivity and specificity of the currently available methods. Copyright © 2014 Elsevier B.V. All rights reserved.
Molecular beacon sequence design algorithm.

PubMed

Monroe, W Todd; Haselton, Frederick R

2003-01-01

A method based on Web-based tools is presented to design optimally functioning molecular beacons. Molecular beacons, fluorogenic hybridization probes, are a powerful tool for the rapid and specific detection of a particular nucleic acid sequence. However, their synthesis costs can be considerable. Since molecular beacon performance is based on its sequence, it is imperative to rationally design an optimal sequence before synthesis. The algorithm presented here uses simple Microsoft Excel formulas and macros to rank candidate sequences. This analysis is carried out using mfold structural predictions along with other free Web-based tools. For smaller laboratories where molecular beacons are not the focus of research, the public domain algorithm described here may be usefully employed to aid in molecular beacon design.
Subjective audio quality evaluation of embedded-optimization-based distortion precompensation algorithms.

PubMed

Defraene, Bruno; van Waterschoot, Toon; Diehl, Moritz; Moonen, Marc

2016-07-01

Subjective audio quality evaluation experiments have been conducted to assess the performance of embedded-optimization-based precompensation algorithms for mitigating perceptible linear and nonlinear distortion in audio signals. It is concluded with statistical significance that the perceived audio quality is improved by applying an embedded-optimization-based precompensation algorithm, both in case (i) nonlinear distortion and (ii) a combination of linear and nonlinear distortion is present. Moreover, a significant positive correlation is reported between the collected subjective and objective PEAQ audio quality scores, supporting the validity of using PEAQ to predict the impact of linear and nonlinear distortion on the perceived audio quality.
Artificial Neural Network and Genetic Algorithm Hybrid Intelligence for Predicting Thai Stock Price Index Trend

PubMed Central

Boonjing, Veera; Intakosum, Sarun

2016-01-01

This study investigated the use of Artificial Neural Network (ANN) and Genetic Algorithm (GA) for prediction of Thailand's SET50 index trend. ANN is a widely accepted machine learning method that uses past data to predict future trend, while GA is an algorithm that can find better subsets of input variables for importing into ANN, hence enabling more accurate prediction by its efficient feature selection. The imported data were chosen technical indicators highly regarded by stock analysts, each represented by 4 input variables that were based on past time spans of 4 different lengths: 3-, 5-, 10-, and 15-day spans before the day of prediction. This import undertaking generated a big set of diverse input variables with an exponentially higher number of possible subsets that GA culled down to a manageable number of more effective ones. SET50 index data of the past 6 years, from 2009 to 2014, were used to evaluate this hybrid intelligence prediction accuracy, and the hybrid's prediction results were found to be more accurate than those made by a method using only one input variable for one fixed length of past time span. PMID:27974883
Artificial Neural Network and Genetic Algorithm Hybrid Intelligence for Predicting Thai Stock Price Index Trend.

PubMed

Inthachot, Montri; Boonjing, Veera; Intakosum, Sarun

2016-01-01

This study investigated the use of Artificial Neural Network (ANN) and Genetic Algorithm (GA) for prediction of Thailand's SET50 index trend. ANN is a widely accepted machine learning method that uses past data to predict future trend, while GA is an algorithm that can find better subsets of input variables for importing into ANN, hence enabling more accurate prediction by its efficient feature selection. The imported data were chosen technical indicators highly regarded by stock analysts, each represented by 4 input variables that were based on past time spans of 4 different lengths: 3-, 5-, 10-, and 15-day spans before the day of prediction. This import undertaking generated a big set of diverse input variables with an exponentially higher number of possible subsets that GA culled down to a manageable number of more effective ones. SET50 index data of the past 6 years, from 2009 to 2014, were used to evaluate this hybrid intelligence prediction accuracy, and the hybrid's prediction results were found to be more accurate than those made by a method using only one input variable for one fixed length of past time span.
Robust and Adaptive Online Time Series Prediction with Long Short-Term Memory

PubMed Central

Tao, Qing

2017-01-01

Online time series prediction is the mainstream method in a wide range of fields, ranging from speech analysis and noise cancelation to stock market analysis. However, the data often contains many outliers with the increasing length of time series in real world. These outliers can mislead the learned model if treated as normal points in the process of prediction. To address this issue, in this paper, we propose a robust and adaptive online gradient learning method, RoAdam (Robust Adam), for long short-term memory (LSTM) to predict time series with outliers. This method tunes the learning rate of the stochastic gradient algorithm adaptively in the process of prediction, which reduces the adverse effect of outliers. It tracks the relative prediction error of the loss function with a weighted average through modifying Adam, a popular stochastic gradient method algorithm for training deep neural networks. In our algorithm, the large value of the relative prediction error corresponds to a small learning rate, and vice versa. The experiments on both synthetic data and real time series show that our method achieves better performance compared to the existing methods based on LSTM. PMID:29391864
Robust and Adaptive Online Time Series Prediction with Long Short-Term Memory.

PubMed

Yang, Haimin; Pan, Zhisong; Tao, Qing

2017-01-01

Online time series prediction is the mainstream method in a wide range of fields, ranging from speech analysis and noise cancelation to stock market analysis. However, the data often contains many outliers with the increasing length of time series in real world. These outliers can mislead the learned model if treated as normal points in the process of prediction. To address this issue, in this paper, we propose a robust and adaptive online gradient learning method, RoAdam (Robust Adam), for long short-term memory (LSTM) to predict time series with outliers. This method tunes the learning rate of the stochastic gradient algorithm adaptively in the process of prediction, which reduces the adverse effect of outliers. It tracks the relative prediction error of the loss function with a weighted average through modifying Adam, a popular stochastic gradient method algorithm for training deep neural networks. In our algorithm, the large value of the relative prediction error corresponds to a small learning rate, and vice versa. The experiments on both synthetic data and real time series show that our method achieves better performance compared to the existing methods based on LSTM.
VDA, a Method of Choosing a Better Algorithm with Fewer Validations

PubMed Central

Kluger, Yuval

2011-01-01

The multitude of bioinformatics algorithms designed for performing a particular computational task presents end-users with the problem of selecting the most appropriate computational tool for analyzing their biological data. The choice of the best available method is often based on expensive experimental validation of the results. We propose an approach to design validation sets for method comparison and performance assessment that are effective in terms of cost and discrimination power. Validation Discriminant Analysis (VDA) is a method for designing a minimal validation dataset to allow reliable comparisons between the performances of different algorithms. Implementation of our VDA approach achieves this reduction by selecting predictions that maximize the minimum Hamming distance between algorithmic predictions in the validation set. We show that VDA can be used to correctly rank algorithms according to their performances. These results are further supported by simulations and by realistic algorithmic comparisons in silico. VDA is a novel, cost-efficient method for minimizing the number of validation experiments necessary for reliable performance estimation and fair comparison between algorithms. Our VDA software is available at http://sourceforge.net/projects/klugerlab/files/VDA/ PMID:22046256
Automatic burst detection for the EEG of the preterm infant.

PubMed

Jennekens, Ward; Ruijs, Loes S; Lommen, Charlotte M L; Niemarkt, Hendrik J; Pasman, Jaco W; van Kranen-Mastenbroek, Vivianne H J M; Wijn, Pieter F F; van Pul, Carola; Andriessen, Peter

2011-10-01

To aid with prognosis and stratification of clinical treatment for preterm infants, a method for automated detection of bursts, interburst-intervals (IBIs) and continuous patterns in the electroencephalogram (EEG) is developed. Results are evaluated for preterm infants with normal neurological follow-up at 2 years. The detection algorithm (MATLAB®) for burst, IBI and continuous pattern is based on selection by amplitude, time span, number of channels and numbers of active electrodes. Annotations of two neurophysiologists were used to determine threshold values. The training set consisted of EEG recordings of four preterm infants with postmenstrual age (PMA, gestational age + postnatal age) of 29-34 weeks. Optimal threshold values were based on overall highest sensitivity. For evaluation, both observers verified detections in an independent dataset of four EEG recordings with comparable PMA. Algorithm performance was assessed by calculation of sensitivity and positive predictive value. The results of algorithm evaluation are as follows: sensitivity values of 90% ± 6%, 80% ± 9% and 97% ± 5% for burst, IBI and continuous patterns, respectively. Corresponding positive predictive values were 88% ± 8%, 96% ± 3% and 85% ± 15%, respectively. In conclusion, the algorithm showed high sensitivity and positive predictive values for bursts, IBIs and continuous patterns in preterm EEG. Computer-assisted analysis of EEG may allow objective and reproducible analysis for clinical treatment.
Application of the extreme learning machine algorithm for the prediction of monthly Effective Drought Index in eastern Australia

NASA Astrophysics Data System (ADS)

Deo, Ravinesh C.; Şahin, Mehmet

2015-02-01

The prediction of future drought is an effective mitigation tool for assessing adverse consequences of drought events on vital water resources, agriculture, ecosystems and hydrology. Data-driven model predictions using machine learning algorithms are promising tenets for these purposes as they require less developmental time, minimal inputs and are relatively less complex than the dynamic or physical model. This paper authenticates a computationally simple, fast and efficient non-linear algorithm known as extreme learning machine (ELM) for the prediction of Effective Drought Index (EDI) in eastern Australia using input data trained from 1957-2008 and the monthly EDI predicted over the period 2009-2011. The predictive variables for the ELM model were the rainfall and mean, minimum and maximum air temperatures, supplemented by the large-scale climate mode indices of interest as regression covariates, namely the Southern Oscillation Index, Pacific Decadal Oscillation, Southern Annular Mode and the Indian Ocean Dipole moment. To demonstrate the effectiveness of the proposed data-driven model a performance comparison in terms of the prediction capabilities and learning speeds was conducted between the proposed ELM algorithm and the conventional artificial neural network (ANN) algorithm trained with Levenberg-Marquardt back propagation. The prediction metrics certified an excellent performance of the ELM over the ANN model for the overall test sites, thus yielding Mean Absolute Errors, Root-Mean Square Errors, Coefficients of Determination and Willmott's Indices of Agreement of 0.277, 0.008, 0.892 and 0.93 (for ELM) and 0.602, 0.172, 0.578 and 0.92 (for ANN) models. Moreover, the ELM model was executed with learning speed 32 times faster and training speed 6.1 times faster than the ANN model. An improvement in the prediction capability of the drought duration and severity by the ELM model was achieved. Based on these results we aver that out of the two machine learning algorithms tested, the ELM was the more expeditious tool for prediction of drought and its related properties.
A novel neural-inspired learning algorithm with application to clinical risk prediction.

PubMed

Tay, Darwin; Poh, Chueh Loo; Kitney, Richard I

2015-04-01

Clinical risk prediction - the estimation of the likelihood an individual is at risk of a disease - is a coveted and exigent clinical task, and a cornerstone to the recommendation of life saving management strategies. This is especially important for individuals at risk of cardiovascular disease (CVD) given the fact that it is the leading causes of death in many developed counties. To this end, we introduce a novel learning algorithm - a key factor that influences the performance of machine learning-based prediction models - and utilities it to develop CVD risk prediction tool. This novel neural-inspired algorithm, called the Artificial Neural Cell System for classification (ANCSc), is inspired by mechanisms that develop the brain and empowering it with capabilities such as information processing/storage and recall, decision making and initiating actions on external environment. Specifically, we exploit on 3 natural neural mechanisms responsible for developing and enriching the brain - namely neurogenesis, neuroplasticity via nurturing and apoptosis - when implementing ANCSc algorithm. Benchmark testing was conducted using the Honolulu Heart Program (HHP) dataset and results are juxtaposed with 2 other algorithms - i.e. Support Vector Machine (SVM) and Evolutionary Data-Conscious Artificial Immune Recognition System (EDC-AIRS). Empirical experiments indicate that ANCSc algorithm (statistically) outperforms both SVM and EDC-AIRS algorithms. Key clinical markers identified by ANCSc algorithm include risk factors related to diet/lifestyle, pulmonary function, personal/family/medical history, blood data, blood pressure, and electrocardiography. These clinical markers, in general, are also found to be clinically significant - providing a promising avenue for identifying potential cardiovascular risk factors to be evaluated in clinical trials. Copyright © 2015 Elsevier Inc. All rights reserved.
Sphinx: merging knowledge-based and ab initio approaches to improve protein loop prediction

PubMed Central

Marks, Claire; Nowak, Jaroslaw; Klostermann, Stefan; Georges, Guy; Dunbar, James; Shi, Jiye; Kelm, Sebastian

2017-01-01

Abstract Motivation: Loops are often vital for protein function, however, their irregular structures make them difficult to model accurately. Current loop modelling algorithms can mostly be divided into two categories: knowledge-based, where databases of fragments are searched to find suitable conformations and ab initio, where conformations are generated computationally. Existing knowledge-based methods only use fragments that are the same length as the target, even though loops of slightly different lengths may adopt similar conformations. Here, we present a novel method, Sphinx, which combines ab initio techniques with the potential extra structural information contained within loops of a different length to improve structure prediction. Results: We show that Sphinx is able to generate high-accuracy predictions and decoy sets enriched with near-native loop conformations, performing better than the ab initio algorithm on which it is based. In addition, it is able to provide predictions for every target, unlike some knowledge-based methods. Sphinx can be used successfully for the difficult problem of antibody H3 prediction, outperforming RosettaAntibody, one of the leading H3-specific ab initio methods, both in accuracy and speed. Availability and Implementation: Sphinx is available at http://opig.stats.ox.ac.uk/webapps/sphinx. Contact: deane@stats.ox.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28453681
Sphinx: merging knowledge-based and ab initio approaches to improve protein loop prediction.

PubMed

Marks, Claire; Nowak, Jaroslaw; Klostermann, Stefan; Georges, Guy; Dunbar, James; Shi, Jiye; Kelm, Sebastian; Deane, Charlotte M

2017-05-01

Loops are often vital for protein function, however, their irregular structures make them difficult to model accurately. Current loop modelling algorithms can mostly be divided into two categories: knowledge-based, where databases of fragments are searched to find suitable conformations and ab initio, where conformations are generated computationally. Existing knowledge-based methods only use fragments that are the same length as the target, even though loops of slightly different lengths may adopt similar conformations. Here, we present a novel method, Sphinx, which combines ab initio techniques with the potential extra structural information contained within loops of a different length to improve structure prediction. We show that Sphinx is able to generate high-accuracy predictions and decoy sets enriched with near-native loop conformations, performing better than the ab initio algorithm on which it is based. In addition, it is able to provide predictions for every target, unlike some knowledge-based methods. Sphinx can be used successfully for the difficult problem of antibody H3 prediction, outperforming RosettaAntibody, one of the leading H3-specific ab initio methods, both in accuracy and speed. Sphinx is available at http://opig.stats.ox.ac.uk/webapps/sphinx. deane@stats.ox.ac.uk. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.

High performance transcription factor-DNA docking with GPU computing

PubMed Central

2012-01-01

Background Protein-DNA docking is a very challenging problem in structural bioinformatics and has important implications in a number of applications, such as structure-based prediction of transcription factor binding sites and rational drug design. Protein-DNA docking is very computational demanding due to the high cost of energy calculation and the statistical nature of conformational sampling algorithms. More importantly, experiments show that the docking quality depends on the coverage of the conformational sampling space. It is therefore desirable to accelerate the computation of the docking algorithm, not only to reduce computing time, but also to improve docking quality. Methods In an attempt to accelerate the sampling process and to improve the docking performance, we developed a graphics processing unit (GPU)-based protein-DNA docking algorithm. The algorithm employs a potential-based energy function to describe the binding affinity of a protein-DNA pair, and integrates Monte-Carlo simulation and a simulated annealing method to search through the conformational space. Algorithmic techniques were developed to improve the computation efficiency and scalability on GPU-based high performance computing systems. Results The effectiveness of our approach is tested on a non-redundant set of 75 TF-DNA complexes and a newly developed TF-DNA docking benchmark. We demonstrated that the GPU-based docking algorithm can significantly accelerate the simulation process and thereby improving the chance of finding near-native TF-DNA complex structures. This study also suggests that further improvement in protein-DNA docking research would require efforts from two integral aspects: improvement in computation efficiency and energy function design. Conclusions We present a high performance computing approach for improving the prediction accuracy of protein-DNA docking. The GPU-based docking algorithm accelerates the search of the conformational space and thus increases the chance of finding more near-native structures. To the best of our knowledge, this is the first ad hoc effort of applying GPU or GPU clusters to the protein-DNA docking problem. PMID:22759575
Open-source chemogenomic data-driven algorithms for predicting drug-target interactions.

PubMed

Hao, Ming; Bryant, Stephen H; Wang, Yanli

2018-02-06

While novel technologies such as high-throughput screening have advanced together with significant investment by pharmaceutical companies during the past decades, the success rate for drug development has not yet been improved prompting researchers looking for new strategies of drug discovery. Drug repositioning is a potential approach to solve this dilemma. However, experimental identification and validation of potential drug targets encoded by the human genome is both costly and time-consuming. Therefore, effective computational approaches have been proposed to facilitate drug repositioning, which have proved to be successful in drug discovery. Doubtlessly, the availability of open-accessible data from basic chemical biology research and the success of human genome sequencing are crucial to develop effective in silico drug repositioning methods allowing the identification of potential targets for existing drugs. In this work, we review several chemogenomic data-driven computational algorithms with source codes publicly accessible for predicting drug-target interactions (DTIs). We organize these algorithms by model properties and model evolutionary relationships. We re-implemented five representative algorithms in R programming language, and compared these algorithms by means of mean percentile ranking, a new recall-based evaluation metric in the DTI prediction research field. We anticipate that this review will be objective and helpful to researchers who would like to further improve existing algorithms or need to choose appropriate algorithms to infer potential DTIs in the projects. The source codes for DTI predictions are available at: https://github.com/minghao2016/chemogenomicAlg4DTIpred. Published by Oxford University Press 2018. This work is written by US Government employees and is in the public domain in the US.
A novel method for landslide displacement prediction by integrating advanced computational intelligence algorithms.

PubMed

Zhou, Chao; Yin, Kunlong; Cao, Ying; Ahmed, Bayes; Fu, Xiaolin

2018-05-08

Landslide displacement prediction is considered as an essential component for developing early warning systems. The modelling of conventional forecast methods requires enormous monitoring data that limit its application. To conduct accurate displacement prediction with limited data, a novel method is proposed and applied by integrating three computational intelligence algorithms namely: the wavelet transform (WT), the artificial bees colony (ABC), and the kernel-based extreme learning machine (KELM). At first, the total displacement was decomposed into several sub-sequences with different frequencies using the WT. Next each sub-sequence was predicted separately by the KELM whose parameters were optimized by the ABC. Finally the predicted total displacement was obtained by adding all the predicted sub-sequences. The Shuping landslide in the Three Gorges Reservoir area in China was taken as a case study. The performance of the new method was compared with the WT-ELM, ABC-KELM, ELM, and the support vector machine (SVM) methods. Results show that the prediction accuracy can be improved by decomposing the total displacement into sub-sequences with various frequencies and by predicting them separately. The ABC-KELM algorithm shows the highest prediction capacity followed by the ELM and SVM. Overall, the proposed method achieved excellent performance both in terms of accuracy and stability.
User Activity Recognition in Smart Homes Using Pattern Clustering Applied to Temporal ANN Algorithm

PubMed Central

Bourobou, Serge Thomas Mickala; Yoo, Younghwan

2015-01-01

This paper discusses the possibility of recognizing and predicting user activities in the IoT (Internet of Things) based smart environment. The activity recognition is usually done through two steps: activity pattern clustering and activity type decision. Although many related works have been suggested, they had some limited performance because they focused only on one part between the two steps. This paper tries to find the best combination of a pattern clustering method and an activity decision algorithm among various existing works. For the first step, in order to classify so varied and complex user activities, we use a relevant and efficient unsupervised learning method called the K-pattern clustering algorithm. In the second step, the training of smart environment for recognizing and predicting user activities inside his/her personal space is done by utilizing the artificial neural network based on the Allen’s temporal relations. The experimental results show that our combined method provides the higher recognition accuracy for various activities, as compared with other data mining classification algorithms. Furthermore, it is more appropriate for a dynamic environment like an IoT based smart home. PMID:26007738
A matrix-algebraic formulation of distributed-memory maximal cardinality matching algorithms in bipartite graphs

DOE PAGES

Azad, Ariful; Buluç, Aydın

2016-05-16

We describe parallel algorithms for computing maximal cardinality matching in a bipartite graph on distributed-memory systems. Unlike traditional algorithms that match one vertex at a time, our algorithms process many unmatched vertices simultaneously using a matrix-algebraic formulation of maximal matching. This generic matrix-algebraic framework is used to develop three efficient maximal matching algorithms with minimal changes. The newly developed algorithms have two benefits over existing graph-based algorithms. First, unlike existing parallel algorithms, cardinality of matching obtained by the new algorithms stays constant with increasing processor counts, which is important for predictable and reproducible performance. Second, relying on bulk-synchronous matrix operations,more » these algorithms expose a higher degree of parallelism on distributed-memory platforms than existing graph-based algorithms. We report high-performance implementations of three maximal matching algorithms using hybrid OpenMP-MPI and evaluate the performance of these algorithm using more than 35 real and randomly generated graphs. On real instances, our algorithms achieve up to 200 × speedup on 2048 cores of a Cray XC30 supercomputer. Even higher speedups are obtained on larger synthetically generated graphs where our algorithms show good scaling on up to 16,384 cores.« less
Non-destructive in-situ method and apparatus for determining radionuclide depth in media

DOEpatents

Xu, X. George; Naessens, Edward P.

2003-01-01

A non-destructive method and apparatus which is based on in-situ gamma spectroscopy is used to determine the depth of radiological contamination in media such as concrete. An algorithm, Gamma Penetration Depth Unfolding Algorithm (GPDUA), uses point kernel techniques to predict the depth of contamination based on the results of uncollided peak information from the in-situ gamma spectroscopy. The invention is better, faster, safer, and/cheaper than the current practice in decontamination and decommissioning of facilities that are slow, rough and unsafe. The invention uses a priori knowledge of the contaminant source distribution. The applicable radiological contaminants of interest are any isotopes that emit two or more gamma rays per disintegration or isotopes that emit a single gamma ray but have gamma-emitting progeny in secular equilibrium with its parent (e.g., .sup.60 Co, .sup.235 U, and .sup.137 Cs to name a few). The predicted depths from the GPDUA algorithm using Monte Carlo N-Particle Transport Code (MCNP) simulations and laboratory experiments using .sup.60 Co have consistently produced predicted depths within 20% of the actual or known depth.
MicroRNAfold: pre-microRNA secondary structure prediction based on modified NCM model with thermodynamics-based scoring strategy.

PubMed

Han, Dianwei; Zhang, Jun; Tang, Guiliang

2012-01-01

An accurate prediction of the pre-microRNA secondary structure is important in miRNA informatics. Based on a recently proposed model, nucleotide cyclic motifs (NCM), to predict RNA secondary structure, we propose and implement a Modified NCM (MNCM) model with a physics-based scoring strategy to tackle the problem of pre-microRNA folding. Our microRNAfold is implemented using a global optimal algorithm based on the bottom-up local optimal solutions. Our experimental results show that microRNAfold outperforms the current leading prediction tools in terms of True Negative rate, False Negative rate, Specificity, and Matthews coefficient ratio.
Route Prediction on Tracking Data to Location-Based Services

NASA Astrophysics Data System (ADS)

Petróczi, Attila István; Gáspár-Papanek, Csaba

Wireless networks have become so widespread, it is beneficial to determine the ability of cellular networks for localization. This property enables the development of location-based services, providing useful information. These services can be improved by route prediction under the condition of using simple algorithms, because of the limited capabilities of mobile stations. This study gives alternative solutions for this problem of route prediction based on a specific graph model. Our models provide the opportunity to reach our destinations with less effort.
SU-E-T-629: Prediction of the ViewRay Radiotherapy Treatment Time for Clinical Logistics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liu, S; Wooten, H; Wu, Y

Purpose: An algorithm is developed in our clinic, given a new treatment plan, to predict treatment delivery time for radiation therapy (RT) treatments of patients on ViewRay magnetic resonance-image guided radiation therapy (MR-IGRT) delivery system. This algorithm is necessary for managing patient treatment appointments, and is useful as an indicator to assess the treatment plan complexity. Methods: A patient’s total treatment delivery time, not including time required for localization, may be described as the sum of four components: (1) the treatment initialization time; (2) the total beam-on time; (3) the gantry rotation time; and (4) the multileaf collimator (MLC) motionmore » time. Each of the four components is predicted separately. The total beam-on time can be calculated using both the planned beam-on time and the decay-corrected delivery dose rate. To predict the remaining components, we quantitatively analyze the patient treatment delivery record files. The initialization time is demonstrated to be random since it depends on the final gantry angle and MLC leaf positions of the previous treatment. Based on modeling the relationships between the gantry rotation angles and the corresponding rotation time, and between the furthest MLC leaf moving distance and the corresponding MLC motion time, the total delivery time is predicted using linear regression. Results: The proposed algorithm has demonstrated the feasibility of predicting the ViewRay treatment delivery time for any treatment plan of any patient. The average prediction error is 0.89 minutes or 5.34%, and the maximal prediction error is 2.09 minutes or 13.87%. Conclusion: We have developed a treatment delivery time prediction algorithm based on the analysis of previous patients’ treatment delivery records. The accuracy of our prediction is sufficient for guiding and arranging patient treatment appointments on a daily basis. The predicted delivery time could also be used as an indicator to assess the treatment plan complexity. This work was supported by a research grant from Viewray Inc.« less
The alliance relationship analysis of international terrorist organizations with link prediction

NASA Astrophysics Data System (ADS)

Fang, Ling; Fang, Haiyang; Tian, Yanfang; Yang, Tinghong; Zhao, Jing

2017-09-01

Terrorism is a huge public hazard of the international community. Alliances of terrorist organizations may cause more serious threat to national security and world peace. Understanding alliances between global terrorist organizations will facilitate more effective anti-terrorism collaboration between governments. Based on publicly available data, this study constructed a alliance network between terrorist organizations and analyzed the alliance relationships with link prediction. We proposed a novel index based on optimal weighted fusion of six similarity indices, in which the optimal weight is calculated by genetic algorithm. Our experimental results showed that this algorithm could achieve better results on the networks than other algorithms. Using this method, we successfully digged out 21 real terrorist organizations alliance from current data. Our experiment shows that this approach used for terrorist organizations alliance mining is effective and this study is expected to benefit the form of a more powerful anti-terrorism strategy.
Modeling of spectral signatures of littoral waters

NASA Astrophysics Data System (ADS)

Haltrin, Vladimir I.

1997-12-01

The spectral values of remotely obtained radiance reflectance coefficient (RRC) are compared with the values of RRC computed from inherent optical properties measured during the shipborne experiment near the West Florida coast. The model calculations are based on the algorithm developed at the Naval Research Laboratory at Stennis Space Center and presented here. The algorithm is based on the radiation transfer theory and uses regression relationships derived from experimental data. Overall comparison of derived and measured RRCs shows that this algorithm is suitable for processing ground truth data for the purposes of remote data calibration. The second part of this work consists of the evaluation of the predictive visibility model (PVM). The simulated three-dimensional values of optical properties are compared with the measured ones. Preliminary results of comparison are encouraging and show that the PVM can qualitatively predict the evolution of inherent optical properties in littoral waters.
Research on electricity consumption forecast based on mutual information and random forests algorithm

NASA Astrophysics Data System (ADS)

Shi, Jing; Shi, Yunli; Tan, Jian; Zhu, Lei; Li, Hu

2018-02-01

Traditional power forecasting models cannot efficiently take various factors into account, neither to identify the relation factors. In this paper, the mutual information in information theory and the artificial intelligence random forests algorithm are introduced into the medium and long-term electricity demand prediction. Mutual information can identify the high relation factors based on the value of average mutual information between a variety of variables and electricity demand, different industries may be highly associated with different variables. The random forests algorithm was used for building the different industries forecasting models according to the different correlation factors. The data of electricity consumption in Jiangsu Province is taken as a practical example, and the above methods are compared with the methods without regard to mutual information and the industries. The simulation results show that the above method is scientific, effective, and can provide higher prediction accuracy.
Ares I-X Best Estimated Trajectory and Comparison with Pre-Flight Predictions

NASA Technical Reports Server (NTRS)

Karlgaard, Christopher D.; Beck, Roger E.; Derry, Stephen D.; Brandon, Jay M.; Starr, Brett R.; Tartabini, Paul V.; Olds, Aaron D.

2011-01-01

The Ares I-X trajectory reconstruction produced best estimated trajectories of the flight test vehicle ascent through stage separation, and of the first and upper stage entries after separation. The trajectory reconstruction process combines on-board, ground-based, and atmospheric measurements to produce the trajectory estimates. The Ares I-X vehicle had a number of on-board and ground based sensors that were available, including inertial measurement units, radar, air- data, and weather balloons. However, due to problems with calibrations and/or data, not all of the sensor data were used. The trajectory estimate was generated using an Iterative Extended Kalman Filter algorithm, which is an industry standard processing algorithm for filtering and estimation applications. This paper describes the methodology and results of the trajectory reconstruction process, including flight data preprocessing and input uncertainties, trajectory estimation algorithms, output transformations, and comparisons with preflight predictions.
Adaptive DIT-Based Fringe Tracking and Prediction at IOTA

NASA Technical Reports Server (NTRS)

Wilson, Edward; Pedretti, Ettore; Bregman, Jesse; Mah, Robert W.; Traub, Wesley A.

2004-01-01

An automatic fringe tracking system has been developed and implemented at the Infrared Optical Telescope Array (IOTA). In testing during May 2002, the system successfully minimized the optical path differences (OPDs) for all three baselines at IOTA. Based on sliding window discrete Fourier transform (DFT) calculations that were optimized for computational efficiency and robustness to atmospheric disturbances, the algorithm has also been tested extensively on off-line data. Implemented in ANSI C on the 266 MHZ PowerPC processor running the VxWorks real-time operating system, the algorithm runs in approximately 2.0 milliseconds per scan (including all three interferograms), using the science camera and piezo scanners to measure and correct the OPDs. Preliminary analysis on an extension of this algorithm indicates a potential for predictive tracking, although at present, real-time implementation of this extension would require significantly more computational capacity.
A Model-based Prognostics Methodology for Electrolytic Capacitors Based on Electrical Overstress Accelerated Aging

NASA Technical Reports Server (NTRS)

Celaya, Jose; Kulkarni, Chetan; Biswas, Gautam; Saha, Sankalita; Goebel, Kai

2011-01-01

A remaining useful life prediction methodology for electrolytic capacitors is presented. This methodology is based on the Kalman filter framework and an empirical degradation model. Electrolytic capacitors are used in several applications ranging from power supplies on critical avionics equipment to power drivers for electro-mechanical actuators. These devices are known for their comparatively low reliability and given their criticality in electronics subsystems they are a good candidate for component level prognostics and health management. Prognostics provides a way to assess remaining useful life of a capacitor based on its current state of health and its anticipated future usage and operational conditions. We present here also, experimental results of an accelerated aging test under electrical stresses. The data obtained in this test form the basis for a remaining life prediction algorithm where a model of the degradation process is suggested. This preliminary remaining life prediction algorithm serves as a demonstration of how prognostics methodologies could be used for electrolytic capacitors. In addition, the use degradation progression data from accelerated aging, provides an avenue for validation of applications of the Kalman filter based prognostics methods typically used for remaining useful life predictions in other applications.
Towards A Model-Based Prognostics Methodology for Electrolytic Capacitors: A Case Study Based on Electrical Overstress Accelerated Aging

NASA Technical Reports Server (NTRS)

Celaya, Jose R.; Kulkarni, Chetan S.; Biswas, Gautam; Goebel, Kai

2012-01-01

A remaining useful life prediction methodology for electrolytic capacitors is presented. This methodology is based on the Kalman filter framework and an empirical degradation model. Electrolytic capacitors are used in several applications ranging from power supplies on critical avionics equipment to power drivers for electro-mechanical actuators. These devices are known for their comparatively low reliability and given their criticality in electronics subsystems they are a good candidate for component level prognostics and health management. Prognostics provides a way to assess remaining useful life of a capacitor based on its current state of health and its anticipated future usage and operational conditions. We present here also, experimental results of an accelerated aging test under electrical stresses. The data obtained in this test form the basis for a remaining life prediction algorithm where a model of the degradation process is suggested. This preliminary remaining life prediction algorithm serves as a demonstration of how prognostics methodologies could be used for electrolytic capacitors. In addition, the use degradation progression data from accelerated aging, provides an avenue for validation of applications of the Kalman filter based prognostics methods typically used for remaining useful life predictions in other applications.
Improving efficacy of metastatic tumor segmentation to facilitate early prediction of ovarian cancer patients' response to chemotherapy

NASA Astrophysics Data System (ADS)

Danala, Gopichandh; Wang, Yunzhi; Thai, Theresa; Gunderson, Camille C.; Moxley, Katherine M.; Moore, Kathleen; Mannel, Robert S.; Cheng, Samuel; Liu, Hong; Zheng, Bin; Qiu, Yuchen

2017-02-01

Accurate tumor segmentation is a critical step in the development of the computer-aided detection (CAD) based quantitative image analysis scheme for early stage prognostic evaluation of ovarian cancer patients. The purpose of this investigation is to assess the efficacy of several different methods to segment the metastatic tumors occurred in different organs of ovarian cancer patients. In this study, we developed a segmentation scheme consisting of eight different algorithms, which can be divided into three groups: 1) Region growth based methods; 2) Canny operator based methods; and 3) Partial differential equation (PDE) based methods. A number of 138 tumors acquired from 30 ovarian cancer patients were used to test the performance of these eight segmentation algorithms. The results demonstrate each of the tested tumors can be successfully segmented by at least one of the eight algorithms without the manual boundary correction. Furthermore, modified region growth, classical Canny detector, and fast marching, and threshold level set algorithms are suggested in the future development of the ovarian cancer related CAD schemes. This study may provide meaningful reference for developing novel quantitative image feature analysis scheme to more accurately predict the response of ovarian cancer patients to the chemotherapy at early stage.
Hyperspectral Imaging for Predicting the Internal Quality of Kiwifruits Based on Variable Selection Algorithms and Chemometric Models.

PubMed

Zhu, Hongyan; Chu, Bingquan; Fan, Yangyang; Tao, Xiaoya; Yin, Wenxin; He, Yong

2017-08-10

We investigated the feasibility and potentiality of determining firmness, soluble solids content (SSC), and pH in kiwifruits using hyperspectral imaging, combined with variable selection methods and calibration models. The images were acquired by a push-broom hyperspectral reflectance imaging system covering two spectral ranges. Weighted regression coefficients (BW), successive projections algorithm (SPA) and genetic algorithm-partial least square (GAPLS) were compared and evaluated for the selection of effective wavelengths. Moreover, multiple linear regression (MLR), partial least squares regression and least squares support vector machine (LS-SVM) were developed to predict quality attributes quantitatively using effective wavelengths. The established models, particularly SPA-MLR, SPA-LS-SVM and GAPLS-LS-SVM, performed well. The SPA-MLR models for firmness (R pre = 0.9812, RPD = 5.17) and SSC (R pre = 0.9523, RPD = 3.26) at 380-1023 nm showed excellent performance, whereas GAPLS-LS-SVM was the optimal model at 874-1734 nm for predicting pH (R pre = 0.9070, RPD = 2.60). Image processing algorithms were developed to transfer the predictive model in every pixel to generate prediction maps that visualize the spatial distribution of firmness and SSC. Hence, the results clearly demonstrated that hyperspectral imaging has the potential as a fast and non-invasive method to predict the quality attributes of kiwifruits.
An improved multi-domain convolution tracking algorithm

NASA Astrophysics Data System (ADS)

Sun, Xin; Wang, Haiying; Zeng, Yingsen

2018-04-01

Along with the wide application of the Deep Learning in the field of Computer vision, Deep learning has become a mainstream direction in the field of object tracking. The tracking algorithm in this paper is based on the improved multidomain convolution neural network, and the VOT video set is pre-trained on the network by multi-domain training strategy. In the process of online tracking, the network evaluates candidate targets sampled from vicinity of the prediction target in the previous with Gaussian distribution, and the candidate target with the highest score is recognized as the prediction target of this frame. The Bounding Box Regression model is introduced to make the prediction target closer to the ground-truths target box of the test set. Grouping-update strategy is involved to extract and select useful update samples in each frame, which can effectively prevent over fitting. And adapt to changes in both target and environment. To improve the speed of the algorithm while maintaining the performance, the number of candidate target succeed in adjusting dynamically with the help of Self-adaption parameter Strategy. Finally, the algorithm is tested by OTB set, compared with other high-performance tracking algorithms, and the plot of success rate and the accuracy are drawn. which illustrates outstanding performance of the tracking algorithm in this paper.
A Novel Segment-Based Approach for Improving Classification Performance of Transport Mode Detection.

PubMed

Guvensan, M Amac; Dusun, Burak; Can, Baris; Turkmen, H Irem

2017-12-30

Transportation planning and solutions have an enormous impact on city life. To minimize the transport duration, urban planners should understand and elaborate the mobility of a city. Thus, researchers look toward monitoring people's daily activities including transportation types and duration by taking advantage of individual's smartphones. This paper introduces a novel segment-based transport mode detection architecture in order to improve the results of traditional classification algorithms in the literature. The proposed post-processing algorithm, namely the Healing algorithm, aims to correct the misclassification results of machine learning-based solutions. Our real-life test results show that the Healing algorithm could achieve up to 40% improvement of the classification results. As a result, the implemented mobile application could predict eight classes including stationary, walking, car, bus, tram, train, metro and ferry with a success rate of 95% thanks to the proposed multi-tier architecture and Healing algorithm.

Particle swarm optimization based space debris surveillance network scheduling

NASA Astrophysics Data System (ADS)

Jiang, Hai; Liu, Jing; Cheng, Hao-Wen; Zhang, Yao

2017-02-01

The increasing number of space debris has created an orbital debris environment that poses increasing impact risks to existing space systems and human space flights. For the safety of in-orbit spacecrafts, we should optimally schedule surveillance tasks for the existing facilities to allocate resources in a manner that most significantly improves the ability to predict and detect events involving affected spacecrafts. This paper analyzes two criteria that mainly affect the performance of a scheduling scheme and introduces an artificial intelligence algorithm into the scheduling of tasks of the space debris surveillance network. A new scheduling algorithm based on the particle swarm optimization algorithm is proposed, which can be implemented in two different ways: individual optimization and joint optimization. Numerical experiments with multiple facilities and objects are conducted based on the proposed algorithm, and simulation results have demonstrated the effectiveness of the proposed algorithm.
Prediction of Hematopoietic Stem Cell Transplantation Related Mortality- Lessons Learned from the In-Silico Approach: A European Society for Blood and Marrow Transplantation Acute Leukemia Working Party Data Mining Study.

PubMed

Shouval, Roni; Labopin, Myriam; Unger, Ron; Giebel, Sebastian; Ciceri, Fabio; Schmid, Christoph; Esteve, Jordi; Baron, Frederic; Gorin, Norbert Claude; Savani, Bipin; Shimoni, Avichai; Mohty, Mohamad; Nagler, Arnon

2016-01-01

Models for prediction of allogeneic hematopoietic stem transplantation (HSCT) related mortality partially account for transplant risk. Improving predictive accuracy requires understating of prediction limiting factors, such as the statistical methodology used, number and quality of features collected, or simply the population size. Using an in-silico approach (i.e., iterative computerized simulations), based on machine learning (ML) algorithms, we set out to analyze these factors. A cohort of 25,923 adult acute leukemia patients from the European Society for Blood and Marrow Transplantation (EBMT) registry was analyzed. Predictive objective was non-relapse mortality (NRM) 100 days following HSCT. Thousands of prediction models were developed under varying conditions: increasing sample size, specific subpopulations and an increasing number of variables, which were selected and ranked by separate feature selection algorithms. Depending on the algorithm, predictive performance plateaued on a population size of 6,611-8,814 patients, reaching a maximal area under the receiver operator characteristic curve (AUC) of 0.67. AUCs' of models developed on specific subpopulation ranged from 0.59 to 0.67 for patients in second complete remission and receiving reduced intensity conditioning, respectively. Only 3-5 variables were necessary to achieve near maximal AUCs. The top 3 ranking variables, shared by all algorithms were disease stage, donor type, and conditioning regimen. Our findings empirically demonstrate that with regards to NRM prediction, few variables "carry the weight" and that traditional HSCT data has been "worn out". "Breaking through" the predictive boundaries will likely require additional types of inputs.
Predicting hospitalization due to worsening heart failure using daily weight measurement: analysis of the Trans-European Network-Home-Care Management System (TEN-HMS) study.

PubMed

Zhang, Jufen; Goode, Kevin M; Cuddihy, Paul E; Cleland, John G F

2009-04-01

We sought to test the utility of weight gain algorithms to predict episodes of worsening heart failure (WHF) using home-telemonitoring data collected as part of the TEN-HMS study. Simple rule-of-thumb (RoT) algorithms (i.e. 3 lbs in 1 day and 5 lbs in 3 days) and a moving average convergence divergence (MACD) algorithm were compared. WHF was defined as hospitalization for WHF or worsening of breathlessness or leg oedema. Of 168 patients, 45 were hospitalized with WHF and 76 were hospitalized for other reasons. On average, weight gain occurred in the 14 days prior to WHF hospitalizations but not in the 14 days prior to non-WHF hospitalizations [1.9 +/- 4.7 lbs (0.9 +/- 2.1 kg) vs. -0.4 +/- 2.5 lbs (-0.2 +/- 1.1 kg), P < 0.0001]. The true alerts rate was higher for the RoT algorithms compared with the MACD (58 and 65% vs. 20%). However, the RoT algorithms had much higher false alert rates (54 and 58% vs. 9%) rendering them of little practical use for predicting WHF events. A MACD algorithm is more specific but less sensitive than RoT when trying to predict episodes of WHF based on daily weight measurements. However, many episodes of WHF do not appear to be associated with weight gain and therefore telemonitoring of weight alone may not have great value for heart failure management.
Prediction of road traffic death rate using neural networks optimised by genetic algorithm.

PubMed

Jafari, Seyed Ali; Jahandideh, Sepideh; Jahandideh, Mina; Asadabadi, Ebrahim Barzegari

2015-01-01

Road traffic injuries (RTIs) are realised as a main cause of public health problems at global, regional and national levels. Therefore, prediction of road traffic death rate will be helpful in its management. Based on this fact, we used an artificial neural network model optimised through Genetic algorithm to predict mortality. In this study, a five-fold cross-validation procedure on a data set containing total of 178 countries was used to verify the performance of models. The best-fit model was selected according to the root mean square errors (RMSE). Genetic algorithm, as a powerful model which has not been introduced in prediction of mortality to this extent in previous studies, showed high performance. The lowest RMSE obtained was 0.0808. Such satisfactory results could be attributed to the use of Genetic algorithm as a powerful optimiser which selects the best input feature set to be fed into the neural networks. Seven factors have been known as the most effective factors on the road traffic mortality rate by high accuracy. The gained results displayed that our model is very promising and may play a useful role in developing a better method for assessing the influence of road traffic mortality risk factors.
DenguePredict: An Integrated Drug Repositioning Approach towards Drug Discovery for Dengue.

PubMed

Wang, QuanQiu; Xu, Rong

2015-01-01

Dengue is a viral disease of expanding global incidence without cures. Here we present a drug repositioning system (DenguePredict) leveraging upon a unique drug treatment database and vast amounts of disease- and drug-related data. We first constructed a large-scale genetic disease network with enriched dengue genetics data curated from biomedical literature. We applied a network-based ranking algorithm to find dengue-related diseases from the disease network. We then developed a novel algorithm to prioritize FDA-approved drugs from dengue-related diseases to treat dengue. When tested in a de-novo validation setting, DenguePredict found the only two drugs tested in clinical trials for treating dengue and ranked them highly: chloroquine ranked at top 0.96% and ivermectin at top 22.75%. We showed that drugs targeting immune systems and arachidonic acid metabolism-related apoptotic pathways might represent innovative drugs to treat dengue. In summary, DenguePredict, by combining comprehensive disease- and drug-related data and novel algorithms, may greatly facilitate drug discovery for dengue.
Predicting termination of atrial fibrillation based on the structure and quantification of the recurrence plot.

PubMed

Sun, Rongrong; Wang, Yuanyuan

2008-11-01

Predicting the spontaneous termination of the atrial fibrillation (AF) leads to not only better understanding of mechanisms of the arrhythmia but also the improved treatment of the sustained AF. A novel method is proposed to characterize the AF based on structure and the quantification of the recurrence plot (RP) to predict the termination of the AF. The RP of the electrocardiogram (ECG) signal is firstly obtained and eleven features are extracted to characterize its three basic patterns. Then the sequential forward search (SFS) algorithm and Davies-Bouldin criterion are utilized to select the feature subset which can predict the AF termination effectively. Finally, the multilayer perceptron (MLP) neural network is applied to predict the AF termination. An AF database which includes one training set and two testing sets (A and B) of Holter ECG recordings is studied. Experiment results show that 97% of testing set A and 95% of testing set B are correctly classified. It demonstrates that this algorithm has the ability to predict the spontaneous termination of the AF effectively.
Comparison of the accuracy of three algorithms in predicting accessory pathways among adult Wolff-Parkinson-White syndrome patients.

PubMed

Maden, Orhan; Balci, Kevser Gülcihan; Selcuk, Mehmet Timur; Balci, Mustafa Mücahit; Açar, Burak; Unal, Sefa; Kara, Meryem; Selcuk, Hatice

2015-12-01

The aim of this study was to investigate the accuracy of three algorithms in predicting accessory pathway locations in adult patients with Wolff-Parkinson-White syndrome in Turkish population. A total of 207 adult patients with Wolff-Parkinson-White syndrome were retrospectively analyzed. The most preexcited 12-lead electrocardiogram in sinus rhythm was used for analysis. Two investigators blinded to the patient data used three algorithms for prediction of accessory pathway location. Among all locations, 48.5% were left-sided, 44% were right-sided, and 7.5% were located in the midseptum or anteroseptum. When only exact locations were accepted as match, predictive accuracy for Chiang was 71.5%, 72.4% for d'Avila, and 71.5% for Arruda. The percentage of predictive accuracy of all algorithms did not differ between the algorithms (p = 1.000; p = 0.875; p = 0.885, respectively). The best algorithm for prediction of right-sided, left-sided, and anteroseptal and midseptal accessory pathways was Arruda (p < 0.001). Arruda was significantly better than d'Avila in predicting adjacent sites (p = 0.035) and the percent of the contralateral site prediction was higher with d'Avila than Arruda (p = 0.013). All algorithms were similar in predicting accessory pathway location and the predicted accuracy was lower than previously reported by their authors. However, according to the accessory pathway site, the algorithm designed by Arruda et al. showed better predictions than the other algorithms and using this algorithm may provide advantages before a planned ablation.
Comparison of rule induction, decision trees and formal concept analysis approaches for classification

NASA Astrophysics Data System (ADS)

Kotelnikov, E. V.; Milov, V. R.

2018-05-01

Rule-based learning algorithms have higher transparency and easiness to interpret in comparison with neural networks and deep learning algorithms. These properties make it possible to effectively use such algorithms to solve descriptive tasks of data mining. The choice of an algorithm depends also on its ability to solve predictive tasks. The article compares the quality of the solution of the problems with binary and multiclass classification based on the experiments with six datasets from the UCI Machine Learning Repository. The authors investigate three algorithms: Ripper (rule induction), C4.5 (decision trees), In-Close (formal concept analysis). The results of the experiments show that In-Close demonstrates the best quality of classification in comparison with Ripper and C4.5, however the latter two generate more compact rule sets.
Sequential and Mixed Genetic Algorithm and Learning Automata (SGALA, MGALA) for Feature Selection in QSAR

PubMed Central

MotieGhader, Habib; Gharaghani, Sajjad; Masoudi-Sobhanzadeh, Yosef; Masoudi-Nejad, Ali

2017-01-01

Feature selection is of great importance in Quantitative Structure-Activity Relationship (QSAR) analysis. This problem has been solved using some meta-heuristic algorithms such as GA, PSO, ACO and so on. In this work two novel hybrid meta-heuristic algorithms i.e. Sequential GA and LA (SGALA) and Mixed GA and LA (MGALA), which are based on Genetic algorithm and learning automata for QSAR feature selection are proposed. SGALA algorithm uses advantages of Genetic algorithm and Learning Automata sequentially and the MGALA algorithm uses advantages of Genetic Algorithm and Learning Automata simultaneously. We applied our proposed algorithms to select the minimum possible number of features from three different datasets and also we observed that the MGALA and SGALA algorithms had the best outcome independently and in average compared to other feature selection algorithms. Through comparison of our proposed algorithms, we deduced that the rate of convergence to optimal result in MGALA and SGALA algorithms were better than the rate of GA, ACO, PSO and LA algorithms. In the end, the results of GA, ACO, PSO, LA, SGALA, and MGALA algorithms were applied as the input of LS-SVR model and the results from LS-SVR models showed that the LS-SVR model had more predictive ability with the input from SGALA and MGALA algorithms than the input from all other mentioned algorithms. Therefore, the results have corroborated that not only is the predictive efficiency of proposed algorithms better, but their rate of convergence is also superior to the all other mentioned algorithms. PMID:28979308
Sequential and Mixed Genetic Algorithm and Learning Automata (SGALA, MGALA) for Feature Selection in QSAR.

PubMed

MotieGhader, Habib; Gharaghani, Sajjad; Masoudi-Sobhanzadeh, Yosef; Masoudi-Nejad, Ali

2017-01-01

Feature selection is of great importance in Quantitative Structure-Activity Relationship (QSAR) analysis. This problem has been solved using some meta-heuristic algorithms such as GA, PSO, ACO and so on. In this work two novel hybrid meta-heuristic algorithms i.e. Sequential GA and LA (SGALA) and Mixed GA and LA (MGALA), which are based on Genetic algorithm and learning automata for QSAR feature selection are proposed. SGALA algorithm uses advantages of Genetic algorithm and Learning Automata sequentially and the MGALA algorithm uses advantages of Genetic Algorithm and Learning Automata simultaneously. We applied our proposed algorithms to select the minimum possible number of features from three different datasets and also we observed that the MGALA and SGALA algorithms had the best outcome independently and in average compared to other feature selection algorithms. Through comparison of our proposed algorithms, we deduced that the rate of convergence to optimal result in MGALA and SGALA algorithms were better than the rate of GA, ACO, PSO and LA algorithms. In the end, the results of GA, ACO, PSO, LA, SGALA, and MGALA algorithms were applied as the input of LS-SVR model and the results from LS-SVR models showed that the LS-SVR model had more predictive ability with the input from SGALA and MGALA algorithms than the input from all other mentioned algorithms. Therefore, the results have corroborated that not only is the predictive efficiency of proposed algorithms better, but their rate of convergence is also superior to the all other mentioned algorithms.
Comparison of Algorithm-based Estimates of Occupational Diesel Exhaust Exposure to Those of Multiple Independent Raters in a Population-based Case–Control Study

PubMed Central

Friesen, Melissa C.

2013-01-01

Objectives: Algorithm-based exposure assessments based on patterns in questionnaire responses and professional judgment can readily apply transparent exposure decision rules to thousands of jobs quickly. However, we need to better understand how algorithms compare to a one-by-one job review by an exposure assessor. We compared algorithm-based estimates of diesel exhaust exposure to those of three independent raters within the New England Bladder Cancer Study, a population-based case–control study, and identified conditions under which disparities occurred in the assessments of the algorithm and the raters. Methods: Occupational diesel exhaust exposure was assessed previously using an algorithm and a single rater for all 14 983 jobs reported by 2631 study participants during personal interviews conducted from 2001 to 2004. Two additional raters independently assessed a random subset of 324 jobs that were selected based on strata defined by the cross-tabulations of the algorithm and the first rater’s probability assessments for each job, oversampling their disagreements. The algorithm and each rater assessed the probability, intensity and frequency of occupational diesel exhaust exposure, as well as a confidence rating for each metric. Agreement among the raters, their aggregate rating (average of the three raters’ ratings) and the algorithm were evaluated using proportion of agreement, kappa and weighted kappa (κw). Agreement analyses on the subset used inverse probability weighting to extrapolate the subset to estimate agreement for all jobs. Classification and Regression Tree (CART) models were used to identify patterns in questionnaire responses that predicted disparities in exposure status (i.e., unexposed versus exposed) between the first rater and the algorithm-based estimates. Results: For the probability, intensity and frequency exposure metrics, moderate to moderately high agreement was observed among raters (κw = 0.50–0.76) and between the algorithm and the individual raters (κw = 0.58–0.81). For these metrics, the algorithm estimates had consistently higher agreement with the aggregate rating (κw = 0.82) than with the individual raters. For all metrics, the agreement between the algorithm and the aggregate ratings was highest for the unexposed category (90–93%) and was poor to moderate for the exposed categories (9–64%). Lower agreement was observed for jobs with a start year <1965 versus ≥1965. For the confidence metrics, the agreement was poor to moderate among raters (κw = 0.17–0.45) and between the algorithm and the individual raters (κw = 0.24–0.61). CART models identified patterns in the questionnaire responses that predicted a fair-to-moderate (33–89%) proportion of the disagreements between the raters’ and the algorithm estimates. Discussion: The agreement between any two raters was similar to the agreement between an algorithm-based approach and individual raters, providing additional support for using the more efficient and transparent algorithm-based approach. CART models identified some patterns in disagreements between the first rater and the algorithm. Given the absence of a gold standard for estimating exposure, these patterns can be reviewed by a team of exposure assessors to determine whether the algorithm should be revised for future studies. PMID:23184256
Winter Precipitation Forecast in the European and Mediterranean Regions Using Cluster Analysis

NASA Astrophysics Data System (ADS)

Totz, Sonja; Tziperman, Eli; Coumou, Dim; Pfeiffer, Karl; Cohen, Judah

2017-12-01

The European climate is changing under global warming, and especially the Mediterranean region has been identified as a hot spot for climate change with climate models projecting a reduction in winter rainfall and a very pronounced increase in summertime heat waves. These trends are already detectable over the historic period. Hence, it is beneficial to forecast seasonal droughts well in advance so that water managers and stakeholders can prepare to mitigate deleterious impacts. We developed a new cluster-based empirical forecast method to predict precipitation anomalies in winter. This algorithm considers not only the strength but also the pattern of the precursors. We compare our algorithm with dynamic forecast models and a canonical correlation analysis-based prediction method demonstrating that our prediction method performs better in terms of time and pattern correlation in the Mediterranean and European regions.
Transcriptomics-based strain optimization tool for designing secondary metabolite overproducing strains of Streptomyces coelicolor.

PubMed

Kim, Minsuk; Yi, Jeong Sang; Lakshmanan, Meiyappan; Lee, Dong-Yup; Kim, Byung-Gee

2016-03-01

In silico model-driven analysis using genome-scale model of metabolism (GEM) has been recognized as a promising method for microbial strain improvement. However, most of the current GEM-based strain design algorithms based on flux balance analysis (FBA) heavily rely on the steady-state and optimality assumptions without considering any regulatory information. Thus, their practical usage is quite limited, especially in its application to secondary metabolites overproduction. In this study, we developed a transcriptomics-based strain optimization tool (tSOT) in order to overcome such limitations by integrating transcriptomic data into GEM. Initially, we evaluated existing algorithms for integrating transcriptomic data into GEM using Streptomyces coelicolor dataset, and identified iMAT algorithm as the only and the best algorithm for characterizing the secondary metabolism of S. coelicolor. Subsequently, we developed tSOT platform where iMAT is adopted to predict the reaction states, and successfully demonstrated its applicability to secondary metabolites overproduction by designing actinorhodin (ACT), a polyketide antibiotic, overproducing strain of S. coelicolor. Mutants overexpressing tSOT targets such as ribulose 5-phosphate 3-epimerase and NADP-dependent malic enzyme showed 2 and 1.8-fold increase in ACT production, thereby validating the tSOT prediction. It is expected that tSOT can be used for solving other metabolic engineering problems which could not be addressed by current strain design algorithms, especially for the secondary metabolite overproductions. © 2015 Wiley Periodicals, Inc.
Advanced methods in NDE using machine learning approaches

NASA Astrophysics Data System (ADS)

Wunderlich, Christian; Tschöpe, Constanze; Duckhorn, Frank

2018-04-01

Machine learning (ML) methods and algorithms have been applied recently with great success in quality control and predictive maintenance. Its goal to build new and/or leverage existing algorithms to learn from training data and give accurate predictions, or to find patterns, particularly with new and unseen similar data, fits perfectly to Non-Destructive Evaluation. The advantages of ML in NDE are obvious in such tasks as pattern recognition in acoustic signals or automated processing of images from X-ray, Ultrasonics or optical methods. Fraunhofer IKTS is using machine learning algorithms in acoustic signal analysis. The approach had been applied to such a variety of tasks in quality assessment. The principal approach is based on acoustic signal processing with a primary and secondary analysis step followed by a cognitive system to create model data. Already in the second analysis steps unsupervised learning algorithms as principal component analysis are used to simplify data structures. In the cognitive part of the software further unsupervised and supervised learning algorithms will be trained. Later the sensor signals from unknown samples can be recognized and classified automatically by the algorithms trained before. Recently the IKTS team was able to transfer the software for signal processing and pattern recognition to a small printed circuit board (PCB). Still, algorithms will be trained on an ordinary PC; however, trained algorithms run on the Digital Signal Processor and the FPGA chip. The identical approach will be used for pattern recognition in image analysis of OCT pictures. Some key requirements have to be fulfilled, however. A sufficiently large set of training data, a high signal-to-noise ratio, and an optimized and exact fixation of components are required. The automated testing can be done subsequently by the machine. By integrating the test data of many components along the value chain further optimization including lifetime and durability prediction based on big data becomes possible, even if components are used in different versions or configurations. This is the promise behind German Industry 4.0.
NASA GPM GV Science Implementation

NASA Technical Reports Server (NTRS)

Petersen, W. A.

2009-01-01

Pre-launch algorithm development & post-launch product evaluation: The GPM GV paradigm moves beyond traditional direct validation/comparison activities by incorporating improved algorithm physics & model applications (end-to-end validation) in the validation process. Three approaches: 1) National Network (surface): Operational networks to identify and resolve first order discrepancies (e.g., bias) between satellite and ground-based precipitation estimates. 2) Physical Process (vertical column): Cloud system and microphysical studies geared toward testing and refinement of physically-based retrieval algorithms. 3) Integrated (4-dimensional): Integration of satellite precipitation products into coupled prediction models to evaluate strengths/limitations of satellite precipitation producers.
THE USE OF BOX MODELS TO DESCRIBE THE PERSONAL CLOUD EFFECT ON HUMAN EXPOSURE TO PARTICULATE MATTER

EPA Science Inventory

An algorithm has been developed to describe particle transport into and out of the breathing zone in an effort to predict the effects of the personal cloud phenomenon (Eisner and Heist, 2000). The algorithm was developed based on the principle of mass balance between a system ...
Multi-label literature classification based on the Gene Ontology graph.

PubMed

Jin, Bo; Muller, Brian; Zhai, Chengxiang; Lu, Xinghua

2008-12-08

The Gene Ontology is a controlled vocabulary for representing knowledge related to genes and proteins in a computable form. The current effort of manually annotating proteins with the Gene Ontology is outpaced by the rate of accumulation of biomedical knowledge in literature, which urges the development of text mining approaches to facilitate the process by automatically extracting the Gene Ontology annotation from literature. The task is usually cast as a text classification problem, and contemporary methods are confronted with unbalanced training data and the difficulties associated with multi-label classification. In this research, we investigated the methods of enhancing automatic multi-label classification of biomedical literature by utilizing the structure of the Gene Ontology graph. We have studied three graph-based multi-label classification algorithms, including a novel stochastic algorithm and two top-down hierarchical classification methods for multi-label literature classification. We systematically evaluated and compared these graph-based classification algorithms to a conventional flat multi-label algorithm. The results indicate that, through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods can significantly improve predictions of the Gene Ontology terms implied by the analyzed text. Furthermore, the graph-based multi-label classifiers are capable of suggesting Gene Ontology annotations (to curators) that are closely related to the true annotations even if they fail to predict the true ones directly. A software package implementing the studied algorithms is available for the research community. Through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods have better potential than the conventional flat multi-label classification approach to facilitate protein annotation based on the literature.
Clinical algorithms for the diagnosis and prognosis of interstitial lung disease in systemic sclerosis.

PubMed

Hax, Vanessa; Bredemeier, Markus; Didonet Moro, Ana Laura; Pavan, Thaís Rohde; Vieira, Marcelo Vasconcellos; Pitrez, Eduardo Hennemann; da Silva Chakr, Rafael Mendonça; Xavier, Ricardo Machado

2017-10-01

Interstitial lung disease (ILD) is currently the primary cause of death in systemic sclerosis (SSc). Thoracic high-resolution computed tomography (HRCT) is considered the gold standard for diagnosis. Recent studies have proposed several clinical algorithms to predict the diagnosis and prognosis of SSc-ILD. To test the clinical algorithms to predict the presence and prognosis of SSc-ILD and to evaluate the association of extent of ILD with mortality in a cohort of SSc patients. Retrospective cohort study, including 177 SSc patients assessed by clinical evaluation, laboratory tests, pulmonary function tests, and HRCT. Three clinical algorithms, combining lung auscultation, chest radiography, and percentage predicted forced vital capacity (FVC), were applied for the diagnosis of different extents of ILD on HRCT. Univariate and multivariate Cox proportional models were used to analyze the association of algorithms and the extent of ILD on HRCT with the risk of death using hazard ratios (HR). The prevalence of ILD on HRCT was 57.1% and 79 patients died (44.6%) in a median follow-up of 11.1 years. For identification of ILD with extent ≥10% and ≥20% on HRCT, all algorithms presented a high sensitivity (>89%) and a very low negative likelihood ratio (<0.16). For prognosis, survival was decreased for all algorithms, especially the algorithm C (HR = 3.47, 95% CI: 1.62-7.42), which identified the presence of ILD based on crackles on lung auscultation, findings on chest X-ray, or FVC <80%. Extensive disease as proposed by Goh et al. (extent of ILD > 20% on HRCT or, in indeterminate cases, FVC < 70%) had a significantly higher risk of death (HR = 3.42, 95% CI: 2.12-5.52). Survival was not different between patients with extent of 10% or 20% of ILD on HRCT, and analysis of 10-year mortality suggested that a threshold of 10% may also have a good predictive value for mortality. However, there is no clear cutoff above which mortality is sharply increased. Clinical algorithms had a good diagnostic performance for extents of SSc-ILD on HRCT with clinical and prognostic relevance (≥10% and ≥20%), and were also strongly related to mortality. Non-HRCT-based algorithms could be useful when HRCT is not available. This is the first study to replicate the prognostic algorithm proposed by Goh et al. in a developing country. Copyright © 2017 Elsevier Inc. All rights reserved.
Comparison of Body Weight Trend Algorithms for Prediction of Heart Failure Related Events in Home Care Setting.

PubMed

Eggerth, Alphons; Modre-Osprian, Robert; Hayn, Dieter; Kastner, Peter; Pölzl, Gerhard; Schreier, Günter

2017-01-01

Automatic event detection is used in telemedicine based heart failure disease management programs supporting physicians and nurses in monitoring of patients' health data. Analysis of the performance of automatic event detection algorithms for prediction of HF related hospitalisations or diuretic dose increases. Rule-Of-Thumb and Moving Average Convergence Divergence (MACD) algorithm were applied to body weight data from 106 heart failure patients of the HerzMobil-Tirol disease management program. The evaluation criteria were based on Youden index and ROC curves. Analysis of data from 1460 monitoring weeks with 54 events showed a maximum Youden index of 0.19 for MACD and RoT with a specificity > 0.90. Comparison of the two algorithms for real-world monitoring data showed similar results regarding total and limited AUC. An improvement of the sensitivity might be possible by including additional health data (e.g. vital signs and self-reported well-being) because body weight variations obviously are not the only cause of HF related hospitalisations or diuretic dose increases.
Stata Modules for Calculating Novel Predictive Performance Indices for Logistic Models.

PubMed

Barkhordari, Mahnaz; Padyab, Mojgan; Hadaegh, Farzad; Azizi, Fereidoun; Bozorgmanesh, Mohammadreza

2016-01-01

Prediction is a fundamental part of prevention of cardiovascular diseases (CVD). The development of prediction algorithms based on the multivariate regression models loomed several decades ago. Parallel with predictive models development, biomarker researches emerged in an impressively great scale. The key question is how best to assess and quantify the improvement in risk prediction offered by new biomarkers or more basically how to assess the performance of a risk prediction model. Discrimination, calibration, and added predictive value have been recently suggested to be used while comparing the predictive performances of the predictive models' with and without novel biomarkers. Lack of user-friendly statistical software has restricted implementation of novel model assessment methods while examining novel biomarkers. We intended, thus, to develop a user-friendly software that could be used by researchers with few programming skills. We have written a Stata command that is intended to help researchers obtain cut point-free and cut point-based net reclassification improvement index and (NRI) and relative and absolute Integrated discriminatory improvement index (IDI) for logistic-based regression analyses.We applied the commands to a real data on women participating the Tehran lipid and glucose study (TLGS) to examine if information of a family history of premature CVD, waist circumference, and fasting plasma glucose can improve predictive performance of the Framingham's "general CVD risk" algorithm. The command is addpred for logistic regression models. The Stata package provided herein can encourage the use of novel methods in examining predictive capacity of ever-emerging plethora of novel biomarkers.

Towards Prognostics of Electrolytic Capacitors

NASA Technical Reports Server (NTRS)

Celaya, Jose R.; Kulkarni, Chetan; Biswas, Gautam; Goegel, Kai

2011-01-01

A remaining useful life prediction algorithm and degradation model for electrolytic capacitors is presented. Electrolytic capacitors are used in several applications ranging from power supplies on critical avionics equipment to power drivers for electro-mechanical actuators. These devices are known for their low reliability and given their criticality in electronics subsystems they are a good candidate for component level prognostics and health management research. Prognostics provides a way to assess remaining useful life of a capacitor based on its current state of health and its anticipated future usage and operational conditions. In particular, experimental results of an accelerated aging test under electrical stresses are presented. The capacitors used in this test form the basis for a remaining life prediction algorithm where a model of the degradation process is suggested. This preliminary remaining life prediction algorithm serves as a demonstration of how prognostics methodologies could be used for electrolytic capacitors.
Determination of stores pointing error due to wing flexibility under flight load

NASA Technical Reports Server (NTRS)

Lokos, William A.; Bahm, Catherine M.; Heinle, Robert A.

1995-01-01

The in-flight elastic wing twist of a fighter-type aircraft was studied to provide for an improved on-board real-time computed prediction of pointing variations of three wing store stations. This is an important capability to correct sensor pod alignment variation or to establish initial conditions of iron bombs or smart weapons prior to release. The original algorithm was based upon coarse measurements. The electro-optical Flight Deflection Measurement System measured the deformed wing shape in flight under maneuver loads to provide a higher resolution database from which an improved twist prediction algorithm could be developed. The FDMS produced excellent repeatable data. In addition, a NASTRAN finite-element analysis was performed to provide additional elastic deformation data. The FDMS data combined with the NASTRAN analysis indicated that an improved prediction algorithm could be derived by using a different set of aircraft parameters, namely normal acceleration, stores configuration, Mach number, and gross weight.
A Family of Well-Clear Boundary Models for the Integration of UAS in the NAS

NASA Technical Reports Server (NTRS)

Munoz, Cesar A.; Narkawicz, Anthony; Chamberlain, James; Consiglio, Maria; Upchurch, Jason

2014-01-01

The FAA-sponsored Sense and Avoid Workshop for Unmanned Aircraft Systems (UAS) defines the concept of sense and avoid for remote pilots as "the capability of a UAS to remain well clear from and avoid collisions with other airborne traffic." Hence, a rigorous definition of well clear is fundamental to any separation assurance concept for the integration of UAS into civil airspace. This paper presents a family of well-clear boundary models based on the TCAS II Resolution Advisory logic. For these models, algorithms that predict well-clear violations along aircraft current trajectories are provided. These algorithms are analogous to conflict detection algorithms but instead of predicting loss of separation, they predict whether well-clear violations will occur during a given lookahead time interval. Analytical techniques are used to study the properties and relationships satisfied by the models.
Cyclic coordinate descent: A robotics algorithm for protein loop closure.

PubMed

Canutescu, Adrian A; Dunbrack, Roland L

2003-05-01

In protein structure prediction, it is often the case that a protein segment must be adjusted to connect two fixed segments. This occurs during loop structure prediction in homology modeling as well as in ab initio structure prediction. Several algorithms for this purpose are based on the inverse Jacobian of the distance constraints with respect to dihedral angle degrees of freedom. These algorithms are sometimes unstable and fail to converge. We present an algorithm developed originally for inverse kinematics applications in robotics. In robotics, an end effector in the form of a robot hand must reach for an object in space by altering adjustable joint angles and arm lengths. In loop prediction, dihedral angles must be adjusted to move the C-terminal residue of a segment to superimpose on a fixed anchor residue in the protein structure. The algorithm, referred to as cyclic coordinate descent or CCD, involves adjusting one dihedral angle at a time to minimize the sum of the squared distances between three backbone atoms of the moving C-terminal anchor and the corresponding atoms in the fixed C-terminal anchor. The result is an equation in one variable for the proposed change in each dihedral. The algorithm proceeds iteratively through all of the adjustable dihedral angles from the N-terminal to the C-terminal end of the loop. CCD is suitable as a component of loop prediction methods that generate large numbers of trial structures. It succeeds in closing loops in a large test set 99.79% of the time, and fails occasionally only for short, highly extended loops. It is very fast, closing loops of length 8 in 0.037 sec on average.
Linear genetic programming application for successive-station monthly streamflow prediction

NASA Astrophysics Data System (ADS)

Danandeh Mehr, Ali; Kahya, Ercan; Yerdelen, Cahit

2014-09-01

In recent decades, artificial intelligence (AI) techniques have been pronounced as a branch of computer science to model wide range of hydrological phenomena. A number of researches have been still comparing these techniques in order to find more effective approaches in terms of accuracy and applicability. In this study, we examined the ability of linear genetic programming (LGP) technique to model successive-station monthly streamflow process, as an applied alternative for streamflow prediction. A comparative efficiency study between LGP and three different artificial neural network algorithms, namely feed forward back propagation (FFBP), generalized regression neural networks (GRNN), and radial basis function (RBF), has also been presented in this study. For this aim, firstly, we put forward six different successive-station monthly streamflow prediction scenarios subjected to training by LGP and FFBP using the field data recorded at two gauging stations on Çoruh River, Turkey. Based on Nash-Sutcliffe and root mean squared error measures, we then compared the efficiency of these techniques and selected the best prediction scenario. Eventually, GRNN and RBF algorithms were utilized to restructure the selected scenario and to compare with corresponding FFBP and LGP. Our results indicated the promising role of LGP for successive-station monthly streamflow prediction providing more accurate results than those of all the ANN algorithms. We found an explicit LGP-based expression evolved by only the basic arithmetic functions as the best prediction model for the river, which uses the records of the both target and upstream stations.
Design and Evaluation of a Dynamic Programming Flight Routing Algorithm Using the Convective Weather Avoidance Model

NASA Technical Reports Server (NTRS)

Ng, Hok K.; Grabbe, Shon; Mukherjee, Avijit

2010-01-01

The optimization of traffic flows in congested airspace with varying convective weather is a challenging problem. One approach is to generate shortest routes between origins and destinations while meeting airspace capacity constraint in the presence of uncertainties, such as weather and airspace demand. This study focuses on development of an optimal flight path search algorithm that optimizes national airspace system throughput and efficiency in the presence of uncertainties. The algorithm is based on dynamic programming and utilizes the predicted probability that an aircraft will deviate around convective weather. It is shown that the running time of the algorithm increases linearly with the total number of links between all stages. The optimal routes minimize a combination of fuel cost and expected cost of route deviation due to convective weather. They are considered as alternatives to the set of coded departure routes which are predefined by FAA to reroute pre-departure flights around weather or air traffic constraints. A formula, which calculates predicted probability of deviation from a given flight path, is also derived. The predicted probability of deviation is calculated for all path candidates. Routes with the best probability are selected as optimal. The predicted probability of deviation serves as a computable measure of reliability in pre-departure rerouting. The algorithm can also be extended to automatically adjust its design parameters to satisfy the desired level of reliability.
Small angle X-ray scattering and cross-linking for data assisted protein structure prediction in CASP 12 with prospects for improved accuracy.

PubMed

Ogorzalek, Tadeusz L; Hura, Greg L; Belsom, Adam; Burnett, Kathryn H; Kryshtafovych, Andriy; Tainer, John A; Rappsilber, Juri; Tsutakawa, Susan E; Fidelis, Krzysztof

2018-03-01

Experimental data offers empowering constraints for structure prediction. These constraints can be used to filter equivalently scored models or more powerfully within optimization functions toward prediction. In CASP12, Small Angle X-ray Scattering (SAXS) and Cross-Linking Mass Spectrometry (CLMS) data, measured on an exemplary set of novel fold targets, were provided to the CASP community of protein structure predictors. As solution-based techniques, SAXS and CLMS can efficiently measure states of the full-length sequence in its native solution conformation and assembly. However, this experimental data did not substantially improve prediction accuracy judged by fits to crystallographic models. One issue, beyond intrinsic limitations of the algorithms, was a disconnect between crystal structures and solution-based measurements. Our analyses show that many targets had substantial percentages of disordered regions (up to 40%) or were multimeric or both. Thus, solution measurements of flexibility and assembly support variations that may confound prediction algorithms trained on crystallographic data and expecting globular fully-folded monomeric proteins. Here, we consider the CLMS and SAXS data collected, the information in these solution measurements, and the challenges in incorporating them into computational prediction. As improvement opportunities were only partly realized in CASP12, we provide guidance on how data from the full-length biological unit and the solution state can better aid prediction of the folded monomer or subunit. We furthermore describe strategic integrations of solution measurements with computational prediction programs with the aim of substantially improving foundational knowledge and the accuracy of computational algorithms for biologically-relevant structure predictions for proteins in solution. © 2018 Wiley Periodicals, Inc.
Identification of informative features for predicting proinflammatory potentials of engine exhausts.

PubMed

Wang, Chia-Chi; Lin, Ying-Chi; Lin, Yuan-Chung; Jhang, Syu-Ruei; Tung, Chun-Wei

2017-08-18

The immunotoxicity of engine exhausts is of high concern to human health due to the increasing prevalence of immune-related diseases. However, the evaluation of immunotoxicity of engine exhausts is currently based on expensive and time-consuming experiments. It is desirable to develop efficient methods for immunotoxicity assessment. To accelerate the development of safe alternative fuels, this study proposed a computational method for identifying informative features for predicting proinflammatory potentials of engine exhausts. A principal component regression (PCR) algorithm was applied to develop prediction models. The informative features were identified by a sequential backward feature elimination (SBFE) algorithm. A total of 19 informative chemical and biological features were successfully identified by SBFE algorithm. The informative features were utilized to develop a computational method named FS-CBM for predicting proinflammatory potentials of engine exhausts. FS-CBM model achieved a high performance with correlation coefficient values of 0.997 and 0.943 obtained from training and independent test sets, respectively. The FS-CBM model was developed for predicting proinflammatory potentials of engine exhausts with a large improvement on prediction performance compared with our previous CBM model. The proposed method could be further applied to construct models for bioactivities of mixtures.
Blood glucose level prediction based on support vector regression using mobile platforms.

PubMed

Reymann, Maximilian P; Dorschky, Eva; Groh, Benjamin H; Martindale, Christine; Blank, Peter; Eskofier, Bjoern M

2016-08-01

The correct treatment of diabetes is vital to a patient's health: Staying within defined blood glucose levels prevents dangerous short- and long-term effects on the body. Mobile devices informing patients about their future blood glucose levels could enable them to take counter-measures to prevent hypo or hyper periods. Previous work addressed this challenge by predicting the blood glucose levels using regression models. However, these approaches required a physiological model, representing the human body's response to insulin and glucose intake, or are not directly applicable to mobile platforms (smart phones, tablets). In this paper, we propose an algorithm for mobile platforms to predict blood glucose levels without the need for a physiological model. Using an online software simulator program, we trained a Support Vector Regression (SVR) model and exported the parameter settings to our mobile platform. The prediction accuracy of our mobile platform was evaluated with pre-recorded data of a type 1 diabetes patient. The blood glucose level was predicted with an error of 19 % compared to the true value. Considering the permitted error of commercially used devices of 15 %, our algorithm is the basis for further development of mobile prediction algorithms.
Shear wave prediction using committee fuzzy model constrained by lithofacies, Zagros basin, SW Iran

NASA Astrophysics Data System (ADS)

Shiroodi, Sadjad Kazem; Ghafoori, Mohammad; Ansari, Hamid Reza; Lashkaripour, Golamreza; Ghanadian, Mostafa

2017-02-01

The main purpose of this study is to introduce the geological controlling factors in improving an intelligence-based model to estimate shear wave velocity from seismic attributes. The proposed method includes three main steps in the framework of geological events in a complex sedimentary succession located in the Persian Gulf. First, the best attributes were selected from extracted seismic data. Second, these attributes were transformed into shear wave velocity using fuzzy inference systems (FIS) such as Sugeno's fuzzy inference (SFIS), adaptive neuro-fuzzy inference (ANFIS) and optimized fuzzy inference (OFIS). Finally, a committee fuzzy machine (CFM) based on bat-inspired algorithm (BA) optimization was applied to combine previous predictions into an enhanced solution. In order to show the geological effect on improving the prediction, the main classes of predominate lithofacies in the reservoir of interest including shale, sand, and carbonate were selected and then the proposed algorithm was performed with and without lithofacies constraint. The results showed a good agreement between real and predicted shear wave velocity in the lithofacies-based model compared to the model without lithofacies especially in sand and carbonate.
Novel Modeling of Combinatorial miRNA Targeting Identifies SNP with Potential Role in Bone Density

PubMed Central

Coronnello, Claudia; Hartmaier, Ryan; Arora, Arshi; Huleihel, Luai; Pandit, Kusum V.; Bais, Abha S.; Butterworth, Michael; Kaminski, Naftali; Stormo, Gary D.; Oesterreich, Steffi; Benos, Panayiotis V.

2012-01-01

MicroRNAs (miRNAs) are post-transcriptional regulators that bind to their target mRNAs through base complementarity. Predicting miRNA targets is a challenging task and various studies showed that existing algorithms suffer from high number of false predictions and low to moderate overlap in their predictions. Until recently, very few algorithms considered the dynamic nature of the interactions, including the effect of less specific interactions, the miRNA expression level, and the effect of combinatorial miRNA binding. Addressing these issues can result in a more accurate miRNA:mRNA modeling with many applications, including efficient miRNA-related SNP evaluation. We present a novel thermodynamic model based on the Fermi-Dirac equation that incorporates miRNA expression in the prediction of target occupancy and we show that it improves the performance of two popular single miRNA target finders. Modeling combinatorial miRNA targeting is a natural extension of this model. Two other algorithms show improved prediction efficiency when combinatorial binding models were considered. ComiR (Combinatorial miRNA targeting), a novel algorithm we developed, incorporates the improved predictions of the four target finders into a single probabilistic score using ensemble learning. Combining target scores of multiple miRNAs using ComiR improves predictions over the naïve method for target combination. ComiR scoring scheme can be used for identification of SNPs affecting miRNA binding. As proof of principle, ComiR identified rs17737058 as disruptive to the miR-488-5p:NCOA1 interaction, which we confirmed in vitro. We also found rs17737058 to be significantly associated with decreased bone mineral density (BMD) in two independent cohorts indicating that the miR-488-5p/NCOA1 regulatory axis is likely critical in maintaining BMD in women. With increasing availability of comprehensive high-throughput datasets from patients ComiR is expected to become an essential tool for miRNA-related studies. PMID:23284279
Short-term Power Load Forecasting Based on Balanced KNN

NASA Astrophysics Data System (ADS)

Lv, Xianlong; Cheng, Xingong; YanShuang; Tang, Yan-mei

2018-03-01

To improve the accuracy of load forecasting, a short-term load forecasting model based on balanced KNN algorithm is proposed; According to the load characteristics, the historical data of massive power load are divided into scenes by the K-means algorithm; In view of unbalanced load scenes, the balanced KNN algorithm is proposed to classify the scene accurately; The local weighted linear regression algorithm is used to fitting and predict the load; Adopting the Apache Hadoop programming framework of cloud computing, the proposed algorithm model is parallelized and improved to enhance its ability of dealing with massive and high-dimension data. The analysis of the household electricity consumption data for a residential district is done by 23-nodes cloud computing cluster, and experimental results show that the load forecasting accuracy and execution time by the proposed model are the better than those of traditional forecasting algorithm.
Statistical learning theory for high dimensional prediction: Application to criterion-keyed scale development.

PubMed

Chapman, Benjamin P; Weiss, Alexander; Duberstein, Paul R

2016-12-01

Statistical learning theory (SLT) is the statistical formulation of machine learning theory, a body of analytic methods common in "big data" problems. Regression-based SLT algorithms seek to maximize predictive accuracy for some outcome, given a large pool of potential predictors, without overfitting the sample. Research goals in psychology may sometimes call for high dimensional regression. One example is criterion-keyed scale construction, where a scale with maximal predictive validity must be built from a large item pool. Using this as a working example, we first introduce a core principle of SLT methods: minimization of expected prediction error (EPE). Minimizing EPE is fundamentally different than maximizing the within-sample likelihood, and hinges on building a predictive model of sufficient complexity to predict the outcome well, without undue complexity leading to overfitting. We describe how such models are built and refined via cross-validation. We then illustrate how 3 common SLT algorithms-supervised principal components, regularization, and boosting-can be used to construct a criterion-keyed scale predicting all-cause mortality, using a large personality item pool within a population cohort. Each algorithm illustrates a different approach to minimizing EPE. Finally, we consider broader applications of SLT predictive algorithms, both as supportive analytic tools for conventional methods, and as primary analytic tools in discovery phase research. We conclude that despite their differences from the classic null-hypothesis testing approach-or perhaps because of them-SLT methods may hold value as a statistically rigorous approach to exploratory regression. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
TH-A-19A-06: Site-Specific Comparison of Analytical and Monte Carlo Based Dose Calculations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schuemann, J; Grassberger, C; Paganetti, H

2014-06-15

Purpose: To investigate the impact of complex patient geometries on the capability of analytical dose calculation algorithms to accurately predict dose distributions and to verify currently used uncertainty margins in proton therapy. Methods: Dose distributions predicted by an analytical pencilbeam algorithm were compared with Monte Carlo simulations (MCS) using TOPAS. 79 complete patient treatment plans were investigated for 7 disease sites (liver, prostate, breast, medulloblastoma spine and whole brain, lung and head and neck). A total of 508 individual passively scattered treatment fields were analyzed for field specific properties. Comparisons based on target coverage indices (EUD, D95, D90 and D50)more » were performed. Range differences were estimated for the distal position of the 90% dose level (R90) and the 50% dose level (R50). Two-dimensional distal dose surfaces were calculated and the root mean square differences (RMSD), average range difference (ARD) and average distal dose degradation (ADD), the distance between the distal position of the 80% and 20% dose levels (R80- R20), were analyzed. Results: We found target coverage indices calculated by TOPAS to generally be around 1–2% lower than predicted by the analytical algorithm. Differences in R90 predicted by TOPAS and the planning system can be larger than currently applied range margins in proton therapy for small regions distal to the target volume. We estimate new site-specific range margins (R90) for analytical dose calculations considering total range uncertainties and uncertainties from dose calculation alone based on the RMSD. Our results demonstrate that a reduction of currently used uncertainty margins is feasible for liver, prostate and whole brain fields even without introducing MC dose calculations. Conclusion: Analytical dose calculation algorithms predict dose distributions within clinical limits for more homogeneous patients sites (liver, prostate, whole brain). However, we recommend treatment plan verification using Monte Carlo simulations for patients with complex geometries.« less
Prediction of gene-phenotype associations in humans, mice, and plants using phenologs.

PubMed

Woods, John O; Singh-Blom, Ulf Martin; Laurent, Jon M; McGary, Kriston L; Marcotte, Edward M

2013-06-21

Phenotypes and diseases may be related to seemingly dissimilar phenotypes in other species by means of the orthology of underlying genes. Such "orthologous phenotypes," or "phenologs," are examples of deep homology, and may be used to predict additional candidate disease genes. In this work, we develop an unsupervised algorithm for ranking phenolog-based candidate disease genes through the integration of predictions from the k nearest neighbor phenologs, comparing classifiers and weighting functions by cross-validation. We also improve upon the original method by extending the theory to paralogous phenotypes. Our algorithm makes use of additional phenotype data--from chicken, zebrafish, and E. coli, as well as new datasets for C. elegans--establishing that several types of annotations may be treated as phenotypes. We demonstrate the use of our algorithm to predict novel candidate genes for human atrial fibrillation (such as HRH2, ATP4A, ATP4B, and HOPX) and epilepsy (e.g., PAX6 and NKX2-1). We suggest gene candidates for pharmacologically-induced seizures in mouse, solely based on orthologous phenotypes from E. coli. We also explore the prediction of plant gene-phenotype associations, as for the Arabidopsis response to vernalization phenotype. We are able to rank gene predictions for a significant portion of the diseases in the Online Mendelian Inheritance in Man database. Additionally, our method suggests candidate genes for mammalian seizures based only on bacterial phenotypes and gene orthology. We demonstrate that phenotype information may come from diverse sources, including drug sensitivities, gene ontology biological processes, and in situ hybridization annotations. Finally, we offer testable candidates for a variety of human diseases, plant traits, and other classes of phenotypes across a wide array of species.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Stoiber, Marcus H.; Brown, James B.

This software implements the first base caller for nanopore data that calls bases directly from raw data. The basecRAWller algorithm has two major advantages over current nanopore base calling software: (1) streaming base calling and (2) base calling from information rich raw signal. The ability to perform truly streaming base calling as signal is received from the sequencer can be very powerful as this is one of the major advantages of this technology as compared to other sequencing technologies. As such enabling as much streaming potential as possible will be incredibly important as this technology continues to become more widelymore » applied in biosciences. All other base callers currently employ the Viterbi algorithm which requires the whole sequence to employ the complete base calling procedure and thus precludes a natural streaming base calling procedure. The other major advantage of the basecRAWller algorithm is the prediction of bases from raw signal which contains much richer information than the segmented chunks that current algorithms employ. This leads to the potential for much more accurate base calls which would make this technology much more valuable to all of the growing user base for this technology.« less
Prediction of breast cancer risk using a machine learning approach embedded with a locality preserving projection algorithm.

PubMed

Heidari, Morteza; Khuzani, Abolfazl Zargari; Hollingsworth, Alan B; Danala, Gopichandh; Mirniaharikandehei, Seyedehnafiseh; Qiu, Yuchen; Liu, Hong; Zheng, Bin

2018-01-30

In order to automatically identify a set of effective mammographic image features and build an optimal breast cancer risk stratification model, this study aims to investigate advantages of applying a machine learning approach embedded with a locally preserving projection (LPP) based feature combination and regeneration algorithm to predict short-term breast cancer risk. A dataset involving negative mammograms acquired from 500 women was assembled. This dataset was divided into two age-matched classes of 250 high risk cases in which cancer was detected in the next subsequent mammography screening and 250 low risk cases, which remained negative. First, a computer-aided image processing scheme was applied to segment fibro-glandular tissue depicted on mammograms and initially compute 44 features related to the bilateral asymmetry of mammographic tissue density distribution between left and right breasts. Next, a multi-feature fusion based machine learning classifier was built to predict the risk of cancer detection in the next mammography screening. A leave-one-case-out (LOCO) cross-validation method was applied to train and test the machine learning classifier embedded with a LLP algorithm, which generated a new operational vector with 4 features using a maximal variance approach in each LOCO process. Results showed a 9.7% increase in risk prediction accuracy when using this LPP-embedded machine learning approach. An increased trend of adjusted odds ratios was also detected in which odds ratios increased from 1.0 to 11.2. This study demonstrated that applying the LPP algorithm effectively reduced feature dimensionality, and yielded higher and potentially more robust performance in predicting short-term breast cancer risk.
Prediction of breast cancer risk using a machine learning approach embedded with a locality preserving projection algorithm

NASA Astrophysics Data System (ADS)

Heidari, Morteza; Zargari Khuzani, Abolfazl; Hollingsworth, Alan B.; Danala, Gopichandh; Mirniaharikandehei, Seyedehnafiseh; Qiu, Yuchen; Liu, Hong; Zheng, Bin

2018-02-01

In order to automatically identify a set of effective mammographic image features and build an optimal breast cancer risk stratification model, this study aims to investigate advantages of applying a machine learning approach embedded with a locally preserving projection (LPP) based feature combination and regeneration algorithm to predict short-term breast cancer risk. A dataset involving negative mammograms acquired from 500 women was assembled. This dataset was divided into two age-matched classes of 250 high risk cases in which cancer was detected in the next subsequent mammography screening and 250 low risk cases, which remained negative. First, a computer-aided image processing scheme was applied to segment fibro-glandular tissue depicted on mammograms and initially compute 44 features related to the bilateral asymmetry of mammographic tissue density distribution between left and right breasts. Next, a multi-feature fusion based machine learning classifier was built to predict the risk of cancer detection in the next mammography screening. A leave-one-case-out (LOCO) cross-validation method was applied to train and test the machine learning classifier embedded with a LLP algorithm, which generated a new operational vector with 4 features using a maximal variance approach in each LOCO process. Results showed a 9.7% increase in risk prediction accuracy when using this LPP-embedded machine learning approach. An increased trend of adjusted odds ratios was also detected in which odds ratios increased from 1.0 to 11.2. This study demonstrated that applying the LPP algorithm effectively reduced feature dimensionality, and yielded higher and potentially more robust performance in predicting short-term breast cancer risk.
BRCA-Monet: a breast cancer specific drug treatment mode-of-action network for treatment effective prediction using large scale microarray database

PubMed Central

2013-01-01

Background Connectivity map (cMap) is a recent developed dataset and algorithm for uncovering and understanding the treatment effect of small molecules on different cancer cell lines. It is widely used but there are still remaining challenges for accurate predictions. Method Here, we propose BRCA-MoNet, a network of drug mode of action (MoA) specific to breast cancer, which is constructed based on the cMap dataset. A drug signature selection algorithm fitting the characteristic of cMap data, a quality control scheme as well as a novel query algorithm based on BRCA-MoNet are developed for more effective prediction of drug effects. Result BRCA-MoNet was applied to three independent data sets obtained from the GEO database: Estrodial treated MCF7 cell line, BMS-754807 treated MCF7 cell line, and a breast cancer patient microarray dataset. In the first case, BRCA-MoNet could identify drug MoAs likely to share same and reverse treatment effect. In the second case, the result demonstrated the potential of BRCA-MoNet to reposition drugs and predict treatment effects for drugs not in cMap data. In the third case, a possible procedure of personalized drug selection is showcased. Conclusions The results clearly demonstrated that the proposed BRCA-MoNet approach can provide increased prediction power to cMap and thus will be useful for identification of new therapeutic candidates. Website: The web based application is developed and can be access through the following link http://compgenomics.utsa.edu/BRCAMoNet/ PMID:24564956
Geodetic Finite-Fault-based Earthquake Early Warning Performance for Great Earthquakes Worldwide

NASA Astrophysics Data System (ADS)

Ruhl, C. J.; Melgar, D.; Grapenthin, R.; Allen, R. M.

2017-12-01

GNSS-based earthquake early warning (EEW) algorithms estimate fault-finiteness and unsaturated moment magnitude for the largest, most damaging earthquakes. Because large events are infrequent, algorithms are not regularly exercised and insufficiently tested on few available datasets. The Geodetic Alarm System (G-larmS) is a GNSS-based finite-fault algorithm developed as part of the ShakeAlert EEW system in the western US. Performance evaluations using synthetic earthquakes offshore Cascadia showed that G-larmS satisfactorily recovers magnitude and fault length, providing useful alerts 30-40 s after origin time and timely warnings of ground motion for onshore urban areas. An end-to-end test of the ShakeAlert system demonstrated the need for GNSS data to accurately estimate ground motions in real-time. We replay real data from several subduction-zone earthquakes worldwide to demonstrate the value of GNSS-based EEW for the largest, most damaging events. We compare predicted ground acceleration (PGA) from first-alert-solutions with those recorded in major urban areas. In addition, where applicable, we compare observed tsunami heights to those predicted from the G-larmS solutions. We show that finite-fault inversion based on GNSS-data is essential to achieving the goals of EEW.

Billing code algorithms to identify cases of peripheral artery disease from administrative data

PubMed Central

Fan, Jin; Arruda-Olson, Adelaide M; Leibson, Cynthia L; Smith, Carin; Liu, Guanghui; Bailey, Kent R; Kullo, Iftikhar J

2013-01-01

Objective To construct and validate billing code algorithms for identifying patients with peripheral arterial disease (PAD). Methods We extracted all encounters and line item details including PAD-related billing codes at Mayo Clinic Rochester, Minnesota, between July 1, 1997 and June 30, 2008; 22 712 patients evaluated in the vascular laboratory were divided into training and validation sets. Multiple logistic regression analysis was used to create an integer code score from the training dataset, and this was tested in the validation set. We applied a model-based code algorithm to patients evaluated in the vascular laboratory and compared this with a simpler algorithm (presence of at least one of the ICD-9 PAD codes 440.20–440.29). We also applied both algorithms to a community-based sample (n=4420), followed by a manual review. Results The logistic regression model performed well in both training and validation datasets (c statistic=0.91). In patients evaluated in the vascular laboratory, the model-based code algorithm provided better negative predictive value. The simpler algorithm was reasonably accurate for identification of PAD status, with lesser sensitivity and greater specificity. In the community-based sample, the sensitivity (38.7% vs 68.0%) of the simpler algorithm was much lower, whereas the specificity (92.0% vs 87.6%) was higher than the model-based algorithm. Conclusions A model-based billing code algorithm had reasonable accuracy in identifying PAD cases from the community, and in patients referred to the non-invasive vascular laboratory. The simpler algorithm had reasonable accuracy for identification of PAD in patients referred to the vascular laboratory but was significantly less sensitive in a community-based sample. PMID:24166724
The Prediction of the Gas Utilization Ratio Based on TS Fuzzy Neural Network and Particle Swarm Optimization

PubMed Central

Jiang, Haihe; Yin, Yixin; Xiao, Wendong; Zhao, Baoyong

2018-01-01

Gas utilization ratio (GUR) is an important indicator that is used to evaluate the energy consumption of blast furnaces (BFs). Currently, the existing methods cannot predict the GUR accurately. In this paper, we present a novel data-driven model for predicting the GUR. The proposed approach utilized both the TS fuzzy neural network (TS-FNN) and the particle swarm algorithm (PSO) to predict the GUR. The particle swarm algorithm (PSO) is applied to optimize the parameters of the TS-FNN in order to decrease the error caused by the inaccurate initial parameter. This paper also applied the box graph (Box-plot) method to eliminate the abnormal value of the raw data during the data preprocessing. This method can deal with the data which does not obey the normal distribution which is caused by the complex industrial environments. The prediction results demonstrate that the optimization model based on PSO and the TS-FNN approach achieves higher prediction accuracy compared with the TS-FNN model and SVM model and the proposed approach can accurately predict the GUR of the blast furnace, providing an effective way for the on-line blast furnace distribution control. PMID:29461469
The Prediction of the Gas Utilization Ratio based on TS Fuzzy Neural Network and Particle Swarm Optimization.

PubMed

Zhang, Sen; Jiang, Haihe; Yin, Yixin; Xiao, Wendong; Zhao, Baoyong

2018-02-20

Gas utilization ratio (GUR) is an important indicator that is used to evaluate the energy consumption of blast furnaces (BFs). Currently, the existing methods cannot predict the GUR accurately. In this paper, we present a novel data-driven model for predicting the GUR. The proposed approach utilized both the TS fuzzy neural network (TS-FNN) and the particle swarm algorithm (PSO) to predict the GUR. The particle swarm algorithm (PSO) is applied to optimize the parameters of the TS-FNN in order to decrease the error caused by the inaccurate initial parameter. This paper also applied the box graph (Box-plot) method to eliminate the abnormal value of the raw data during the data preprocessing. This method can deal with the data which does not obey the normal distribution which is caused by the complex industrial environments. The prediction results demonstrate that the optimization model based on PSO and the TS-FNN approach achieves higher prediction accuracy compared with the TS-FNN model and SVM model and the proposed approach can accurately predict the GUR of the blast furnace, providing an effective way for the on-line blast furnace distribution control.
Impact of Noise on a Dynamical System: Prediction and Uncertainties from a Swarm-Optimized Neural Network

PubMed Central

López-Caraballo, C. H.; Lazzús, J. A.; Salfate, I.; Rojas, P.; Rivera, M.; Palma-Chilla, L.

2015-01-01

An artificial neural network (ANN) based on particle swarm optimization (PSO) was developed for the time series prediction. The hybrid ANN+PSO algorithm was applied on Mackey-Glass chaotic time series in the short-term x(t + 6). The performance prediction was evaluated and compared with other studies available in the literature. Also, we presented properties of the dynamical system via the study of chaotic behaviour obtained from the predicted time series. Next, the hybrid ANN+PSO algorithm was complemented with a Gaussian stochastic procedure (called stochastic hybrid ANN+PSO) in order to obtain a new estimator of the predictions, which also allowed us to compute the uncertainties of predictions for noisy Mackey-Glass chaotic time series. Thus, we studied the impact of noise for several cases with a white noise level (σ N) from 0.01 to 0.1. PMID:26351449
Impact of Noise on a Dynamical System: Prediction and Uncertainties from a Swarm-Optimized Neural Network.

PubMed

López-Caraballo, C H; Lazzús, J A; Salfate, I; Rojas, P; Rivera, M; Palma-Chilla, L

2015-01-01

An artificial neural network (ANN) based on particle swarm optimization (PSO) was developed for the time series prediction. The hybrid ANN+PSO algorithm was applied on Mackey-Glass chaotic time series in the short-term x(t + 6). The performance prediction was evaluated and compared with other studies available in the literature. Also, we presented properties of the dynamical system via the study of chaotic behaviour obtained from the predicted time series. Next, the hybrid ANN+PSO algorithm was complemented with a Gaussian stochastic procedure (called stochastic hybrid ANN+PSO) in order to obtain a new estimator of the predictions, which also allowed us to compute the uncertainties of predictions for noisy Mackey-Glass chaotic time series. Thus, we studied the impact of noise for several cases with a white noise level (σ(N)) from 0.01 to 0.1.
A community effort to assess and improve drug sensitivity prediction algorithms

PubMed Central

Costello, James C; Heiser, Laura M; Georgii, Elisabeth; Gönen, Mehmet; Menden, Michael P; Wang, Nicholas J; Bansal, Mukesh; Ammad-ud-din, Muhammad; Hintsanen, Petteri; Khan, Suleiman A; Mpindi, John-Patrick; Kallioniemi, Olli; Honkela, Antti; Aittokallio, Tero; Wennerberg, Krister; Collins, James J; Gallahan, Dan; Singer, Dinah; Saez-Rodriguez, Julio; Kaski, Samuel; Gray, Joe W; Stolovitzky, Gustavo

2015-01-01

Predicting the best treatment strategy from genomic information is a core goal of precision medicine. Here we focus on predicting drug response based on a cohort of genomic, epigenomic and proteomic profiling data sets measured in human breast cancer cell lines. Through a collaborative effort between the National Cancer Institute (NCI) and the Dialogue on Reverse Engineering Assessment and Methods (DREAM) project, we analyzed a total of 44 drug sensitivity prediction algorithms. The top-performing approaches modeled nonlinear relationships and incorporated biological pathway information. We found that gene expression microarrays consistently provided the best predictive power of the individual profiling data sets; however, performance was increased by including multiple, independent data sets. We discuss the innovations underlying the top-performing methodology, Bayesian multitask MKL, and we provide detailed descriptions of all methods. This study establishes benchmarks for drug sensitivity prediction and identifies approaches that can be leveraged for the development of new methods. PMID:24880487
A community effort to assess and improve drug sensitivity prediction algorithms.

PubMed

Costello, James C; Heiser, Laura M; Georgii, Elisabeth; Gönen, Mehmet; Menden, Michael P; Wang, Nicholas J; Bansal, Mukesh; Ammad-ud-din, Muhammad; Hintsanen, Petteri; Khan, Suleiman A; Mpindi, John-Patrick; Kallioniemi, Olli; Honkela, Antti; Aittokallio, Tero; Wennerberg, Krister; Collins, James J; Gallahan, Dan; Singer, Dinah; Saez-Rodriguez, Julio; Kaski, Samuel; Gray, Joe W; Stolovitzky, Gustavo

2014-12-01

Predicting the best treatment strategy from genomic information is a core goal of precision medicine. Here we focus on predicting drug response based on a cohort of genomic, epigenomic and proteomic profiling data sets measured in human breast cancer cell lines. Through a collaborative effort between the National Cancer Institute (NCI) and the Dialogue on Reverse Engineering Assessment and Methods (DREAM) project, we analyzed a total of 44 drug sensitivity prediction algorithms. The top-performing approaches modeled nonlinear relationships and incorporated biological pathway information. We found that gene expression microarrays consistently provided the best predictive power of the individual profiling data sets; however, performance was increased by including multiple, independent data sets. We discuss the innovations underlying the top-performing methodology, Bayesian multitask MKL, and we provide detailed descriptions of all methods. This study establishes benchmarks for drug sensitivity prediction and identifies approaches that can be leveraged for the development of new methods.
Discrete Event-based Performance Prediction for Temperature Accelerated Dynamics

NASA Astrophysics Data System (ADS)

Junghans, Christoph; Mniszewski, Susan; Voter, Arthur; Perez, Danny; Eidenbenz, Stephan

2014-03-01

We present an example of a new class of tools that we call application simulators, parameterized fast-running proxies of large-scale scientific applications using parallel discrete event simulation (PDES). We demonstrate our approach with a TADSim application simulator that models the Temperature Accelerated Dynamics (TAD) method, which is an algorithmically complex member of the Accelerated Molecular Dynamics (AMD) family. The essence of the TAD application is captured without the computational expense and resource usage of the full code. We use TADSim to quickly characterize the runtime performance and algorithmic behavior for the otherwise long-running simulation code. We further extend TADSim to model algorithm extensions to standard TAD, such as speculative spawning of the compute-bound stages of the algorithm, and predict performance improvements without having to implement such a method. Focused parameter scans have allowed us to study algorithm parameter choices over far more scenarios than would be possible with the actual simulation. This has led to interesting performance-related insights into the TAD algorithm behavior and suggested extensions to the TAD method.
Metaphor Identification in Large Texts Corpora

PubMed Central

Neuman, Yair; Assaf, Dan; Cohen, Yohai; Last, Mark; Argamon, Shlomo; Howard, Newton; Frieder, Ophir

2013-01-01

Identifying metaphorical language-use (e.g., sweet child) is one of the challenges facing natural language processing. This paper describes three novel algorithms for automatic metaphor identification. The algorithms are variations of the same core algorithm. We evaluate the algorithms on two corpora of Reuters and the New York Times articles. The paper presents the most comprehensive study of metaphor identification in terms of scope of metaphorical phrases and annotated corpora size. Algorithms’ performance in identifying linguistic phrases as metaphorical or literal has been compared to human judgment. Overall, the algorithms outperform the state-of-the-art algorithm with 71% precision and 27% averaged improvement in prediction over the base-rate of metaphors in the corpus. PMID:23658625
Motion prediction in MRI-guided radiotherapy based on interleaved orthogonal cine-MRI

NASA Astrophysics Data System (ADS)

Seregni, M.; Paganelli, C.; Lee, D.; Greer, P. B.; Baroni, G.; Keall, P. J.; Riboldi, M.

2016-01-01

In-room cine-MRI guidance can provide non-invasive target localization during radiotherapy treatment. However, in order to cope with finite imaging frequency and system latencies between target localization and dose delivery, tumour motion prediction is required. This work proposes a framework for motion prediction dedicated to cine-MRI guidance, aiming at quantifying the geometric uncertainties introduced by this process for both tumour tracking and beam gating. The tumour position, identified through scale invariant features detected in cine-MRI slices, is estimated at high-frequency (25 Hz) using three independent predictors, one for each anatomical coordinate. Linear extrapolation, auto-regressive and support vector machine algorithms are compared against systems that use no prediction or surrogate-based motion estimation. Geometric uncertainties are reported as a function of image acquisition period and system latency. Average results show that the tracking error RMS can be decreased down to a [0.2; 1.2] mm range, for acquisition periods between 250 and 750 ms and system latencies between 50 and 300 ms. Except for the linear extrapolator, tracking and gating prediction errors were, on average, lower than those measured for surrogate-based motion estimation. This finding suggests that cine-MRI guidance, combined with appropriate prediction algorithms, could relevantly decrease geometric uncertainties in motion compensated treatments.
Firefly algorithm versus genetic algorithm as powerful variable selection tools and their effect on different multivariate calibration models in spectroscopy: A comparative study

NASA Astrophysics Data System (ADS)

Attia, Khalid A. M.; Nassar, Mohammed W. I.; El-Zeiny, Mohamed B.; Serag, Ahmed

2017-01-01

For the first time, a new variable selection method based on swarm intelligence namely firefly algorithm is coupled with three different multivariate calibration models namely, concentration residual augmented classical least squares, artificial neural network and support vector regression in UV spectral data. A comparative study between the firefly algorithm and the well-known genetic algorithm was developed. The discussion revealed the superiority of using this new powerful algorithm over the well-known genetic algorithm. Moreover, different statistical tests were performed and no significant differences were found between all the models regarding their predictabilities. This ensures that simpler and faster models were obtained without any deterioration of the quality of the calibration.
RNA secondary structure prediction using soft computing.

PubMed

Ray, Shubhra Sankar; Pal, Sankar K

2013-01-01

Prediction of RNA structure is invaluable in creating new drugs and understanding genetic diseases. Several deterministic algorithms and soft computing-based techniques have been developed for more than a decade to determine the structure from a known RNA sequence. Soft computing gained importance with the need to get approximate solutions for RNA sequences by considering the issues related with kinetic effects, cotranscriptional folding, and estimation of certain energy parameters. A brief description of some of the soft computing-based techniques, developed for RNA secondary structure prediction, is presented along with their relevance. The basic concepts of RNA and its different structural elements like helix, bulge, hairpin loop, internal loop, and multiloop are described. These are followed by different methodologies, employing genetic algorithms, artificial neural networks, and fuzzy logic. The role of various metaheuristics, like simulated annealing, particle swarm optimization, ant colony optimization, and tabu search is also discussed. A relative comparison among different techniques, in predicting 12 known RNA secondary structures, is presented, as an example. Future challenging issues are then mentioned.
Predicting Human Preferences Using the Block Structure of Complex Social Networks

PubMed Central

Guimerà, Roger; Llorente, Alejandro; Moro, Esteban; Sales-Pardo, Marta

2012-01-01

With ever-increasing available data, predicting individuals' preferences and helping them locate the most relevant information has become a pressing need. Understanding and predicting preferences is also important from a fundamental point of view, as part of what has been called a “new” computational social science. Here, we propose a novel approach based on stochastic block models, which have been developed by sociologists as plausible models of complex networks of social interactions. Our model is in the spirit of predicting individuals' preferences based on the preferences of others but, rather than fitting a particular model, we rely on a Bayesian approach that samples over the ensemble of all possible models. We show that our approach is considerably more accurate than leading recommender algorithms, with major relative improvements between 38% and 99% over industry-level algorithms. Besides, our approach sheds light on decision-making processes by identifying groups of individuals that have consistently similar preferences, and enabling the analysis of the characteristics of those groups. PMID:22984533
GRID: a high-resolution protein structure refinement algorithm.

PubMed

Chitsaz, Mohsen; Mayo, Stephen L

2013-03-05

The energy-based refinement of protein structures generated by fold prediction algorithms to atomic-level accuracy remains a major challenge in structural biology. Energy-based refinement is mainly dependent on two components: (1) sufficiently accurate force fields, and (2) efficient conformational space search algorithms. Focusing on the latter, we developed a high-resolution refinement algorithm called GRID. It takes a three-dimensional protein structure as input and, using an all-atom force field, attempts to improve the energy of the structure by systematically perturbing backbone dihedrals and side-chain rotamer conformations. We compare GRID to Backrub, a stochastic algorithm that has been shown to predict a significant fraction of the conformational changes that occur with point mutations. We applied GRID and Backrub to 10 high-resolution (≤ 2.8 Å) crystal structures from the Protein Data Bank and measured the energy improvements obtained and the computation times required to achieve them. GRID resulted in energy improvements that were significantly better than those attained by Backrub while expending about the same amount of computational resources. GRID resulted in relaxed structures that had slightly higher backbone RMSDs compared to Backrub relative to the starting crystal structures. The average RMSD was 0.25 ± 0.02 Å for GRID versus 0.14 ± 0.04 Å for Backrub. These relatively minor deviations indicate that both algorithms generate structures that retain their original topologies, as expected given the nature of the algorithms. Copyright © 2012 Wiley Periodicals, Inc.
Electronic Thermometer Readings

NASA Technical Reports Server (NTRS)

2001-01-01

NASA Stennis' adaptive predictive algorithm for electronic thermometers uses sample readings during the initial rise in temperature and applies an algorithm that accurately and rapidly predicts the steady state temperature. The final steady state temperature of an object can be calculated based on the second-order logarithm of the temperature signals acquired by the sensor and predetermined variables from the sensor characteristics. These variables are calculated during tests of the sensor. Once the variables are determined, relatively little data acquisition and data processing time by the algorithm is required to provide a near-accurate approximation of the final temperature. This reduces the delay in the steady state response time of a temperature sensor. This advanced algorithm can be implemented in existing software or hardware with an erasable programmable read-only memory (EPROM). The capability for easy integration eliminates the expense of developing a whole new system that offers the benefits provided by NASA Stennis' technology.
Evaluation of an automated spike-and-wave complex detection algorithm in the EEG from a rat model of absence epilepsy.

PubMed

Bauquier, Sebastien H; Lai, Alan; Jiang, Jonathan L; Sui, Yi; Cook, Mark J

2015-10-01

The aim of this prospective blinded study was to evaluate an automated algorithm for spike-and-wave discharge (SWD) detection applied to EEGs from genetic absence epilepsy rats from Strasbourg (GAERS). Five GAERS underwent four sessions of 20-min EEG recording. Each EEG was manually analyzed for SWDs longer than one second by two investigators and automatically using an algorithm developed in MATLAB®. The sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated for the manual (reference) versus the automatic (test) methods. The results showed that the algorithm had specificity, sensitivity, PPV and NPV >94%, comparable to published methods that are based on analyzing EEG changes in the frequency domain. This provides a good alternative as a method designed to mimic human manual marking in the time domain.
Toward detecting deception in intelligent systems

NASA Astrophysics Data System (ADS)

Santos, Eugene, Jr.; Johnson, Gregory, Jr.

2004-08-01

Contemporary decision makers often must choose a course of action using knowledge from several sources. Knowledge may be provided from many diverse sources including electronic sources such as knowledge-based diagnostic or decision support systems or through data mining techniques. As the decision maker becomes more dependent on these electronic information sources, detecting deceptive information from these sources becomes vital to making a correct, or at least more informed, decision. This applies to unintentional disinformation as well as intentional misinformation. Our ongoing research focuses on employing models of deception and deception detection from the fields of psychology and cognitive science to these systems as well as implementing deception detection algorithms for probabilistic intelligent systems. The deception detection algorithms are used to detect, classify and correct attempts at deception. Algorithms for detecting unexpected information rely upon a prediction algorithm from the collaborative filtering domain to predict agent responses in a multi-agent system.
Machine Learning Algorithm Predicts Cardiac Resynchronization Therapy Outcomes: Lessons From the COMPANION Trial.

PubMed

Kalscheur, Matthew M; Kipp, Ryan T; Tattersall, Matthew C; Mei, Chaoqun; Buhr, Kevin A; DeMets, David L; Field, Michael E; Eckhardt, Lee L; Page, C David

2018-01-01

Cardiac resynchronization therapy (CRT) reduces morbidity and mortality in heart failure patients with reduced left ventricular function and intraventricular conduction delay. However, individual outcomes vary significantly. This study sought to use a machine learning algorithm to develop a model to predict outcomes after CRT. Models were developed with machine learning algorithms to predict all-cause mortality or heart failure hospitalization at 12 months post-CRT in the COMPANION trial (Comparison of Medical Therapy, Pacing, and Defibrillation in Heart Failure). The best performing model was developed with the random forest algorithm. The ability of this model to predict all-cause mortality or heart failure hospitalization and all-cause mortality alone was compared with discrimination obtained using a combination of bundle branch block morphology and QRS duration. In the 595 patients with CRT-defibrillator in the COMPANION trial, 105 deaths occurred (median follow-up, 15.7 months). The survival difference across subgroups differentiated by bundle branch block morphology and QRS duration did not reach significance ( P =0.08). The random forest model produced quartiles of patients with an 8-fold difference in survival between those with the highest and lowest predicted probability for events (hazard ratio, 7.96; P <0.0001). The model also discriminated the risk of the composite end point of all-cause mortality or heart failure hospitalization better than subgroups based on bundle branch block morphology and QRS duration. In the COMPANION trial, a machine learning algorithm produced a model that predicted clinical outcomes after CRT. Applied before device implant, this model may better differentiate outcomes over current clinical discriminators and improve shared decision-making with patients. © 2018 American Heart Association, Inc.
The circadian profile of epilepsy improves seizure forecasting.

PubMed

Karoly, Philippa J; Ung, Hoameng; Grayden, David B; Kuhlmann, Levin; Leyde, Kent; Cook, Mark J; Freestone, Dean R

2017-08-01

It is now established that epilepsy is characterized by periodic dynamics that increase seizure likelihood at certain times of day, and which are highly patient-specific. However, these dynamics are not typically incorporated into seizure prediction algorithms due to the difficulty of estimating patient-specific rhythms from relatively short-term or unreliable data sources. This work outlines a novel framework to develop and assess seizure forecasts, and demonstrates that the predictive power of forecasting models is improved by circadian information. The analyses used long-term, continuous electrocorticography from nine subjects, recorded for an average of 320 days each. We used a large amount of out-of-sample data (a total of 900 days for algorithm training, and 2879 days for testing), enabling the most extensive post hoc investigation into seizure forecasting. We compared the results of an electrocorticography-based logistic regression model, a circadian probability, and a combined electrocorticography and circadian model. For all subjects, clinically relevant seizure prediction results were significant, and the addition of circadian information (combined model) maximized performance across a range of outcome measures. These results represent a proof-of-concept for implementing a circadian forecasting framework, and provide insight into new approaches for improving seizure prediction algorithms. The circadian framework adds very little computational complexity to existing prediction algorithms, and can be implemented using current-generation implant devices, or even non-invasively via surface electrodes using a wearable application. The ability to improve seizure prediction algorithms through straightforward, patient-specific modifications provides promise for increased quality of life and improved safety for patients with epilepsy. © The Author (2017). Published by Oxford University Press on behalf of the Guarantors of Brain. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
CUFID-query: accurate network querying through random walk based network flow estimation.

PubMed

Jeong, Hyundoo; Qian, Xiaoning; Yoon, Byung-Jun

2017-12-28

Functional modules in biological networks consist of numerous biomolecules and their complicated interactions. Recent studies have shown that biomolecules in a functional module tend to have similar interaction patterns and that such modules are often conserved across biological networks of different species. As a result, such conserved functional modules can be identified through comparative analysis of biological networks. In this work, we propose a novel network querying algorithm based on the CUFID (Comparative network analysis Using the steady-state network Flow to IDentify orthologous proteins) framework combined with an efficient seed-and-extension approach. The proposed algorithm, CUFID-query, can accurately detect conserved functional modules as small subnetworks in the target network that are expected to perform similar functions to the given query functional module. The CUFID framework was recently developed for probabilistic pairwise global comparison of biological networks, and it has been applied to pairwise global network alignment, where the framework was shown to yield accurate network alignment results. In the proposed CUFID-query algorithm, we adopt the CUFID framework and extend it for local network alignment, specifically to solve network querying problems. First, in the seed selection phase, the proposed method utilizes the CUFID framework to compare the query and the target networks and to predict the probabilistic node-to-node correspondence between the networks. Next, the algorithm selects and greedily extends the seed in the target network by iteratively adding nodes that have frequent interactions with other nodes in the seed network, in a way that the conductance of the extended network is maximally reduced. Finally, CUFID-query removes irrelevant nodes from the querying results based on the personalized PageRank vector for the induced network that includes the fully extended network and its neighboring nodes. Through extensive performance evaluation based on biological networks with known functional modules, we show that CUFID-query outperforms the existing state-of-the-art algorithms in terms of prediction accuracy and biological significance of the predictions.

Active control strategy for the running attitude of high-speed train under strong crosswind condition

NASA Astrophysics Data System (ADS)

Li, Decang; Meng, Jianjun; Bai, Huan; Xu, Ruxun

2018-07-01

This paper focuses on the safety of high-speed trains under strong crosswind conditions. A new active control strategy is proposed based on the adaptive predictive control theory. The new control strategy aims at adjusting the attitudes of a train by controlling the new-type intelligent giant magnetostrictive actuator (GMA). It combined adaptive control with dynamic matrix control; parameters of predictive controller was real-time adjusted by online distinguishing to enhance the robustness of the control algorithm. On this basis, a correction control algorithm is also designed to regulate the parameters of predictive controller based on the step response of a controlled objective. Finally, the simulation results show that the proposed control strategy can adjust the running attitudes of high-speed trains under strong crosswind conditions; they also indicate that the new active control strategy is effective and applicable in improving the safety performance of a train based on a host-target computer technology provided by Matlab/Simulink.
Predictive control and estimation algorithms for the NASA/JPL 70-meter antennas

NASA Technical Reports Server (NTRS)

Gawronski, W.

1991-01-01

A modified output prediction procedure and a new controller design is presented based on the predictive control law. Also, a new predictive estimator is developed to complement the controller and to enhance system performance. The predictive controller is designed and applied to the tracking control of the Deep Space Network 70 m antennas. Simulation results show significant improvement in tracking performance over the linear quadratic controller and estimator presently in use.
Short-term prediction of solar energy in Saudi Arabia using automated-design fuzzy logic systems

PubMed Central

2017-01-01

Solar energy is considered as one of the main sources for renewable energy in the near future. However, solar energy and other renewable energy sources have a drawback related to the difficulty in predicting their availability in the near future. This problem affects optimal exploitation of solar energy, especially in connection with other resources. Therefore, reliable solar energy prediction models are essential to solar energy management and economics. This paper presents work aimed at designing reliable models to predict the global horizontal irradiance (GHI) for the next day in 8 stations in Saudi Arabia. The designed models are based on computational intelligence methods of automated-design fuzzy logic systems. The fuzzy logic systems are designed and optimized with two models using fuzzy c-means clustering (FCM) and simulated annealing (SA) algorithms. The first model uses FCM based on the subtractive clustering algorithm to automatically design the predictor fuzzy rules from data. The second model is using FCM followed by simulated annealing algorithm to enhance the prediction accuracy of the fuzzy logic system. The objective of the predictor is to accurately predict next-day global horizontal irradiance (GHI) using previous-day meteorological and solar radiation observations. The proposed models use observations of 10 variables of measured meteorological and solar radiation data to build the model. The experimentation and results of the prediction are detailed where the root mean square error of the prediction was approximately 88% for the second model tuned by simulated annealing compared to 79.75% accuracy using the first model. This results demonstrate a good modeling accuracy of the second model despite that the training and testing of the proposed models were carried out using spatially and temporally independent data. PMID:28806754
Short-term prediction of solar energy in Saudi Arabia using automated-design fuzzy logic systems.

PubMed

Almaraashi, Majid

2017-01-01

Solar energy is considered as one of the main sources for renewable energy in the near future. However, solar energy and other renewable energy sources have a drawback related to the difficulty in predicting their availability in the near future. This problem affects optimal exploitation of solar energy, especially in connection with other resources. Therefore, reliable solar energy prediction models are essential to solar energy management and economics. This paper presents work aimed at designing reliable models to predict the global horizontal irradiance (GHI) for the next day in 8 stations in Saudi Arabia. The designed models are based on computational intelligence methods of automated-design fuzzy logic systems. The fuzzy logic systems are designed and optimized with two models using fuzzy c-means clustering (FCM) and simulated annealing (SA) algorithms. The first model uses FCM based on the subtractive clustering algorithm to automatically design the predictor fuzzy rules from data. The second model is using FCM followed by simulated annealing algorithm to enhance the prediction accuracy of the fuzzy logic system. The objective of the predictor is to accurately predict next-day global horizontal irradiance (GHI) using previous-day meteorological and solar radiation observations. The proposed models use observations of 10 variables of measured meteorological and solar radiation data to build the model. The experimentation and results of the prediction are detailed where the root mean square error of the prediction was approximately 88% for the second model tuned by simulated annealing compared to 79.75% accuracy using the first model. This results demonstrate a good modeling accuracy of the second model despite that the training and testing of the proposed models were carried out using spatially and temporally independent data.
Predicting chroma from luma with frequency domain intra prediction

NASA Astrophysics Data System (ADS)

Egge, Nathan E.; Valin, Jean-Marc

2015-03-01

This paper describes a technique for performing intra prediction of the chroma planes based on the reconstructed luma plane in the frequency domain. This prediction exploits the fact that while RGB to YUV color conversion has the property that it decorrelates the color planes globally across an image, there is still some correlation locally at the block level.1 Previous proposals compute a linear model of the spatial relationship between the luma plane (Y) and the two chroma planes (U and V).2 In codecs that use lapped transforms this is not possible since transform support extends across the block boundaries3 and thus neighboring blocks are unavailable during intra- prediction. We design a frequency domain intra predictor for chroma that exploits the same local correlation with lower complexity than the spatial predictor and which works with lapped transforms. We then describe a low- complexity algorithm that directly uses luma coefficients as a chroma predictor based on gain-shape quantization and band partitioning. An experiment is performed that compares these two techniques inside the experimental Daala video codec and shows the lower complexity algorithm to be a better chroma predictor.
Basophile: Accurate Fragment Charge State Prediction Improves Peptide Identification Rates

DOE PAGES

Wang, Dong; Dasari, Surendra; Chambers, Matthew C.; ...

2013-03-07

In shotgun proteomics, database search algorithms rely on fragmentation models to predict fragment ions that should be observed for a given peptide sequence. The most widely used strategy (Naive model) is oversimplified, cleaving all peptide bonds with equal probability to produce fragments of all charges below that of the precursor ion. More accurate models, based on fragmentation simulation, are too computationally intensive for on-the-fly use in database search algorithms. We have created an ordinal-regression-based model called Basophile that takes fragment size and basic residue distribution into account when determining the charge retention during CID/higher-energy collision induced dissociation (HCD) of chargedmore » peptides. This model improves the accuracy of predictions by reducing the number of unnecessary fragments that are routinely predicted for highly-charged precursors. Basophile increased the identification rates by 26% (on average) over the Naive model, when analyzing triply-charged precursors from ion trap data. Basophile achieves simplicity and speed by solving the prediction problem with an ordinal regression equation, which can be incorporated into any database search software for shotgun proteomic identification.« less
Real-time Upstream Monitoring System (RUMS): Forecasting arrival times of interplanetary shocks using energetic particle data from ACE

NASA Astrophysics Data System (ADS)

Ho, G.; Donegan, M.; Vandegriff, J.; Wagstaff, K.

We have created a system for predicting the arrival times at Earth of interplanetary (IP) shocks that originate at the Sun. This system is currently available on the web (http://sd-www.jhuapl.edu/UPOS/RISP/index.html) and runs in real-time. Input data to our prediction algorithm is energetic particle data from the Electron, Proton, and Alpha Monitor (EPAM) instrument on NASA's Advanced Composition Explorer (ACE) spacecraft. Real-time EPAM data is obtained from the National Oceanic and Atmospheric Administration (NOAA) Space Environment Center (SEC). Our algorithm operates in two stages. First it watches for a velocity dispersion signature (energetic ions show flux enhancement followed by subsequent enhancements in lower energies), which is commonly seen upstream of a large IP shock. Once a precursor signature has been detected, a pattern recognition algorithm is used to analyze the time series profile of the particle data and generate an estimate for the shock arrival time. Tests on the algorithm show an average error of roughly 9 hours for predictions made 24 hours before the shock arrival and roughly 5 hours when the shock is 12 hours away. This can provide significant lead-time and deliver critical information to mission planners, satellite operations controllers, and scientists. As of February 4, 2004, the ACE real-time stream has been switched to include data from another detector on EPAM. We are now processing the new real-time data stream and have made improvements to our algorithm based on this data. In this paper, we report prediction results from the updated algorithm.
Context Relevant Prediction Model for COPD Domain Using Bayesian Belief Network

PubMed Central

Saleh, Lokman; Ajami, Hicham; Mili, Hafedh

2017-01-01

In the last three decades, researchers have examined extensively how context-aware systems can assist people, specifically those suffering from incurable diseases, to help them cope with their medical illness. Over the years, a huge number of studies on Chronic Obstructive Pulmonary Disease (COPD) have been published. However, how to derive relevant attributes and early detection of COPD exacerbations remains a challenge. In this research work, we will use an efficient algorithm to select relevant attributes where there is no proper approach in this domain. Such algorithm predicts exacerbations with high accuracy by adding discretization process, and organizes the pertinent attributes in priority order based on their impact to facilitate the emergency medical treatment. In this paper, we propose an extension of our existing Helper Context-Aware Engine System (HCES) for COPD. This project uses Bayesian network algorithm to depict the dependency between the COPD symptoms (attributes) in order to overcome the insufficiency and the independency hypothesis of naïve Bayesian. In addition, the dependency in Bayesian network is realized using TAN algorithm rather than consulting pneumologists. All these combined algorithms (discretization, selection, dependency, and the ordering of the relevant attributes) constitute an effective prediction model, comparing to effective ones. Moreover, an investigation and comparison of different scenarios of these algorithms are also done to verify which sequence of steps of prediction model gives more accurate results. Finally, we designed and validated a computer-aided support application to integrate different steps of this model. The findings of our system HCES has shown promising results using Area Under Receiver Operating Characteristic (AUC = 81.5%). PMID:28644419
A Novel Local Learning based Approach With Application to Breast Cancer Diagnosis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Xu, Songhua; Tourassi, Georgia

2012-01-01

The purpose of this study is to develop and evaluate a novel local learning-based approach for computer-assisted diagnosis of breast cancer. Our new local learning based algorithm using the linear logistic regression method as its base learner is described. Overall, our algorithm will perform its stochastic searching process until the total allowed computing time is used up by our random walk process in identifying the most suitable population subdivision scheme and their corresponding individual base learners. The proposed local learning-based approach was applied for the prediction of breast cancer given 11 mammographic and clinical findings reported by physicians using themore » BI-RADS lexicon. Our database consisted of 850 patients with biopsy confirmed diagnosis (290 malignant and 560 benign). We also compared the performance of our method with a collection of publicly available state-of-the-art machine learning methods. Predictive performance for all classifiers was evaluated using 10-fold cross validation and Receiver Operating Characteristics (ROC) analysis. Figure 1 reports the performance of 54 machine learning methods implemented in the machine learning toolkit Weka (version 3.0). We introduced a novel local learning-based classifier and compared it with an extensive list of other classifiers for the problem of breast cancer diagnosis. Our experiments show that the algorithm superior prediction performance outperforming a wide range of other well established machine learning techniques. Our conclusion complements the existing understanding in the machine learning field that local learning may capture complicated, non-linear relationships exhibited by real-world datasets.« less
Programmable logic controller implementation of an auto-tuned predictive control based on minimal plant information.

PubMed

Valencia-Palomo, G; Rossiter, J A

2011-01-01

This paper makes two key contributions. First, it tackles the issue of the availability of constrained predictive control for low-level control loops. Hence, it describes how the constrained control algorithm is embedded in an industrial programmable logic controller (PLC) using the IEC 61131-3 programming standard. Second, there is a definition and implementation of a novel auto-tuned predictive controller; the key novelty is that the modelling is based on relatively crude but pragmatic plant information. Laboratory experiment tests were carried out in two bench-scale laboratory systems to prove the effectiveness of the combined algorithm and hardware solution. For completeness, the results are compared with a commercial proportional-integral-derivative (PID) controller (also embedded in the PLC) using the most up to date auto-tuning rules. Copyright © 2010 ISA. Published by Elsevier Ltd. All rights reserved.
Earthquake prediction in California using regression algorithms and cloud-based big data infrastructure

NASA Astrophysics Data System (ADS)

Asencio-Cortés, G.; Morales-Esteban, A.; Shang, X.; Martínez-Álvarez, F.

2018-06-01

Earthquake magnitude prediction is a challenging problem that has been widely studied during the last decades. Statistical, geophysical and machine learning approaches can be found in literature, with no particularly satisfactory results. In recent years, powerful computational techniques to analyze big data have emerged, making possible the analysis of massive datasets. These new methods make use of physical resources like cloud based architectures. California is known for being one of the regions with highest seismic activity in the world and many data are available. In this work, the use of several regression algorithms combined with ensemble learning is explored in the context of big data (1 GB catalog is used), in order to predict earthquakes magnitude within the next seven days. Apache Spark framework, H2 O library in R language and Amazon cloud infrastructure were been used, reporting very promising results.
Real-Time Detection of Rupture Development: Earthquake Early Warning Using P Waves From Growing Ruptures

NASA Astrophysics Data System (ADS)

Kodera, Yuki

2018-01-01

Large earthquakes with long rupture durations emit P wave energy throughout the rupture period. Incorporating late-onset P waves into earthquake early warning (EEW) algorithms could contribute to robust predictions of strong ground motion. Here I describe a technique to detect in real time P waves from growing ruptures to improve the timeliness of an EEW algorithm based on seismic wavefield estimation. The proposed P wave detector, which employs a simple polarization analysis, successfully detected P waves from strong motion generation areas of the 2011 Mw 9.0 Tohoku-oki earthquake rupture. An analysis using 23 large (M ≥ 7) events from Japan confirmed that seismic intensity predictions based on the P wave detector significantly increased lead times without appreciably decreasing the prediction accuracy. P waves from growing ruptures, being one of the fastest carriers of information on ongoing rupture development, have the potential to improve the performance of EEW systems.
Wind power prediction based on genetic neural network

NASA Astrophysics Data System (ADS)

Zhang, Suhan

2017-04-01

The scale of grid connected wind farms keeps increasing. To ensure the stability of power system operation, make a reasonable scheduling scheme and improve the competitiveness of wind farm in the electricity generation market, it's important to accurately forecast the short-term wind power. To reduce the influence of the nonlinear relationship between the disturbance factor and the wind power, the improved prediction model based on genetic algorithm and neural network method is established. To overcome the shortcomings of long training time of BP neural network and easy to fall into local minimum and improve the accuracy of the neural network, genetic algorithm is adopted to optimize the parameters and topology of neural network. The historical data is used as input to predict short-term wind power. The effectiveness and feasibility of the method is verified by the actual data of a certain wind farm as an example.
Visible Light Image-Based Method for Sugar Content Classification of Citrus

PubMed Central

Wang, Xuefeng; Wu, Chunyan; Hirafuji, Masayuki

2016-01-01

Visible light imaging of citrus fruit from Mie Prefecture of Japan was performed to determine whether an algorithm could be developed to predict the sugar content. This nondestructive classification showed that the accurate segmentation of different images can be realized by a correlation analysis based on the threshold value of the coefficient of determination. There is an obvious correlation between the sugar content of citrus fruit and certain parameters of the color images. The selected image parameters were connected by addition algorithm. The sugar content of citrus fruit can be predicted by the dummy variable method. The results showed that the small but orange citrus fruits often have a high sugar content. The study shows that it is possible to predict the sugar content of citrus fruit and to perform a classification of the sugar content using light in the visible spectrum and without the need for an additional light source. PMID:26811935
A comparative study of machine learning methods for time-to-event survival data for radiomics risk modelling.

PubMed

Leger, Stefan; Zwanenburg, Alex; Pilz, Karoline; Lohaus, Fabian; Linge, Annett; Zöphel, Klaus; Kotzerke, Jörg; Schreiber, Andreas; Tinhofer, Inge; Budach, Volker; Sak, Ali; Stuschke, Martin; Balermpas, Panagiotis; Rödel, Claus; Ganswindt, Ute; Belka, Claus; Pigorsch, Steffi; Combs, Stephanie E; Mönnich, David; Zips, Daniel; Krause, Mechthild; Baumann, Michael; Troost, Esther G C; Löck, Steffen; Richter, Christian

2017-10-16

Radiomics applies machine learning algorithms to quantitative imaging data to characterise the tumour phenotype and predict clinical outcome. For the development of radiomics risk models, a variety of different algorithms is available and it is not clear which one gives optimal results. Therefore, we assessed the performance of 11 machine learning algorithms combined with 12 feature selection methods by the concordance index (C-Index), to predict loco-regional tumour control (LRC) and overall survival for patients with head and neck squamous cell carcinoma. The considered algorithms are able to deal with continuous time-to-event survival data. Feature selection and model building were performed on a multicentre cohort (213 patients) and validated using an independent cohort (80 patients). We found several combinations of machine learning algorithms and feature selection methods which achieve similar results, e.g. C-Index = 0.71 and BT-COX: C-Index = 0.70 in combination with Spearman feature selection. Using the best performing models, patients were stratified into groups of low and high risk of recurrence. Significant differences in LRC were obtained between both groups on the validation cohort. Based on the presented analysis, we identified a subset of algorithms which should be considered in future radiomics studies to develop stable and clinically relevant predictive models for time-to-event endpoints.
Applying a machine learning model using a locally preserving projection based feature regeneration algorithm to predict breast cancer risk

NASA Astrophysics Data System (ADS)

Heidari, Morteza; Zargari Khuzani, Abolfazl; Danala, Gopichandh; Mirniaharikandehei, Seyedehnafiseh; Qian, Wei; Zheng, Bin

2018-03-01

Both conventional and deep machine learning has been used to develop decision-support tools applied in medical imaging informatics. In order to take advantages of both conventional and deep learning approach, this study aims to investigate feasibility of applying a locally preserving projection (LPP) based feature regeneration algorithm to build a new machine learning classifier model to predict short-term breast cancer risk. First, a computer-aided image processing scheme was used to segment and quantify breast fibro-glandular tissue volume. Next, initially computed 44 image features related to the bilateral mammographic tissue density asymmetry were extracted. Then, an LLP-based feature combination method was applied to regenerate a new operational feature vector using a maximal variance approach. Last, a k-nearest neighborhood (KNN) algorithm based machine learning classifier using the LPP-generated new feature vectors was developed to predict breast cancer risk. A testing dataset involving negative mammograms acquired from 500 women was used. Among them, 250 were positive and 250 remained negative in the next subsequent mammography screening. Applying to this dataset, LLP-generated feature vector reduced the number of features from 44 to 4. Using a leave-onecase-out validation method, area under ROC curve produced by the KNN classifier significantly increased from 0.62 to 0.68 (p < 0.05) and odds ratio was 4.60 with a 95% confidence interval of [3.16, 6.70]. Study demonstrated that this new LPP-based feature regeneration approach enabled to produce an optimal feature vector and yield improved performance in assisting to predict risk of women having breast cancer detected in the next subsequent mammography screening.
Estimating Western U.S. Reservoir Sedimentation

NASA Astrophysics Data System (ADS)

Bensching, L.; Livneh, B.; Greimann, B. P.

2017-12-01

Reservoir sedimentation is a long-term problem for water management across the Western U.S. Observations of sedimentation are limited to reservoir surveys that are costly and infrequent, with many reservoirs having only two or fewer surveys. This work aims to apply a recently developed ensemble of sediment algorithms to estimate reservoir sedimentation over several western U.S. reservoirs. The sediment algorithms include empirical, conceptual, stochastic, and processes based approaches and are coupled with a hydrologic modeling framework. Preliminary results showed that the more complex and processed based algorithms performed better in predicting high sediment flux values and in a basin transferability experiment. However, more testing and validation is required to confirm sediment model skill. This work is carried out in partnership with the Bureau of Reclamation with the goal of evaluating the viability of reservoir sediment yield prediction across the western U.S. using a multi-algorithm approach. Simulations of streamflow and sediment fluxes are validated against observed discharges, as well as a Reservoir Sedimentation Information database that is being developed by the US Army Corps of Engineers. Specific goals of this research include (i) quantifying whether inter-algorithm differences consistently capture observational variability; (ii) identifying whether certain categories of models consistently produce the best results, (iii) assessing the expected sedimentation life-span of several western U.S. reservoirs through long-term simulations.
MR fingerprinting reconstruction with Kalman filter.

PubMed

Zhang, Xiaodi; Zhou, Zechen; Chen, Shiyang; Chen, Shuo; Li, Rui; Hu, Xiaoping

2017-09-01

Magnetic resonance fingerprinting (MR fingerprinting or MRF) is a newly introduced quantitative magnetic resonance imaging technique, which enables simultaneous multi-parameter mapping in a single acquisition with improved time efficiency. The current MRF reconstruction method is based on dictionary matching, which may be limited by the discrete and finite nature of the dictionary and the computational cost associated with dictionary construction, storage and matching. In this paper, we describe a reconstruction method based on Kalman filter for MRF, which avoids the use of dictionary to obtain continuous MR parameter measurements. With this Kalman filter framework, the Bloch equation of inversion-recovery balanced steady state free-precession (IR-bSSFP) MRF sequence was derived to predict signal evolution, and acquired signal was entered to update the prediction. The algorithm can gradually estimate the accurate MR parameters during the recursive calculation. Single pixel and numeric brain phantom simulation were implemented with Kalman filter and the results were compared with those from dictionary matching reconstruction algorithm to demonstrate the feasibility and assess the performance of Kalman filter algorithm. The results demonstrated that Kalman filter algorithm is applicable for MRF reconstruction, eliminating the need for a pre-define dictionary and obtaining continuous MR parameter in contrast to the dictionary matching algorithm. Copyright © 2017 Elsevier Inc. All rights reserved.
Efficient and accurate Greedy Search Methods for mining functional modules in protein interaction networks.

PubMed

He, Jieyue; Li, Chaojun; Ye, Baoliu; Zhong, Wei

2012-06-25

Most computational algorithms mainly focus on detecting highly connected subgraphs in PPI networks as protein complexes but ignore their inherent organization. Furthermore, many of these algorithms are computationally expensive. However, recent analysis indicates that experimentally detected protein complexes generally contain Core/attachment structures. In this paper, a Greedy Search Method based on Core-Attachment structure (GSM-CA) is proposed. The GSM-CA method detects densely connected regions in large protein-protein interaction networks based on the edge weight and two criteria for determining core nodes and attachment nodes. The GSM-CA method improves the prediction accuracy compared to other similar module detection approaches, however it is computationally expensive. Many module detection approaches are based on the traditional hierarchical methods, which is also computationally inefficient because the hierarchical tree structure produced by these approaches cannot provide adequate information to identify whether a network belongs to a module structure or not. In order to speed up the computational process, the Greedy Search Method based on Fast Clustering (GSM-FC) is proposed in this work. The edge weight based GSM-FC method uses a greedy procedure to traverse all edges just once to separate the network into the suitable set of modules. The proposed methods are applied to the protein interaction network of S. cerevisiae. Experimental results indicate that many significant functional modules are detected, most of which match the known complexes. Results also demonstrate that the GSM-FC algorithm is faster and more accurate as compared to other competing algorithms. Based on the new edge weight definition, the proposed algorithm takes advantages of the greedy search procedure to separate the network into the suitable set of modules. Experimental analysis shows that the identified modules are statistically significant. The algorithm can reduce the computational time significantly while keeping high prediction accuracy.
Detecting treatment-subgroup interactions in clustered data with generalized linear mixed-effects model trees.

PubMed

Fokkema, M; Smits, N; Zeileis, A; Hothorn, T; Kelderman, H

2017-10-25

Identification of subgroups of patients for whom treatment A is more effective than treatment B, and vice versa, is of key importance to the development of personalized medicine. Tree-based algorithms are helpful tools for the detection of such interactions, but none of the available algorithms allow for taking into account clustered or nested dataset structures, which are particularly common in psychological research. Therefore, we propose the generalized linear mixed-effects model tree (GLMM tree) algorithm, which allows for the detection of treatment-subgroup interactions, while accounting for the clustered structure of a dataset. The algorithm uses model-based recursive partitioning to detect treatment-subgroup interactions, and a GLMM to estimate the random-effects parameters. In a simulation study, GLMM trees show higher accuracy in recovering treatment-subgroup interactions, higher predictive accuracy, and lower type II error rates than linear-model-based recursive partitioning and mixed-effects regression trees. Also, GLMM trees show somewhat higher predictive accuracy than linear mixed-effects models with pre-specified interaction effects, on average. We illustrate the application of GLMM trees on an individual patient-level data meta-analysis on treatments for depression. We conclude that GLMM trees are a promising exploratory tool for the detection of treatment-subgroup interactions in clustered datasets.

Rational application of adenosine deaminase activity in cerebrospinal fluid for the diagnosis of tuberculous meningitis.

PubMed

Parra-Ruiz, Jorge; Ramos, V; Dueñas, C; Coronado-Álvarez, N M; Cabo-Magadán, R; Portillo-Tuñón, V; Vinuesa, D; Muñoz-Medina, L; Hernández-Quero, J

2015-10-01

Tuberculous meningitis (TBM) is one of the most serious and difficult to diagnose manifestations of TB. An ADA value >9.5 IU/L has great sensitivity and specificity. However, all available studies have been conducted in areas of high endemicity, so we sought to determine the accuracy of ADA in a low endemicity area. This retrospective study included 190 patients (105 men) who had ADA tested in CSF for some reason. Patients were classified as probable/certain TBM or non-TBM based on clinical and Thwaite's criteria. Optimal ADA cutoff was established by ROC curves and a predictive algorithm based on ADA and other CSF biochemical parameters was generated. Eleven patients were classified as probable/certain TBM. In a low endemicity area, the best ADA cutoff was 11.5 IU/L with 91 % sensitivity and 77.7 % specificity. We also developed a predictive algorithm based on the combination of ADA (>11.5 IU/L), glucose (<65 mg/dL) and leukocytes (≥13.5 cell/mm(3)) with increased accuracy (Se: 91 % Sp: 88 %). Optimal ADA cutoff value in areas of low TB endemicity is higher than previously reported. Our algorithm is more accurate than ADA activity alone with better sensitivity and specificity than previously reported algorithms.
High performance of chlorophyll-a prediction algorithms based on simulated OLCI Sentinel-3A bands in cyanobacteria-dominated inland waters

NASA Astrophysics Data System (ADS)

Watanabe, Fernanda Sayuri Yoshino; Alcântara, Enner; Stech, José Luiz

2018-07-01

In this research, we have investigated whether the chlorophyll-a (chl a) retrieval algorithms based on OLCI Sentinel-3A bands are suitable for cyanobacteria-dominated waters. Phytoplankton assemblages model optical properties of the water, influencing the performance of bio-optical algorithms. Understanding these processes is important to improve the prediction of photoactive pigments in order to use them as a proxy for trophic state and harmful algal bloom. So that, both empirical and semi-analytical approaches designed for different inland waters were tested. In addition, empirical models were tuned based on dataset collected in situ. The study was conducted in the Funil hydroelectric reservoir, where chl a ranged from 2.33 to 208.68 mg m-3 in May 2012 (austral fall) and 4.37 to 306.03 mg m-3 in October 2012 (austral spring). OLCI Sentinel-3A bands were tested in existing algorithms developed for other sensors and new band combinations were compared to analyze the errors produced. Normalized Difference Chlorophyll Index (NDCI) exhibited the best performance, with a Normalized Root Mean Square Error (NRMSE) of 9.30%. Result showed that wavelength at 665 nm is adequate to estimate chl a, although the maximum pigment absorption band is shifted due to phycocyanin fluorescence at approximately 650 nm.
Detection of Nitrogen Content in Rubber Leaves Using Near-Infrared (NIR) Spectroscopy with Correlation-Based Successive Projections Algorithm (SPA).

PubMed

Tang, Rongnian; Chen, Xupeng; Li, Chuang

2018-05-01

Near-infrared spectroscopy is an efficient, low-cost technology that has potential as an accurate method in detecting the nitrogen content of natural rubber leaves. Successive projections algorithm (SPA) is a widely used variable selection method for multivariate calibration, which uses projection operations to select a variable subset with minimum multi-collinearity. However, due to the fluctuation of correlation between variables, high collinearity may still exist in non-adjacent variables of subset obtained by basic SPA. Based on analysis to the correlation matrix of the spectra data, this paper proposed a correlation-based SPA (CB-SPA) to apply the successive projections algorithm in regions with consistent correlation. The result shows that CB-SPA can select variable subsets with more valuable variables and less multi-collinearity. Meanwhile, models established by the CB-SPA subset outperform basic SPA subsets in predicting nitrogen content in terms of both cross-validation and external prediction. Moreover, CB-SPA is assured to be more efficient, for the time cost in its selection procedure is one-twelfth that of the basic SPA.
PANDA: Protein function prediction using domain architecture and affinity propagation.

PubMed

Wang, Zheng; Zhao, Chenguang; Wang, Yiheng; Sun, Zheng; Wang, Nan

2018-02-22

We developed PANDA (Propagation of Affinity and Domain Architecture) to predict protein functions in the format of Gene Ontology (GO) terms. PANDA at first executes profile-profile alignment algorithm to search against PfamA, KOG, COG, and SwissProt databases, and then launches PSI-BLAST against UniProt for homologue search. PANDA integrates a domain architecture inference algorithm based on the Bayesian statistics that calculates the probability of having a GO term. All the candidate GO terms are pooled and filtered based on Z-score. After that, the remaining GO terms are clustered using an affinity propagation algorithm based on the GO directed acyclic graph, followed by a second round of filtering on the clusters of GO terms. We benchmarked the performance of all the baseline predictors PANDA integrates and also for every pooling and filtering step of PANDA. It can be found that PANDA achieves better performances in terms of area under the curve for precision and recall compared to the baseline predictors. PANDA can be accessed from http://dna.cs.miami.edu/PANDA/ .
[Development and validation of an algorithm to identify cancer recurrences from hospital data bases].

PubMed

Manzanares-Laya, S; Burón, A; Murta-Nascimento, C; Servitja, S; Castells, X; Macià, F

2014-01-01

Hospital cancer registries and hospital databases are valuable and efficient sources of information for research into cancer recurrences. The aim of this study was to develop and validate algorithms for the detection of breast cancer recurrence. A retrospective observational study was conducted on breast cancer cases from the cancer registry of a third level university hospital diagnosed between 2003 and 2009. Different probable cancer recurrence algorithms were obtained by linking the hospital databases and the construction of several operational definitions, with their corresponding sensitivity, specificity, positive predictive value and negative predictive value. A total of 1,523 patients were diagnosed of breast cancer between 2003 and 2009. A request for bone gammagraphy after 6 months from the first oncological treatment showed the highest sensitivity (53.8%) and negative predictive value (93.8%), and a pathology test after 6 months after the diagnosis showed the highest specificity (93.8%) and negative predictive value (92.6%). The combination of different definitions increased the specificity and the positive predictive value, but decreased the sensitivity. Several diagnostic algorithms were obtained, and the different definitions could be useful depending on the interest and resources of the researcher. A higher positive predictive value could be interesting for a quick estimation of the number of cases, and a higher negative predictive value for a more exact estimation if more resources are available. It is a versatile and adaptable tool for other types of tumors, as well as for the needs of the researcher. Copyright © 2014 SECA. Published by Elsevier Espana. All rights reserved.
Validation of algorithms to determine incidence of Hirschsprung disease in Ontario, Canada: a population-based study using health administrative data

PubMed Central

Nasr, Ahmed; Sullivan, Katrina J; Chan, Emily W; Wong, Coralie A; Benchimol, Eric I

2017-01-01

Objective Incidence rates of Hirschsprung disease (HD) vary by geographical region, yet no recent population-based estimate exists for Canada. The objective of our study was to validate and use health administrative data from Ontario, Canada to describe trends in incidence of HD between 1991 and 2013. Study design To identify children with HD we tested algorithms consisting of a combination of diagnostic, procedural, and intervention codes against the reference standard of abstracted clinical charts from a tertiary pediatric hospital. The algorithm with the highest positive predictive value (PPV) that could maintain high sensitivity was applied to health administrative data from April 31, 1991 to March 31, 2014 (fiscal years 1991–2013) to determine annual incidence. Temporal trends were evaluated using Poisson regression, controlling for sex as a covariate. Results The selected algorithm was highly sensitive (93.5%) and specific (>99.9%) with excellent predictive abilities (PPV 89.6% and negative predictive value >99.9%). Using the algorithm, a total of 679 patients diagnosed with HD were identified in Ontario between 1991 and 2013. The overall incidence during this time was 2.05 per 10,000 live births (or 1 in 4,868 live births). The incidence did not change significantly over time (odds ratio 0.998, 95% confidence interval 0.983–1.013, p = 0.80). Conclusion Ontario health administrative data can be used to accurately identify cases of HD and describe trends in incidence. There has not been a significant change in HD incidence over time in Ontario between 1991 and 2013. PMID:29180902
Application of XGBoost algorithm in hourly PM2.5 concentration prediction

NASA Astrophysics Data System (ADS)

Pan, Bingyue

2018-02-01

In view of prediction techniques of hourly PM2.5 concentration in China, this paper applied the XGBoost(Extreme Gradient Boosting) algorithm to predict hourly PM2.5 concentration. The monitoring data of air quality in Tianjin city was analyzed by using XGBoost algorithm. The prediction performance of the XGBoost method is evaluated by comparing observed and predicted PM2.5 concentration using three measures of forecast accuracy. The XGBoost method is also compared with the random forest algorithm, multiple linear regression, decision tree regression and support vector machines for regression models using computational results. The results demonstrate that the XGBoost algorithm outperforms other data mining methods.
Prediction of microRNA target genes using an efficient genetic algorithm-based decision tree.

PubMed

Rabiee-Ghahfarrokhi, Behzad; Rafiei, Fariba; Niknafs, Ali Akbar; Zamani, Behzad

2015-01-01

MicroRNAs (miRNAs) are small, non-coding RNA molecules that regulate gene expression in almost all plants and animals. They play an important role in key processes, such as proliferation, apoptosis, and pathogen-host interactions. Nevertheless, the mechanisms by which miRNAs act are not fully understood. The first step toward unraveling the function of a particular miRNA is the identification of its direct targets. This step has shown to be quite challenging in animals primarily because of incomplete complementarities between miRNA and target mRNAs. In recent years, the use of machine-learning techniques has greatly increased the prediction of miRNA targets, avoiding the need for costly and time-consuming experiments to achieve miRNA targets experimentally. Among the most important machine-learning algorithms are decision trees, which classify data based on extracted rules. In the present work, we used a genetic algorithm in combination with C4.5 decision tree for prediction of miRNA targets. We applied our proposed method to a validated human datasets. We nearly achieved 93.9% accuracy of classification, which could be related to the selection of best rules.
Simulating polarized light scattering in terrestrial snow based on bicontinuous random medium and Monte Carlo ray tracing

NASA Astrophysics Data System (ADS)

Xiong, Chuan; Shi, Jiancheng

2014-01-01

To date, the light scattering models of snow consider very little about the real snow microstructures. The ideal spherical or other single shaped particle assumptions in previous snow light scattering models can cause error in light scattering modeling of snow and further cause errors in remote sensing inversion algorithms. This paper tries to build up a snow polarized reflectance model based on bicontinuous medium, with which the real snow microstructure is considered. The accurate specific surface area of bicontinuous medium can be analytically derived. The polarized Monte Carlo ray tracing technique is applied to the computer generated bicontinuous medium. With proper algorithms, the snow surface albedo, bidirectional reflectance distribution function (BRDF) and polarized BRDF can be simulated. The validation of model predicted spectral albedo and bidirectional reflectance factor (BRF) using experiment data shows good results. The relationship between snow surface albedo and snow specific surface area (SSA) were predicted, and this relationship can be used for future improvement of snow specific surface area (SSA) inversion algorithms. The model predicted polarized reflectance is validated and proved accurate, which can be further applied in polarized remote sensing.
Prediction of microRNA target genes using an efficient genetic algorithm-based decision tree

PubMed Central

Rabiee-Ghahfarrokhi, Behzad; Rafiei, Fariba; Niknafs, Ali Akbar; Zamani, Behzad

2015-01-01

MicroRNAs (miRNAs) are small, non-coding RNA molecules that regulate gene expression in almost all plants and animals. They play an important role in key processes, such as proliferation, apoptosis, and pathogen–host interactions. Nevertheless, the mechanisms by which miRNAs act are not fully understood. The first step toward unraveling the function of a particular miRNA is the identification of its direct targets. This step has shown to be quite challenging in animals primarily because of incomplete complementarities between miRNA and target mRNAs. In recent years, the use of machine-learning techniques has greatly increased the prediction of miRNA targets, avoiding the need for costly and time-consuming experiments to achieve miRNA targets experimentally. Among the most important machine-learning algorithms are decision trees, which classify data based on extracted rules. In the present work, we used a genetic algorithm in combination with C4.5 decision tree for prediction of miRNA targets. We applied our proposed method to a validated human datasets. We nearly achieved 93.9% accuracy of classification, which could be related to the selection of best rules. PMID:26649272
Development of Artificial Neural Network Model for Diesel Fuel Properties Prediction using Vibrational Spectroscopy.

PubMed

Bolanča, Tomislav; Marinović, Slavica; Ukić, Sime; Jukić, Ante; Rukavina, Vinko

2012-06-01

This paper describes development of artificial neural network models which can be used to correlate and predict diesel fuel properties from several FTIR-ATR absorbances and Raman intensities as input variables. Multilayer feed forward and radial basis function neural networks have been used to rapid and simultaneous prediction of cetane number, cetane index, density, viscosity, distillation temperatures at 10% (T10), 50% (T50) and 90% (T90) recovery, contents of total aromatics and polycyclic aromatic hydrocarbons of commercial diesel fuels. In this study two-phase training procedures for multilayer feed forward networks were applied. While first phase training algorithm was constantly the back propagation one, two second phase training algorithms were varied and compared, namely: conjugate gradient and quasi Newton. In case of radial basis function network, radial layer was trained using K-means radial assignment algorithm and three different radial spread algorithms: explicit, isotropic and K-nearest neighbour. The number of hidden layer neurons and experimental data points used for the training set have been optimized for both neural networks in order to insure good predictive ability by reducing unnecessary experimental work. This work shows that developed artificial neural network models can determine main properties of diesel fuels simultaneously based on a single and fast IR or Raman measurement.
Identification of chronic rhinosinusitis phenotypes using cluster analysis.

PubMed

Soler, Zachary M; Hyer, J Madison; Ramakrishnan, Viswanathan; Smith, Timothy L; Mace, Jess; Rudmik, Luke; Schlosser, Rodney J

2015-05-01

Current clinical classifications of chronic rhinosinusitis (CRS) have been largely defined based upon preconceived notions of factors thought to be important, such as polyp or eosinophil status. Unfortunately, these classification systems have little correlation with symptom severity or treatment outcomes. Unsupervised clustering can be used to identify phenotypic subgroups of CRS patients, describe clinical differences in these clusters and define simple algorithms for classification. A multi-institutional, prospective study of 382 patients with CRS who had failed initial medical therapy completed the Sino-Nasal Outcome Test (SNOT-22), Rhinosinusitis Disability Index (RSDI), Medical Outcomes Study Short Form-12 (SF-12), Pittsburgh Sleep Quality Index (PSQI), and Patient Health Questionnaire (PHQ-2). Objective measures of CRS severity included Brief Smell Identification Test (B-SIT), CT, and endoscopy scoring. All variables were reduced and unsupervised hierarchical clustering was performed. After clusters were defined, variations in medication usage were analyzed. Discriminant analysis was performed to develop a simplified, clinically useful algorithm for clustering. Clustering was largely determined by age, severity of patient reported outcome measures, depression, and fibromyalgia. CT and endoscopy varied somewhat among clusters. Traditional clinical measures, including polyp/atopic status, prior surgery, B-SIT and asthma, did not vary among clusters. A simplified algorithm based upon productivity loss, SNOT-22 score, and age predicted clustering with 89% accuracy. Medication usage among clusters did vary significantly. A simplified algorithm based upon hierarchical clustering is able to classify CRS patients and predict medication usage. Further studies are warranted to determine if such clustering predicts treatment outcomes. © 2015 ARS-AAOA, LLC.
SU-E-T-516: Dosimetric Validation of AcurosXB Algorithm in Comparison with AAA & CCC Algorithms for VMAT Technique.

PubMed

Kathirvel, M; Subramanian, V Sai; Arun, G; Thirumalaiswamy, S; Ramalingam, K; Kumar, S Ashok; Jagadeesh, K

2012-06-01

To dosimetrically validate AcurosXB algorithm for Volumetric Modulated Arc Therapy (VMAT) in comparison with standard clinical Anisotropic Analytic Algorithm(AAA) and Collapsed Cone Convolution(CCC) dose calculation algorithms. AcurosXB dose calculation algorithm is available with Varian Eclipse treatment planning system (V10). It uses grid-based Boltzmann equation solver to predict dose precisely in lesser time. This study was made to realize algorithms ability to predict dose accurately as its delivery for which five clinical cases each of Brain, Head&Neck, Thoracic, Pelvic and SBRT were taken. Verification plans were created on multicube phantom with iMatrixx-2D detector array and then dose prediction was done with AcurosXB, AAA & CCC (COMPASS System) algorithm and the same were delivered onto CLINAC-iX treatment machine. Delivered dose was captured in iMatrixx plane for all 25 plans. Measured dose was taken as reference to quantify the agreement between AcurosXB calculation algorithm against previously validated AAA and CCC algorithm. Gamma evaluation was performed with clinical criteria distance-to-agreement 3&2mm and dose difference 3&2% in omnipro-I'MRT software. Plans were evaluated in terms of correlation coefficient, quantitative area gamma and average gamma. Study shows good agreement between mean correlation 0.9979±0.0012, 0.9984±0.0009 & 0.9979±0.0011 for AAA, CCC & Acuros respectively. Mean area gamma for criteria 3mm/3% was found to be 98.80±1.04, 98.14±2.31, 98.08±2.01 and 2mm/2% was found to be 93.94±3.83, 87.17±10.54 & 92.36±5.46 for AAA, CCC & Acuros respectively. Mean average gamma for 3mm/3% was 0.26±0.07, 0.42±0.08, 0.28±0.09 and 2mm/2% was found to be 0.39±0.10, 0.64±0.11, 0.42±0.13 for AAA, CCC & Acuros respectively. This study demonstrated that the AcurosXB algorithm had a good agreement with the AAA & CCC in terms of dose prediction. In conclusion AcurosXB algorithm provides a valid, accurate and speedy alternative to AAA and CCC algorithms in a busy clinical environment. © 2012 American Association of Physicists in Medicine.
Prediction of total organic carbon content in shale reservoir based on a new integrated hybrid neural network and conventional well logging curves

NASA Astrophysics Data System (ADS)

Zhu, Linqi; Zhang, Chong; Zhang, Chaomo; Wei, Yang; Zhou, Xueqing; Cheng, Yuan; Huang, Yuyang; Zhang, Le

2018-06-01

There is increasing interest in shale gas reservoirs due to their abundant reserves. As a key evaluation criterion, the total organic carbon content (TOC) of the reservoirs can reflect its hydrocarbon generation potential. The existing TOC calculation model is not very accurate and there is still the possibility for improvement. In this paper, an integrated hybrid neural network (IHNN) model is proposed for predicting the TOC. This is based on the fact that the TOC information on the low TOC reservoir, where the TOC is easy to evaluate, comes from a prediction problem, which is the inherent problem of the existing algorithm. By comparing the prediction models established in 132 rock samples in the shale gas reservoir within the Jiaoshiba area, it can be seen that the accuracy of the proposed IHNN model is much higher than that of the other prediction models. The mean square error of the samples, which were not joined to the established models, was reduced from 0.586 to 0.442. The results show that TOC prediction is easier after logging prediction has been improved. Furthermore, this paper puts forward the next research direction of the prediction model. The IHNN algorithm can help evaluate the TOC of a shale gas reservoir.
Trust from the past: Bayesian Personalized Ranking based Link Prediction in Knowledge Graphs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Baichuan; Choudhury, Sutanay; Al-Hasan, Mohammad

2016-02-01

Estimating the confidence for a link is a critical task for Knowledge Graph construction. Link prediction, or predicting the likelihood of a link in a knowledge graph based on prior state is a key research direction within this area. We propose a Latent Feature Embedding based link recommendation model for prediction task and utilize Bayesian Personalized Ranking based optimization technique for learning models for each predicate. Experimental results on large-scale knowledge bases such as YAGO2 show that our approach achieves substantially higher performance than several state-of-art approaches. Furthermore, we also study the performance of the link prediction algorithm in termsmore » of topological properties of the Knowledge Graph and present a linear regression model to reason about its expected level of accuracy.« less
Improved prediction of peptide detectability for targeted proteomics using a rank-based algorithm and organism-specific data.

PubMed

Qeli, Ermir; Omasits, Ulrich; Goetze, Sandra; Stekhoven, Daniel J; Frey, Juerg E; Basler, Konrad; Wollscheid, Bernd; Brunner, Erich; Ahrens, Christian H

2014-08-28

The in silico prediction of the best-observable "proteotypic" peptides in mass spectrometry-based workflows is a challenging problem. Being able to accurately predict such peptides would enable the informed selection of proteotypic peptides for targeted quantification of previously observed and non-observed proteins for any organism, with a significant impact for clinical proteomics and systems biology studies. Current prediction algorithms rely on physicochemical parameters in combination with positive and negative training sets to identify those peptide properties that most profoundly affect their general detectability. Here we present PeptideRank, an approach that uses learning to rank algorithm for peptide detectability prediction from shotgun proteomics data, and that eliminates the need to select a negative dataset for the training step. A large number of different peptide properties are used to train ranking models in order to predict a ranking of the best-observable peptides within a protein. Empirical evaluation with rank accuracy metrics showed that PeptideRank complements existing prediction algorithms. Our results indicate that the best performance is achieved when it is trained on organism-specific shotgun proteomics data, and that PeptideRank is most accurate for short to medium-sized and abundant proteins, without any loss in prediction accuracy for the important class of membrane proteins. Targeted proteomics approaches have been gaining a lot of momentum and hold immense potential for systems biology studies and clinical proteomics. However, since only very few complete proteomes have been reported to date, for a considerable fraction of a proteome there is no experimental proteomics evidence that would allow to guide the selection of the best-suited proteotypic peptides (PTPs), i.e. peptides that are specific to a given proteoform and that are repeatedly observed in a mass spectrometer. We describe a novel, rank-based approach for the prediction of the best-suited PTPs for targeted proteomics applications. By building on methods developed in the field of information retrieval (e.g. web search engines like Google's PageRank), we circumvent the delicate step of selecting positive and negative training sets and at the same time also more closely reflect the experimentalist´s need for selecting e.g. the 5 most promising peptides for targeting a protein of interest. This approach allows to predict PTPs for not yet observed proteins or for organisms without prior experimental proteomics data such as many non-model organisms. Copyright © 2014 Elsevier B.V. All rights reserved.
Stata Modules for Calculating Novel Predictive Performance Indices for Logistic Models

PubMed Central

Barkhordari, Mahnaz; Padyab, Mojgan; Hadaegh, Farzad; Azizi, Fereidoun; Bozorgmanesh, Mohammadreza

2016-01-01

Background Prediction is a fundamental part of prevention of cardiovascular diseases (CVD). The development of prediction algorithms based on the multivariate regression models loomed several decades ago. Parallel with predictive models development, biomarker researches emerged in an impressively great scale. The key question is how best to assess and quantify the improvement in risk prediction offered by new biomarkers or more basically how to assess the performance of a risk prediction model. Discrimination, calibration, and added predictive value have been recently suggested to be used while comparing the predictive performances of the predictive models’ with and without novel biomarkers. Objectives Lack of user-friendly statistical software has restricted implementation of novel model assessment methods while examining novel biomarkers. We intended, thus, to develop a user-friendly software that could be used by researchers with few programming skills. Materials and Methods We have written a Stata command that is intended to help researchers obtain cut point-free and cut point-based net reclassification improvement index and (NRI) and relative and absolute Integrated discriminatory improvement index (IDI) for logistic-based regression analyses.We applied the commands to a real data on women participating the Tehran lipid and glucose study (TLGS) to examine if information of a family history of premature CVD, waist circumference, and fasting plasma glucose can improve predictive performance of the Framingham’s “general CVD risk” algorithm. Results The command is addpred for logistic regression models. Conclusions The Stata package provided herein can encourage the use of novel methods in examining predictive capacity of ever-emerging plethora of novel biomarkers. PMID:27279830
Simulation of quantum dynamics based on the quantum stochastic differential equation.

PubMed

Li, Ming

2013-01-01

The quantum stochastic differential equation derived from the Lindblad form quantum master equation is investigated. The general formulation in terms of environment operators representing the quantum state diffusion is given. The numerical simulation algorithm of stochastic process of direct photodetection of a driven two-level system for the predictions of the dynamical behavior is proposed. The effectiveness and superiority of the algorithm are verified by the performance analysis of the accuracy and the computational cost in comparison with the classical Runge-Kutta algorithm.
ASPsiRNA: A Resource of ASP-siRNAs Having Therapeutic Potential for Human Genetic Disorders and Algorithm for Prediction of Their Inhibitory Efficacy

PubMed Central

Monga, Isha; Qureshi, Abid; Thakur, Nishant; Gupta, Amit Kumar; Kumar, Manoj

2017-01-01

Allele-specific siRNAs (ASP-siRNAs) have emerged as promising therapeutic molecules owing to their selectivity to inhibit the mutant allele or associated single-nucleotide polymorphisms (SNPs) sparing the expression of the wild-type counterpart. Thus, a dedicated bioinformatics platform encompassing updated ASP-siRNAs and an algorithm for the prediction of their inhibitory efficacy will be helpful in tackling currently intractable genetic disorders. In the present study, we have developed the ASPsiRNA resource (http://crdd.osdd.net/servers/aspsirna/) covering three components viz (i) ASPsiDb, (ii) ASPsiPred, and (iii) analysis tools like ASP-siOffTar. ASPsiDb is a manually curated database harboring 4543 (including 422 chemically modified) ASP-siRNAs targeting 78 unique genes involved in 51 different diseases. It furnishes comprehensive information from experimental studies on ASP-siRNAs along with multidimensional genetic and clinical information for numerous mutations. ASPsiPred is a two-layered algorithm to predict efficacy of ASP-siRNAs for fully complementary mutant (Effmut) and wild-type allele (Effwild) with one mismatch by ASPsiPredSVM and ASPsiPredmatrix, respectively. In ASPsiPredSVM, 922 unique ASP-siRNAs with experimentally validated quantitative Effmut were used. During 10-fold cross-validation (10nCV) employing various sequence features on the training/testing dataset (T737), the best predictive model achieved a maximum Pearson’s correlation coefficient (PCC) of 0.71. Further, the accuracy of the classifier to predict Effmut against novel genes was assessed by leave one target out cross-validation approach (LOTOCV). ASPsiPredmatrix was constructed from rule-based studies describing the effect of single siRNA:mRNA mismatches on the efficacy at 19 different locations of siRNA. Thus, ASPsiRNA encompasses the first database, prediction algorithm, and off-target analysis tool that is expected to accelerate research in the field of RNAi-based therapeutics for human genetic diseases. PMID:28696921
Learning Instance-Specific Predictive Models

PubMed Central

Visweswaran, Shyam; Cooper, Gregory F.

2013-01-01

This paper introduces a Bayesian algorithm for constructing predictive models from data that are optimized to predict a target variable well for a particular instance. This algorithm learns Markov blanket models, carries out Bayesian model averaging over a set of models to predict a target variable of the instance at hand, and employs an instance-specific heuristic to locate a set of suitable models to average over. We call this method the instance-specific Markov blanket (ISMB) algorithm. The ISMB algorithm was evaluated on 21 UCI data sets using five different performance measures and its performance was compared to that of several commonly used predictive algorithms, including nave Bayes, C4.5 decision tree, logistic regression, neural networks, k-Nearest Neighbor, Lazy Bayesian Rules, and AdaBoost. Over all the data sets, the ISMB algorithm performed better on average on all performance measures against all the comparison algorithms. PMID:25045325

Some links on this page may take you to non-federal websites. Their policies may differ from this site.