Sample records for prediction methods based

  1. A review of propeller noise prediction methodology: 1919-1994

    NASA Technical Reports Server (NTRS)

    Metzger, F. Bruce

    1995-01-01

    This report summarizes a review of the literature regarding propeller noise prediction methods. The review is divided into six sections: (1) early methods; (2) more recent methods based on earlier theory; (3) more recent methods based on the Acoustic Analogy; (4) more recent methods based on Computational Acoustics; (5) empirical methods; and (6) broadband methods. The report concludes that there are a large number of noise prediction procedures available which vary markedly in complexity. Deficiencies in accuracy of methods in many cases may be related, not to the methods themselves, but the accuracy and detail of the aerodynamic inputs used to calculate noise. The steps recommended in the report to provide accurate and easy to use prediction methods are: (1) identify reliable test data; (2) define and conduct test programs to fill gaps in the existing data base; (3) identify the most promising prediction methods; (4) evaluate promising prediction methods relative to the data base; (5) identify and correct the weaknesses in the prediction methods, including lack of user friendliness, and include features now available only in research codes; (6) confirm the accuracy of improved prediction methods to the data base; and (7) make the methods widely available and provide training in their use.

  2. Learning linear transformations between counting-based and prediction-based word embeddings

    PubMed Central

    Hayashi, Kohei; Kawarabayashi, Ken-ichi

    2017-01-01

    Despite the growing interest in prediction-based word embedding learning methods, it remains unclear as to how the vector spaces learnt by the prediction-based methods differ from that of the counting-based methods, or whether one can be transformed into the other. To study the relationship between counting-based and prediction-based embeddings, we propose a method for learning a linear transformation between two given sets of word embeddings. Our proposal contributes to the word embedding learning research in three ways: (a) we propose an efficient method to learn a linear transformation between two sets of word embeddings, (b) using the transformation learnt in (a), we empirically show that it is possible to predict distributed word embeddings for novel unseen words, and (c) empirically it is possible to linearly transform counting-based embeddings to prediction-based embeddings, for frequent words, different POS categories, and varying degrees of ambiguities. PMID:28926629

  3. Selecting the minimum prediction base of historical data to perform 5-year predictions of the cancer burden: The GoF-optimal method.

    PubMed

    Valls, Joan; Castellà, Gerard; Dyba, Tadeusz; Clèries, Ramon

    2015-06-01

    Predicting the future burden of cancer is a key issue for health services planning, where a method for selecting the predictive model and the prediction base is a challenge. A method, named here Goodness-of-Fit optimal (GoF-optimal), is presented to determine the minimum prediction base of historical data to perform 5-year predictions of the number of new cancer cases or deaths. An empirical ex-post evaluation exercise for cancer mortality data in Spain and cancer incidence in Finland using simple linear and log-linear Poisson models was performed. Prediction bases were considered within the time periods 1951-2006 in Spain and 1975-2007 in Finland, and then predictions were made for 37 and 33 single years in these periods, respectively. The performance of three fixed different prediction bases (last 5, 10, and 20 years of historical data) was compared to that of the prediction base determined by the GoF-optimal method. The coverage (COV) of the 95% prediction interval and the discrepancy ratio (DR) were calculated to assess the success of the prediction. The results showed that (i) models using the prediction base selected through GoF-optimal method reached the highest COV and the lowest DR and (ii) the best alternative strategy to GoF-optimal was the one using the base of prediction of 5-years. The GoF-optimal approach can be used as a selection criterion in order to find an adequate base of prediction. Copyright © 2015 Elsevier Ltd. All rights reserved.

  4. Prediction of hot regions in protein-protein interaction by combining density-based incremental clustering with feature-based classification.

    PubMed

    Hu, Jing; Zhang, Xiaolong; Liu, Xiaoming; Tang, Jinshan

    2015-06-01

    Discovering hot regions in protein-protein interaction is important for drug and protein design, while experimental identification of hot regions is a time-consuming and labor-intensive effort; thus, the development of predictive models can be very helpful. In hot region prediction research, some models are based on structure information, and others are based on a protein interaction network. However, the prediction accuracy of these methods can still be improved. In this paper, a new method is proposed for hot region prediction, which combines density-based incremental clustering with feature-based classification. The method uses density-based incremental clustering to obtain rough hot regions, and uses feature-based classification to remove the non-hot spot residues from the rough hot regions. Experimental results show that the proposed method significantly improves the prediction performance of hot regions. Copyright © 2015 Elsevier Ltd. All rights reserved.

  5. Improving local clustering based top-L link prediction methods via asymmetric link clustering information

    NASA Astrophysics Data System (ADS)

    Wu, Zhihao; Lin, Youfang; Zhao, Yiji; Yan, Hongyan

    2018-02-01

    Networks can represent a wide range of complex systems, such as social, biological and technological systems. Link prediction is one of the most important problems in network analysis, and has attracted much research interest recently. Many link prediction methods have been proposed to solve this problem with various techniques. We can note that clustering information plays an important role in solving the link prediction problem. In previous literatures, we find node clustering coefficient appears frequently in many link prediction methods. However, node clustering coefficient is limited to describe the role of a common-neighbor in different local networks, because it cannot distinguish different clustering abilities of a node to different node pairs. In this paper, we shift our focus from nodes to links, and propose the concept of asymmetric link clustering (ALC) coefficient. Further, we improve three node clustering based link prediction methods via the concept of ALC. The experimental results demonstrate that ALC-based methods outperform node clustering based methods, especially achieving remarkable improvements on food web, hamster friendship and Internet networks. Besides, comparing with other methods, the performance of ALC-based methods are very stable in both globalized and personalized top-L link prediction tasks.

  6. Ensemble-based prediction of RNA secondary structures.

    PubMed

    Aghaeepour, Nima; Hoos, Holger H

    2013-04-24

    Accurate structure prediction methods play an important role for the understanding of RNA function. Energy-based, pseudoknot-free secondary structure prediction is one of the most widely used and versatile approaches, and improved methods for this task have received much attention over the past five years. Despite the impressive progress that as been achieved in this area, existing evaluations of the prediction accuracy achieved by various algorithms do not provide a comprehensive, statistically sound assessment. Furthermore, while there is increasing evidence that no prediction algorithm consistently outperforms all others, no work has been done to exploit the complementary strengths of multiple approaches. In this work, we present two contributions to the area of RNA secondary structure prediction. Firstly, we use state-of-the-art, resampling-based statistical methods together with a previously published and increasingly widely used dataset of high-quality RNA structures to conduct a comprehensive evaluation of existing RNA secondary structure prediction procedures. The results from this evaluation clarify the performance relationship between ten well-known existing energy-based pseudoknot-free RNA secondary structure prediction methods and clearly demonstrate the progress that has been achieved in recent years. Secondly, we introduce AveRNA, a generic and powerful method for combining a set of existing secondary structure prediction procedures into an ensemble-based method that achieves significantly higher prediction accuracies than obtained from any of its component procedures. Our new, ensemble-based method, AveRNA, improves the state of the art for energy-based, pseudoknot-free RNA secondary structure prediction by exploiting the complementary strengths of multiple existing prediction procedures, as demonstrated using a state-of-the-art statistical resampling approach. In addition, AveRNA allows an intuitive and effective control of the trade-off between false negative and false positive base pair predictions. Finally, AveRNA can make use of arbitrary sets of secondary structure prediction procedures and can therefore be used to leverage improvements in prediction accuracy offered by algorithms and energy models developed in the future. Our data, MATLAB software and a web-based version of AveRNA are publicly available at http://www.cs.ubc.ca/labs/beta/Software/AveRNA.

  7. Methods of developing core collections based on the predicted genotypic value of rice ( Oryza sativa L.).

    PubMed

    Li, C T; Shi, C H; Wu, J G; Xu, H M; Zhang, H Z; Ren, Y L

    2004-04-01

    The selection of an appropriate sampling strategy and a clustering method is important in the construction of core collections based on predicted genotypic values in order to retain the greatest degree of genetic diversity of the initial collection. In this study, methods of developing rice core collections were evaluated based on the predicted genotypic values for 992 rice varieties with 13 quantitative traits. The genotypic values of the traits were predicted by the adjusted unbiased prediction (AUP) method. Based on the predicted genotypic values, Mahalanobis distances were calculated and employed to measure the genetic similarities among the rice varieties. Six hierarchical clustering methods, including the single linkage, median linkage, centroid, unweighted pair-group average, weighted pair-group average and flexible-beta methods, were combined with random, preferred and deviation sampling to develop 18 core collections of rice germplasm. The results show that the deviation sampling strategy in combination with the unweighted pair-group average method of hierarchical clustering retains the greatest degree of genetic diversities of the initial collection. The core collections sampled using predicted genotypic values had more genetic diversity than those based on phenotypic values.

  8. Prediction of welding shrinkage deformation of bridge steel box girder based on wavelet neural network

    NASA Astrophysics Data System (ADS)

    Tao, Yulong; Miao, Yunshui; Han, Jiaqi; Yan, Feiyun

    2018-05-01

    Aiming at the low accuracy of traditional forecasting methods such as linear regression method, this paper presents a prediction method for predicting the relationship between bridge steel box girder and its displacement with wavelet neural network. Compared with traditional forecasting methods, this scheme has better local characteristics and learning ability, which greatly improves the prediction ability of deformation. Through analysis of the instance and found that after compared with the traditional prediction method based on wavelet neural network, the rigid beam deformation prediction accuracy is higher, and is superior to the BP neural network prediction results, conform to the actual demand of engineering design.

  9. Predicting missing links via correlation between nodes

    NASA Astrophysics Data System (ADS)

    Liao, Hao; Zeng, An; Zhang, Yi-Cheng

    2015-10-01

    As a fundamental problem in many different fields, link prediction aims to estimate the likelihood of an existing link between two nodes based on the observed information. Since this problem is related to many applications ranging from uncovering missing data to predicting the evolution of networks, link prediction has been intensively investigated recently and many methods have been proposed so far. The essential challenge of link prediction is to estimate the similarity between nodes. Most of the existing methods are based on the common neighbor index and its variants. In this paper, we propose to calculate the similarity between nodes by the Pearson correlation coefficient. This method is found to be very effective when applied to calculate similarity based on high order paths. We finally fuse the correlation-based method with the resource allocation method, and find that the combined method can substantially outperform the existing methods, especially in sparse networks.

  10. An evidential link prediction method and link predictability based on Shannon entropy

    NASA Astrophysics Data System (ADS)

    Yin, Likang; Zheng, Haoyang; Bian, Tian; Deng, Yong

    2017-09-01

    Predicting missing links is of both theoretical value and practical interest in network science. In this paper, we empirically investigate a new link prediction method base on similarity and compare nine well-known local similarity measures on nine real networks. Most of the previous studies focus on the accuracy, however, it is crucial to consider the link predictability as an initial property of networks itself. Hence, this paper has proposed a new link prediction approach called evidential measure (EM) based on Dempster-Shafer theory. Moreover, this paper proposed a new method to measure link predictability via local information and Shannon entropy.

  11. A community resource benchmarking predictions of peptide binding to MHC-I molecules.

    PubMed

    Peters, Bjoern; Bui, Huynh-Hoa; Frankild, Sune; Nielson, Morten; Lundegaard, Claus; Kostem, Emrah; Basch, Derek; Lamberth, Kasper; Harndahl, Mikkel; Fleri, Ward; Wilson, Stephen S; Sidney, John; Lund, Ole; Buus, Soren; Sette, Alessandro

    2006-06-09

    Recognition of peptides bound to major histocompatibility complex (MHC) class I molecules by T lymphocytes is an essential part of immune surveillance. Each MHC allele has a characteristic peptide binding preference, which can be captured in prediction algorithms, allowing for the rapid scan of entire pathogen proteomes for peptide likely to bind MHC. Here we make public a large set of 48,828 quantitative peptide-binding affinity measurements relating to 48 different mouse, human, macaque, and chimpanzee MHC class I alleles. We use this data to establish a set of benchmark predictions with one neural network method and two matrix-based prediction methods extensively utilized in our groups. In general, the neural network outperforms the matrix-based predictions mainly due to its ability to generalize even on a small amount of data. We also retrieved predictions from tools publicly available on the internet. While differences in the data used to generate these predictions hamper direct comparisons, we do conclude that tools based on combinatorial peptide libraries perform remarkably well. The transparent prediction evaluation on this dataset provides tool developers with a benchmark for comparison of newly developed prediction methods. In addition, to generate and evaluate our own prediction methods, we have established an easily extensible web-based prediction framework that allows automated side-by-side comparisons of prediction methods implemented by experts. This is an advance over the current practice of tool developers having to generate reference predictions themselves, which can lead to underestimating the performance of prediction methods they are not as familiar with as their own. The overall goal of this effort is to provide a transparent prediction evaluation allowing bioinformaticians to identify promising features of prediction methods and providing guidance to immunologists regarding the reliability of prediction tools.

  12. Life prediction for high temperature low cycle fatigue of two kinds of titanium alloys based on exponential function

    NASA Astrophysics Data System (ADS)

    Mu, G. Y.; Mi, X. Z.; Wang, F.

    2018-01-01

    The high temperature low cycle fatigue tests of TC4 titanium alloy and TC11 titanium alloy are carried out under strain controlled. The relationships between cyclic stress-life and strain-life are analyzed. The high temperature low cycle fatigue life prediction model of two kinds of titanium alloys is established by using Manson-Coffin method. The relationship between failure inverse number and plastic strain range presents nonlinear in the double logarithmic coordinates. Manson-Coffin method assumes that they have linear relation. Therefore, there is bound to be a certain prediction error by using the Manson-Coffin method. In order to solve this problem, a new method based on exponential function is proposed. The results show that the fatigue life of the two kinds of titanium alloys can be predicted accurately and effectively by using these two methods. Prediction accuracy is within ±1.83 times scatter zone. The life prediction capability of new methods based on exponential function proves more effective and accurate than Manson-Coffin method for two kinds of titanium alloys. The new method based on exponential function can give better fatigue life prediction results with the smaller standard deviation and scatter zone than Manson-Coffin method. The life prediction results of two methods for TC4 titanium alloy prove better than TC11 titanium alloy.

  13. Prediction of protein secondary structure content for the twilight zone sequences.

    PubMed

    Homaeian, Leila; Kurgan, Lukasz A; Ruan, Jishou; Cios, Krzysztof J; Chen, Ke

    2007-11-15

    Secondary protein structure carries information about local structural arrangements, which include three major conformations: alpha-helices, beta-strands, and coils. Significant majority of successful methods for prediction of the secondary structure is based on multiple sequence alignment. However, multiple alignment fails to provide accurate results when a sequence comes from the twilight zone, that is, it is characterized by low (<30%) homology. To this end, we propose a novel method for prediction of secondary structure content through comprehensive sequence representation, called PSSC-core. The method uses a multiple linear regression model and introduces a comprehensive feature-based sequence representation to predict amount of helices and strands for sequences from the twilight zone. The PSSC-core method was tested and compared with two other state-of-the-art prediction methods on a set of 2187 twilight zone sequences. The results indicate that our method provides better predictions for both helix and strand content. The PSSC-core is shown to provide statistically significantly better results when compared with the competing methods, reducing the prediction error by 5-7% for helix and 7-9% for strand content predictions. The proposed feature-based sequence representation uses a comprehensive set of physicochemical properties that are custom-designed for each of the helix and strand content predictions. It includes composition and composition moment vectors, frequency of tetra-peptides associated with helical and strand conformations, various property-based groups like exchange groups, chemical groups of the side chains and hydrophobic group, auto-correlations based on hydrophobicity, side-chain masses, hydropathy, and conformational patterns for beta-sheets. The PSSC-core method provides an alternative for predicting the secondary structure content that can be used to validate and constrain results of other structure prediction methods. At the same time, it also provides useful insight into design of successful protein sequence representations that can be used in developing new methods related to prediction of different aspects of the secondary protein structure. (c) 2007 Wiley-Liss, Inc.

  14. Incorporating information on predicted solvent accessibility to the co-evolution-based study of protein interactions.

    PubMed

    Ochoa, David; García-Gutiérrez, Ponciano; Juan, David; Valencia, Alfonso; Pazos, Florencio

    2013-01-27

    A widespread family of methods for studying and predicting protein interactions using sequence information is based on co-evolution, quantified as similarity of phylogenetic trees. Part of the co-evolution observed between interacting proteins could be due to co-adaptation caused by inter-protein contacts. In this case, the co-evolution is expected to be more evident when evaluated on the surface of the proteins or the internal layers close to it. In this work we study the effect of incorporating information on predicted solvent accessibility to three methods for predicting protein interactions based on similarity of phylogenetic trees. We evaluate the performance of these methods in predicting different types of protein associations when trees based on positions with different characteristics of predicted accessibility are used as input. We found that predicted accessibility improves the results of two recent versions of the mirrortree methodology in predicting direct binary physical interactions, while it neither improves these methods, nor the original mirrortree method, in predicting other types of interactions. That improvement comes at no cost in terms of applicability since accessibility can be predicted for any sequence. We also found that predictions of protein-protein interactions are improved when multiple sequence alignments with a richer representation of sequences (including paralogs) are incorporated in the accessibility prediction.

  15. Lessons learned from participating in D3R 2016 Grand Challenge 2: compounds targeting the farnesoid X receptor

    NASA Astrophysics Data System (ADS)

    Duan, Rui; Xu, Xianjin; Zou, Xiaoqin

    2018-01-01

    D3R 2016 Grand Challenge 2 focused on predictions of binding modes and affinities for 102 compounds against the farnesoid X receptor (FXR). In this challenge, two distinct methods, a docking-based method and a template-based method, were employed by our team for the binding mode prediction. For the new template-based method, 3D ligand similarities were calculated for each query compound against the ligands in the co-crystal structures of FXR available in Protein Data Bank. The binding mode was predicted based on the co-crystal protein structure containing the ligand with the best ligand similarity score against the query compound. For the FXR dataset, the template-based method achieved a better performance than the docking-based method on the binding mode prediction. For the binding affinity prediction, an in-house knowledge-based scoring function ITScore2 and MM/PBSA approach were employed. Good performance was achieved for MM/PBSA, whereas the performance of ITScore2 was sensitive to ligand composition, e.g. the percentage of carbon atoms in the compounds. The sensitivity to ligand composition could be a clue for the further improvement of our knowledge-based scoring function.

  16. Protein contact prediction by integrating deep multiple sequence alignments, coevolution and machine learning.

    PubMed

    Adhikari, Badri; Hou, Jie; Cheng, Jianlin

    2018-03-01

    In this study, we report the evaluation of the residue-residue contacts predicted by our three different methods in the CASP12 experiment, focusing on studying the impact of multiple sequence alignment, residue coevolution, and machine learning on contact prediction. The first method (MULTICOM-NOVEL) uses only traditional features (sequence profile, secondary structure, and solvent accessibility) with deep learning to predict contacts and serves as a baseline. The second method (MULTICOM-CONSTRUCT) uses our new alignment algorithm to generate deep multiple sequence alignment to derive coevolution-based features, which are integrated by a neural network method to predict contacts. The third method (MULTICOM-CLUSTER) is a consensus combination of the predictions of the first two methods. We evaluated our methods on 94 CASP12 domains. On a subset of 38 free-modeling domains, our methods achieved an average precision of up to 41.7% for top L/5 long-range contact predictions. The comparison of the three methods shows that the quality and effective depth of multiple sequence alignments, coevolution-based features, and machine learning integration of coevolution-based features and traditional features drive the quality of predicted protein contacts. On the full CASP12 dataset, the coevolution-based features alone can improve the average precision from 28.4% to 41.6%, and the machine learning integration of all the features further raises the precision to 56.3%, when top L/5 predicted long-range contacts are evaluated. And the correlation between the precision of contact prediction and the logarithm of the number of effective sequences in alignments is 0.66. © 2017 Wiley Periodicals, Inc.

  17. Predicting links based on knowledge dissemination in complex network

    NASA Astrophysics Data System (ADS)

    Zhou, Wen; Jia, Yifan

    2017-04-01

    Link prediction is the task of mining the missing links in networks or predicting the next vertex pair to be connected by a link. A lot of link prediction methods were inspired by evolutionary processes of networks. In this paper, a new mechanism for the formation of complex networks called knowledge dissemination (KD) is proposed with the assumption of knowledge disseminating through the paths of a network. Accordingly, a new link prediction method-knowledge dissemination based link prediction (KDLP)-is proposed to test KD. KDLP characterizes vertex similarity based on knowledge quantity (KQ) which measures the importance of a vertex through H-index. Extensive numerical simulations on six real-world networks demonstrate that KDLP is a strong link prediction method which performs at a higher prediction accuracy than four well-known similarity measures including common neighbors, local path index, average commute time and matrix forest index. Furthermore, based on the common conclusion that an excellent link prediction method reveals a good evolving mechanism, the experiment results suggest that KD is a considerable network evolving mechanism for the formation of complex networks.

  18. Improving transmembrane protein consensus topology prediction using inter-helical interaction.

    PubMed

    Wang, Han; Zhang, Chao; Shi, Xiaohu; Zhang, Li; Zhou, You

    2012-11-01

    Alpha helix transmembrane proteins (αTMPs) represent roughly 30% of all open reading frames (ORFs) in a typical genome and are involved in many critical biological processes. Due to the special physicochemical properties, it is hard to crystallize and obtain high resolution structures experimentally, thus, sequence-based topology prediction is highly desirable for the study of transmembrane proteins (TMPs), both in structure prediction and function prediction. Various model-based topology prediction methods have been developed, but the accuracy of those individual predictors remain poor due to the limitation of the methods or the features they used. Thus, the consensus topology prediction method becomes practical for high accuracy applications by combining the advances of the individual predictors. Here, based on the observation that inter-helical interactions are commonly found within the transmembrane helixes (TMHs) and strongly indicate the existence of them, we present a novel consensus topology prediction method for αTMPs, CNTOP, which incorporates four top leading individual topology predictors, and further improves the prediction accuracy by using the predicted inter-helical interactions. The method achieved 87% prediction accuracy based on a benchmark dataset and 78% accuracy based on a non-redundant dataset which is composed of polytopic αTMPs. Our method derives the highest topology accuracy than any other individual predictors and consensus predictors, at the same time, the TMHs are more accurately predicted in their length and locations, where both the false positives (FPs) and the false negatives (FNs) decreased dramatically. The CNTOP is available at: http://ccst.jlu.edu.cn/JCSB/cntop/CNTOP.html. Copyright © 2012 Elsevier B.V. All rights reserved.

  19. An object programming based environment for protein secondary structure prediction.

    PubMed

    Giacomini, M; Ruggiero, C; Sacile, R

    1996-01-01

    The most frequently used methods for protein secondary structure prediction are empirical statistical methods and rule based methods. A consensus system based on object-oriented programming is presented, which integrates the two approaches with the aim of improving the prediction quality. This system uses an object-oriented knowledge representation based on the concepts of conformation, residue and protein, where the conformation class is the basis, the residue class derives from it and the protein class derives from the residue class. The system has been tested with satisfactory results on several proteins of the Brookhaven Protein Data Bank. Its results have been compared with the results of the most widely used prediction methods, and they show a higher prediction capability and greater stability. Moreover, the system itself provides an index of the reliability of its current prediction. This system can also be regarded as a basis structure for programs of this kind.

  20. Analysis of methods to estimate spring flows in a karst aquifer

    USGS Publications Warehouse

    Sepulveda, N.

    2009-01-01

    Hydraulically and statistically based methods were analyzed to identify the most reliable method to predict spring flows in a karst aquifer. Measured water levels at nearby observation wells, measured spring pool altitudes, and the distance between observation wells and the spring pool were the parameters used to match measured spring flows. Measured spring flows at six Upper Floridan aquifer springs in central Florida were used to assess the reliability of these methods to predict spring flows. Hydraulically based methods involved the application of the Theis, Hantush-Jacob, and Darcy-Weisbach equations, whereas the statistically based methods were the multiple linear regressions and the technology of artificial neural networks (ANNs). Root mean square errors between measured and predicted spring flows using the Darcy-Weisbach method ranged between 5% and 15% of the measured flows, lower than the 7% to 27% range for the Theis or Hantush-Jacob methods. Flows at all springs were estimated to be turbulent based on the Reynolds number derived from the Darcy-Weisbach equation for conduit flow. The multiple linear regression and the Darcy-Weisbach methods had similar spring flow prediction capabilities. The ANNs provided the lowest residuals between measured and predicted spring flows, ranging from 1.6% to 5.3% of the measured flows. The model prediction efficiency criteria also indicated that the ANNs were the most accurate method predicting spring flows in a karst aquifer. ?? 2008 National Ground Water Association.

  1. Analysis of methods to estimate spring flows in a karst aquifer.

    PubMed

    Sepúlveda, Nicasio

    2009-01-01

    Hydraulically and statistically based methods were analyzed to identify the most reliable method to predict spring flows in a karst aquifer. Measured water levels at nearby observation wells, measured spring pool altitudes, and the distance between observation wells and the spring pool were the parameters used to match measured spring flows. Measured spring flows at six Upper Floridan aquifer springs in central Florida were used to assess the reliability of these methods to predict spring flows. Hydraulically based methods involved the application of the Theis, Hantush-Jacob, and Darcy-Weisbach equations, whereas the statistically based methods were the multiple linear regressions and the technology of artificial neural networks (ANNs). Root mean square errors between measured and predicted spring flows using the Darcy-Weisbach method ranged between 5% and 15% of the measured flows, lower than the 7% to 27% range for the Theis or Hantush-Jacob methods. Flows at all springs were estimated to be turbulent based on the Reynolds number derived from the Darcy-Weisbach equation for conduit flow. The multiple linear regression and the Darcy-Weisbach methods had similar spring flow prediction capabilities. The ANNs provided the lowest residuals between measured and predicted spring flows, ranging from 1.6% to 5.3% of the measured flows. The model prediction efficiency criteria also indicated that the ANNs were the most accurate method predicting spring flows in a karst aquifer.

  2. Bayesian model aggregation for ensemble-based estimates of protein pKa values

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gosink, Luke J.; Hogan, Emilie A.; Pulsipher, Trenton C.

    2014-03-01

    This paper investigates an ensemble-based technique called Bayesian Model Averaging (BMA) to improve the performance of protein amino acid pmore » $$K_a$$ predictions. Structure-based p$$K_a$$ calculations play an important role in the mechanistic interpretation of protein structure and are also used to determine a wide range of protein properties. A diverse set of methods currently exist for p$$K_a$$ prediction, ranging from empirical statistical models to {\\it ab initio} quantum mechanical approaches. However, each of these methods are based on a set of assumptions that have inherent bias and sensitivities that can effect a model's accuracy and generalizability for p$$K_a$$ prediction in complicated biomolecular systems. We use BMA to combine eleven diverse prediction methods that each estimate pKa values of amino acids in staphylococcal nuclease. These methods are based on work conducted for the pKa Cooperative and the pKa measurements are based on experimental work conducted by the Garc{\\'i}a-Moreno lab. Our study demonstrates that the aggregated estimate obtained from BMA outperforms all individual prediction methods in our cross-validation study with improvements from 40-70\\% over other method classes. This work illustrates a new possible mechanism for improving the accuracy of p$$K_a$$ prediction and lays the foundation for future work on aggregate models that balance computational cost with prediction accuracy.« less

  3. RRCRank: a fusion method using rank strategy for residue-residue contact prediction.

    PubMed

    Jing, Xiaoyang; Dong, Qiwen; Lu, Ruqian

    2017-09-02

    In structural biology area, protein residue-residue contacts play a crucial role in protein structure prediction. Some researchers have found that the predicted residue-residue contacts could effectively constrain the conformational search space, which is significant for de novo protein structure prediction. In the last few decades, related researchers have developed various methods to predict residue-residue contacts, especially, significant performance has been achieved by using fusion methods in recent years. In this work, a novel fusion method based on rank strategy has been proposed to predict contacts. Unlike the traditional regression or classification strategies, the contact prediction task is regarded as a ranking task. First, two kinds of features are extracted from correlated mutations methods and ensemble machine-learning classifiers, and then the proposed method uses the learning-to-rank algorithm to predict contact probability of each residue pair. First, we perform two benchmark tests for the proposed fusion method (RRCRank) on CASP11 dataset and CASP12 dataset respectively. The test results show that the RRCRank method outperforms other well-developed methods, especially for medium and short range contacts. Second, in order to verify the superiority of ranking strategy, we predict contacts by using the traditional regression and classification strategies based on the same features as ranking strategy. Compared with these two traditional strategies, the proposed ranking strategy shows better performance for three contact types, in particular for long range contacts. Third, the proposed RRCRank has been compared with several state-of-the-art methods in CASP11 and CASP12. The results show that the RRCRank could achieve comparable prediction precisions and is better than three methods in most assessment metrics. The learning-to-rank algorithm is introduced to develop a novel rank-based method for the residue-residue contact prediction of proteins, which achieves state-of-the-art performance based on the extensive assessment.

  4. The effect of using genealogy-based haplotypes for genomic prediction

    PubMed Central

    2013-01-01

    Background Genomic prediction uses two sources of information: linkage disequilibrium between markers and quantitative trait loci, and additive genetic relationships between individuals. One way to increase the accuracy of genomic prediction is to capture more linkage disequilibrium by regression on haplotypes instead of regression on individual markers. The aim of this study was to investigate the accuracy of genomic prediction using haplotypes based on local genealogy information. Methods A total of 4429 Danish Holstein bulls were genotyped with the 50K SNP chip. Haplotypes were constructed using local genealogical trees. Effects of haplotype covariates were estimated with two types of prediction models: (1) assuming that effects had the same distribution for all haplotype covariates, i.e. the GBLUP method and (2) assuming that a large proportion (π) of the haplotype covariates had zero effect, i.e. a Bayesian mixture method. Results About 7.5 times more covariate effects were estimated when fitting haplotypes based on local genealogical trees compared to fitting individuals markers. Genealogy-based haplotype clustering slightly increased the accuracy of genomic prediction and, in some cases, decreased the bias of prediction. With the Bayesian method, accuracy of prediction was less sensitive to parameter π when fitting haplotypes compared to fitting markers. Conclusions Use of haplotypes based on genealogy can slightly increase the accuracy of genomic prediction. Improved methods to cluster the haplotypes constructed from local genealogy could lead to additional gains in accuracy. PMID:23496971

  5. Evaluation and integration of existing methods for computational prediction of allergens

    PubMed Central

    2013-01-01

    Background Allergy involves a series of complex reactions and factors that contribute to the development of the disease and triggering of the symptoms, including rhinitis, asthma, atopic eczema, skin sensitivity, even acute and fatal anaphylactic shock. Prediction and evaluation of the potential allergenicity is of importance for safety evaluation of foods and other environment factors. Although several computational approaches for assessing the potential allergenicity of proteins have been developed, their performance and relative merits and shortcomings have not been compared systematically. Results To evaluate and improve the existing methods for allergen prediction, we collected an up-to-date definitive dataset consisting of 989 known allergens and massive putative non-allergens. The three most widely used allergen computational prediction approaches including sequence-, motif- and SVM-based (Support Vector Machine) methods were systematically compared using the defined parameters and we found that SVM-based method outperformed the other two methods with higher accuracy and specificity. The sequence-based method with the criteria defined by FAO/WHO (FAO: Food and Agriculture Organization of the United Nations; WHO: World Health Organization) has higher sensitivity of over 98%, but having a low specificity. The advantage of motif-based method is the ability to visualize the key motif within the allergen. Notably, the performances of the sequence-based method defined by FAO/WHO and motif eliciting strategy could be improved by the optimization of parameters. To facilitate the allergen prediction, we integrated these three methods in a web-based application proAP, which provides the global search of the known allergens and a powerful tool for allergen predication. Flexible parameter setting and batch prediction were also implemented. The proAP can be accessed at http://gmobl.sjtu.edu.cn/proAP/main.html. Conclusions This study comprehensively evaluated sequence-, motif- and SVM-based computational prediction approaches for allergens and optimized their parameters to obtain better performance. These findings may provide helpful guidance for the researchers in allergen-prediction. Furthermore, we integrated these methods into a web application proAP, greatly facilitating users to do customizable allergen search and prediction. PMID:23514097

  6. Evaluation and integration of existing methods for computational prediction of allergens.

    PubMed

    Wang, Jing; Yu, Yabin; Zhao, Yunan; Zhang, Dabing; Li, Jing

    2013-01-01

    Allergy involves a series of complex reactions and factors that contribute to the development of the disease and triggering of the symptoms, including rhinitis, asthma, atopic eczema, skin sensitivity, even acute and fatal anaphylactic shock. Prediction and evaluation of the potential allergenicity is of importance for safety evaluation of foods and other environment factors. Although several computational approaches for assessing the potential allergenicity of proteins have been developed, their performance and relative merits and shortcomings have not been compared systematically. To evaluate and improve the existing methods for allergen prediction, we collected an up-to-date definitive dataset consisting of 989 known allergens and massive putative non-allergens. The three most widely used allergen computational prediction approaches including sequence-, motif- and SVM-based (Support Vector Machine) methods were systematically compared using the defined parameters and we found that SVM-based method outperformed the other two methods with higher accuracy and specificity. The sequence-based method with the criteria defined by FAO/WHO (FAO: Food and Agriculture Organization of the United Nations; WHO: World Health Organization) has higher sensitivity of over 98%, but having a low specificity. The advantage of motif-based method is the ability to visualize the key motif within the allergen. Notably, the performances of the sequence-based method defined by FAO/WHO and motif eliciting strategy could be improved by the optimization of parameters. To facilitate the allergen prediction, we integrated these three methods in a web-based application proAP, which provides the global search of the known allergens and a powerful tool for allergen predication. Flexible parameter setting and batch prediction were also implemented. The proAP can be accessed at http://gmobl.sjtu.edu.cn/proAP/main.html. This study comprehensively evaluated sequence-, motif- and SVM-based computational prediction approaches for allergens and optimized their parameters to obtain better performance. These findings may provide helpful guidance for the researchers in allergen-prediction. Furthermore, we integrated these methods into a web application proAP, greatly facilitating users to do customizable allergen search and prediction.

  7. The Satellite Clock Bias Prediction Method Based on Takagi-Sugeno Fuzzy Neural Network

    NASA Astrophysics Data System (ADS)

    Cai, C. L.; Yu, H. G.; Wei, Z. C.; Pan, J. D.

    2017-05-01

    The continuous improvement of the prediction accuracy of Satellite Clock Bias (SCB) is the key problem of precision navigation. In order to improve the precision of SCB prediction and better reflect the change characteristics of SCB, this paper proposes an SCB prediction method based on the Takagi-Sugeno fuzzy neural network. Firstly, the SCB values are pre-treated based on their characteristics. Then, an accurate Takagi-Sugeno fuzzy neural network model is established based on the preprocessed data to predict SCB. This paper uses the precise SCB data with different sampling intervals provided by IGS (International Global Navigation Satellite System Service) to realize the short-time prediction experiment, and the results are compared with the ARIMA (Auto-Regressive Integrated Moving Average) model, GM(1,1) model, and the quadratic polynomial model. The results show that the Takagi-Sugeno fuzzy neural network model is feasible and effective for the SCB short-time prediction experiment, and performs well for different types of clocks. The prediction results for the proposed method are better than the conventional methods obviously.

  8. Prediction of global and local model quality in CASP8 using the ModFOLD server.

    PubMed

    McGuffin, Liam J

    2009-01-01

    The development of effective methods for predicting the quality of three-dimensional (3D) models is fundamentally important for the success of tertiary structure (TS) prediction strategies. Since CASP7, the Quality Assessment (QA) category has existed to gauge the ability of various model quality assessment programs (MQAPs) at predicting the relative quality of individual 3D models. For the CASP8 experiment, automated predictions were submitted in the QA category using two methods from the ModFOLD server-ModFOLD version 1.1 and ModFOLDclust. ModFOLD version 1.1 is a single-model machine learning based method, which was used for automated predictions of global model quality (QMODE1). ModFOLDclust is a simple clustering based method, which was used for automated predictions of both global and local quality (QMODE2). In addition, manual predictions of model quality were made using ModFOLD version 2.0--an experimental method that combines the scores from ModFOLDclust and ModFOLD v1.1. Predictions from the ModFOLDclust method were the most successful of the three in terms of the global model quality, whilst the ModFOLD v1.1 method was comparable in performance to other single-model based methods. In addition, the ModFOLDclust method performed well at predicting the per-residue, or local, model quality scores. Predictions of the per-residue errors in our own 3D models, selected using the ModFOLD v2.0 method, were also the most accurate compared with those from other methods. All of the MQAPs described are publicly accessible via the ModFOLD server at: http://www.reading.ac.uk/bioinf/ModFOLD/. The methods are also freely available to download from: http://www.reading.ac.uk/bioinf/downloads/. Copyright 2009 Wiley-Liss, Inc.

  9. Key Technology of Real-Time Road Navigation Method Based on Intelligent Data Research

    PubMed Central

    Tang, Haijing; Liang, Yu; Huang, Zhongnan; Wang, Taoyi; He, Lin; Du, Yicong; Ding, Gangyi

    2016-01-01

    The effect of traffic flow prediction plays an important role in routing selection. Traditional traffic flow forecasting methods mainly include linear, nonlinear, neural network, and Time Series Analysis method. However, all of them have some shortcomings. This paper analyzes the existing algorithms on traffic flow prediction and characteristics of city traffic flow and proposes a road traffic flow prediction method based on transfer probability. This method first analyzes the transfer probability of upstream of the target road and then makes the prediction of the traffic flow at the next time by using the traffic flow equation. Newton Interior-Point Method is used to obtain the optimal value of parameters. Finally, it uses the proposed model to predict the traffic flow at the next time. By comparing the existing prediction methods, the proposed model has proven to have good performance. It can fast get the optimal value of parameters faster and has higher prediction accuracy, which can be used to make real-time traffic flow prediction. PMID:27872637

  10. Research of Water Level Prediction for a Continuous Flood due to Typhoons Based on a Machine Learning Method

    NASA Astrophysics Data System (ADS)

    Nakatsugawa, M.; Kobayashi, Y.; Okazaki, R.; Taniguchi, Y.

    2017-12-01

    This research aims to improve accuracy of water level prediction calculations for more effective river management. In August 2016, Hokkaido was visited by four typhoons, whose heavy rainfall caused severe flooding. In the Tokoro river basin of Eastern Hokkaido, the water level (WL) at the Kamikawazoe gauging station, which is at the lower reaches exceeded the design high-water level and the water rose to the highest level on record. To predict such flood conditions and mitigate disaster damage, it is necessary to improve the accuracy of prediction as well as to prolong the lead time (LT) required for disaster mitigation measures such as flood-fighting activities and evacuation actions by residents. There is the need to predict the river water level around the peak stage earlier and more accurately. Previous research dealing with WL prediction had proposed a method in which the WL at the lower reaches is estimated by the correlation with the WL at the upper reaches (hereinafter: "the water level correlation method"). Additionally, a runoff model-based method has been generally used in which the discharge is estimated by giving rainfall prediction data to a runoff model such as a storage function model and then the WL is estimated from that discharge by using a WL discharge rating curve (H-Q curve). In this research, an attempt was made to predict WL by applying the Random Forest (RF) method, which is a machine learning method that can estimate the contribution of explanatory variables. Furthermore, from the practical point of view, we investigated the prediction of WL based on a multiple correlation (MC) method involving factors using explanatory variables with high contribution in the RF method, and we examined the proper selection of explanatory variables and the extension of LT. The following results were found: 1) Based on the RF method tuned up by learning from previous floods, the WL for the abnormal flood case of August 2016 was properly predicted with a lead time of 6 h. 2) Based on the contribution of explanatory variables, factors were selected for the MC method. In this way, plausible prediction results were obtained.

  11. Prediction of Protein Structural Classes for Low-Similarity Sequences Based on Consensus Sequence and Segmented PSSM.

    PubMed

    Liang, Yunyun; Liu, Sanyang; Zhang, Shengli

    2015-01-01

    Prediction of protein structural classes for low-similarity sequences is useful for understanding fold patterns, regulation, functions, and interactions of proteins. It is well known that feature extraction is significant to prediction of protein structural class and it mainly uses protein primary sequence, predicted secondary structure sequence, and position-specific scoring matrix (PSSM). Currently, prediction solely based on the PSSM has played a key role in improving the prediction accuracy. In this paper, we propose a novel method called CSP-SegPseP-SegACP by fusing consensus sequence (CS), segmented PsePSSM, and segmented autocovariance transformation (ACT) based on PSSM. Three widely used low-similarity datasets (1189, 25PDB, and 640) are adopted in this paper. Then a 700-dimensional (700D) feature vector is constructed and the dimension is decreased to 224D by using principal component analysis (PCA). To verify the performance of our method, rigorous jackknife cross-validation tests are performed on 1189, 25PDB, and 640 datasets. Comparison of our results with the existing PSSM-based methods demonstrates that our method achieves the favorable and competitive performance. This will offer an important complementary to other PSSM-based methods for prediction of protein structural classes for low-similarity sequences.

  12. Applicability of a gene expression based prediction method to SD and Wistar rats: an example of CARCINOscreen®.

    PubMed

    Matsumoto, Hiroshi; Saito, Fumiyo; Takeyoshi, Masahiro

    2015-12-01

    Recently, the development of several gene expression-based prediction methods has been attempted in the fields of toxicology. CARCINOscreen® is a gene expression-based screening method to predict carcinogenicity of chemicals which target the liver with high accuracy. In this study, we investigated the applicability of the gene expression-based screening method to SD and Wistar rats by using CARCINOscreen®, originally developed with F344 rats, with two carcinogens, 2,4-diaminotoluen and thioacetamide, and two non-carcinogens, 2,6-diaminotoluen and sodium benzoate. After the 28-day repeated dose test was conducted with each chemical in SD and Wistar rats, microarray analysis was performed using total RNA extracted from each liver. Obtained gene expression data were applied to CARCINOscreen®. Predictive scores obtained by the CARCINOscreen® for known carcinogens were > 2 in all strains of rats, while non-carcinogens gave prediction scores below 0.5. These results suggested that the gene expression based screening method, CARCINOscreen®, can be applied to SD and Wistar rats, widely used strains in toxicological studies, by setting of an appropriate boundary line of prediction score to classify the chemicals into carcinogens and non-carcinogens.

  13. The wind power prediction research based on mind evolutionary algorithm

    NASA Astrophysics Data System (ADS)

    Zhuang, Ling; Zhao, Xinjian; Ji, Tianming; Miao, Jingwen; Cui, Haina

    2018-04-01

    When the wind power is connected to the power grid, its characteristics of fluctuation, intermittent and randomness will affect the stability of the power system. The wind power prediction can guarantee the power quality and reduce the operating cost of power system. There were some limitations in several traditional wind power prediction methods. On the basis, the wind power prediction method based on Mind Evolutionary Algorithm (MEA) is put forward and a prediction model is provided. The experimental results demonstrate that MEA performs efficiently in term of the wind power prediction. The MEA method has broad prospect of engineering application.

  14. Forecasting hotspots using predictive visual analytics approach

    DOEpatents

    Maciejewski, Ross; Hafen, Ryan; Rudolph, Stephen; Cleveland, William; Ebert, David

    2014-12-30

    A method for forecasting hotspots is provided. The method may include the steps of receiving input data at an input of the computational device, generating a temporal prediction based on the input data, generating a geospatial prediction based on the input data, and generating output data based on the time series and geospatial predictions. The output data may be configured to display at least one user interface at an output of the computational device.

  15. Prediction of human pharmacokinetics using physiologically based modeling: a retrospective analysis of 26 clinically tested drugs.

    PubMed

    De Buck, Stefan S; Sinha, Vikash K; Fenu, Luca A; Nijsen, Marjoleen J; Mackie, Claire E; Gilissen, Ron A H J

    2007-10-01

    The aim of this study was to evaluate different physiologically based modeling strategies for the prediction of human pharmacokinetics. Plasma profiles after intravenous and oral dosing were simulated for 26 clinically tested drugs. Two mechanism-based predictions of human tissue-to-plasma partitioning (P(tp)) from physicochemical input (method Vd1) were evaluated for their ability to describe human volume of distribution at steady state (V(ss)). This method was compared with a strategy that combined predicted and experimentally determined in vivo rat P(tp) data (method Vd2). Best V(ss) predictions were obtained using method Vd2, providing that rat P(tp) input was corrected for interspecies differences in plasma protein binding (84% within 2-fold). V(ss) predictions from physicochemical input alone were poor (32% within 2-fold). Total body clearance (CL) was predicted as the sum of scaled rat renal clearance and hepatic clearance projected from in vitro metabolism data. Best CL predictions were obtained by disregarding both blood and microsomal or hepatocyte binding (method CL2, 74% within 2-fold), whereas strong bias was seen using both blood and microsomal or hepatocyte binding (method CL1, 53% within 2-fold). The physiologically based pharmacokinetics (PBPK) model, which combined methods Vd2 and CL2 yielded the most accurate predictions of in vivo terminal half-life (69% within 2-fold). The Gastroplus advanced compartmental absorption and transit model was used to construct an absorption-disposition model and provided accurate predictions of area under the plasma concentration-time profile, oral apparent volume of distribution, and maximum plasma concentration after oral dosing, with 74%, 70%, and 65% within 2-fold, respectively. This evaluation demonstrates that PBPK models can lead to reasonable predictions of human pharmacokinetics.

  16. Thermodynamic heuristics with case-based reasoning: combined insights for RNA pseudoknot secondary structure.

    PubMed

    Al-Khatib, Ra'ed M; Rashid, Nur'Aini Abdul; Abdullah, Rosni

    2011-08-01

    The secondary structure of RNA pseudoknots has been extensively inferred and scrutinized by computational approaches. Experimental methods for determining RNA structure are time consuming and tedious; therefore, predictive computational approaches are required. Predicting the most accurate and energy-stable pseudoknot RNA secondary structure has been proven to be an NP-hard problem. In this paper, a new RNA folding approach, termed MSeeker, is presented; it includes KnotSeeker (a heuristic method) and Mfold (a thermodynamic algorithm). The global optimization of this thermodynamic heuristic approach was further enhanced by using a case-based reasoning technique as a local optimization method. MSeeker is a proposed algorithm for predicting RNA pseudoknot structure from individual sequences, especially long ones. This research demonstrates that MSeeker improves the sensitivity and specificity of existing RNA pseudoknot structure predictions. The performance and structural results from this proposed method were evaluated against seven other state-of-the-art pseudoknot prediction methods. The MSeeker method had better sensitivity than the DotKnot, FlexStem, HotKnots, pknotsRG, ILM, NUPACK and pknotsRE methods, with 79% of the predicted pseudoknot base-pairs being correct.

  17. Prediction of the total cycle 24 of solar activity by several autoregressive methods and by the precursor method

    NASA Astrophysics Data System (ADS)

    Ozheredov, V. A.; Breus, T. K.; Obridko, V. N.

    2012-12-01

    As follows from the statement of the Third Official Solar Cycle 24 Prediction Panel created by the National Aeronautics and Space Administration (NASA), the National Oceanic and Atmospheric Administration (NOAA), and the International Space Environment Service (ISES) based on the results of an analysis of many solar cycle 24 predictions, there has been no consensus on the amplitude and time of the maximum. There are two different scenarios: 90 units and August 2012 or 140 units and October 2011. The aim of our study is to revise the solar cycle 24 predictions by a comparative analysis of data obtained by three different methods: the singular spectral method, the nonlinear neural-based method, and the precursor method. As a precursor for solar cycle 24, we used the dynamics of the solar magnetic fields forming solar spots with Wolf numbers Rz. According to the prediction on the basis of the neural-based approach, it was established that the maximum of solar cycle 24 is expected to be 70. The precursor method predicted 50 units for the amplitude and April of 2012 for the time of the maximum. In view of the fact that the data used in the precursor method were averaged over 4.4 years, the amplitude of the maximum can be 20-30% larger (i.e., around 60-70 units), which is close to the values predicted by the neural-based method. The protracted minimum of solar cycle 23 and predicted low values of the maximum of solar cycle 24 are reminiscent of the historical Dalton minimum.

  18. A postprocessing method in the HMC framework for predicting gene function based on biological instrumental data

    NASA Astrophysics Data System (ADS)

    Feng, Shou; Fu, Ping; Zheng, Wenbin

    2018-03-01

    Predicting gene function based on biological instrumental data is a complicated and challenging hierarchical multi-label classification (HMC) problem. When using local approach methods to solve this problem, a preliminary results processing method is usually needed. This paper proposed a novel preliminary results processing method called the nodes interaction method. The nodes interaction method revises the preliminary results and guarantees that the predictions are consistent with the hierarchy constraint. This method exploits the label dependency and considers the hierarchical interaction between nodes when making decisions based on the Bayesian network in its first phase. In the second phase, this method further adjusts the results according to the hierarchy constraint. Implementing the nodes interaction method in the HMC framework also enhances the HMC performance for solving the gene function prediction problem based on the Gene Ontology (GO), the hierarchy of which is a directed acyclic graph that is more difficult to tackle. The experimental results validate the promising performance of the proposed method compared to state-of-the-art methods on eight benchmark yeast data sets annotated by the GO.

  19. Evaluating the predictive performance of empirical estimators of natural mortality rate using information on over 200 fish species

    USGS Publications Warehouse

    Then, Amy Y.; Hoenig, John M; Hall, Norman G.; Hewitt, David A.

    2015-01-01

    Many methods have been developed in the last 70 years to predict the natural mortality rate, M, of a stock based on empirical evidence from comparative life history studies. These indirect or empirical methods are used in most stock assessments to (i) obtain estimates of M in the absence of direct information, (ii) check on the reasonableness of a direct estimate of M, (iii) examine the range of plausible M estimates for the stock under consideration, and (iv) define prior distributions for Bayesian analyses. The two most cited empirical methods have appeared in the literature over 2500 times to date. Despite the importance of these methods, there is no consensus in the literature on how well these methods work in terms of prediction error or how their performance may be ranked. We evaluate estimators based on various combinations of maximum age (tmax), growth parameters, and water temperature by seeing how well they reproduce >200 independent, direct estimates of M. We use tenfold cross-validation to estimate the prediction error of the estimators and to rank their performance. With updated and carefully reviewed data, we conclude that a tmax-based estimator performs the best among all estimators evaluated. The tmax-based estimators in turn perform better than the Alverson–Carney method based on tmax and the von Bertalanffy K coefficient, Pauly’s method based on growth parameters and water temperature and methods based just on K. It is possible to combine two independent methods by computing a weighted mean but the improvement over the tmax-based methods is slight. Based on cross-validation prediction error, model residual patterns, model parsimony, and biological considerations, we recommend the use of a tmax-based estimator (M=4.899tmax−0.916">M=4.899t−0.916maxM=4.899tmax−0.916, prediction error = 0.32) when possible and a growth-based method (M=4.118K0.73L∞−0.33">M=4.118K0.73L−0.33∞M=4.118K0.73L∞−0.33 , prediction error = 0.6, length in cm) otherwise.

  20. Choosing the appropriate forecasting model for predictive parameter control.

    PubMed

    Aleti, Aldeida; Moser, Irene; Meedeniya, Indika; Grunske, Lars

    2014-01-01

    All commonly used stochastic optimisation algorithms have to be parameterised to perform effectively. Adaptive parameter control (APC) is an effective method used for this purpose. APC repeatedly adjusts parameter values during the optimisation process for optimal algorithm performance. The assignment of parameter values for a given iteration is based on previously measured performance. In recent research, time series prediction has been proposed as a method of projecting the probabilities to use for parameter value selection. In this work, we examine the suitability of a variety of prediction methods for the projection of future parameter performance based on previous data. All considered prediction methods have assumptions the time series data has to conform to for the prediction method to provide accurate projections. Looking specifically at parameters of evolutionary algorithms (EAs), we find that all standard EA parameters with the exception of population size conform largely to the assumptions made by the considered prediction methods. Evaluating the performance of these prediction methods, we find that linear regression provides the best results by a very small and statistically insignificant margin. Regardless of the prediction method, predictive parameter control outperforms state of the art parameter control methods when the performance data adheres to the assumptions made by the prediction method. When a parameter's performance data does not adhere to the assumptions made by the forecasting method, the use of prediction does not have a notable adverse impact on the algorithm's performance.

  1. Efficient depth intraprediction method for H.264/AVC-based three-dimensional video coding

    NASA Astrophysics Data System (ADS)

    Oh, Kwan-Jung; Oh, Byung Tae

    2015-04-01

    We present an intracoding method that is applicable to depth map coding in multiview plus depth systems. Our approach combines skip prediction and plane segmentation-based prediction. The proposed depth intraskip prediction uses the estimated direction at both the encoder and decoder, and does not need to encode residual data. Our plane segmentation-based intraprediction divides the current block into biregions, and applies a different prediction scheme for each segmented region. This method avoids incorrect estimations across different regions, resulting in higher prediction accuracy. Simulation results demonstrate that the proposed scheme is superior to H.264/advanced video coding intraprediction and has the ability to improve the subjective rendering quality.

  2. Prediction of Protein Structure by Template-Based Modeling Combined with the UNRES Force Field.

    PubMed

    Krupa, Paweł; Mozolewska, Magdalena A; Joo, Keehyoung; Lee, Jooyoung; Czaplewski, Cezary; Liwo, Adam

    2015-06-22

    A new approach to the prediction of protein structures that uses distance and backbone virtual-bond dihedral angle restraints derived from template-based models and simulations with the united residue (UNRES) force field is proposed. The approach combines the accuracy and reliability of template-based methods for the segments of the target sequence with high similarity to those having known structures with the ability of UNRES to pack the domains correctly. Multiplexed replica-exchange molecular dynamics with restraints derived from template-based models of a given target, in which each restraint is weighted according to the accuracy of the prediction of the corresponding section of the molecule, is used to search the conformational space, and the weighted histogram analysis method and cluster analysis are applied to determine the families of the most probable conformations, from which candidate predictions are selected. To test the capability of the method to recover template-based models from restraints, five single-domain proteins with structures that have been well-predicted by template-based methods were used; it was found that the resulting structures were of the same quality as the best of the original models. To assess whether the new approach can improve template-based predictions with incorrectly predicted domain packing, four such targets were selected from the CASP10 targets; for three of them the new approach resulted in significantly better predictions compared with the original template-based models. The new approach can be used to predict the structures of proteins for which good templates can be found for sections of the sequence or an overall good template can be found for the entire sequence but the prediction quality is remarkably weaker in putative domain-linker regions.

  3. Accuracy of different bioinformatics methods in detecting antibiotic resistance and virulence factors from Staphylococcus aureus whole genome sequences.

    PubMed

    Mason, Amy; Foster, Dona; Bradley, Phelim; Golubchik, Tanya; Doumith, Michel; Gordon, N Claire; Pichon, Bruno; Iqbal, Zamin; Staves, Peter; Crook, Derrick; Walker, A Sarah; Kearns, Angela; Peto, Tim

    2018-06-20

    Background : In principle, whole genome sequencing (WGS) can predict phenotypic resistance directly from genotype, replacing laboratory-based tests. However, the contribution of different bioinformatics methods to genotype-phenotype discrepancies has not been systematically explored to date. Methods : We compared three WGS-based bioinformatics methods (Genefinder (read-based), Mykrobe (de Bruijn graph-based) and Typewriter (BLAST-based)) for predicting presence/absence of 83 different resistance determinants and virulence genes, and overall antimicrobial susceptibility, in 1379 Staphylococcus aureus isolates previously characterised by standard laboratory methods (disc diffusion, broth and/or agar dilution and PCR). Results : 99.5% (113830/114457) of individual resistance-determinant/virulence gene predictions were identical between all three methods, with only 627 (0.5%) discordant predictions, demonstrating high overall agreement (Fliess-Kappa=0.98, p<0.0001). Discrepancies when identified were in only one of the three methods for all genes except the cassette recombinase, ccrC(b ). Genotypic antimicrobial susceptibility prediction matched laboratory phenotype in 98.3% (14224/14464) cases (2720 (18.8%) resistant, 11504 (79.5%) susceptible). There was greater disagreement between the laboratory phenotypes and the combined genotypic predictions (97 (0.7%) phenotypically-susceptible but all bioinformatic methods reported resistance; 89 (0.6%) phenotypically-resistant, but all bioinformatics methods reported susceptible) than within the three bioinformatics methods (54 (0.4%) cases, 16 phenotypically-resistant, 38 phenotypically-susceptible). However, in 36/54 (67%), the consensus genotype matched the laboratory phenotype. Conclusions : In this study, the choice between these three specific bioinformatic methods to identify resistance-determinants or other genes in S. aureus did not prove critical, with all demonstrating high concordance with each other and phenotypic/molecular methods. However, each has some limitations and therefore consensus methods provide some assurance. Copyright © 2018 American Society for Microbiology.

  4. Ligand Binding Site Detection by Local Structure Alignment and Its Performance Complementarity

    PubMed Central

    Lee, Hui Sun; Im, Wonpil

    2013-01-01

    Accurate determination of potential ligand binding sites (BS) is a key step for protein function characterization and structure-based drug design. Despite promising results of template-based BS prediction methods using global structure alignment (GSA), there is a room to improve the performance by properly incorporating local structure alignment (LSA) because BS are local structures and often similar for proteins with dissimilar global folds. We present a template-based ligand BS prediction method using G-LoSA, our LSA tool. A large benchmark set validation shows that G-LoSA predicts drug-like ligands’ positions in single-chain protein targets more precisely than TM-align, a GSA-based method, while the overall success rate of TM-align is better. G-LoSA is particularly efficient for accurate detection of local structures conserved across proteins with diverse global topologies. Recognizing the performance complementarity of G-LoSA to TM-align and a non-template geometry-based method, fpocket, a robust consensus scoring method, CMCS-BSP (Complementary Methods and Consensus Scoring for ligand Binding Site Prediction), is developed and shows improvement on prediction accuracy. The G-LoSA source code is freely available at http://im.bioinformatics.ku.edu/GLoSA. PMID:23957286

  5. Electroencephalogram-based decoding cognitive states using convolutional neural network and likelihood ratio based score fusion.

    PubMed

    Zafar, Raheel; Dass, Sarat C; Malik, Aamir Saeed

    2017-01-01

    Electroencephalogram (EEG)-based decoding human brain activity is challenging, owing to the low spatial resolution of EEG. However, EEG is an important technique, especially for brain-computer interface applications. In this study, a novel algorithm is proposed to decode brain activity associated with different types of images. In this hybrid algorithm, convolutional neural network is modified for the extraction of features, a t-test is used for the selection of significant features and likelihood ratio-based score fusion is used for the prediction of brain activity. The proposed algorithm takes input data from multichannel EEG time-series, which is also known as multivariate pattern analysis. Comprehensive analysis was conducted using data from 30 participants. The results from the proposed method are compared with current recognized feature extraction and classification/prediction techniques. The wavelet transform-support vector machine method is the most popular currently used feature extraction and prediction method. This method showed an accuracy of 65.7%. However, the proposed method predicts the novel data with improved accuracy of 79.9%. In conclusion, the proposed algorithm outperformed the current feature extraction and prediction method.

  6. Connecting clinical and actuarial prediction with rule-based methods.

    PubMed

    Fokkema, Marjolein; Smits, Niels; Kelderman, Henk; Penninx, Brenda W J H

    2015-06-01

    Meta-analyses comparing the accuracy of clinical versus actuarial prediction have shown actuarial methods to outperform clinical methods, on average. However, actuarial methods are still not widely used in clinical practice, and there has been a call for the development of actuarial prediction methods for clinical practice. We argue that rule-based methods may be more useful than the linear main effect models usually employed in prediction studies, from a data and decision analytic as well as a practical perspective. In addition, decision rules derived with rule-based methods can be represented as fast and frugal trees, which, unlike main effects models, can be used in a sequential fashion, reducing the number of cues that have to be evaluated before making a prediction. We illustrate the usability of rule-based methods by applying RuleFit, an algorithm for deriving decision rules for classification and regression problems, to a dataset on prediction of the course of depressive and anxiety disorders from Penninx et al. (2011). The RuleFit algorithm provided a model consisting of 2 simple decision rules, requiring evaluation of only 2 to 4 cues. Predictive accuracy of the 2-rule model was very similar to that of a logistic regression model incorporating 20 predictor variables, originally applied to the dataset. In addition, the 2-rule model required, on average, evaluation of only 3 cues. Therefore, the RuleFit algorithm appears to be a promising method for creating decision tools that are less time consuming and easier to apply in psychological practice, and with accuracy comparable to traditional actuarial methods. (c) 2015 APA, all rights reserved).

  7. The effect of using genealogy-based haplotypes for genomic prediction.

    PubMed

    Edriss, Vahid; Fernando, Rohan L; Su, Guosheng; Lund, Mogens S; Guldbrandtsen, Bernt

    2013-03-06

    Genomic prediction uses two sources of information: linkage disequilibrium between markers and quantitative trait loci, and additive genetic relationships between individuals. One way to increase the accuracy of genomic prediction is to capture more linkage disequilibrium by regression on haplotypes instead of regression on individual markers. The aim of this study was to investigate the accuracy of genomic prediction using haplotypes based on local genealogy information. A total of 4429 Danish Holstein bulls were genotyped with the 50K SNP chip. Haplotypes were constructed using local genealogical trees. Effects of haplotype covariates were estimated with two types of prediction models: (1) assuming that effects had the same distribution for all haplotype covariates, i.e. the GBLUP method and (2) assuming that a large proportion (π) of the haplotype covariates had zero effect, i.e. a Bayesian mixture method. About 7.5 times more covariate effects were estimated when fitting haplotypes based on local genealogical trees compared to fitting individuals markers. Genealogy-based haplotype clustering slightly increased the accuracy of genomic prediction and, in some cases, decreased the bias of prediction. With the Bayesian method, accuracy of prediction was less sensitive to parameter π when fitting haplotypes compared to fitting markers. Use of haplotypes based on genealogy can slightly increase the accuracy of genomic prediction. Improved methods to cluster the haplotypes constructed from local genealogy could lead to additional gains in accuracy.

  8. In silico platform for predicting and initiating β-turns in a protein at desired locations.

    PubMed

    Singh, Harinder; Singh, Sandeep; Raghava, Gajendra P S

    2015-05-01

    Numerous studies have been performed for analysis and prediction of β-turns in a protein. This study focuses on analyzing, predicting, and designing of β-turns to understand the preference of amino acids in β-turn formation. We analyzed around 20,000 PDB chains to understand the preference of residues or pair of residues at different positions in β-turns. Based on the results, a propensity-based method has been developed for predicting β-turns with an accuracy of 82%. We introduced a new approach entitled "Turn level prediction method," which predicts the complete β-turn rather than focusing on the residues in a β-turn. Finally, we developed BetaTPred3, a Random forest based method for predicting β-turns by utilizing various features of four residues present in β-turns. The BetaTPred3 achieved an accuracy of 79% with 0.51 MCC that is comparable or better than existing methods on BT426 dataset. Additionally, models were developed to predict β-turn types with better performance than other methods available in the literature. In order to improve the quality of prediction of turns, we developed prediction models on a large and latest dataset of 6376 nonredundant protein chains. Based on this study, a web server has been developed for prediction of β-turns and their types in proteins. This web server also predicts minimum number of mutations required to initiate or break a β-turn in a protein at specified location of a protein. © 2015 Wiley Periodicals, Inc.

  9. Similarity indices based on link weight assignment for link prediction of unweighted complex networks

    NASA Astrophysics Data System (ADS)

    Liu, Shuxin; Ji, Xinsheng; Liu, Caixia; Bai, Yi

    2017-01-01

    Many link prediction methods have been proposed for predicting the likelihood that a link exists between two nodes in complex networks. Among these methods, similarity indices are receiving close attention. Most similarity-based methods assume that the contribution of links with different topological structures is the same in the similarity calculations. This paper proposes a local weighted method, which weights the strength of connection between each pair of nodes. Based on the local weighted method, six local weighted similarity indices extended from unweighted similarity indices (including Common Neighbor (CN), Adamic-Adar (AA), Resource Allocation (RA), Salton, Jaccard and Local Path (LP) index) are proposed. Empirical study has shown that the local weighted method can significantly improve the prediction accuracy of these unweighted similarity indices and that in sparse and weakly clustered networks, the indices perform even better.

  10. Predicting volume of distribution with decision tree-based regression methods using predicted tissue:plasma partition coefficients.

    PubMed

    Freitas, Alex A; Limbu, Kriti; Ghafourian, Taravat

    2015-01-01

    Volume of distribution is an important pharmacokinetic property that indicates the extent of a drug's distribution in the body tissues. This paper addresses the problem of how to estimate the apparent volume of distribution at steady state (Vss) of chemical compounds in the human body using decision tree-based regression methods from the area of data mining (or machine learning). Hence, the pros and cons of several different types of decision tree-based regression methods have been discussed. The regression methods predict Vss using, as predictive features, both the compounds' molecular descriptors and the compounds' tissue:plasma partition coefficients (Kt:p) - often used in physiologically-based pharmacokinetics. Therefore, this work has assessed whether the data mining-based prediction of Vss can be made more accurate by using as input not only the compounds' molecular descriptors but also (a subset of) their predicted Kt:p values. Comparison of the models that used only molecular descriptors, in particular, the Bagging decision tree (mean fold error of 2.33), with those employing predicted Kt:p values in addition to the molecular descriptors, such as the Bagging decision tree using adipose Kt:p (mean fold error of 2.29), indicated that the use of predicted Kt:p values as descriptors may be beneficial for accurate prediction of Vss using decision trees if prior feature selection is applied. Decision tree based models presented in this work have an accuracy that is reasonable and similar to the accuracy of reported Vss inter-species extrapolations in the literature. The estimation of Vss for new compounds in drug discovery will benefit from methods that are able to integrate large and varied sources of data and flexible non-linear data mining methods such as decision trees, which can produce interpretable models. Graphical AbstractDecision trees for the prediction of tissue partition coefficient and volume of distribution of drugs.

  11. Improved method for predicting protein fold patterns with ensemble classifiers.

    PubMed

    Chen, W; Liu, X; Huang, Y; Jiang, Y; Zou, Q; Lin, C

    2012-01-27

    Protein folding is recognized as a critical problem in the field of biophysics in the 21st century. Predicting protein-folding patterns is challenging due to the complex structure of proteins. In an attempt to solve this problem, we employed ensemble classifiers to improve prediction accuracy. In our experiments, 188-dimensional features were extracted based on the composition and physical-chemical property of proteins and 20-dimensional features were selected using a coupled position-specific scoring matrix. Compared with traditional prediction methods, these methods were superior in terms of prediction accuracy. The 188-dimensional feature-based method achieved 71.2% accuracy in five cross-validations. The accuracy rose to 77% when we used a 20-dimensional feature vector. These methods were used on recent data, with 54.2% accuracy. Source codes and dataset, together with web server and software tools for prediction, are available at: http://datamining.xmu.edu.cn/main/~cwc/ProteinPredict.html.

  12. Patient Similarity in Prediction Models Based on Health Data: A Scoping Review

    PubMed Central

    Sharafoddini, Anis; Dubin, Joel A

    2017-01-01

    Background Physicians and health policy makers are required to make predictions during their decision making in various medical problems. Many advances have been made in predictive modeling toward outcome prediction, but these innovations target an average patient and are insufficiently adjustable for individual patients. One developing idea in this field is individualized predictive analytics based on patient similarity. The goal of this approach is to identify patients who are similar to an index patient and derive insights from the records of similar patients to provide personalized predictions.. Objective The aim is to summarize and review published studies describing computer-based approaches for predicting patients’ future health status based on health data and patient similarity, identify gaps, and provide a starting point for related future research. Methods The method involved (1) conducting the review by performing automated searches in Scopus, PubMed, and ISI Web of Science, selecting relevant studies by first screening titles and abstracts then analyzing full-texts, and (2) documenting by extracting publication details and information on context, predictors, missing data, modeling algorithm, outcome, and evaluation methods into a matrix table, synthesizing data, and reporting results. Results After duplicate removal, 1339 articles were screened in abstracts and titles and 67 were selected for full-text review. In total, 22 articles met the inclusion criteria. Within included articles, hospitals were the main source of data (n=10). Cardiovascular disease (n=7) and diabetes (n=4) were the dominant patient diseases. Most studies (n=18) used neighborhood-based approaches in devising prediction models. Two studies showed that patient similarity-based modeling outperformed population-based predictive methods. Conclusions Interest in patient similarity-based predictive modeling for diagnosis and prognosis has been growing. In addition to raw/coded health data, wavelet transform and term frequency-inverse document frequency methods were employed to extract predictors. Selecting predictors with potential to highlight special cases and defining new patient similarity metrics were among the gaps identified in the existing literature that provide starting points for future work. Patient status prediction models based on patient similarity and health data offer exciting potential for personalizing and ultimately improving health care, leading to better patient outcomes. PMID:28258046

  13. Predict the fatigue life of crack based on extended finite element method and SVR

    NASA Astrophysics Data System (ADS)

    Song, Weizhen; Jiang, Zhansi; Jiang, Hui

    2018-05-01

    Using extended finite element method (XFEM) and support vector regression (SVR) to predict the fatigue life of plate crack. Firstly, the XFEM is employed to calculate the stress intensity factors (SIFs) with given crack sizes. Then predicetion model can be built based on the function relationship of the SIFs with the fatigue life or crack length. Finally, according to the prediction model predict the SIFs at different crack sizes or different cycles. Because of the accuracy of the forward Euler method only ensured by the small step size, a new prediction method is presented to resolve the issue. The numerical examples were studied to demonstrate the proposed method allow a larger step size and have a high accuracy.

  14. Lessons learned in induced fit docking and metadynamics in the Drug Design Data Resource Grand Challenge 2

    NASA Astrophysics Data System (ADS)

    Baumgartner, Matthew P.; Evans, David A.

    2018-01-01

    Two of the major ongoing challenges in computational drug discovery are predicting the binding pose and affinity of a compound to a protein. The Drug Design Data Resource Grand Challenge 2 was developed to address these problems and to drive development of new methods. The challenge provided the 2D structures of compounds for which the organizers help blinded data in the form of 35 X-ray crystal structures and 102 binding affinity measurements and challenged participants to predict the binding pose and affinity of the compounds. We tested a number of pose prediction methods as part of the challenge; we found that docking methods that incorporate protein flexibility (Induced Fit Docking) outperformed methods that treated the protein as rigid. We also found that using binding pose metadynamics, a molecular dynamics based method, to score docked poses provided the best predictions of our methods with an average RMSD of 2.01 Å. We tested both structure-based (e.g. docking) and ligand-based methods (e.g. QSAR) in the affinity prediction portion of the competition. We found that our structure-based methods based on docking with Smina (Spearman ρ = 0.614), performed slightly better than our ligand-based methods (ρ = 0.543), and had equivalent performance with the other top methods in the competition. Despite the overall good performance of our methods in comparison to other participants in the challenge, there exists significant room for improvement especially in cases such as these where protein flexibility plays such a large role.

  15. Protein Sorting Prediction.

    PubMed

    Nielsen, Henrik

    2017-01-01

    Many computational methods are available for predicting protein sorting in bacteria. When comparing them, it is important to know that they can be grouped into three fundamentally different approaches: signal-based, global-property-based and homology-based prediction. In this chapter, the strengths and drawbacks of each of these approaches is described through many examples of methods that predict secretion, integration into membranes, or subcellular locations in general. The aim of this chapter is to provide a user-level introduction to the field with a minimum of computational theory.

  16. SeqRate: sequence-based protein folding type classification and rates prediction

    PubMed Central

    2010-01-01

    Background Protein folding rate is an important property of a protein. Predicting protein folding rate is useful for understanding protein folding process and guiding protein design. Most previous methods of predicting protein folding rate require the tertiary structure of a protein as an input. And most methods do not distinguish the different kinetic nature (two-state folding or multi-state folding) of the proteins. Here we developed a method, SeqRate, to predict both protein folding kinetic type (two-state versus multi-state) and real-value folding rate using sequence length, amino acid composition, contact order, contact number, and secondary structure information predicted from only protein sequence with support vector machines. Results We systematically studied the contributions of individual features to folding rate prediction. On a standard benchmark dataset, the accuracy of folding kinetic type classification is 80%. The Pearson correlation coefficient and the mean absolute difference between predicted and experimental folding rates (sec-1) in the base-10 logarithmic scale are 0.81 and 0.79 for two-state protein folders, and 0.80 and 0.68 for three-state protein folders. SeqRate is the first sequence-based method for protein folding type classification and its accuracy of fold rate prediction is improved over previous sequence-based methods. Its performance can be further enhanced with additional information, such as structure-based geometric contacts, as inputs. Conclusions Both the web server and software of predicting folding rate are publicly available at http://casp.rnet.missouri.edu/fold_rate/index.html. PMID:20438647

  17. PrePhyloPro: phylogenetic profile-based prediction of whole proteome linkages

    PubMed Central

    Niu, Yulong; Liu, Chengcheng; Moghimyfiroozabad, Shayan; Yang, Yi

    2017-01-01

    Direct and indirect functional links between proteins as well as their interactions as part of larger protein complexes or common signaling pathways may be predicted by analyzing the correlation of their evolutionary patterns. Based on phylogenetic profiling, here we present a highly scalable and time-efficient computational framework for predicting linkages within the whole human proteome. We have validated this method through analysis of 3,697 human pathways and molecular complexes and a comparison of our results with the prediction outcomes of previously published co-occurrency model-based and normalization methods. Here we also introduce PrePhyloPro, a web-based software that uses our method for accurately predicting proteome-wide linkages. We present data on interactions of human mitochondrial proteins, verifying the performance of this software. PrePhyloPro is freely available at http://prephylopro.org/phyloprofile/. PMID:28875072

  18. An auxiliary optimization method for complex public transit route network based on link prediction

    NASA Astrophysics Data System (ADS)

    Zhang, Lin; Lu, Jian; Yue, Xianfei; Zhou, Jialin; Li, Yunxuan; Wan, Qian

    2018-02-01

    Inspired by the missing (new) link prediction and the spurious existing link identification in link prediction theory, this paper establishes an auxiliary optimization method for public transit route network (PTRN) based on link prediction. First, link prediction applied to PTRN is described, and based on reviewing the previous studies, the summary indices set and its algorithms set are collected for the link prediction experiment. Second, through analyzing the topological properties of Jinan’s PTRN established by the Space R method, we found that this is a typical small-world network with a relatively large average clustering coefficient. This phenomenon indicates that the structural similarity-based link prediction will show a good performance in this network. Then, based on the link prediction experiment of the summary indices set, three indices with maximum accuracy are selected for auxiliary optimization of Jinan’s PTRN. Furthermore, these link prediction results show that the overall layout of Jinan’s PTRN is stable and orderly, except for a partial area that requires optimization and reconstruction. The above pattern conforms to the general pattern of the optimal development stage of PTRN in China. Finally, based on the missing (new) link prediction and the spurious existing link identification, we propose optimization schemes that can be used not only to optimize current PTRN but also to evaluate PTRN planning.

  19. A predictive estimation method for carbon dioxide transport by data-driven modeling with a physically-based data model

    NASA Astrophysics Data System (ADS)

    Jeong, Jina; Park, Eungyu; Han, Weon Shik; Kim, Kue-Young; Jun, Seong-Chun; Choung, Sungwook; Yun, Seong-Taek; Oh, Junho; Kim, Hyun-Jun

    2017-11-01

    In this study, a data-driven method for predicting CO2 leaks and associated concentrations from geological CO2 sequestration is developed. Several candidate models are compared based on their reproducibility and predictive capability for CO2 concentration measurements from the Environment Impact Evaluation Test (EIT) site in Korea. Based on the data mining results, a one-dimensional solution of the advective-dispersive equation for steady flow (i.e., Ogata-Banks solution) is found to be most representative for the test data, and this model is adopted as the data model for the developed method. In the validation step, the method is applied to estimate future CO2 concentrations with the reference estimation by the Ogata-Banks solution, where a part of earlier data is used as the training dataset. From the analysis, it is found that the ensemble mean of multiple estimations based on the developed method shows high prediction accuracy relative to the reference estimation. In addition, the majority of the data to be predicted are included in the proposed quantile interval, which suggests adequate representation of the uncertainty by the developed method. Therefore, the incorporation of a reasonable physically-based data model enhances the prediction capability of the data-driven model. The proposed method is not confined to estimations of CO2 concentration and may be applied to various real-time monitoring data from subsurface sites to develop automated control, management or decision-making systems.

  20. Post processing of protein-compound docking for fragment-based drug discovery (FBDD): in-silico structure-based drug screening and ligand-binding pose prediction.

    PubMed

    Fukunishi, Yoshifumi

    2010-01-01

    For fragment-based drug development, both hit (active) compound prediction and docking-pose (protein-ligand complex structure) prediction of the hit compound are important, since chemical modification (fragment linking, fragment evolution) subsequent to the hit discovery must be performed based on the protein-ligand complex structure. However, the naïve protein-compound docking calculation shows poor accuracy in terms of docking-pose prediction. Thus, post-processing of the protein-compound docking is necessary. Recently, several methods for the post-processing of protein-compound docking have been proposed. In FBDD, the compounds are smaller than those for conventional drug screening. This makes it difficult to perform the protein-compound docking calculation. A method to avoid this problem has been reported. Protein-ligand binding free energy estimation is useful to reduce the procedures involved in the chemical modification of the hit fragment. Several prediction methods have been proposed for high-accuracy estimation of protein-ligand binding free energy. This paper summarizes the various computational methods proposed for docking-pose prediction and their usefulness in FBDD.

  1. BEST: Improved Prediction of B-Cell Epitopes from Antigen Sequences

    PubMed Central

    Gao, Jianzhao; Faraggi, Eshel; Zhou, Yaoqi; Ruan, Jishou; Kurgan, Lukasz

    2012-01-01

    Accurate identification of immunogenic regions in a given antigen chain is a difficult and actively pursued problem. Although accurate predictors for T-cell epitopes are already in place, the prediction of the B-cell epitopes requires further research. We overview the available approaches for the prediction of B-cell epitopes and propose a novel and accurate sequence-based solution. Our BEST (B-cell Epitope prediction using Support vector machine Tool) method predicts epitopes from antigen sequences, in contrast to some method that predict only from short sequence fragments, using a new architecture based on averaging selected scores generated from sliding 20-mers by a Support Vector Machine (SVM). The SVM predictor utilizes a comprehensive and custom designed set of inputs generated by combining information derived from the chain, sequence conservation, similarity to known (training) epitopes, and predicted secondary structure and relative solvent accessibility. Empirical evaluation on benchmark datasets demonstrates that BEST outperforms several modern sequence-based B-cell epitope predictors including ABCPred, method by Chen et al. (2007), BCPred, COBEpro, BayesB, and CBTOPE, when considering the predictions from antigen chains and from the chain fragments. Our method obtains a cross-validated area under the receiver operating characteristic curve (AUC) for the fragment-based prediction at 0.81 and 0.85, depending on the dataset. The AUCs of BEST on the benchmark sets of full antigen chains equal 0.57 and 0.6, which is significantly and slightly better than the next best method we tested. We also present case studies to contrast the propensity profiles generated by BEST and several other methods. PMID:22761950

  2. On some methods for assessing earthquake predictions

    NASA Astrophysics Data System (ADS)

    Molchan, G.; Romashkova, L.; Peresan, A.

    2017-09-01

    A regional approach to the problem of assessing earthquake predictions inevitably faces a deficit of data. We point out some basic limits of assessment methods reported in the literature, considering the practical case of the performance of the CN pattern recognition method in the prediction of large Italian earthquakes. Along with the classical hypothesis testing, a new game approach, the so-called parimutuel gambling (PG) method, is examined. The PG, originally proposed for the evaluation of the probabilistic earthquake forecast, has been recently adapted for the case of 'alarm-based' CN prediction. The PG approach is a non-standard method; therefore it deserves careful examination and theoretical analysis. We show that the PG alarm-based version leads to an almost complete loss of information about predicted earthquakes (even for a large sample). As a result, any conclusions based on the alarm-based PG approach are not to be trusted. We also show that the original probabilistic PG approach does not necessarily identifies the genuine forecast correctly among competing seismicity rate models, even when applied to extensive data.

  3. Modeling of BN Lifetime Prediction of a System Based on Integrated Multi-Level Information

    PubMed Central

    Wang, Xiaohong; Wang, Lizhi

    2017-01-01

    Predicting system lifetime is important to ensure safe and reliable operation of products, which requires integrated modeling based on multi-level, multi-sensor information. However, lifetime characteristics of equipment in a system are different and failure mechanisms are inter-coupled, which leads to complex logical correlations and the lack of a uniform lifetime measure. Based on a Bayesian network (BN), a lifetime prediction method for systems that combine multi-level sensor information is proposed. The method considers the correlation between accidental failures and degradation failure mechanisms, and achieves system modeling and lifetime prediction under complex logic correlations. This method is applied in the lifetime prediction of a multi-level solar-powered unmanned system, and the predicted results can provide guidance for the improvement of system reliability and for the maintenance and protection of the system. PMID:28926930

  4. Modeling of BN Lifetime Prediction of a System Based on Integrated Multi-Level Information.

    PubMed

    Wang, Jingbin; Wang, Xiaohong; Wang, Lizhi

    2017-09-15

    Predicting system lifetime is important to ensure safe and reliable operation of products, which requires integrated modeling based on multi-level, multi-sensor information. However, lifetime characteristics of equipment in a system are different and failure mechanisms are inter-coupled, which leads to complex logical correlations and the lack of a uniform lifetime measure. Based on a Bayesian network (BN), a lifetime prediction method for systems that combine multi-level sensor information is proposed. The method considers the correlation between accidental failures and degradation failure mechanisms, and achieves system modeling and lifetime prediction under complex logic correlations. This method is applied in the lifetime prediction of a multi-level solar-powered unmanned system, and the predicted results can provide guidance for the improvement of system reliability and for the maintenance and protection of the system.

  5. Methods for estimating flood frequency in Montana based on data through water year 1998

    USGS Publications Warehouse

    Parrett, Charles; Johnson, Dave R.

    2004-01-01

    Annual peak discharges having recurrence intervals of 2, 5, 10, 25, 50, 100, 200, and 500 years (T-year floods) were determined for 660 gaged sites in Montana and in adjacent areas of Idaho, Wyoming, and Canada, based on data through water year 1998. The updated flood-frequency information was subsequently used in regression analyses, either ordinary or generalized least squares, to develop equations relating T-year floods to various basin and climatic characteristics, equations relating T-year floods to active-channel width, and equations relating T-year floods to bankfull width. The equations can be used to estimate flood frequency at ungaged sites. Montana was divided into eight regions, within which flood characteristics were considered to be reasonably homogeneous, and the three sets of regression equations were developed for each region. A measure of the overall reliability of the regression equations is the average standard error of prediction. The average standard errors of prediction for the equations based on basin and climatic characteristics ranged from 37.4 percent to 134.1 percent. Average standard errors of prediction for the equations based on active-channel width ranged from 57.2 percent to 141.3 percent. Average standard errors of prediction for the equations based on bankfull width ranged from 63.1 percent to 155.5 percent. In most regions, the equations based on basin and climatic characteristics generally had smaller average standard errors of prediction than equations based on active-channel or bankfull width. An exception was the Southeast Plains Region, where all equations based on active-channel width had smaller average standard errors of prediction than equations based on basin and climatic characteristics or bankfull width. Methods for weighting estimates derived from the basin- and climatic-characteristic equations and the channel-width equations also were developed. The weights were based on the cross correlation of residuals from the different methods and the average standard errors of prediction. When all three methods were combined, the average standard errors of prediction ranged from 37.4 percent to 120.2 percent. Weighting of estimates reduced the standard errors of prediction for all T-year flood estimates in four regions, reduced the standard errors of prediction for some T-year flood estimates in two regions, and provided no reduction in average standard error of prediction in two regions. A computer program for solving the regression equations, weighting estimates, and determining reliability of individual estimates was developed and placed on the USGS Montana District World Wide Web page. A new regression method, termed Region of Influence regression, also was tested. Test results indicated that the Region of Influence method was not as reliable as the regional equations based on generalized least squares regression. Two additional methods for estimating flood frequency at ungaged sites located on the same streams as gaged sites also are described. The first method, based on a drainage-area-ratio adjustment, is intended for use on streams where the ungaged site of interest is located near a gaged site. The second method, based on interpolation between gaged sites, is intended for use on streams that have two or more streamflow-gaging stations.

  6. SCPRED: accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences.

    PubMed

    Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke

    2008-05-01

    Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods.

  7. SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences

    PubMed Central

    Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke

    2008-01-01

    Background Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. Results SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. Conclusion The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods. PMID:18452616

  8. Nondestructive testing methods to predict effect of degradation on wood : a critical assessment

    Treesearch

    J. Kaiserlik

    1978-01-01

    Results are reported for an assessment of methods for predicting strength of wood, wood-based, or related material. Research directly applicable to nondestructive strength prediction was very limited. In wood, strength prediction research is limited to vibration decay, wave attenuation, and multiparameter "degradation models." Nonwood methods with potential...

  9. Systems and Methods for Automated Vessel Navigation Using Sea State Prediction

    NASA Technical Reports Server (NTRS)

    Huntsberger, Terrance L. (Inventor); Howard, Andrew B. (Inventor); Reinhart, Rene Felix (Inventor); Aghazarian, Hrand (Inventor); Rankin, Arturo (Inventor)

    2017-01-01

    Systems and methods for sea state prediction and autonomous navigation in accordance with embodiments of the invention are disclosed. One embodiment of the invention includes a method of predicting a future sea state including generating a sequence of at least two 3D images of a sea surface using at least two image sensors, detecting peaks and troughs in the 3D images using a processor, identifying at least one wavefront in each 3D image based upon the detected peaks and troughs using the processor, characterizing at least one propagating wave based upon the propagation of wavefronts detected in the sequence of 3D images using the processor, and predicting a future sea state using at least one propagating wave characterizing the propagation of wavefronts in the sequence of 3D images using the processor. Another embodiment includes a method of autonomous vessel navigation based upon a predicted sea state and target location.

  10. Systems and Methods for Automated Vessel Navigation Using Sea State Prediction

    NASA Technical Reports Server (NTRS)

    Aghazarian, Hrand (Inventor); Reinhart, Rene Felix (Inventor); Huntsberger, Terrance L. (Inventor); Rankin, Arturo (Inventor); Howard, Andrew B. (Inventor)

    2015-01-01

    Systems and methods for sea state prediction and autonomous navigation in accordance with embodiments of the invention are disclosed. One embodiment of the invention includes a method of predicting a future sea state including generating a sequence of at least two 3D images of a sea surface using at least two image sensors, detecting peaks and troughs in the 3D images using a processor, identifying at least one wavefront in each 3D image based upon the detected peaks and troughs using the processor, characterizing at least one propagating wave based upon the propagation of wavefronts detected in the sequence of 3D images using the processor, and predicting a future sea state using at least one propagating wave characterizing the propagation of wavefronts in the sequence of 3D images using the processor. Another embodiment includes a method of autonomous vessel navigation based upon a predicted sea state and target location.

  11. Electroencephalogram-based decoding cognitive states using convolutional neural network and likelihood ratio based score fusion

    PubMed Central

    2017-01-01

    Electroencephalogram (EEG)-based decoding human brain activity is challenging, owing to the low spatial resolution of EEG. However, EEG is an important technique, especially for brain–computer interface applications. In this study, a novel algorithm is proposed to decode brain activity associated with different types of images. In this hybrid algorithm, convolutional neural network is modified for the extraction of features, a t-test is used for the selection of significant features and likelihood ratio-based score fusion is used for the prediction of brain activity. The proposed algorithm takes input data from multichannel EEG time-series, which is also known as multivariate pattern analysis. Comprehensive analysis was conducted using data from 30 participants. The results from the proposed method are compared with current recognized feature extraction and classification/prediction techniques. The wavelet transform-support vector machine method is the most popular currently used feature extraction and prediction method. This method showed an accuracy of 65.7%. However, the proposed method predicts the novel data with improved accuracy of 79.9%. In conclusion, the proposed algorithm outperformed the current feature extraction and prediction method. PMID:28558002

  12. AptRank: an adaptive PageRank model for protein function prediction on   bi-relational graphs.

    PubMed

    Jiang, Biaobin; Kloster, Kyle; Gleich, David F; Gribskov, Michael

    2017-06-15

    Diffusion-based network models are widely used for protein function prediction using protein network data and have been shown to outperform neighborhood-based and module-based methods. Recent studies have shown that integrating the hierarchical structure of the Gene Ontology (GO) data dramatically improves prediction accuracy. However, previous methods usually either used the GO hierarchy to refine the prediction results of multiple classifiers, or flattened the hierarchy into a function-function similarity kernel. No study has taken the GO hierarchy into account together with the protein network as a two-layer network model. We first construct a Bi-relational graph (Birg) model comprised of both protein-protein association and function-function hierarchical networks. We then propose two diffusion-based methods, BirgRank and AptRank, both of which use PageRank to diffuse information on this two-layer graph model. BirgRank is a direct application of traditional PageRank with fixed decay parameters. In contrast, AptRank utilizes an adaptive diffusion mechanism to improve the performance of BirgRank. We evaluate the ability of both methods to predict protein function on yeast, fly and human protein datasets, and compare with four previous methods: GeneMANIA, TMC, ProteinRank and clusDCA. We design four different validation strategies: missing function prediction, de novo function prediction, guided function prediction and newly discovered function prediction to comprehensively evaluate predictability of all six methods. We find that both BirgRank and AptRank outperform the previous methods, especially in missing function prediction when using only 10% of the data for training. The MATLAB code is available at https://github.rcac.purdue.edu/mgribsko/aptrank . gribskov@purdue.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  13. Prediction of beta-turns at over 80% accuracy based on an ensemble of predicted secondary structures and multiple alignments.

    PubMed

    Zheng, Ce; Kurgan, Lukasz

    2008-10-10

    beta-turn is a secondary protein structure type that plays significant role in protein folding, stability, and molecular recognition. To date, several methods for prediction of beta-turns from protein sequences were developed, but they are characterized by relatively poor prediction quality. The novelty of the proposed sequence-based beta-turn predictor stems from the usage of a window based information extracted from four predicted three-state secondary structures, which together with a selected set of position specific scoring matrix (PSSM) values serve as an input to the support vector machine (SVM) predictor. We show that (1) all four predicted secondary structures are useful; (2) the most useful information extracted from the predicted secondary structure includes the structure of the predicted residue, secondary structure content in a window around the predicted residue, and features that indicate whether the predicted residue is inside a secondary structure segment; (3) the PSSM values of Asn, Asp, Gly, Ile, Leu, Met, Pro, and Val were among the top ranked features, which corroborates with recent studies. The Asn, Asp, Gly, and Pro indicate potential beta-turns, while the remaining four amino acids are useful to predict non-beta-turns. Empirical evaluation using three nonredundant datasets shows favorable Q total, Q predicted and MCC values when compared with over a dozen of modern competing methods. Our method is the first to break the 80% Q total barrier and achieves Q total = 80.9%, MCC = 0.47, and Q predicted higher by over 6% when compared with the second best method. We use feature selection to reduce the dimensionality of the feature vector used as the input for the proposed prediction method. The applied feature set is smaller by 86, 62 and 37% when compared with the second and two third-best (with respect to MCC) competing methods, respectively. Experiments show that the proposed method constitutes an improvement over the competing prediction methods. The proposed prediction model can better discriminate between beta-turns and non-beta-turns due to obtaining lower numbers of false positive predictions. The prediction model and datasets are freely available at http://biomine.ece.ualberta.ca/BTNpred/BTNpred.html.

  14. A Hybrid RANS/LES Approach for Predicting Jet Noise

    NASA Technical Reports Server (NTRS)

    Goldstein, Marvin E.

    2006-01-01

    Hybrid acoustic prediction methods have an important advantage over the current Reynolds averaged Navier-Stokes (RANS) based methods in that they only involve modeling of the relatively universal subscale motion and not the configuration dependent larger scale turbulence. Unfortunately, they are unable to account for the high frequency sound generated by the turbulence in the initial mixing layers. This paper introduces an alternative approach that directly calculates the sound from a hybrid RANS/LES flow model (which can resolve the steep gradients in the initial mixing layers near the nozzle lip) and adopts modeling techniques similar to those used in current RANS based noise prediction methods to determine the unknown sources in the equations for the remaining unresolved components of the sound field. The resulting prediction method would then be intermediate between the current noise prediction codes and previously proposed hybrid noise prediction methods.

  15. Towards an Airframe Noise Prediction Methodology: Survey of Current Approaches

    NASA Technical Reports Server (NTRS)

    Farassat, Fereidoun; Casper, Jay H.

    2006-01-01

    In this paper, we present a critical survey of the current airframe noise (AFN) prediction methodologies. Four methodologies are recognized. These are the fully analytic method, CFD combined with the acoustic analogy, the semi-empirical method and fully numerical method. It is argued that for the immediate need of the aircraft industry, the semi-empirical method based on recent high quality acoustic database is the best available method. The method based on CFD and the Ffowcs William- Hawkings (FW-H) equation with penetrable data surface (FW-Hpds ) has advanced considerably and much experience has been gained in its use. However, more research is needed in the near future particularly in the area of turbulence simulation. The fully numerical method will take longer to reach maturity. Based on the current trends, it is predicted that this method will eventually develop into the method of choice. Both the turbulence simulation and propagation methods need to develop more for this method to become useful. Nonetheless, the authors propose that the method based on a combination of numerical and analytical techniques, e.g., CFD combined with FW-H equation, should also be worked on. In this effort, the current symbolic algebra software will allow more analytical approaches to be incorporated into AFN prediction methods.

  16. Machine learning-based methods for prediction of linear B-cell epitopes.

    PubMed

    Wang, Hsin-Wei; Pai, Tun-Wen

    2014-01-01

    B-cell epitope prediction facilitates immunologists in designing peptide-based vaccine, diagnostic test, disease prevention, treatment, and antibody production. In comparison with T-cell epitope prediction, the performance of variable length B-cell epitope prediction is still yet to be satisfied. Fortunately, due to increasingly available verified epitope databases, bioinformaticians could adopt machine learning-based algorithms on all curated data to design an improved prediction tool for biomedical researchers. Here, we have reviewed related epitope prediction papers, especially those for linear B-cell epitope prediction. It should be noticed that a combination of selected propensity scales and statistics of epitope residues with machine learning-based tools formulated a general way for constructing linear B-cell epitope prediction systems. It is also observed from most of the comparison results that the kernel method of support vector machine (SVM) classifier outperformed other machine learning-based approaches. Hence, in this chapter, except reviewing recently published papers, we have introduced the fundamentals of B-cell epitope and SVM techniques. In addition, an example of linear B-cell prediction system based on physicochemical features and amino acid combinations is illustrated in details.

  17. Regression trees for predicting mortality in patients with cardiovascular disease: What improvement is achieved by using ensemble-based methods?

    PubMed Central

    Austin, Peter C; Lee, Douglas S; Steyerberg, Ewout W; Tu, Jack V

    2012-01-01

    In biomedical research, the logistic regression model is the most commonly used method for predicting the probability of a binary outcome. While many clinical researchers have expressed an enthusiasm for regression trees, this method may have limited accuracy for predicting health outcomes. We aimed to evaluate the improvement that is achieved by using ensemble-based methods, including bootstrap aggregation (bagging) of regression trees, random forests, and boosted regression trees. We analyzed 30-day mortality in two large cohorts of patients hospitalized with either acute myocardial infarction (N = 16,230) or congestive heart failure (N = 15,848) in two distinct eras (1999–2001 and 2004–2005). We found that both the in-sample and out-of-sample prediction of ensemble methods offered substantial improvement in predicting cardiovascular mortality compared to conventional regression trees. However, conventional logistic regression models that incorporated restricted cubic smoothing splines had even better performance. We conclude that ensemble methods from the data mining and machine learning literature increase the predictive performance of regression trees, but may not lead to clear advantages over conventional logistic regression models for predicting short-term mortality in population-based samples of subjects with cardiovascular disease. PMID:22777999

  18. An improved predictive functional control method with application to PMSM systems

    NASA Astrophysics Data System (ADS)

    Li, Shihua; Liu, Huixian; Fu, Wenshu

    2017-01-01

    In common design of prediction model-based control method, usually disturbances are not considered in the prediction model as well as the control design. For the control systems with large amplitude or strong disturbances, it is difficult to precisely predict the future outputs according to the conventional prediction model, and thus the desired optimal closed-loop performance will be degraded to some extent. To this end, an improved predictive functional control (PFC) method is developed in this paper by embedding disturbance information into the system model. Here, a composite prediction model is thus obtained by embedding the estimated value of disturbances, where disturbance observer (DOB) is employed to estimate the lumped disturbances. So the influence of disturbances on system is taken into account in optimisation procedure. Finally, considering the speed control problem for permanent magnet synchronous motor (PMSM) servo system, a control scheme based on the improved PFC method is designed to ensure an optimal closed-loop performance even in the presence of disturbances. Simulation and experimental results based on a hardware platform are provided to confirm the effectiveness of the proposed algorithm.

  19. Grain growth prediction based on data assimilation by implementing 4DVar on multi-phase-field model

    NASA Astrophysics Data System (ADS)

    Ito, Shin-ichi; Nagao, Hiromichi; Kasuya, Tadashi; Inoue, Junya

    2017-12-01

    We propose a method to predict grain growth based on data assimilation by using a four-dimensional variational method (4DVar). When implemented on a multi-phase-field model, the proposed method allows us to calculate the predicted grain structures and uncertainties in them that depend on the quality and quantity of the observational data. We confirm through numerical tests involving synthetic data that the proposed method correctly reproduces the true phase-field assumed in advance. Furthermore, it successfully quantifies uncertainties in the predicted grain structures, where such uncertainty quantifications provide valuable information to optimize the experimental design.

  20. Predicting infant cortical surface development using a 4D varifold-based learning framework and local topography-based shape morphing.

    PubMed

    Rekik, Islem; Li, Gang; Lin, Weili; Shen, Dinggang

    2016-02-01

    Longitudinal neuroimaging analysis methods have remarkably advanced our understanding of early postnatal brain development. However, learning predictive models to trace forth the evolution trajectories of both normal and abnormal cortical shapes remains broadly absent. To fill this critical gap, we pioneered the first prediction model for longitudinal developing cortical surfaces in infants using a spatiotemporal current-based learning framework solely from the baseline cortical surface. In this paper, we detail this prediction model and even further improve its performance by introducing two key variants. First, we use the varifold metric to overcome the limitations of the current metric for surface registration that was used in our preliminary study. We also extend the conventional varifold-based surface registration model for pairwise registration to a spatiotemporal surface regression model. Second, we propose a morphing process of the baseline surface using its topographic attributes such as normal direction and principal curvature sign. Specifically, our method learns from longitudinal data both the geometric (vertices positions) and dynamic (temporal evolution trajectories) features of the infant cortical surface, comprising a training stage and a prediction stage. In the training stage, we use the proposed varifold-based shape regression model to estimate geodesic cortical shape evolution trajectories for each training subject. We then build an empirical mean spatiotemporal surface atlas. In the prediction stage, given an infant, we select the best learnt features from training subjects to simultaneously predict the cortical surface shapes at all later timepoints, based on similarity metrics between this baseline surface and the learnt baseline population average surface atlas. We used a leave-one-out cross validation method to predict the inner cortical surface shape at 3, 6, 9 and 12 months of age from the baseline cortical surface shape at birth. Our method attained a higher prediction accuracy and better captured the spatiotemporal dynamic change of the highly folded cortical surface than the previous proposed prediction method. Copyright © 2015 Elsevier B.V. All rights reserved.

  1. Is social projection based on simulation or theory? Why new methods are needed for differentiating

    PubMed Central

    Bazinger, Claudia; Kühberger, Anton

    2012-01-01

    The literature on social cognition reports many instances of a phenomenon titled ‘social projection’ or ‘egocentric bias’. These terms indicate egocentric predictions, i.e., an over-reliance on the self when predicting the cognition, emotion, or behavior of other people. The classic method to diagnose egocentric prediction is to establish high correlations between our own and other people's cognition, emotion, or behavior. We argue that this method is incorrect because there is a different way to come to a correlation between own and predicted states, namely, through the use of theoretical knowledge. Thus, the use of correlational measures is not sufficient to identify the source of social predictions. Based on the distinction between simulation theory and theory theory, we propose the following alternative methods for inferring prediction strategies: independent vs. juxtaposed predictions, the use of ‘hot’ mental processes, and the use of participants’ self-reports. PMID:23209342

  2. Assessment of quantitative structure-activity relationship of toxicity prediction models for Korean chemical substance control legislation

    PubMed Central

    Kim, Kwang-Yon; Shin, Seong Eun; No, Kyoung Tai

    2015-01-01

    Objectives For successful adoption of legislation controlling registration and assessment of chemical substances, it is important to obtain sufficient toxicological experimental evidence and other related information. It is also essential to obtain a sufficient number of predicted risk and toxicity results. Particularly, methods used in predicting toxicities of chemical substances during acquisition of required data, ultimately become an economic method for future dealings with new substances. Although the need for such methods is gradually increasing, the-required information about reliability and applicability range has not been systematically provided. Methods There are various representative environmental and human toxicity models based on quantitative structure-activity relationships (QSAR). Here, we secured the 10 representative QSAR-based prediction models and its information that can make predictions about substances that are expected to be regulated. We used models that predict and confirm usability of the information expected to be collected and submitted according to the legislation. After collecting and evaluating each predictive model and relevant data, we prepared methods quantifying the scientific validity and reliability, which are essential conditions for using predictive models. Results We calculated predicted values for the models. Furthermore, we deduced and compared adequacies of the models using the Alternative non-testing method assessed for Registration, Evaluation, Authorization, and Restriction of Chemicals Substances scoring system, and deduced the applicability domains for each model. Additionally, we calculated and compared inclusion rates of substances expected to be regulated, to confirm the applicability. Conclusions We evaluated and compared the data, adequacy, and applicability of our selected QSAR-based toxicity prediction models, and included them in a database. Based on this data, we aimed to construct a system that can be used with predicted toxicity results. Furthermore, by presenting the suitability of individual predicted results, we aimed to provide a foundation that could be used in actual assessments and regulations. PMID:26206368

  3. Practical quantum mechanics-based fragment methods for predicting molecular crystal properties.

    PubMed

    Wen, Shuhao; Nanda, Kaushik; Huang, Yuanhang; Beran, Gregory J O

    2012-06-07

    Significant advances in fragment-based electronic structure methods have created a real alternative to force-field and density functional techniques in condensed-phase problems such as molecular crystals. This perspective article highlights some of the important challenges in modeling molecular crystals and discusses techniques for addressing them. First, we survey recent developments in fragment-based methods for molecular crystals. Second, we use examples from our own recent research on a fragment-based QM/MM method, the hybrid many-body interaction (HMBI) model, to analyze the physical requirements for a practical and effective molecular crystal model chemistry. We demonstrate that it is possible to predict molecular crystal lattice energies to within a couple kJ mol(-1) and lattice parameters to within a few percent in small-molecule crystals. Fragment methods provide a systematically improvable approach to making predictions in the condensed phase, which is critical to making robust predictions regarding the subtle energy differences found in molecular crystals.

  4. Relative Packing Groups in Template-Based Structure Prediction: Cooperative Effects of True Positive Constraints

    PubMed Central

    Day, Ryan; Qu, Xiaotao; Swanson, Rosemarie; Bohannan, Zach; Bliss, Robert

    2011-01-01

    Abstract Most current template-based structure prediction methods concentrate on finding the correct backbone conformation and then packing sidechains within that backbone. Our packing-based method derives distance constraints from conserved relative packing groups (RPGs). In our refinement approach, the RPGs provide a level of resolution that restrains global topology while allowing conformational sampling. In this study, we test our template-based structure prediction method using 51 prediction units from CASP7 experiments. RPG-based constraints are able to substantially improve approximately two-thirds of starting templates. Upon deeper investigation, we find that true positive spatial constraints, especially those non-local in sequence, derived from the RPGs were important to building nearer native models. Surprisingly, the fraction of incorrect or false positive constraints does not strongly influence the quality of the final candidate. This result indicates that our RPG-based true positive constraints sample the self-consistent, cooperative interactions of the native structure. The lack of such reinforcing cooperativity explains the weaker effect of false positive constraints. Generally, these findings are encouraging indications that RPGs will improve template-based structure prediction. PMID:21210729

  5. Improving the spectral measurement accuracy based on temperature distribution and spectra-temperature relationship

    NASA Astrophysics Data System (ADS)

    Li, Zhe; Feng, Jinchao; Liu, Pengyu; Sun, Zhonghua; Li, Gang; Jia, Kebin

    2018-05-01

    Temperature is usually considered as a fluctuation in near-infrared spectral measurement. Chemometric methods were extensively studied to correct the effect of temperature variations. However, temperature can be considered as a constructive parameter that provides detailed chemical information when systematically changed during the measurement. Our group has researched the relationship between temperature-induced spectral variation (TSVC) and normalized squared temperature. In this study, we focused on the influence of temperature distribution in calibration set. Multi-temperature calibration set selection (MTCS) method was proposed to improve the prediction accuracy by considering the temperature distribution of calibration samples. Furthermore, double-temperature calibration set selection (DTCS) method was proposed based on MTCS method and the relationship between TSVC and normalized squared temperature. We compare the prediction performance of PLS models based on random sampling method and proposed methods. The results from experimental studies showed that the prediction performance was improved by using proposed methods. Therefore, MTCS method and DTCS method will be the alternative methods to improve prediction accuracy in near-infrared spectral measurement.

  6. Deep learning architecture for air quality predictions.

    PubMed

    Li, Xiang; Peng, Ling; Hu, Yuan; Shao, Jing; Chi, Tianhe

    2016-11-01

    With the rapid development of urbanization and industrialization, many developing countries are suffering from heavy air pollution. Governments and citizens have expressed increasing concern regarding air pollution because it affects human health and sustainable development worldwide. Current air quality prediction methods mainly use shallow models; however, these methods produce unsatisfactory results, which inspired us to investigate methods of predicting air quality based on deep architecture models. In this paper, a novel spatiotemporal deep learning (STDL)-based air quality prediction method that inherently considers spatial and temporal correlations is proposed. A stacked autoencoder (SAE) model is used to extract inherent air quality features, and it is trained in a greedy layer-wise manner. Compared with traditional time series prediction models, our model can predict the air quality of all stations simultaneously and shows the temporal stability in all seasons. Moreover, a comparison with the spatiotemporal artificial neural network (STANN), auto regression moving average (ARMA), and support vector regression (SVR) models demonstrates that the proposed method of performing air quality predictions has a superior performance.

  7. Prediction of protein-protein interaction network using a multi-objective optimization approach.

    PubMed

    Chowdhury, Archana; Rakshit, Pratyusha; Konar, Amit

    2016-06-01

    Protein-Protein Interactions (PPIs) are very important as they coordinate almost all cellular processes. This paper attempts to formulate PPI prediction problem in a multi-objective optimization framework. The scoring functions for the trial solution deal with simultaneous maximization of functional similarity, strength of the domain interaction profiles, and the number of common neighbors of the proteins predicted to be interacting. The above optimization problem is solved using the proposed Firefly Algorithm with Nondominated Sorting. Experiments undertaken reveal that the proposed PPI prediction technique outperforms existing methods, including gene ontology-based Relative Specific Similarity, multi-domain-based Domain Cohesion Coupling method, domain-based Random Decision Forest method, Bagging with REP Tree, and evolutionary/swarm algorithm-based approaches, with respect to sensitivity, specificity, and F1 score.

  8. Application of two direct runoff prediction methods in Puerto Rico

    USGS Publications Warehouse

    Sepulveda, N.

    1997-01-01

    Two methods for predicting direct runoff from rainfall data were applied to several basins and the resulting hydrographs compared to measured values. The first method uses a geomorphology-based unit hydrograph to predict direct runoff through its convolution with the excess rainfall hyetograph. The second method shows how the resulting hydraulic routing flow equation from a kinematic wave approximation is solved using a spectral method based on the matrix representation of the spatial derivative with Chebyshev collocation and a fourth-order Runge-Kutta time discretization scheme. The calibrated Green-Ampt (GA) infiltration parameters are obtained by minimizing the sum, over several rainfall events, of absolute differences between the total excess rainfall volume computed from the GA equations and the total direct runoff volume computed from a hydrograph separation technique. The improvement made in predicting direct runoff using a geomorphology-based unit hydrograph with the ephemeral and perennial stream network instead of the strictly perennial stream network is negligible. The hydraulic routing scheme presented here is highly accurate in predicting the magnitude and time of the hydrograph peak although the much faster unit hydrograph method also yields reasonable results.

  9. Comparative Study of Different Methods for the Prediction of Drug-Polymer Solubility.

    PubMed

    Knopp, Matthias Manne; Tajber, Lidia; Tian, Yiwei; Olesen, Niels Erik; Jones, David S; Kozyra, Agnieszka; Löbmann, Korbinian; Paluch, Krzysztof; Brennan, Claire Marie; Holm, René; Healy, Anne Marie; Andrews, Gavin P; Rades, Thomas

    2015-09-08

    In this study, a comparison of different methods to predict drug-polymer solubility was carried out on binary systems consisting of five model drugs (paracetamol, chloramphenicol, celecoxib, indomethacin, and felodipine) and polyvinylpyrrolidone/vinyl acetate copolymers (PVP/VA) of different monomer weight ratios. The drug-polymer solubility at 25 °C was predicted using the Flory-Huggins model, from data obtained at elevated temperature using thermal analysis methods based on the recrystallization of a supersaturated amorphous solid dispersion and two variations of the melting point depression method. These predictions were compared with the solubility in the low molecular weight liquid analogues of the PVP/VA copolymer (N-vinylpyrrolidone and vinyl acetate). The predicted solubilities at 25 °C varied considerably depending on the method used. However, the three thermal analysis methods ranked the predicted solubilities in the same order, except for the felodipine-PVP system. Furthermore, the magnitude of the predicted solubilities from the recrystallization method and melting point depression method correlated well with the estimates based on the solubility in the liquid analogues, which suggests that this method can be used as an initial screening tool if a liquid analogue is available. The learnings of this important comparative study provided general guidance for the selection of the most suitable method(s) for the screening of drug-polymer solubility.

  10. An accelerated non-Gaussianity based multichannel predictive deconvolution method with the limited supporting region of filters

    NASA Astrophysics Data System (ADS)

    Li, Zhong-xiao; Li, Zhen-chun

    2016-09-01

    The multichannel predictive deconvolution can be conducted in overlapping temporal and spatial data windows to solve the 2D predictive filter for multiple removal. Generally, the 2D predictive filter can better remove multiples at the cost of more computation time compared with the 1D predictive filter. In this paper we first use the cross-correlation strategy to determine the limited supporting region of filters where the coefficients play a major role for multiple removal in the filter coefficient space. To solve the 2D predictive filter the traditional multichannel predictive deconvolution uses the least squares (LS) algorithm, which requires primaries and multiples are orthogonal. To relax the orthogonality assumption the iterative reweighted least squares (IRLS) algorithm and the fast iterative shrinkage thresholding (FIST) algorithm have been used to solve the 2D predictive filter in the multichannel predictive deconvolution with the non-Gaussian maximization (L1 norm minimization) constraint of primaries. The FIST algorithm has been demonstrated as a faster alternative to the IRLS algorithm. In this paper we introduce the FIST algorithm to solve the filter coefficients in the limited supporting region of filters. Compared with the FIST based multichannel predictive deconvolution without the limited supporting region of filters the proposed method can reduce the computation burden effectively while achieving a similar accuracy. Additionally, the proposed method can better balance multiple removal and primary preservation than the traditional LS based multichannel predictive deconvolution and FIST based single channel predictive deconvolution. Synthetic and field data sets demonstrate the effectiveness of the proposed method.

  11. Prediction of ground effects on aircraft noise

    NASA Technical Reports Server (NTRS)

    Pao, S. P.; Wenzel, A. R.; Oncley, P. B.

    1978-01-01

    A unified method is recommended for predicting ground effects on noise. This method may be used in flyover noise predictions and in correcting static test-stand data to free-field conditions. The recommendation is based on a review of recent progress in the theory of ground effects and of the experimental evidence which supports this theory. It is shown that a surface wave must be included sometimes in the prediction method. Prediction equations are collected conveniently in a single section of the paper. Methods of measuring ground impedance and the resulting ground-impedance data are also reviewed because the recommended method is based on a locally reactive impedance boundary model. Current practice of estimating ground effects are reviewed and consideration is given to practical problems in applying the recommended method. These problems include finite frequency-band filters, finite source dimension, wind and temperature gradients, and signal incoherence.

  12. A predictive estimation method for carbon dioxide transport by data-driven modeling with a physically-based data model.

    PubMed

    Jeong, Jina; Park, Eungyu; Han, Weon Shik; Kim, Kue-Young; Jun, Seong-Chun; Choung, Sungwook; Yun, Seong-Taek; Oh, Junho; Kim, Hyun-Jun

    2017-11-01

    In this study, a data-driven method for predicting CO 2 leaks and associated concentrations from geological CO 2 sequestration is developed. Several candidate models are compared based on their reproducibility and predictive capability for CO 2 concentration measurements from the Environment Impact Evaluation Test (EIT) site in Korea. Based on the data mining results, a one-dimensional solution of the advective-dispersive equation for steady flow (i.e., Ogata-Banks solution) is found to be most representative for the test data, and this model is adopted as the data model for the developed method. In the validation step, the method is applied to estimate future CO 2 concentrations with the reference estimation by the Ogata-Banks solution, where a part of earlier data is used as the training dataset. From the analysis, it is found that the ensemble mean of multiple estimations based on the developed method shows high prediction accuracy relative to the reference estimation. In addition, the majority of the data to be predicted are included in the proposed quantile interval, which suggests adequate representation of the uncertainty by the developed method. Therefore, the incorporation of a reasonable physically-based data model enhances the prediction capability of the data-driven model. The proposed method is not confined to estimations of CO 2 concentration and may be applied to various real-time monitoring data from subsurface sites to develop automated control, management or decision-making systems. Copyright © 2017 Elsevier B.V. All rights reserved.

  13. Advanced Methods for Determining Prediction Uncertainty in Model-Based Prognostics with Application to Planetary Rovers

    NASA Technical Reports Server (NTRS)

    Daigle, Matthew J.; Sankararaman, Shankar

    2013-01-01

    Prognostics is centered on predicting the time of and time until adverse events in components, subsystems, and systems. It typically involves both a state estimation phase, in which the current health state of a system is identified, and a prediction phase, in which the state is projected forward in time. Since prognostics is mainly a prediction problem, prognostic approaches cannot avoid uncertainty, which arises due to several sources. Prognostics algorithms must both characterize this uncertainty and incorporate it into the predictions so that informed decisions can be made about the system. In this paper, we describe three methods to solve these problems, including Monte Carlo-, unscented transform-, and first-order reliability-based methods. Using a planetary rover as a case study, we demonstrate and compare the different methods in simulation for battery end-of-discharge prediction.

  14. Uncertainty analysis of neural network based flood forecasting models: An ensemble based approach for constructing prediction interval

    NASA Astrophysics Data System (ADS)

    Kasiviswanathan, K.; Sudheer, K.

    2013-05-01

    Artificial neural network (ANN) based hydrologic models have gained lot of attention among water resources engineers and scientists, owing to their potential for accurate prediction of flood flows as compared to conceptual or physics based hydrologic models. The ANN approximates the non-linear functional relationship between the complex hydrologic variables in arriving at the river flow forecast values. Despite a large number of applications, there is still some criticism that ANN's point prediction lacks in reliability since the uncertainty of predictions are not quantified, and it limits its use in practical applications. A major concern in application of traditional uncertainty analysis techniques on neural network framework is its parallel computing architecture with large degrees of freedom, which makes the uncertainty assessment a challenging task. Very limited studies have considered assessment of predictive uncertainty of ANN based hydrologic models. In this study, a novel method is proposed that help construct the prediction interval of ANN flood forecasting model during calibration itself. The method is designed to have two stages of optimization during calibration: at stage 1, the ANN model is trained with genetic algorithm (GA) to obtain optimal set of weights and biases vector, and during stage 2, the optimal variability of ANN parameters (obtained in stage 1) is identified so as to create an ensemble of predictions. During the 2nd stage, the optimization is performed with multiple objectives, (i) minimum residual variance for the ensemble mean, (ii) maximum measured data points to fall within the estimated prediction interval and (iii) minimum width of prediction interval. The method is illustrated using a real world case study of an Indian basin. The method was able to produce an ensemble that has an average prediction interval width of 23.03 m3/s, with 97.17% of the total validation data points (measured) lying within the interval. The derived prediction interval for a selected hydrograph in the validation data set is presented in Fig 1. It is noted that most of the observed flows lie within the constructed prediction interval, and therefore provides information about the uncertainty of the prediction. One specific advantage of the method is that when ensemble mean value is considered as a forecast, the peak flows are predicted with improved accuracy by this method compared to traditional single point forecasted ANNs. Fig. 1 Prediction Interval for selected hydrograph

  15. Improved nonlinear prediction method

    NASA Astrophysics Data System (ADS)

    Adenan, Nur Hamiza; Md Noorani, Mohd Salmi

    2014-06-01

    The analysis and prediction of time series data have been addressed by researchers. Many techniques have been developed to be applied in various areas, such as weather forecasting, financial markets and hydrological phenomena involving data that are contaminated by noise. Therefore, various techniques to improve the method have been introduced to analyze and predict time series data. In respect of the importance of analysis and the accuracy of the prediction result, a study was undertaken to test the effectiveness of the improved nonlinear prediction method for data that contain noise. The improved nonlinear prediction method involves the formation of composite serial data based on the successive differences of the time series. Then, the phase space reconstruction was performed on the composite data (one-dimensional) to reconstruct a number of space dimensions. Finally the local linear approximation method was employed to make a prediction based on the phase space. This improved method was tested with data series Logistics that contain 0%, 5%, 10%, 20% and 30% of noise. The results show that by using the improved method, the predictions were found to be in close agreement with the observed ones. The correlation coefficient was close to one when the improved method was applied on data with up to 10% noise. Thus, an improvement to analyze data with noise without involving any noise reduction method was introduced to predict the time series data.

  16. Massive integration of diverse protein quality assessment methods to improve template based modeling in CASP11

    PubMed Central

    Cao, Renzhi; Bhattacharya, Debswapna; Adhikari, Badri; Li, Jilong; Cheng, Jianlin

    2015-01-01

    Model evaluation and selection is an important step and a big challenge in template-based protein structure prediction. Individual model quality assessment methods designed for recognizing some specific properties of protein structures often fail to consistently select good models from a model pool because of their limitations. Therefore, combining multiple complimentary quality assessment methods is useful for improving model ranking and consequently tertiary structure prediction. Here, we report the performance and analysis of our human tertiary structure predictor (MULTICOM) based on the massive integration of 14 diverse complementary quality assessment methods that was successfully benchmarked in the 11th Critical Assessment of Techniques of Protein Structure prediction (CASP11). The predictions of MULTICOM for 39 template-based domains were rigorously assessed by six scoring metrics covering global topology of Cα trace, local all-atom fitness, side chain quality, and physical reasonableness of the model. The results show that the massive integration of complementary, diverse single-model and multi-model quality assessment methods can effectively leverage the strength of single-model methods in distinguishing quality variation among similar good models and the advantage of multi-model quality assessment methods of identifying reasonable average-quality models. The overall excellent performance of the MULTICOM predictor demonstrates that integrating a large number of model quality assessment methods in conjunction with model clustering is a useful approach to improve the accuracy, diversity, and consequently robustness of template-based protein structure prediction. PMID:26369671

  17. Prediction of beta-turns at over 80% accuracy based on an ensemble of predicted secondary structures and multiple alignments

    PubMed Central

    Zheng, Ce; Kurgan, Lukasz

    2008-01-01

    Background β-turn is a secondary protein structure type that plays significant role in protein folding, stability, and molecular recognition. To date, several methods for prediction of β-turns from protein sequences were developed, but they are characterized by relatively poor prediction quality. The novelty of the proposed sequence-based β-turn predictor stems from the usage of a window based information extracted from four predicted three-state secondary structures, which together with a selected set of position specific scoring matrix (PSSM) values serve as an input to the support vector machine (SVM) predictor. Results We show that (1) all four predicted secondary structures are useful; (2) the most useful information extracted from the predicted secondary structure includes the structure of the predicted residue, secondary structure content in a window around the predicted residue, and features that indicate whether the predicted residue is inside a secondary structure segment; (3) the PSSM values of Asn, Asp, Gly, Ile, Leu, Met, Pro, and Val were among the top ranked features, which corroborates with recent studies. The Asn, Asp, Gly, and Pro indicate potential β-turns, while the remaining four amino acids are useful to predict non-β-turns. Empirical evaluation using three nonredundant datasets shows favorable Qtotal, Qpredicted and MCC values when compared with over a dozen of modern competing methods. Our method is the first to break the 80% Qtotal barrier and achieves Qtotal = 80.9%, MCC = 0.47, and Qpredicted higher by over 6% when compared with the second best method. We use feature selection to reduce the dimensionality of the feature vector used as the input for the proposed prediction method. The applied feature set is smaller by 86, 62 and 37% when compared with the second and two third-best (with respect to MCC) competing methods, respectively. Conclusion Experiments show that the proposed method constitutes an improvement over the competing prediction methods. The proposed prediction model can better discriminate between β-turns and non-β-turns due to obtaining lower numbers of false positive predictions. The prediction model and datasets are freely available at . PMID:18847492

  18. A unified frame of predicting side effects of drugs by using linear neighborhood similarity.

    PubMed

    Zhang, Wen; Yue, Xiang; Liu, Feng; Chen, Yanlin; Tu, Shikui; Zhang, Xining

    2017-12-14

    Drug side effects are one of main concerns in the drug discovery, which gains wide attentions. Investigating drug side effects is of great importance, and the computational prediction can help to guide wet experiments. As far as we known, a great number of computational methods have been proposed for the side effect predictions. The assumption that similar drugs may induce same side effects is usually employed for modeling, and how to calculate the drug-drug similarity is critical in the side effect predictions. In this paper, we present a novel measure of drug-drug similarity named "linear neighborhood similarity", which is calculated in a drug feature space by exploring linear neighborhood relationship. Then, we transfer the similarity from the feature space into the side effect space, and predict drug side effects by propagating known side effect information through a similarity-based graph. Under a unified frame based on the linear neighborhood similarity, we propose method "LNSM" and its extension "LNSM-SMI" to predict side effects of new drugs, and propose the method "LNSM-MSE" to predict unobserved side effect of approved drugs. We evaluate the performances of LNSM and LNSM-SMI in predicting side effects of new drugs, and evaluate the performances of LNSM-MSE in predicting missing side effects of approved drugs. The results demonstrate that the linear neighborhood similarity can improve the performances of side effect prediction, and the linear neighborhood similarity-based methods can outperform existing side effect prediction methods. More importantly, the proposed methods can predict side effects of new drugs as well as unobserved side effects of approved drugs under a unified frame.

  19. DBH Prediction Using Allometry Described by Bivariate Copula Distribution

    NASA Astrophysics Data System (ADS)

    Xu, Q.; Hou, Z.; Li, B.; Greenberg, J. A.

    2017-12-01

    Forest biomass mapping based on single tree detection from the airborne laser scanning (ALS) usually depends on an allometric equation that relates diameter at breast height (DBH) with per-tree aboveground biomass. The incapability of the ALS technology in directly measuring DBH leads to the need to predict DBH with other ALS-measured tree-level structural parameters. A copula-based method is proposed in the study to predict DBH with the ALS-measured tree height and crown diameter using a dataset measured in the Lassen National Forest in California. Instead of exploring an explicit mathematical equation that explains the underlying relationship between DBH and other structural parameters, the copula-based prediction method utilizes the dependency between cumulative distributions of these variables, and solves the DBH based on an assumption that for a single tree, the cumulative probability of each structural parameter is identical. Results show that compared with the bench-marking least-square linear regression and the k-MSN imputation, the copula-based method obtains better accuracy in the DBH for the Lassen National Forest. To assess the generalization of the proposed method, prediction uncertainty is quantified using bootstrapping techniques that examine the variability of the RMSE of the predicted DBH. We find that the copula distribution is reliable in describing the allometric relationship between tree-level structural parameters, and it contributes to the reduction of prediction uncertainty.

  20. Predicting Fluctuations in Cryptocurrency Transactions Based on User Comments and Replies.

    PubMed

    Kim, Young Bin; Kim, Jun Gi; Kim, Wook; Im, Jae Ho; Kim, Tae Hyeong; Kang, Shin Jin; Kim, Chang Hun

    2016-01-01

    This paper proposes a method to predict fluctuations in the prices of cryptocurrencies, which are increasingly used for online transactions worldwide. Little research has been conducted on predicting fluctuations in the price and number of transactions of a variety of cryptocurrencies. Moreover, the few methods proposed to predict fluctuation in currency prices are inefficient because they fail to take into account the differences in attributes between real currencies and cryptocurrencies. This paper analyzes user comments in online cryptocurrency communities to predict fluctuations in the prices of cryptocurrencies and the number of transactions. By focusing on three cryptocurrencies, each with a large market size and user base, this paper attempts to predict such fluctuations by using a simple and efficient method.

  1. Predicting Fluctuations in Cryptocurrency Transactions Based on User Comments and Replies

    PubMed Central

    Kim, Young Bin; Kim, Jun Gi; Kim, Wook; Im, Jae Ho; Kim, Tae Hyeong; Kang, Shin Jin; Kim, Chang Hun

    2016-01-01

    This paper proposes a method to predict fluctuations in the prices of cryptocurrencies, which are increasingly used for online transactions worldwide. Little research has been conducted on predicting fluctuations in the price and number of transactions of a variety of cryptocurrencies. Moreover, the few methods proposed to predict fluctuation in currency prices are inefficient because they fail to take into account the differences in attributes between real currencies and cryptocurrencies. This paper analyzes user comments in online cryptocurrency communities to predict fluctuations in the prices of cryptocurrencies and the number of transactions. By focusing on three cryptocurrencies, each with a large market size and user base, this paper attempts to predict such fluctuations by using a simple and efficient method. PMID:27533113

  2. Pseudo CT estimation from MRI using patch-based random forest

    NASA Astrophysics Data System (ADS)

    Yang, Xiaofeng; Lei, Yang; Shu, Hui-Kuo; Rossi, Peter; Mao, Hui; Shim, Hyunsuk; Curran, Walter J.; Liu, Tian

    2017-02-01

    Recently, MR simulators gain popularity because of unnecessary radiation exposure of CT simulators being used in radiation therapy planning. We propose a method for pseudo CT estimation from MR images based on a patch-based random forest. Patient-specific anatomical features are extracted from the aligned training images and adopted as signatures for each voxel. The most robust and informative features are identified using feature selection to train the random forest. The well-trained random forest is used to predict the pseudo CT of a new patient. This prediction technique was tested with human brain images and the prediction accuracy was assessed using the original CT images. Peak signal-to-noise ratio (PSNR) and feature similarity (FSIM) indexes were used to quantify the differences between the pseudo and original CT images. The experimental results showed the proposed method could accurately generate pseudo CT images from MR images. In summary, we have developed a new pseudo CT prediction method based on patch-based random forest, demonstrated its clinical feasibility, and validated its prediction accuracy. This pseudo CT prediction technique could be a useful tool for MRI-based radiation treatment planning and attenuation correction in a PET/MRI scanner.

  3. Formation enthalpies for transition metal alloys using machine learning

    NASA Astrophysics Data System (ADS)

    Ubaru, Shashanka; Miedlar, Agnieszka; Saad, Yousef; Chelikowsky, James R.

    2017-06-01

    The enthalpy of formation is an important thermodynamic property. Developing fast and accurate methods for its prediction is of practical interest in a variety of applications. Material informatics techniques based on machine learning have recently been introduced in the literature as an inexpensive means of exploiting materials data, and can be used to examine a variety of thermodynamics properties. We investigate the use of such machine learning tools for predicting the formation enthalpies of binary intermetallic compounds that contain at least one transition metal. We consider certain easily available properties of the constituting elements complemented by some basic properties of the compounds, to predict the formation enthalpies. We show how choosing these properties (input features) based on a literature study (using prior physics knowledge) seems to outperform machine learning based feature selection methods such as sensitivity analysis and LASSO (least absolute shrinkage and selection operator) based methods. A nonlinear kernel based support vector regression method is employed to perform the predictions. The predictive ability of our model is illustrated via several experiments on a dataset containing 648 binary alloys. We train and validate the model using the formation enthalpies calculated using a model by Miedema, which is a popular semiempirical model used for the prediction of formation enthalpies of metal alloys.

  4. Predicting miRNA targets for head and neck squamous cell carcinoma using an ensemble method.

    PubMed

    Gao, Hong; Jin, Hui; Li, Guijun

    2018-01-01

    This study aimed to uncover potential microRNA (miRNA) targets in head and neck squamous cell carcinoma (HNSCC) using an ensemble method which combined 3 different methods: Pearson's correlation coefficient (PCC), Lasso and a causal inference method (i.e., intervention calculus when the directed acyclic graph (DAG) is absent [IDA]), based on Borda count election. The Borda count election method was used to integrate the top 100 predicted targets of each miRNA generated by individual methods. Afterwards, to validate the performance ability of our method, we checked the TarBase v6.0, miRecords v2013, miRWalk v2.0 and miRTarBase v4.5 databases to validate predictions for miRNAs. Pathway enrichment analysis of target genes in the top 1,000 miRNA-messenger RNA (mRNA) interactions was conducted to focus on significant KEGG pathways. Finally, we extracted target genes based on occurrence frequency ≥3. Based on an absolute value of PCC >0.7, we found 33 miRNAs and 288 mRNAs for further analysis. We extracted 10 target genes with predicted frequencies not less than 3. The target gene MYO5C possessed the highest frequency, which was predicted by 7 different miRNAs. Significantly, a total of 8 pathways were identified; the pathways of cytokine-cytokine receptor interaction and chemokine signaling pathway were the most significant. We successfully predicted target genes and pathways for HNSCC relying on miRNA expression data, mRNA expression profile, an ensemble method and pathway information. Our results may offer new information for the diagnosis and estimation of the prognosis of HNSCC.

  5. Application Study of Comprehensive Forecasting Model Based on Entropy Weighting Method on Trend of PM2.5 Concentration in Guangzhou, China

    PubMed Central

    Liu, Dong-jun; Li, Li

    2015-01-01

    For the issue of haze-fog, PM2.5 is the main influence factor of haze-fog pollution in China. The trend of PM2.5 concentration was analyzed from a qualitative point of view based on mathematical models and simulation in this study. The comprehensive forecasting model (CFM) was developed based on the combination forecasting ideas. Autoregressive Integrated Moving Average Model (ARIMA), Artificial Neural Networks (ANNs) model and Exponential Smoothing Method (ESM) were used to predict the time series data of PM2.5 concentration. The results of the comprehensive forecasting model were obtained by combining the results of three methods based on the weights from the Entropy Weighting Method. The trend of PM2.5 concentration in Guangzhou China was quantitatively forecasted based on the comprehensive forecasting model. The results were compared with those of three single models, and PM2.5 concentration values in the next ten days were predicted. The comprehensive forecasting model balanced the deviation of each single prediction method, and had better applicability. It broadens a new prediction method for the air quality forecasting field. PMID:26110332

  6. Application Study of Comprehensive Forecasting Model Based on Entropy Weighting Method on Trend of PM2.5 Concentration in Guangzhou, China.

    PubMed

    Liu, Dong-jun; Li, Li

    2015-06-23

    For the issue of haze-fog, PM2.5 is the main influence factor of haze-fog pollution in China. The trend of PM2.5 concentration was analyzed from a qualitative point of view based on mathematical models and simulation in this study. The comprehensive forecasting model (CFM) was developed based on the combination forecasting ideas. Autoregressive Integrated Moving Average Model (ARIMA), Artificial Neural Networks (ANNs) model and Exponential Smoothing Method (ESM) were used to predict the time series data of PM2.5 concentration. The results of the comprehensive forecasting model were obtained by combining the results of three methods based on the weights from the Entropy Weighting Method. The trend of PM2.5 concentration in Guangzhou China was quantitatively forecasted based on the comprehensive forecasting model. The results were compared with those of three single models, and PM2.5 concentration values in the next ten days were predicted. The comprehensive forecasting model balanced the deviation of each single prediction method, and had better applicability. It broadens a new prediction method for the air quality forecasting field.

  7. Group-regularized individual prediction: theory and application to pain.

    PubMed

    Lindquist, Martin A; Krishnan, Anjali; López-Solà, Marina; Jepma, Marieke; Woo, Choong-Wan; Koban, Leonie; Roy, Mathieu; Atlas, Lauren Y; Schmidt, Liane; Chang, Luke J; Reynolds Losin, Elizabeth A; Eisenbarth, Hedwig; Ashar, Yoni K; Delk, Elizabeth; Wager, Tor D

    2017-01-15

    Multivariate pattern analysis (MVPA) has become an important tool for identifying brain representations of psychological processes and clinical outcomes using fMRI and related methods. Such methods can be used to predict or 'decode' psychological states in individual subjects. Single-subject MVPA approaches, however, are limited by the amount and quality of individual-subject data. In spite of higher spatial resolution, predictive accuracy from single-subject data often does not exceed what can be accomplished using coarser, group-level maps, because single-subject patterns are trained on limited amounts of often-noisy data. Here, we present a method that combines population-level priors, in the form of biomarker patterns developed on prior samples, with single-subject MVPA maps to improve single-subject prediction. Theoretical results and simulations motivate a weighting based on the relative variances of biomarker-based prediction-based on population-level predictive maps from prior groups-and individual-subject, cross-validated prediction. Empirical results predicting pain using brain activity on a trial-by-trial basis (single-trial prediction) across 6 studies (N=180 participants) confirm the theoretical predictions. Regularization based on a population-level biomarker-in this case, the Neurologic Pain Signature (NPS)-improved single-subject prediction accuracy compared with idiographic maps based on the individuals' data alone. The regularization scheme that we propose, which we term group-regularized individual prediction (GRIP), can be applied broadly to within-person MVPA-based prediction. We also show how GRIP can be used to evaluate data quality and provide benchmarks for the appropriateness of population-level maps like the NPS for a given individual or study. Copyright © 2015 Elsevier Inc. All rights reserved.

  8. ProTox: a web server for the in silico prediction of rodent oral toxicity

    PubMed Central

    Drwal, Malgorzata N.; Banerjee, Priyanka; Dunkel, Mathias; Wettig, Martin R.; Preissner, Robert

    2014-01-01

    Animal trials are currently the major method for determining the possible toxic effects of drug candidates and cosmetics. In silico prediction methods represent an alternative approach and aim to rationalize the preclinical drug development, thus enabling the reduction of the associated time, costs and animal experiments. Here, we present ProTox, a web server for the prediction of rodent oral toxicity. The prediction method is based on the analysis of the similarity of compounds with known median lethal doses (LD50) and incorporates the identification of toxic fragments, therefore representing a novel approach in toxicity prediction. In addition, the web server includes an indication of possible toxicity targets which is based on an in-house collection of protein–ligand-based pharmacophore models (‘toxicophores’) for targets associated with adverse drug reactions. The ProTox web server is open to all users and can be accessed without registration at: http://tox.charite.de/tox. The only requirement for the prediction is the two-dimensional structure of the input compounds. All ProTox methods have been evaluated based on a diverse external validation set and displayed strong performance (sensitivity, specificity and precision of 76, 95 and 75%, respectively) and superiority over other toxicity prediction tools, indicating their possible applicability for other compound classes. PMID:24838562

  9. Validation of Skeletal Muscle cis-Regulatory Module Predictions Reveals Nucleotide Composition Bias in Functional Enhancers

    PubMed Central

    Kwon, Andrew T.; Chou, Alice Yi; Arenillas, David J.; Wasserman, Wyeth W.

    2011-01-01

    We performed a genome-wide scan for muscle-specific cis-regulatory modules (CRMs) using three computational prediction programs. Based on the predictions, 339 candidate CRMs were tested in cell culture with NIH3T3 fibroblasts and C2C12 myoblasts for capacity to direct selective reporter gene expression to differentiated C2C12 myotubes. A subset of 19 CRMs validated as functional in the assay. The rate of predictive success reveals striking limitations of computational regulatory sequence analysis methods for CRM discovery. Motif-based methods performed no better than predictions based only on sequence conservation. Analysis of the properties of the functional sequences relative to inactive sequences identifies nucleotide sequence composition can be an important characteristic to incorporate in future methods for improved predictive specificity. Muscle-related TFBSs predicted within the functional sequences display greater sequence conservation than non-TFBS flanking regions. Comparison with recent MyoD and histone modification ChIP-Seq data supports the validity of the functional regions. PMID:22144875

  10. Template‐based field map prediction for rapid whole brain B0 shimming

    PubMed Central

    Shi, Yuhang; Vannesjo, S. Johanna; Miller, Karla L.

    2017-01-01

    Purpose In typical MRI protocols, time is spent acquiring a field map to calculate the shim settings for best image quality. We propose a fast template‐based field map prediction method that yields near‐optimal shims without measuring the field. Methods The template‐based prediction method uses prior knowledge of the B0 distribution in the human brain, based on a large database of field maps acquired from different subjects, together with subject‐specific structural information from a quick localizer scan. The shimming performance of using the template‐based prediction is evaluated in comparison to a range of potential fast shimming methods. Results Static B0 shimming based on predicted field maps performed almost as well as shimming based on individually measured field maps. In experimental evaluations at 7 T, the proposed approach yielded a residual field standard deviation in the brain of on average 59 Hz, compared with 50 Hz using measured field maps and 176 Hz using no subject‐specific shim. Conclusions This work demonstrates that shimming based on predicted field maps is feasible. The field map prediction accuracy could potentially be further improved by generating the template from a subset of subjects, based on parameters such as head rotation and body mass index. Magn Reson Med 80:171–180, 2018. © 2017 The Authors Magnetic Resonance in Medicine published by Wiley Periodicals, Inc. on behalf of International Society for Magnetic Resonance in Medicine. This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. PMID:29193340

  11. A new method for enhancer prediction based on deep belief network.

    PubMed

    Bu, Hongda; Gan, Yanglan; Wang, Yang; Zhou, Shuigeng; Guan, Jihong

    2017-10-16

    Studies have shown that enhancers are significant regulatory elements to play crucial roles in gene expression regulation. Since enhancers are unrelated to the orientation and distance to their target genes, it is a challenging mission for scholars and researchers to accurately predicting distal enhancers. In the past years, with the high-throughout ChiP-seq technologies development, several computational techniques emerge to predict enhancers using epigenetic or genomic features. Nevertheless, the inconsistency of computational models across different cell-lines and the unsatisfactory prediction performance call for further research in this area. Here, we propose a new Deep Belief Network (DBN) based computational method for enhancer prediction, which is called EnhancerDBN. This method combines diverse features, composed of DNA sequence compositional features, DNA methylation and histone modifications. Our computational results indicate that 1) EnhancerDBN outperforms 13 existing methods in prediction, and 2) GC content and DNA methylation can serve as relevant features for enhancer prediction. Deep learning is effective in boosting the performance of enhancer prediction.

  12. Prediction of essential proteins based on gene expression programming.

    PubMed

    Zhong, Jiancheng; Wang, Jianxin; Peng, Wei; Zhang, Zhen; Pan, Yi

    2013-01-01

    Essential proteins are indispensable for cell survive. Identifying essential proteins is very important for improving our understanding the way of a cell working. There are various types of features related to the essentiality of proteins. Many methods have been proposed to combine some of them to predict essential proteins. However, it is still a big challenge for designing an effective method to predict them by integrating different features, and explaining how these selected features decide the essentiality of protein. Gene expression programming (GEP) is a learning algorithm and what it learns specifically is about relationships between variables in sets of data and then builds models to explain these relationships. In this work, we propose a GEP-based method to predict essential protein by combing some biological features and topological features. We carry out experiments on S. cerevisiae data. The experimental results show that the our method achieves better prediction performance than those methods using individual features. Moreover, our method outperforms some machine learning methods and performs as well as a method which is obtained by combining the outputs of eight machine learning methods. The accuracy of predicting essential proteins can been improved by using GEP method to combine some topological features and biological features.

  13. A method of predicting the energy-absorption capability of composite subfloor beams

    NASA Technical Reports Server (NTRS)

    Farley, Gary L.

    1987-01-01

    A simple method of predicting the energy-absorption capability of composite subfloor beam structure was developed. The method is based upon the weighted sum of the energy-absorption capability of constituent elements of a subfloor beam. An empirical data base of energy absorption results from circular and square cross section tube specimens were used in the prediction capability. The procedure is applicable to a wide range of subfloor beam structure. The procedure was demonstrated on three subfloor beam concepts. Agreement between test and prediction was within seven percent for all three cases.

  14. Literature-based condition-specific miRNA-mRNA target prediction.

    PubMed

    Oh, Minsik; Rhee, Sungmin; Moon, Ji Hwan; Chae, Heejoon; Lee, Sunwon; Kang, Jaewoo; Kim, Sun

    2017-01-01

    miRNAs are small non-coding RNAs that regulate gene expression by binding to the 3'-UTR of genes. Many recent studies have reported that miRNAs play important biological roles by regulating specific mRNAs or genes. Many sequence-based target prediction algorithms have been developed to predict miRNA targets. However, these methods are not designed for condition-specific target predictions and produce many false positives; thus, expression-based target prediction algorithms have been developed for condition-specific target predictions. A typical strategy to utilize expression data is to leverage the negative control roles of miRNAs on genes. To control false positives, a stringent cutoff value is typically set, but in this case, these methods tend to reject many true target relationships, i.e., false negatives. To overcome these limitations, additional information should be utilized. The literature is probably the best resource that we can utilize. Recent literature mining systems compile millions of articles with experiments designed for specific biological questions, and the systems provide a function to search for specific information. To utilize the literature information, we used a literature mining system, BEST, that automatically extracts information from the literature in PubMed and that allows the user to perform searches of the literature with any English words. By integrating omics data analysis methods and BEST, we developed Context-MMIA, a miRNA-mRNA target prediction method that combines expression data analysis results and the literature information extracted based on the user-specified context. In the pathway enrichment analysis using genes included in the top 200 miRNA-targets, Context-MMIA outperformed the four existing target prediction methods that we tested. In another test on whether prediction methods can re-produce experimentally validated target relationships, Context-MMIA outperformed the four existing target prediction methods. In summary, Context-MMIA allows the user to specify a context of the experimental data to predict miRNA targets, and we believe that Context-MMIA is very useful for predicting condition-specific miRNA targets.

  15. A novel one-class SVM based negative data sampling method for reconstructing proteome-wide HTLV-human protein interaction networks.

    PubMed

    Mei, Suyu; Zhu, Hao

    2015-01-26

    Protein-protein interaction (PPI) prediction is generally treated as a problem of binary classification wherein negative data sampling is still an open problem to be addressed. The commonly used random sampling is prone to yield less representative negative data with considerable false negatives. Meanwhile rational constraints are seldom exerted on model selection to reduce the risk of false positive predictions for most of the existing computational methods. In this work, we propose a novel negative data sampling method based on one-class SVM (support vector machine, SVM) to predict proteome-wide protein interactions between HTLV retrovirus and Homo sapiens, wherein one-class SVM is used to choose reliable and representative negative data, and two-class SVM is used to yield proteome-wide outcomes as predictive feedback for rational model selection. Computational results suggest that one-class SVM is more suited to be used as negative data sampling method than two-class PPI predictor, and the predictive feedback constrained model selection helps to yield a rational predictive model that reduces the risk of false positive predictions. Some predictions have been validated by the recent literature. Lastly, gene ontology based clustering of the predicted PPI networks is conducted to provide valuable cues for the pathogenesis of HTLV retrovirus.

  16. Strainrange partitioning behavior of the nickel-base superalloys, Rene' 80 and in 100

    NASA Technical Reports Server (NTRS)

    Halford, G. R.; Nachtigall, A. J.

    1978-01-01

    A study was made to assess the ability of the method of Strainrange Partitioning (SRP) to both correlate and predict high-temperature, low cycle fatigue lives of nickel base superalloys for gas turbine applications. The partitioned strainrange versus life relationships for uncoated Rene' 80 and cast IN 100 were also determined from the ductility normalized-Strainrange Partitioning equations. These were used to predict the cyclic lives of the baseline tests. The life predictability of the method was verified for cast IN 100 by applying the baseline results to the cyclic life prediction of a series of complex strain cycling tests with multiple hold periods at constant strain. It was concluded that the method of SRP can correlate and predict the cyclic lives of laboratory specimens of the nickel base superalloys evaluated in this program.

  17. Research on cross - Project software defect prediction based on transfer learning

    NASA Astrophysics Data System (ADS)

    Chen, Ya; Ding, Xiaoming

    2018-04-01

    According to the two challenges in the prediction of cross-project software defects, the distribution differences between the source project and the target project dataset and the class imbalance in the dataset, proposing a cross-project software defect prediction method based on transfer learning, named NTrA. Firstly, solving the source project data's class imbalance based on the Augmented Neighborhood Cleaning Algorithm. Secondly, the data gravity method is used to give different weights on the basis of the attribute similarity of source project and target project data. Finally, a defect prediction model is constructed by using Trad boost algorithm. Experiments were conducted using data, come from NASA and SOFTLAB respectively, from a published PROMISE dataset. The results show that the method has achieved good values of recall and F-measure, and achieved good prediction results.

  18. Model-Based and Model-Free Pavlovian Reward Learning: Revaluation, Revision and Revelation

    PubMed Central

    Dayan, Peter; Berridge, Kent C.

    2014-01-01

    Evidence supports at least two methods for learning about reward and punishment and making predictions for guiding actions. One method, called model-free, progressively acquires cached estimates of the long-run values of circumstances and actions from retrospective experience. The other method, called model-based, uses representations of the environment, expectations and prospective calculations to make cognitive predictions of future value. Extensive attention has been paid to both methods in computational analyses of instrumental learning. By contrast, although a full computational analysis has been lacking, Pavlovian learning and prediction has typically been presumed to be solely model-free. Here, we revise that presumption and review compelling evidence from Pavlovian revaluation experiments showing that Pavlovian predictions can involve their own form of model-based evaluation. In model-based Pavlovian evaluation, prevailing states of the body and brain influence value computations, and thereby produce powerful incentive motivations that can sometimes be quite new. We consider the consequences of this revised Pavlovian view for the computational landscape of prediction, response and choice. We also revisit differences between Pavlovian and instrumental learning in the control of incentive motivation. PMID:24647659

  19. Model-based and model-free Pavlovian reward learning: revaluation, revision, and revelation.

    PubMed

    Dayan, Peter; Berridge, Kent C

    2014-06-01

    Evidence supports at least two methods for learning about reward and punishment and making predictions for guiding actions. One method, called model-free, progressively acquires cached estimates of the long-run values of circumstances and actions from retrospective experience. The other method, called model-based, uses representations of the environment, expectations, and prospective calculations to make cognitive predictions of future value. Extensive attention has been paid to both methods in computational analyses of instrumental learning. By contrast, although a full computational analysis has been lacking, Pavlovian learning and prediction has typically been presumed to be solely model-free. Here, we revise that presumption and review compelling evidence from Pavlovian revaluation experiments showing that Pavlovian predictions can involve their own form of model-based evaluation. In model-based Pavlovian evaluation, prevailing states of the body and brain influence value computations, and thereby produce powerful incentive motivations that can sometimes be quite new. We consider the consequences of this revised Pavlovian view for the computational landscape of prediction, response, and choice. We also revisit differences between Pavlovian and instrumental learning in the control of incentive motivation.

  20. High Speed Jet Noise Prediction Using Large Eddy Simulation

    NASA Technical Reports Server (NTRS)

    Lele, Sanjiva K.

    2002-01-01

    Current methods for predicting the noise of high speed jets are largely empirical. These empirical methods are based on the jet noise data gathered by varying primarily the jet flow speed, and jet temperature for a fixed nozzle geometry. Efforts have been made to correlate the noise data of co-annular (multi-stream) jets and for the changes associated with the forward flight within these empirical correlations. But ultimately these emipirical methods fail to provide suitable guidance in the selection of new, low-noise nozzle designs. This motivates the development of a new class of prediction methods which are based on computational simulations, in an attempt to remove the empiricism of the present day noise predictions.

  1. Protein asparagine deamidation prediction based on structures with machine learning methods.

    PubMed

    Jia, Lei; Sun, Yaxiong

    2017-01-01

    Chemical stability is a major concern in the development of protein therapeutics due to its impact on both efficacy and safety. Protein "hotspots" are amino acid residues that are subject to various chemical modifications, including deamidation, isomerization, glycosylation, oxidation etc. A more accurate prediction method for potential hotspot residues would allow their elimination or reduction as early as possible in the drug discovery process. In this work, we focus on prediction models for asparagine (Asn) deamidation. Sequence-based prediction method simply identifies the NG motif (amino acid asparagine followed by a glycine) to be liable to deamidation. It still dominates deamidation evaluation process in most pharmaceutical setup due to its convenience. However, the simple sequence-based method is less accurate and often causes over-engineering a protein. We introduce structure-based prediction models by mining available experimental and structural data of deamidated proteins. Our training set contains 194 Asn residues from 25 proteins that all have available high-resolution crystal structures. Experimentally measured deamidation half-life of Asn in penta-peptides as well as 3D structure-based properties, such as solvent exposure, crystallographic B-factors, local secondary structure and dihedral angles etc., were used to train prediction models with several machine learning algorithms. The prediction tools were cross-validated as well as tested with an external test data set. The random forest model had high enrichment in ranking deamidated residues higher than non-deamidated residues while effectively eliminated false positive predictions. It is possible that such quantitative protein structure-function relationship tools can also be applied to other protein hotspot predictions. In addition, we extensively discussed metrics being used to evaluate the performance of predicting unbalanced data sets such as the deamidation case.

  2. A Data Driven Model for Predicting RNA-Protein Interactions based on Gradient Boosting Machine.

    PubMed

    Jain, Dharm Skandh; Gupte, Sanket Rajan; Aduri, Raviprasad

    2018-06-22

    RNA protein interactions (RPI) play a pivotal role in the regulation of various biological processes. Experimental validation of RPI has been time-consuming, paving the way for computational prediction methods. The major limiting factor of these methods has been the accuracy and confidence of the predictions, and our in-house experiments show that they fail to accurately predict RPI involving short RNA sequences such as TERRA RNA. Here, we present a data-driven model for RPI prediction using a gradient boosting classifier. Amino acids and nucleotides are classified based on the high-resolution structural data of RNA protein complexes. The minimum structural unit consisting of five residues is used as the descriptor. Comparative analysis of existing methods shows the consistently higher performance of our method irrespective of the length of RNA present in the RPI. The method has been successfully applied to map RPI networks involving both long noncoding RNA as well as TERRA RNA. The method is also shown to successfully predict RNA and protein hubs present in RPI networks of four different organisms. The robustness of this method will provide a way for predicting RPI networks of yet unknown interactions for both long noncoding RNA and microRNA.

  3. Improving lung cancer prognosis assessment by incorporating synthetic minority oversampling technique and score fusion method

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yan, Shiju; Qian, Wei; Guan, Yubao

    2016-06-15

    Purpose: This study aims to investigate the potential to improve lung cancer recurrence risk prediction performance for stage I NSCLS patients by integrating oversampling, feature selection, and score fusion techniques and develop an optimal prediction model. Methods: A dataset involving 94 early stage lung cancer patients was retrospectively assembled, which includes CT images, nine clinical and biological (CB) markers, and outcome of 3-yr disease-free survival (DFS) after surgery. Among the 94 patients, 74 remained DFS and 20 had cancer recurrence. Applying a computer-aided detection scheme, tumors were segmented from the CT images and 35 quantitative image (QI) features were initiallymore » computed. Two normalized Gaussian radial basis function network (RBFN) based classifiers were built based on QI features and CB markers separately. To improve prediction performance, the authors applied a synthetic minority oversampling technique (SMOTE) and a BestFirst based feature selection method to optimize the classifiers and also tested fusion methods to combine QI and CB based prediction results. Results: Using a leave-one-case-out cross-validation (K-fold cross-validation) method, the computed areas under a receiver operating characteristic curve (AUCs) were 0.716 ± 0.071 and 0.642 ± 0.061, when using the QI and CB based classifiers, respectively. By fusion of the scores generated by the two classifiers, AUC significantly increased to 0.859 ± 0.052 (p < 0.05) with an overall prediction accuracy of 89.4%. Conclusions: This study demonstrated the feasibility of improving prediction performance by integrating SMOTE, feature selection, and score fusion techniques. Combining QI features and CB markers and performing SMOTE prior to feature selection in classifier training enabled RBFN based classifier to yield improved prediction accuracy.« less

  4. Analysis of energy-based algorithms for RNA secondary structure prediction

    PubMed Central

    2012-01-01

    Background RNA molecules play critical roles in the cells of organisms, including roles in gene regulation, catalysis, and synthesis of proteins. Since RNA function depends in large part on its folded structures, much effort has been invested in developing accurate methods for prediction of RNA secondary structure from the base sequence. Minimum free energy (MFE) predictions are widely used, based on nearest neighbor thermodynamic parameters of Mathews, Turner et al. or those of Andronescu et al. Some recently proposed alternatives that leverage partition function calculations find the structure with maximum expected accuracy (MEA) or pseudo-expected accuracy (pseudo-MEA) methods. Advances in prediction methods are typically benchmarked using sensitivity, positive predictive value and their harmonic mean, namely F-measure, on datasets of known reference structures. Since such benchmarks document progress in improving accuracy of computational prediction methods, it is important to understand how measures of accuracy vary as a function of the reference datasets and whether advances in algorithms or thermodynamic parameters yield statistically significant improvements. Our work advances such understanding for the MFE and (pseudo-)MEA-based methods, with respect to the latest datasets and energy parameters. Results We present three main findings. First, using the bootstrap percentile method, we show that the average F-measure accuracy of the MFE and (pseudo-)MEA-based algorithms, as measured on our largest datasets with over 2000 RNAs from diverse families, is a reliable estimate (within a 2% range with high confidence) of the accuracy of a population of RNA molecules represented by this set. However, average accuracy on smaller classes of RNAs such as a class of 89 Group I introns used previously in benchmarking algorithm accuracy is not reliable enough to draw meaningful conclusions about the relative merits of the MFE and MEA-based algorithms. Second, on our large datasets, the algorithm with best overall accuracy is a pseudo MEA-based algorithm of Hamada et al. that uses a generalized centroid estimator of base pairs. However, between MFE and other MEA-based methods, there is no clear winner in the sense that the relative accuracy of the MFE versus MEA-based algorithms changes depending on the underlying energy parameters. Third, of the four parameter sets we considered, the best accuracy for the MFE-, MEA-based, and pseudo-MEA-based methods is 0.686, 0.680, and 0.711, respectively (on a scale from 0 to 1 with 1 meaning perfect structure predictions) and is obtained with a thermodynamic parameter set obtained by Andronescu et al. called BL* (named after the Boltzmann likelihood method by which the parameters were derived). Conclusions Large datasets should be used to obtain reliable measures of the accuracy of RNA structure prediction algorithms, and average accuracies on specific classes (such as Group I introns and Transfer RNAs) should be interpreted with caution, considering the relatively small size of currently available datasets for such classes. The accuracy of the MEA-based methods is significantly higher when using the BL* parameter set of Andronescu et al. than when using the parameters of Mathews and Turner, and there is no significant difference between the accuracy of MEA-based methods and MFE when using the BL* parameters. The pseudo-MEA-based method of Hamada et al. with the BL* parameter set significantly outperforms all other MFE and MEA-based algorithms on our large data sets. PMID:22296803

  5. Analysis of energy-based algorithms for RNA secondary structure prediction.

    PubMed

    Hajiaghayi, Monir; Condon, Anne; Hoos, Holger H

    2012-02-01

    RNA molecules play critical roles in the cells of organisms, including roles in gene regulation, catalysis, and synthesis of proteins. Since RNA function depends in large part on its folded structures, much effort has been invested in developing accurate methods for prediction of RNA secondary structure from the base sequence. Minimum free energy (MFE) predictions are widely used, based on nearest neighbor thermodynamic parameters of Mathews, Turner et al. or those of Andronescu et al. Some recently proposed alternatives that leverage partition function calculations find the structure with maximum expected accuracy (MEA) or pseudo-expected accuracy (pseudo-MEA) methods. Advances in prediction methods are typically benchmarked using sensitivity, positive predictive value and their harmonic mean, namely F-measure, on datasets of known reference structures. Since such benchmarks document progress in improving accuracy of computational prediction methods, it is important to understand how measures of accuracy vary as a function of the reference datasets and whether advances in algorithms or thermodynamic parameters yield statistically significant improvements. Our work advances such understanding for the MFE and (pseudo-)MEA-based methods, with respect to the latest datasets and energy parameters. We present three main findings. First, using the bootstrap percentile method, we show that the average F-measure accuracy of the MFE and (pseudo-)MEA-based algorithms, as measured on our largest datasets with over 2000 RNAs from diverse families, is a reliable estimate (within a 2% range with high confidence) of the accuracy of a population of RNA molecules represented by this set. However, average accuracy on smaller classes of RNAs such as a class of 89 Group I introns used previously in benchmarking algorithm accuracy is not reliable enough to draw meaningful conclusions about the relative merits of the MFE and MEA-based algorithms. Second, on our large datasets, the algorithm with best overall accuracy is a pseudo MEA-based algorithm of Hamada et al. that uses a generalized centroid estimator of base pairs. However, between MFE and other MEA-based methods, there is no clear winner in the sense that the relative accuracy of the MFE versus MEA-based algorithms changes depending on the underlying energy parameters. Third, of the four parameter sets we considered, the best accuracy for the MFE-, MEA-based, and pseudo-MEA-based methods is 0.686, 0.680, and 0.711, respectively (on a scale from 0 to 1 with 1 meaning perfect structure predictions) and is obtained with a thermodynamic parameter set obtained by Andronescu et al. called BL* (named after the Boltzmann likelihood method by which the parameters were derived). Large datasets should be used to obtain reliable measures of the accuracy of RNA structure prediction algorithms, and average accuracies on specific classes (such as Group I introns and Transfer RNAs) should be interpreted with caution, considering the relatively small size of currently available datasets for such classes. The accuracy of the MEA-based methods is significantly higher when using the BL* parameter set of Andronescu et al. than when using the parameters of Mathews and Turner, and there is no significant difference between the accuracy of MEA-based methods and MFE when using the BL* parameters. The pseudo-MEA-based method of Hamada et al. with the BL* parameter set significantly outperforms all other MFE and MEA-based algorithms on our large data sets.

  6. TH-CD-207A-07: Prediction of High Dimensional State Subject to Respiratory Motion: A Manifold Learning Approach

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liu, W; Sawant, A; Ruan, D

    Purpose: The development of high dimensional imaging systems (e.g. volumetric MRI, CBCT, photogrammetry systems) in image-guided radiotherapy provides important pathways to the ultimate goal of real-time volumetric/surface motion monitoring. This study aims to develop a prediction method for the high dimensional state subject to respiratory motion. Compared to conventional linear dimension reduction based approaches, our method utilizes manifold learning to construct a descriptive feature submanifold, where more efficient and accurate prediction can be performed. Methods: We developed a prediction framework for high-dimensional state subject to respiratory motion. The proposed method performs dimension reduction in a nonlinear setting to permit moremore » descriptive features compared to its linear counterparts (e.g., classic PCA). Specifically, a kernel PCA is used to construct a proper low-dimensional feature manifold, where low-dimensional prediction is performed. A fixed-point iterative pre-image estimation method is applied subsequently to recover the predicted value in the original state space. We evaluated and compared the proposed method with PCA-based method on 200 level-set surfaces reconstructed from surface point clouds captured by the VisionRT system. The prediction accuracy was evaluated with respect to root-mean-squared-error (RMSE) for both 200ms and 600ms lookahead lengths. Results: The proposed method outperformed PCA-based approach with statistically higher prediction accuracy. In one-dimensional feature subspace, our method achieved mean prediction accuracy of 0.86mm and 0.89mm for 200ms and 600ms lookahead lengths respectively, compared to 0.95mm and 1.04mm from PCA-based method. The paired t-tests further demonstrated the statistical significance of the superiority of our method, with p-values of 6.33e-3 and 5.78e-5, respectively. Conclusion: The proposed approach benefits from the descriptiveness of a nonlinear manifold and the prediction reliability in such low dimensional manifold. The fixed-point iterative approach turns out to work well practically for the pre-image recovery. Our approach is particularly suitable to facilitate managing respiratory motion in image-guide radiotherapy. This work is supported in part by NIH grant R01 CA169102-02.« less

  7. Nonlinear-drifted Brownian motion with multiple hidden states for remaining useful life prediction of rechargeable batteries

    NASA Astrophysics Data System (ADS)

    Wang, Dong; Zhao, Yang; Yang, Fangfang; Tsui, Kwok-Leung

    2017-09-01

    Brownian motion with adaptive drift has attracted much attention in prognostics because its first hitting time is highly relevant to remaining useful life prediction and it follows the inverse Gaussian distribution. Besides linear degradation modeling, nonlinear-drifted Brownian motion has been developed to model nonlinear degradation. Moreover, the first hitting time distribution of the nonlinear-drifted Brownian motion has been approximated by time-space transformation. In the previous studies, the drift coefficient is the only hidden state used in state space modeling of the nonlinear-drifted Brownian motion. Besides the drift coefficient, parameters of a nonlinear function used in the nonlinear-drifted Brownian motion should be treated as additional hidden states of state space modeling to make the nonlinear-drifted Brownian motion more flexible. In this paper, a prognostic method based on nonlinear-drifted Brownian motion with multiple hidden states is proposed and then it is applied to predict remaining useful life of rechargeable batteries. 26 sets of rechargeable battery degradation samples are analyzed to validate the effectiveness of the proposed prognostic method. Moreover, some comparisons with a standard particle filter based prognostic method, a spherical cubature particle filter based prognostic method and two classic Bayesian prognostic methods are conducted to highlight the superiority of the proposed prognostic method. Results show that the proposed prognostic method has lower average prediction errors than the particle filter based prognostic methods and the classic Bayesian prognostic methods for battery remaining useful life prediction.

  8. Flight effects on exhaust noise for turbojet and turbofan engines: Comparison of experimental data with prediction

    NASA Technical Reports Server (NTRS)

    Stone, J. R.

    1976-01-01

    It was demonstrated that static and in flight jet engine exhaust noise can be predicted with reasonable accuracy when the multiple source nature of the problem is taken into account. Jet mixing noise was predicted from the interim prediction method. Provisional methods of estimating internally generated noise and shock noise flight effects were used, based partly on existing prediction methods and partly on recent reported engine data.

  9. Stability theory applications to laminar-flow control

    NASA Technical Reports Server (NTRS)

    Malik, Mujeeb R.

    1987-01-01

    In order to design Laminar Flow Control (LFC) configurations, reliable methods are needed for boundary-layer transition predictions. Among the available methods, there are correlations based upon R sub e, shape factors, Goertler number and crossflow Reynolds number. The most advanced transition prediction method is based upon linear stability theory in the form of the e sup N method which has proven to be successful in predicting transition in two- and three-dimensional boundary layers. When transition occurs in a low disturbance environment, the e sup N method provides a viable design tool for transition prediction and LFC in both 2-D and 3-D subsonic/supersonic flows. This is true for transition dominated by either TS, crossflow, or Goertler instability. If Goertler/TS or crossflow/TS interaction is present, the e sup N will fail to predict transition. However, there is no evidence of such interaction at low amplitudes of Goertler and crossflow vortices.

  10. Prediction of Body Fluids where Proteins are Secreted into Based on Protein Interaction Network

    PubMed Central

    Hu, Le-Le; Huang, Tao; Cai, Yu-Dong; Chou, Kuo-Chen

    2011-01-01

    Determining the body fluids where secreted proteins can be secreted into is important for protein function annotation and disease biomarker discovery. In this study, we developed a network-based method to predict which kind of body fluids human proteins can be secreted into. For a newly constructed benchmark dataset that consists of 529 human-secreted proteins, the prediction accuracy for the most possible body fluid location predicted by our method via the jackknife test was 79.02%, significantly higher than the success rate by a random guess (29.36%). The likelihood that the predicted body fluids of the first four orders contain all the true body fluids where the proteins can be secreted into is 62.94%. Our method was further demonstrated with two independent datasets: one contains 57 proteins that can be secreted into blood; while the other contains 61 proteins that can be secreted into plasma/serum and were possible biomarkers associated with various cancers. For the 57 proteins in first dataset, 55 were correctly predicted as blood-secrete proteins. For the 61 proteins in the second dataset, 58 were predicted to be most possible in plasma/serum. These encouraging results indicate that the network-based prediction method is quite promising. It is anticipated that the method will benefit the relevant areas for both basic research and drug development. PMID:21829572

  11. Link Prediction in Evolving Networks Based on Popularity of Nodes.

    PubMed

    Wang, Tong; He, Xing-Sheng; Zhou, Ming-Yang; Fu, Zhong-Qian

    2017-08-02

    Link prediction aims to uncover the underlying relationship behind networks, which could be utilized to predict missing edges or identify the spurious edges. The key issue of link prediction is to estimate the likelihood of potential links in networks. Most classical static-structure based methods ignore the temporal aspects of networks, limited by the time-varying features, such approaches perform poorly in evolving networks. In this paper, we propose a hypothesis that the ability of each node to attract links depends not only on its structural importance, but also on its current popularity (activeness), since active nodes have much more probability to attract future links. Then a novel approach named popularity based structural perturbation method (PBSPM) and its fast algorithm are proposed to characterize the likelihood of an edge from both existing connectivity structure and current popularity of its two endpoints. Experiments on six evolving networks show that the proposed methods outperform state-of-the-art methods in accuracy and robustness. Besides, visual results and statistical analysis reveal that the proposed methods are inclined to predict future edges between active nodes, rather than edges between inactive nodes.

  12. Computational Methods in Drug Discovery

    PubMed Central

    Sliwoski, Gregory; Kothiwale, Sandeepkumar; Meiler, Jens

    2014-01-01

    Computer-aided drug discovery/design methods have played a major role in the development of therapeutically important small molecules for over three decades. These methods are broadly classified as either structure-based or ligand-based methods. Structure-based methods are in principle analogous to high-throughput screening in that both target and ligand structure information is imperative. Structure-based approaches include ligand docking, pharmacophore, and ligand design methods. The article discusses theory behind the most important methods and recent successful applications. Ligand-based methods use only ligand information for predicting activity depending on its similarity/dissimilarity to previously known active ligands. We review widely used ligand-based methods such as ligand-based pharmacophores, molecular descriptors, and quantitative structure-activity relationships. In addition, important tools such as target/ligand data bases, homology modeling, ligand fingerprint methods, etc., necessary for successful implementation of various computer-aided drug discovery/design methods in a drug discovery campaign are discussed. Finally, computational methods for toxicity prediction and optimization for favorable physiologic properties are discussed with successful examples from literature. PMID:24381236

  13. Automated prediction of protein function and detection of functional sites from structure.

    PubMed

    Pazos, Florencio; Sternberg, Michael J E

    2004-10-12

    Current structural genomics projects are yielding structures for proteins whose functions are unknown. Accordingly, there is a pressing requirement for computational methods for function prediction. Here we present PHUNCTIONER, an automatic method for structure-based function prediction using automatically extracted functional sites (residues associated to functions). The method relates proteins with the same function through structural alignments and extracts 3D profiles of conserved residues. Functional features to train the method are extracted from the Gene Ontology (GO) database. The method extracts these features from the entire GO hierarchy and hence is applicable across the whole range of function specificity. 3D profiles associated with 121 GO annotations were extracted. We tested the power of the method both for the prediction of function and for the extraction of functional sites. The success of function prediction by our method was compared with the standard homology-based method. In the zone of low sequence similarity (approximately 15%), our method assigns the correct GO annotation in 90% of the protein structures considered, approximately 20% higher than inheritance of function from the closest homologue.

  14. Novel method to predict body weight in children based on age and morphological facial features.

    PubMed

    Huang, Ziyin; Barrett, Jeffrey S; Barrett, Kyle; Barrett, Ryan; Ng, Chee M

    2015-04-01

    A new and novel approach of predicting the body weight of children based on age and morphological facial features using a three-layer feed-forward artificial neural network (ANN) model is reported. The model takes in four parameters, including age-based CDC-inferred median body weight and three facial feature distances measured from digital facial images. In this study, thirty-nine volunteer subjects with age ranging from 6-18 years old and BW ranging from 18.6-96.4 kg were used for model development and validation. The final model has a mean prediction error of 0.48, a mean squared error of 18.43, and a coefficient of correlation of 0.94. The model shows significant improvement in prediction accuracy over several age-based body weight prediction methods. Combining with a facial recognition algorithm that can detect, extract and measure the facial features used in this study, mobile applications that incorporate this body weight prediction method may be developed for clinical investigations where access to scales is limited. © 2014, The American College of Clinical Pharmacology.

  15. A Tale of Two Methods: Chart and Interview Methods for Identifying Delirium

    PubMed Central

    Saczynski, Jane S.; Kosar, Cyrus M.; Xu, Guoquan; Puelle, Margaret R.; Schmitt, Eva; Jones, Richard N.; Marcantonio, Edward R.; Wong, Bonnie; Isaza, Ilean; Inouye, Sharon K.

    2014-01-01

    Background Interview and chart-based methods for identifying delirium have been validated. However, relative strengths and limitations of each method have not been described, nor has a combined approach (using both interviews and chart), been systematically examined. Objectives To compare chart and interview-based methods for identification of delirium. Design, Setting and Participants Participants were 300 patients aged 70+ undergoing major elective surgery (majority were orthopedic surgery) interviewed daily during hospitalization for delirium using the Confusion Assessment Method (CAM; interview-based method) and whose medical charts were reviewed for delirium using a validated chart-review method (chart-based method). We examined rate of agreement on the two methods and patient characteristics of those identified using each approach. Predictive validity for clinical outcomes (length of stay, postoperative complications, discharge disposition) was compared. In the absence of a gold-standard, predictive value could not be calculated. Results The cumulative incidence of delirium was 23% (n= 68) by the interview-based method, 12% (n=35) by the chart-based method and 27% (n=82) by the combined approach. Overall agreement was 80%; kappa was 0.30. The methods differed in detection of psychomotor features and time of onset. The chart-based method missed delirium in CAM-identified patients laacking features of psychomotor agitation or inappropriate behavior. The CAM-based method missed chart-identified cases occurring during the night shift. The combined method had high predictive validity for all clinical outcomes. Conclusions Interview and chart-based methods have specific strengths for identification of delirium. A combined approach captures the largest number and the broadest range of delirium cases. PMID:24512042

  16. Massive integration of diverse protein quality assessment methods to improve template based modeling in CASP11.

    PubMed

    Cao, Renzhi; Bhattacharya, Debswapna; Adhikari, Badri; Li, Jilong; Cheng, Jianlin

    2016-09-01

    Model evaluation and selection is an important step and a big challenge in template-based protein structure prediction. Individual model quality assessment methods designed for recognizing some specific properties of protein structures often fail to consistently select good models from a model pool because of their limitations. Therefore, combining multiple complimentary quality assessment methods is useful for improving model ranking and consequently tertiary structure prediction. Here, we report the performance and analysis of our human tertiary structure predictor (MULTICOM) based on the massive integration of 14 diverse complementary quality assessment methods that was successfully benchmarked in the 11th Critical Assessment of Techniques of Protein Structure prediction (CASP11). The predictions of MULTICOM for 39 template-based domains were rigorously assessed by six scoring metrics covering global topology of Cα trace, local all-atom fitness, side chain quality, and physical reasonableness of the model. The results show that the massive integration of complementary, diverse single-model and multi-model quality assessment methods can effectively leverage the strength of single-model methods in distinguishing quality variation among similar good models and the advantage of multi-model quality assessment methods of identifying reasonable average-quality models. The overall excellent performance of the MULTICOM predictor demonstrates that integrating a large number of model quality assessment methods in conjunction with model clustering is a useful approach to improve the accuracy, diversity, and consequently robustness of template-based protein structure prediction. Proteins 2016; 84(Suppl 1):247-259. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.

  17. A New Hybrid-Multiscale SSA Prediction of Non-Stationary Time Series

    NASA Astrophysics Data System (ADS)

    Ghanbarzadeh, Mitra; Aminghafari, Mina

    2016-02-01

    Singular spectral analysis (SSA) is a non-parametric method used in the prediction of non-stationary time series. It has two parameters, which are difficult to determine and very sensitive to their values. Since, SSA is a deterministic-based method, it does not give good results when the time series is contaminated with a high noise level and correlated noise. Therefore, we introduce a novel method to handle these problems. It is based on the prediction of non-decimated wavelet (NDW) signals by SSA and then, prediction of residuals by wavelet regression. The advantages of our method are the automatic determination of parameters and taking account of the stochastic structure of time series. As shown through the simulated and real data, we obtain better results than SSA, a non-parametric wavelet regression method and Holt-Winters method.

  18. International journal of computational fluid dynamics real-time prediction of unsteady flow based on POD reduced-order model and particle filter

    NASA Astrophysics Data System (ADS)

    Kikuchi, Ryota; Misaka, Takashi; Obayashi, Shigeru

    2016-04-01

    An integrated method consisting of a proper orthogonal decomposition (POD)-based reduced-order model (ROM) and a particle filter (PF) is proposed for real-time prediction of an unsteady flow field. The proposed method is validated using identical twin experiments of an unsteady flow field around a circular cylinder for Reynolds numbers of 100 and 1000. In this study, a PF is employed (ROM-PF) to modify the temporal coefficient of the ROM based on observation data because the prediction capability of the ROM alone is limited due to the stability issue. The proposed method reproduces the unsteady flow field several orders faster than a reference numerical simulation based on Navier-Stokes equations. Furthermore, the effects of parameters, related to observation and simulation, on the prediction accuracy are studied. Most of the energy modes of the unsteady flow field are captured, and it is possible to stably predict the long-term evolution with ROM-PF.

  19. Sequence Based Prediction of Antioxidant Proteins Using a Classifier Selection Strategy

    PubMed Central

    Zhang, Lina; Zhang, Chengjin; Gao, Rui; Yang, Runtao; Song, Qing

    2016-01-01

    Antioxidant proteins perform significant functions in maintaining oxidation/antioxidation balance and have potential therapies for some diseases. Accurate identification of antioxidant proteins could contribute to revealing physiological processes of oxidation/antioxidation balance and developing novel antioxidation-based drugs. In this study, an ensemble method is presented to predict antioxidant proteins with hybrid features, incorporating SSI (Secondary Structure Information), PSSM (Position Specific Scoring Matrix), RSA (Relative Solvent Accessibility), and CTD (Composition, Transition, Distribution). The prediction results of the ensemble predictor are determined by an average of prediction results of multiple base classifiers. Based on a classifier selection strategy, we obtain an optimal ensemble classifier composed of RF (Random Forest), SMO (Sequential Minimal Optimization), NNA (Nearest Neighbor Algorithm), and J48 with an accuracy of 0.925. A Relief combined with IFS (Incremental Feature Selection) method is adopted to obtain optimal features from hybrid features. With the optimal features, the ensemble method achieves improved performance with a sensitivity of 0.95, a specificity of 0.93, an accuracy of 0.94, and an MCC (Matthew’s Correlation Coefficient) of 0.880, far better than the existing method. To evaluate the prediction performance objectively, the proposed method is compared with existing methods on the same independent testing dataset. Encouragingly, our method performs better than previous studies. In addition, our method achieves more balanced performance with a sensitivity of 0.878 and a specificity of 0.860. These results suggest that the proposed ensemble method can be a potential candidate for antioxidant protein prediction. For public access, we develop a user-friendly web server for antioxidant protein identification that is freely accessible at http://antioxidant.weka.cc. PMID:27662651

  20. SU-D-BRB-01: A Comparison of Learning Methods for Knowledge Based Dose Prediction for Coplanar and Non-Coplanar Liver Radiotherapy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tran, A; Ruan, D; Woods, K

    Purpose: The predictive power of knowledge based planning (KBP) has considerable potential in the development of automated treatment planning. Here, we examine the predictive capabilities and accuracy of previously reported KBP methods, as well as an artificial neural networks (ANN) method. Furthermore, we compare the predictive accuracy of these methods on coplanar volumetric-modulated arc therapy (VMAT) and non-coplanar 4π radiotherapy. Methods: 30 liver SBRT patients previously treated using coplanar VMAT were selected for this study. The patients were re-planned using 4π radiotherapy, which involves 20 optimally selected non-coplanar IMRT fields. ANNs were used to incorporate enhanced geometric information including livermore » and PTV size, prescription dose, patient girth, and proximity to beams. The performance of ANN was compared to three methods from statistical voxel dose learning (SVDL), wherein the doses of voxels sharing the same distance to the PTV are approximated by either taking the median of the distribution, non-parametric fitting, or skew-normal fitting. These three methods were shown to be capable of predicting DVH, but only median approximation can predict 3D dose. Prediction methods were tested using leave-one-out cross-validation tests and evaluated using residual sum of squares (RSS) for DVH and 3D dose predictions. Results: DVH prediction using non-parametric fitting had the lowest average RSS with 0.1176(4π) and 0.1633(VMAT), compared to 0.4879(4π) and 1.8744(VMAT) RSS for ANN. 3D dose prediction with median approximation had lower RSS with 12.02(4π) and 29.22(VMAT), compared to 27.95(4π) and 130.9(VMAT) for ANN. Conclusion: Paradoxically, although the ANNs included geometric features in addition to the distances to the PTV, it did not perform better in predicting DVH or 3D dose compared to simpler, faster methods based on the distances alone. The study further confirms that the prediction of 4π non-coplanar plans were more accurate than VMAT. NIH R43CA183390 and R01CA188300.« less

  1. Analysis of temporal transcription expression profiles reveal links between protein function and developmental stages of Drosophila melanogaster.

    PubMed

    Wan, Cen; Lees, Jonathan G; Minneci, Federico; Orengo, Christine A; Jones, David T

    2017-10-01

    Accurate gene or protein function prediction is a key challenge in the post-genome era. Most current methods perform well on molecular function prediction, but struggle to provide useful annotations relating to biological process functions due to the limited power of sequence-based features in that functional domain. In this work, we systematically evaluate the predictive power of temporal transcription expression profiles for protein function prediction in Drosophila melanogaster. Our results show significantly better performance on predicting protein function when transcription expression profile-based features are integrated with sequence-derived features, compared with the sequence-derived features alone. We also observe that the combination of expression-based and sequence-based features leads to further improvement of accuracy on predicting all three domains of gene function. Based on the optimal feature combinations, we then propose a novel multi-classifier-based function prediction method for Drosophila melanogaster proteins, FFPred-fly+. Interpreting our machine learning models also allows us to identify some of the underlying links between biological processes and developmental stages of Drosophila melanogaster.

  2. PatchSurfers: Two methods for local molecular property-based binding ligand prediction.

    PubMed

    Shin, Woong-Hee; Bures, Mark Gregory; Kihara, Daisuke

    2016-01-15

    Protein function prediction is an active area of research in computational biology. Function prediction can help biologists make hypotheses for characterization of genes and help interpret biological assays, and thus is a productive area for collaboration between experimental and computational biologists. Among various function prediction methods, predicting binding ligand molecules for a target protein is an important class because ligand binding events for a protein are usually closely intertwined with the proteins' biological function, and also because predicted binding ligands can often be directly tested by biochemical assays. Binding ligand prediction methods can be classified into two types: those which are based on protein-protein (or pocket-pocket) comparison, and those that compare a target pocket directly to ligands. Recently, our group proposed two computational binding ligand prediction methods, Patch-Surfer, which is a pocket-pocket comparison method, and PL-PatchSurfer, which compares a pocket to ligand molecules. The two programs apply surface patch-based descriptions to calculate similarity or complementarity between molecules. A surface patch is characterized by physicochemical properties such as shape, hydrophobicity, and electrostatic potentials. These properties on the surface are represented using three-dimensional Zernike descriptors (3DZD), which are based on a series expansion of a 3 dimensional function. Utilizing 3DZD for describing the physicochemical properties has two main advantages: (1) rotational invariance and (2) fast comparison. Here, we introduce Patch-Surfer and PL-PatchSurfer with an emphasis on PL-PatchSurfer, which is more recently developed. Illustrative examples of PL-PatchSurfer performance on binding ligand prediction as well as virtual drug screening are also provided. Copyright © 2015 Elsevier Inc. All rights reserved.

  3. Comparison of four statistical and machine learning methods for crash severity prediction.

    PubMed

    Iranitalab, Amirfarrokh; Khattak, Aemal

    2017-11-01

    Crash severity prediction models enable different agencies to predict the severity of a reported crash with unknown severity or the severity of crashes that may be expected to occur sometime in the future. This paper had three main objectives: comparison of the performance of four statistical and machine learning methods including Multinomial Logit (MNL), Nearest Neighbor Classification (NNC), Support Vector Machines (SVM) and Random Forests (RF), in predicting traffic crash severity; developing a crash costs-based approach for comparison of crash severity prediction methods; and investigating the effects of data clustering methods comprising K-means Clustering (KC) and Latent Class Clustering (LCC), on the performance of crash severity prediction models. The 2012-2015 reported crash data from Nebraska, United States was obtained and two-vehicle crashes were extracted as the analysis data. The dataset was split into training/estimation (2012-2014) and validation (2015) subsets. The four prediction methods were trained/estimated using the training/estimation dataset and the correct prediction rates for each crash severity level, overall correct prediction rate and a proposed crash costs-based accuracy measure were obtained for the validation dataset. The correct prediction rates and the proposed approach showed NNC had the best prediction performance in overall and in more severe crashes. RF and SVM had the next two sufficient performances and MNL was the weakest method. Data clustering did not affect the prediction results of SVM, but KC improved the prediction performance of MNL, NNC and RF, while LCC caused improvement in MNL and RF but weakened the performance of NNC. Overall correct prediction rate had almost the exact opposite results compared to the proposed approach, showing that neglecting the crash costs can lead to misjudgment in choosing the right prediction method. Copyright © 2017 Elsevier Ltd. All rights reserved.

  4. Improved Density Functional Tight Binding Potentials for Metalloid Aluminum Clusters

    DTIC Science & Technology

    2016-06-01

    simulations of the oxidation of Al4Cp * 4 show reasonable comparison with a DFT-based Car -Parrinello method, including correct prediction of hydride transfers...comparison with a DFT-based Car -Parrinello method, including correct prediction of hydride transfers from Cp* to the metal centers during the...initio molecular dynamics of the oxidation of Al4Cp * 4 using a DFT-based Car -Parrinello method. This simulation, which 43 several months on the

  5. Using string invariants for prediction searching for optimal parameters

    NASA Astrophysics Data System (ADS)

    Bundzel, Marek; Kasanický, Tomáš; Pinčák, Richard

    2016-02-01

    We have developed a novel prediction method based on string invariants. The method does not require learning but a small set of parameters must be set to achieve optimal performance. We have implemented an evolutionary algorithm for the parametric optimization. We have tested the performance of the method on artificial and real world data and compared the performance to statistical methods and to a number of artificial intelligence methods. We have used data and the results of a prediction competition as a benchmark. The results show that the method performs well in single step prediction but the method's performance for multiple step prediction needs to be improved. The method works well for a wide range of parameters.

  6. TMDIM: an improved algorithm for the structure prediction of transmembrane domains of bitopic dimers.

    PubMed

    Cao, Han; Ng, Marcus C K; Jusoh, Siti Azma; Tai, Hio Kuan; Siu, Shirley W I

    2017-09-01

    [Formula: see text]-Helical transmembrane proteins are the most important drug targets in rational drug development. However, solving the experimental structures of these proteins remains difficult, therefore computational methods to accurately and efficiently predict the structures are in great demand. We present an improved structure prediction method TMDIM based on Park et al. (Proteins 57:577-585, 2004) for predicting bitopic transmembrane protein dimers. Three major algorithmic improvements are introduction of the packing type classification, the multiple-condition decoy filtering, and the cluster-based candidate selection. In a test of predicting nine known bitopic dimers, approximately 78% of our predictions achieved a successful fit (RMSD <2.0 Å) and 78% of the cases are better predicted than the two other methods compared. Our method provides an alternative for modeling TM bitopic dimers of unknown structures for further computational studies. TMDIM is freely available on the web at https://cbbio.cis.umac.mo/TMDIM . Website is implemented in PHP, MySQL and Apache, with all major browsers supported.

  7. TMDIM: an improved algorithm for the structure prediction of transmembrane domains of bitopic dimers

    NASA Astrophysics Data System (ADS)

    Cao, Han; Ng, Marcus C. K.; Jusoh, Siti Azma; Tai, Hio Kuan; Siu, Shirley W. I.

    2017-09-01

    α-Helical transmembrane proteins are the most important drug targets in rational drug development. However, solving the experimental structures of these proteins remains difficult, therefore computational methods to accurately and efficiently predict the structures are in great demand. We present an improved structure prediction method TMDIM based on Park et al. (Proteins 57:577-585, 2004) for predicting bitopic transmembrane protein dimers. Three major algorithmic improvements are introduction of the packing type classification, the multiple-condition decoy filtering, and the cluster-based candidate selection. In a test of predicting nine known bitopic dimers, approximately 78% of our predictions achieved a successful fit (RMSD <2.0 Å) and 78% of the cases are better predicted than the two other methods compared. Our method provides an alternative for modeling TM bitopic dimers of unknown structures for further computational studies. TMDIM is freely available on the web at https://cbbio.cis.umac.mo/TMDIM. Website is implemented in PHP, MySQL and Apache, with all major browsers supported.

  8. RNA secondary structure prediction with pseudoknots: Contribution of algorithm versus energy model.

    PubMed

    Jabbari, Hosna; Wark, Ian; Montemagno, Carlo

    2018-01-01

    RNA is a biopolymer with various applications inside the cell and in biotechnology. Structure of an RNA molecule mainly determines its function and is essential to guide nanostructure design. Since experimental structure determination is time-consuming and expensive, accurate computational prediction of RNA structure is of great importance. Prediction of RNA secondary structure is relatively simpler than its tertiary structure and provides information about its tertiary structure, therefore, RNA secondary structure prediction has received attention in the past decades. Numerous methods with different folding approaches have been developed for RNA secondary structure prediction. While methods for prediction of RNA pseudoknot-free structure (structures with no crossing base pairs) have greatly improved in terms of their accuracy, methods for prediction of RNA pseudoknotted secondary structure (structures with crossing base pairs) still have room for improvement. A long-standing question for improving the prediction accuracy of RNA pseudoknotted secondary structure is whether to focus on the prediction algorithm or the underlying energy model, as there is a trade-off on computational cost of the prediction algorithm versus the generality of the method. The aim of this work is to argue when comparing different methods for RNA pseudoknotted structure prediction, the combination of algorithm and energy model should be considered and a method should not be considered superior or inferior to others if they do not use the same scoring model. We demonstrate that while the folding approach is important in structure prediction, it is not the only important factor in prediction accuracy of a given method as the underlying energy model is also as of great value. Therefore we encourage researchers to pay particular attention in comparing methods with different energy models.

  9. Hierarchical Interactions Model for Predicting Mild Cognitive Impairment (MCI) to Alzheimer's Disease (AD) Conversion

    PubMed Central

    Li, Han; Liu, Yashu; Gong, Pinghua; Zhang, Changshui; Ye, Jieping

    2014-01-01

    Identifying patients with Mild Cognitive Impairment (MCI) who are likely to convert to dementia has recently attracted increasing attention in Alzheimer's disease (AD) research. An accurate prediction of conversion from MCI to AD can aid clinicians to initiate treatments at early stage and monitor their effectiveness. However, existing prediction systems based on the original biosignatures are not satisfactory. In this paper, we propose to fit the prediction models using pairwise biosignature interactions, thus capturing higher-order relationship among biosignatures. Specifically, we employ hierarchical constraints and sparsity regularization to prune the high-dimensional input features. Based on the significant biosignatures and underlying interactions identified, we build classifiers to predict the conversion probability based on the selected features. We further analyze the underlying interaction effects of different biosignatures based on the so-called stable expectation scores. We have used 293 MCI subjects from Alzheimer's Disease Neuroimaging Initiative (ADNI) database that have MRI measurements at the baseline to evaluate the effectiveness of the proposed method. Our proposed method achieves better classification performance than state-of-the-art methods. Moreover, we discover several significant interactions predictive of MCI-to-AD conversion. These results shed light on improving the prediction performance using interaction features. PMID:24416143

  10. Comparisons of survival predictions using survival risk ratios based on International Classification of Diseases, Ninth Revision and Abbreviated Injury Scale trauma diagnosis codes.

    PubMed

    Clarke, John R; Ragone, Andrew V; Greenwald, Lloyd

    2005-09-01

    We conducted a comparison of methods for predicting survival using survival risk ratios (SRRs), including new comparisons based on International Classification of Diseases, Ninth Revision (ICD-9) versus Abbreviated Injury Scale (AIS) six-digit codes. From the Pennsylvania trauma center's registry, all direct trauma admissions were collected through June 22, 1999. Patients with no comorbid medical diagnoses and both ICD-9 and AIS injury codes were used for comparisons based on a single set of data. SRRs for ICD-9 and then for AIS diagnostic codes were each calculated two ways: from the survival rate of patients with each diagnosis and when each diagnosis was an isolated diagnosis. Probabilities of survival for the cohort were calculated using each set of SRRs by the multiplicative ICISS method and, where appropriate, the minimum SRR method. These prediction sets were then internally validated against actual survival by the Hosmer-Lemeshow goodness-of-fit statistic. The 41,364 patients had 1,224 different ICD-9 injury diagnoses in 32,261 combinations and 1,263 corresponding AIS injury diagnoses in 31,755 combinations, ranging from 1 to 27 injuries per patient. All conventional ICD-9-based combinations of SRRs and methods had better Hosmer-Lemeshow goodness-of-fit statistic fits than their AIS-based counterparts. The minimum SRR method produced better calibration than the multiplicative methods, presumably because it did not magnify inaccuracies in the SRRs that might occur with multiplication. Predictions of survival based on anatomic injury alone can be performed using ICD-9 codes, with no advantage from extra coding of AIS diagnoses. Predictions based on the single worst SRR were closer to actual outcomes than those based on multiplying SRRs.

  11. Integrated Detection and Prediction of Influenza Activity for Real-Time Surveillance: Algorithm Design

    PubMed Central

    2017-01-01

    Background Influenza is a viral respiratory disease capable of causing epidemics that represent a threat to communities worldwide. The rapidly growing availability of electronic “big data” from diagnostic and prediagnostic sources in health care and public health settings permits advance of a new generation of methods for local detection and prediction of winter influenza seasons and influenza pandemics. Objective The aim of this study was to present a method for integrated detection and prediction of influenza virus activity in local settings using electronically available surveillance data and to evaluate its performance by retrospective application on authentic data from a Swedish county. Methods An integrated detection and prediction method was formally defined based on a design rationale for influenza detection and prediction methods adapted for local surveillance. The novel method was retrospectively applied on data from the winter influenza season 2008-09 in a Swedish county (population 445,000). Outcome data represented individuals who met a clinical case definition for influenza (based on International Classification of Diseases version 10 [ICD-10] codes) from an electronic health data repository. Information from calls to a telenursing service in the county was used as syndromic data source. Results The novel integrated detection and prediction method is based on nonmechanistic statistical models and is designed for integration in local health information systems. The method is divided into separate modules for detection and prediction of local influenza virus activity. The function of the detection module is to alert for an upcoming period of increased load of influenza cases on local health care (using influenza-diagnosis data), whereas the function of the prediction module is to predict the timing of the activity peak (using syndromic data) and its intensity (using influenza-diagnosis data). For detection modeling, exponential regression was used based on the assumption that the beginning of a winter influenza season has an exponential growth of infected individuals. For prediction modeling, linear regression was applied on 7-day periods at the time in order to find the peak timing, whereas a derivate of a normal distribution density function was used to find the peak intensity. We found that the integrated detection and prediction method detected the 2008-09 winter influenza season on its starting day (optimal timeliness 0 days), whereas the predicted peak was estimated to occur 7 days ahead of the factual peak and the predicted peak intensity was estimated to be 26% lower than the factual intensity (6.3 compared with 8.5 influenza-diagnosis cases/100,000). Conclusions Our detection and prediction method is one of the first integrated methods specifically designed for local application on influenza data electronically available for surveillance. The performance of the method in a retrospective study indicates that further prospective evaluations of the methods are justified. PMID:28619700

  12. Flight-Test Evaluation of Flutter-Prediction Methods

    NASA Technical Reports Server (NTRS)

    Lind, RIck; Brenner, Marty

    2003-01-01

    The flight-test community routinely spends considerable time and money to determine a range of flight conditions, called a flight envelope, within which an aircraft is safe to fly. The cost of determining a flight envelope could be greatly reduced if there were a method of safely and accurately predicting the speed associated with the onset of an instability called flutter. Several methods have been developed with the goal of predicting flutter speeds to improve the efficiency of flight testing. These methods include (1) data-based methods, in which one relies entirely on information obtained from the flight tests and (2) model-based approaches, in which one relies on a combination of flight data and theoretical models. The data-driven methods include one based on extrapolation of damping trends, one that involves an envelope function, one that involves the Zimmerman-Weissenburger flutter margin, and one that involves a discrete-time auto-regressive model. An example of a model-based approach is that of the flutterometer. These methods have all been shown to be theoretically valid and have been demonstrated on simple test cases; however, until now, they have not been thoroughly evaluated in flight tests. An experimental apparatus called the Aerostructures Test Wing (ATW) was developed to test these prediction methods.

  13. A deep learning-based multi-model ensemble method for cancer prediction.

    PubMed

    Xiao, Yawen; Wu, Jun; Lin, Zongli; Zhao, Xiaodong

    2018-01-01

    Cancer is a complex worldwide health problem associated with high mortality. With the rapid development of the high-throughput sequencing technology and the application of various machine learning methods that have emerged in recent years, progress in cancer prediction has been increasingly made based on gene expression, providing insight into effective and accurate treatment decision making. Thus, developing machine learning methods, which can successfully distinguish cancer patients from healthy persons, is of great current interest. However, among the classification methods applied to cancer prediction so far, no one method outperforms all the others. In this paper, we demonstrate a new strategy, which applies deep learning to an ensemble approach that incorporates multiple different machine learning models. We supply informative gene data selected by differential gene expression analysis to five different classification models. Then, a deep learning method is employed to ensemble the outputs of the five classifiers. The proposed deep learning-based multi-model ensemble method was tested on three public RNA-seq data sets of three kinds of cancers, Lung Adenocarcinoma, Stomach Adenocarcinoma and Breast Invasive Carcinoma. The test results indicate that it increases the prediction accuracy of cancer for all the tested RNA-seq data sets as compared to using a single classifier or the majority voting algorithm. By taking full advantage of different classifiers, the proposed deep learning-based multi-model ensemble method is shown to be accurate and effective for cancer prediction. Copyright © 2017 Elsevier B.V. All rights reserved.

  14. Comparison of two DSC-based methods to predict drug-polymer solubility.

    PubMed

    Rask, Malte Bille; Knopp, Matthias Manne; Olesen, Niels Erik; Holm, René; Rades, Thomas

    2018-04-05

    The aim of the present study was to compare two DSC-based methods to predict drug-polymer solubility (melting point depression method and recrystallization method) and propose a guideline for selecting the most suitable method based on physicochemical properties of both the drug and the polymer. Using the two methods, the solubilities of celecoxib, indomethacin, carbamazepine, and ritonavir in polyvinylpyrrolidone, hydroxypropyl methylcellulose, and Soluplus® were determined at elevated temperatures and extrapolated to room temperature using the Flory-Huggins model. For the melting point depression method, it was observed that a well-defined drug melting point was required in order to predict drug-polymer solubility, since the method is based on the depression of the melting point as a function of polymer content. In contrast to previous findings, it was possible to measure melting point depression up to 20 °C below the glass transition temperature (T g ) of the polymer for some systems. Nevertheless, in general it was possible to obtain solubility measurements at lower temperatures using polymers with a low T g . Finally, for the recrystallization method it was found that the experimental composition dependence of the T g must be differentiable for compositions ranging from 50 to 90% drug (w/w) so that one T g corresponds to only one composition. Based on these findings, a guideline for selecting the most suitable thermal method to predict drug-polymer solubility based on the physicochemical properties of the drug and polymer is suggested in the form of a decision tree. Copyright © 2018 Elsevier B.V. All rights reserved.

  15. A class-based link prediction using Distance Dependent Chinese Restaurant Process

    NASA Astrophysics Data System (ADS)

    Andalib, Azam; Babamir, Seyed Morteza

    2016-08-01

    One of the important tasks in relational data analysis is link prediction which has been successfully applied on many applications such as bioinformatics, information retrieval, etc. The link prediction is defined as predicting the existence or absence of edges between nodes of a network. In this paper, we propose a novel method for link prediction based on Distance Dependent Chinese Restaurant Process (DDCRP) model which enables us to utilize the information of the topological structure of the network such as shortest path and connectivity of the nodes. We also propose a new Gibbs sampling algorithm for computing the posterior distribution of the hidden variables based on the training data. Experimental results on three real-world datasets show the superiority of the proposed method over other probabilistic models for link prediction problem.

  16. Long-Term Deflection Prediction from Computer Vision-Measured Data History for High-Speed Railway Bridges

    PubMed Central

    Lee, Jaebeom; Lee, Young-Joo

    2018-01-01

    Management of the vertical long-term deflection of a high-speed railway bridge is a crucial factor to guarantee traffic safety and passenger comfort. Therefore, there have been efforts to predict the vertical deflection of a railway bridge based on physics-based models representing various influential factors to vertical deflection such as concrete creep and shrinkage. However, it is not an easy task because the vertical deflection of a railway bridge generally involves several sources of uncertainty. This paper proposes a probabilistic method that employs a Gaussian process to construct a model to predict the vertical deflection of a railway bridge based on actual vision-based measurement and temperature. To deal with the sources of uncertainty which may cause prediction errors, a Gaussian process is modeled with multiple kernels and hyperparameters. Once the hyperparameters are identified through the Gaussian process regression using training data, the proposed method provides a 95% prediction interval as well as a predictive mean about the vertical deflection of the bridge. The proposed method is applied to an arch bridge under operation for high-speed trains in South Korea. The analysis results obtained from the proposed method show good agreement with the actual measurement data on the vertical deflection of the example bridge, and the prediction results can be utilized for decision-making on railway bridge maintenance. PMID:29747421

  17. Long-Term Deflection Prediction from Computer Vision-Measured Data History for High-Speed Railway Bridges.

    PubMed

    Lee, Jaebeom; Lee, Kyoung-Chan; Lee, Young-Joo

    2018-05-09

    Management of the vertical long-term deflection of a high-speed railway bridge is a crucial factor to guarantee traffic safety and passenger comfort. Therefore, there have been efforts to predict the vertical deflection of a railway bridge based on physics-based models representing various influential factors to vertical deflection such as concrete creep and shrinkage. However, it is not an easy task because the vertical deflection of a railway bridge generally involves several sources of uncertainty. This paper proposes a probabilistic method that employs a Gaussian process to construct a model to predict the vertical deflection of a railway bridge based on actual vision-based measurement and temperature. To deal with the sources of uncertainty which may cause prediction errors, a Gaussian process is modeled with multiple kernels and hyperparameters. Once the hyperparameters are identified through the Gaussian process regression using training data, the proposed method provides a 95% prediction interval as well as a predictive mean about the vertical deflection of the bridge. The proposed method is applied to an arch bridge under operation for high-speed trains in South Korea. The analysis results obtained from the proposed method show good agreement with the actual measurement data on the vertical deflection of the example bridge, and the prediction results can be utilized for decision-making on railway bridge maintenance.

  18. ProTox: a web server for the in silico prediction of rodent oral toxicity.

    PubMed

    Drwal, Malgorzata N; Banerjee, Priyanka; Dunkel, Mathias; Wettig, Martin R; Preissner, Robert

    2014-07-01

    Animal trials are currently the major method for determining the possible toxic effects of drug candidates and cosmetics. In silico prediction methods represent an alternative approach and aim to rationalize the preclinical drug development, thus enabling the reduction of the associated time, costs and animal experiments. Here, we present ProTox, a web server for the prediction of rodent oral toxicity. The prediction method is based on the analysis of the similarity of compounds with known median lethal doses (LD50) and incorporates the identification of toxic fragments, therefore representing a novel approach in toxicity prediction. In addition, the web server includes an indication of possible toxicity targets which is based on an in-house collection of protein-ligand-based pharmacophore models ('toxicophores') for targets associated with adverse drug reactions. The ProTox web server is open to all users and can be accessed without registration at: http://tox.charite.de/tox. The only requirement for the prediction is the two-dimensional structure of the input compounds. All ProTox methods have been evaluated based on a diverse external validation set and displayed strong performance (sensitivity, specificity and precision of 76, 95 and 75%, respectively) and superiority over other toxicity prediction tools, indicating their possible applicability for other compound classes. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  19. Stability of rigid rotors supported by air foil bearings: Comparison of two fundamental approaches

    NASA Astrophysics Data System (ADS)

    Larsen, Jon S.; Santos, Ilmar F.; von Osmanski, Sebastian

    2016-10-01

    High speed direct drive motors enable the use of Air Foil Bearings (AFB) in a wide range of applications due to the elimination of gear forces. Unfortunately, AFB supported rotors are lightly damped, and an accurate prediction of their Onset Speed of Instability (OSI) is therefore important. This paper compares two fundamental methods for predicting the OSI. One is based on a nonlinear time domain simulation and another is based on a linearised frequency domain method and a perturbation of the Reynolds equation. Both methods are based on equivalent models and should predict similar results. Significant discrepancies are observed leading to the question, is the classical frequency domain method sufficiently accurate? The discrepancies and possible explanations are discussed in detail.

  20. Linear reduction methods for tag SNP selection.

    PubMed

    He, Jingwu; Zelikovsky, Alex

    2004-01-01

    It is widely hoped that constructing a complete human haplotype map will help to associate complex diseases with certain SNP's. Unfortunately, the number of SNP's is huge and it is very costly to sequence many individuals. Therefore, it is desirable to reduce the number of SNP's that should be sequenced to considerably small number of informative representatives, so called tag SNP's. In this paper, we propose a new linear algebra based method for selecting and using tag SNP's. Our method is purely combinatorial and can be combined with linkage disequilibrium (LD) and block based methods. We measure the quality of our tag SNP selection algorithm by comparing actual SNP's with SNP's linearly predicted from linearly chosen tag SNP's. We obtain an extremely good compression and prediction rates. For example, for long haplotypes (>25000 SNP's), knowing only 0.4% of all SNP's we predict the entire unknown haplotype with 2% accuracy while the prediction method is based on a 10% sample of the population.

  1. Protein subcellular localization prediction using artificial intelligence technology.

    PubMed

    Nair, Rajesh; Rost, Burkhard

    2008-01-01

    Proteins perform many important tasks in living organisms, such as catalysis of biochemical reactions, transport of nutrients, and recognition and transmission of signals. The plethora of aspects of the role of any particular protein is referred to as its "function." One aspect of protein function that has been the target of intensive research by computational biologists is its subcellular localization. Proteins must be localized in the same subcellular compartment to cooperate toward a common physiological function. Aberrant subcellular localization of proteins can result in several diseases, including kidney stones, cancer, and Alzheimer's disease. To date, sequence homology remains the most widely used method for inferring the function of a protein. However, the application of advanced artificial intelligence (AI)-based techniques in recent years has resulted in significant improvements in our ability to predict the subcellular localization of a protein. The prediction accuracy has risen steadily over the years, in large part due to the application of AI-based methods such as hidden Markov models (HMMs), neural networks (NNs), and support vector machines (SVMs), although the availability of larger experimental datasets has also played a role. Automatic methods that mine textual information from the biological literature and molecular biology databases have considerably sped up the process of annotation for proteins for which some information regarding function is available in the literature. State-of-the-art methods based on NNs and HMMs can predict the presence of N-terminal sorting signals extremely accurately. Ab initio methods that predict subcellular localization for any protein sequence using only the native amino acid sequence and features predicted from the native sequence have shown the most remarkable improvements. The prediction accuracy of these methods has increased by over 30% in the past decade. The accuracy of these methods is now on par with high-throughput methods for predicting localization, and they are beginning to play an important role in directing experimental research. In this chapter, we review some of the most important methods for the prediction of subcellular localization.

  2. A novel artificial neural network method for biomedical prediction based on matrix pseudo-inversion.

    PubMed

    Cai, Binghuang; Jiang, Xia

    2014-04-01

    Biomedical prediction based on clinical and genome-wide data has become increasingly important in disease diagnosis and classification. To solve the prediction problem in an effective manner for the improvement of clinical care, we develop a novel Artificial Neural Network (ANN) method based on Matrix Pseudo-Inversion (MPI) for use in biomedical applications. The MPI-ANN is constructed as a three-layer (i.e., input, hidden, and output layers) feed-forward neural network, and the weights connecting the hidden and output layers are directly determined based on MPI without a lengthy learning iteration. The LASSO (Least Absolute Shrinkage and Selection Operator) method is also presented for comparative purposes. Single Nucleotide Polymorphism (SNP) simulated data and real breast cancer data are employed to validate the performance of the MPI-ANN method via 5-fold cross validation. Experimental results demonstrate the efficacy of the developed MPI-ANN for disease classification and prediction, in view of the significantly superior accuracy (i.e., the rate of correct predictions), as compared with LASSO. The results based on the real breast cancer data also show that the MPI-ANN has better performance than other machine learning methods (including support vector machine (SVM), logistic regression (LR), and an iterative ANN). In addition, experiments demonstrate that our MPI-ANN could be used for bio-marker selection as well. Copyright © 2013 Elsevier Inc. All rights reserved.

  3. Prediction of Drug-Target Interaction Networks from the Integration of Protein Sequences and Drug Chemical Structures.

    PubMed

    Meng, Fan-Rong; You, Zhu-Hong; Chen, Xing; Zhou, Yong; An, Ji-Yong

    2017-07-05

    Knowledge of drug-target interaction (DTI) plays an important role in discovering new drug candidates. Unfortunately, there are unavoidable shortcomings; including the time-consuming and expensive nature of the experimental method to predict DTI. Therefore, it motivates us to develop an effective computational method to predict DTI based on protein sequence. In the paper, we proposed a novel computational approach based on protein sequence, namely PDTPS (Predicting Drug Targets with Protein Sequence) to predict DTI. The PDTPS method combines Bi-gram probabilities (BIGP), Position Specific Scoring Matrix (PSSM), and Principal Component Analysis (PCA) with Relevance Vector Machine (RVM). In order to evaluate the prediction capacity of the PDTPS, the experiment was carried out on enzyme, ion channel, GPCR, and nuclear receptor datasets by using five-fold cross-validation tests. The proposed PDTPS method achieved average accuracy of 97.73%, 93.12%, 86.78%, and 87.78% on enzyme, ion channel, GPCR and nuclear receptor datasets, respectively. The experimental results showed that our method has good prediction performance. Furthermore, in order to further evaluate the prediction performance of the proposed PDTPS method, we compared it with the state-of-the-art support vector machine (SVM) classifier on enzyme and ion channel datasets, and other exiting methods on four datasets. The promising comparison results further demonstrate that the efficiency and robust of the proposed PDTPS method. This makes it a useful tool and suitable for predicting DTI, as well as other bioinformatics tasks.

  4. HomPPI: a class of sequence homology based protein-protein interface prediction methods

    PubMed Central

    2011-01-01

    Background Although homology-based methods are among the most widely used methods for predicting the structure and function of proteins, the question as to whether interface sequence conservation can be effectively exploited in predicting protein-protein interfaces has been a subject of debate. Results We studied more than 300,000 pair-wise alignments of protein sequences from structurally characterized protein complexes, including both obligate and transient complexes. We identified sequence similarity criteria required for accurate homology-based inference of interface residues in a query protein sequence. Based on these analyses, we developed HomPPI, a class of sequence homology-based methods for predicting protein-protein interface residues. We present two variants of HomPPI: (i) NPS-HomPPI (Non partner-specific HomPPI), which can be used to predict interface residues of a query protein in the absence of knowledge of the interaction partner; and (ii) PS-HomPPI (Partner-specific HomPPI), which can be used to predict the interface residues of a query protein with a specific target protein. Our experiments on a benchmark dataset of obligate homodimeric complexes show that NPS-HomPPI can reliably predict protein-protein interface residues in a given protein, with an average correlation coefficient (CC) of 0.76, sensitivity of 0.83, and specificity of 0.78, when sequence homologs of the query protein can be reliably identified. NPS-HomPPI also reliably predicts the interface residues of intrinsically disordered proteins. Our experiments suggest that NPS-HomPPI is competitive with several state-of-the-art interface prediction servers including those that exploit the structure of the query proteins. The partner-specific classifier, PS-HomPPI can, on a large dataset of transient complexes, predict the interface residues of a query protein with a specific target, with a CC of 0.65, sensitivity of 0.69, and specificity of 0.70, when homologs of both the query and the target can be reliably identified. The HomPPI web server is available at http://homppi.cs.iastate.edu/. Conclusions Sequence homology-based methods offer a class of computationally efficient and reliable approaches for predicting the protein-protein interface residues that participate in either obligate or transient interactions. For query proteins involved in transient interactions, the reliability of interface residue prediction can be improved by exploiting knowledge of putative interaction partners. PMID:21682895

  5. Predictive Analytical Model for Isolator Shock-Train Location in a Mach 2.2 Direct-Connect Supersonic Combustion Tunnel

    NASA Astrophysics Data System (ADS)

    Lingren, Joe; Vanstone, Leon; Hashemi, Kelley; Gogineni, Sivaram; Donbar, Jeffrey; Akella, Maruthi; Clemens, Noel

    2016-11-01

    This study develops an analytical model for predicting the leading shock of a shock-train in the constant area isolator section in a Mach 2.2 direct-connect scramjet simulation tunnel. The effective geometry of the isolator is assumed to be a weakly converging duct owing to boundary-layer growth. For some given pressure rise across the isolator, quasi-1D equations relating to isentropic or normal shock flows can be used to predict the normal shock location in the isolator. The surface pressure distribution through the isolator was measured during experiments and both the actual and predicted locations can be calculated. Three methods of finding the shock-train location are examined, one based on the measured pressure rise, one using a non-physics-based control model, and one using the physics-based analytical model. It is shown that the analytical model performs better than the non-physics-based model in all cases. The analytic model is less accurate than the pressure threshold method but requires significantly less information to compute. In contrast to other methods for predicting shock-train location, this method is relatively accurate and requires as little as a single pressure measurement. This makes this method potentially useful for unstart control applications.

  6. A Prediction Model for Functional Outcomes in Spinal Cord Disorder Patients Using Gaussian Process Regression.

    PubMed

    Lee, Sunghoon Ivan; Mortazavi, Bobak; Hoffman, Haydn A; Lu, Derek S; Li, Charles; Paak, Brian H; Garst, Jordan H; Razaghy, Mehrdad; Espinal, Marie; Park, Eunjeong; Lu, Daniel C; Sarrafzadeh, Majid

    2016-01-01

    Predicting the functional outcomes of spinal cord disorder patients after medical treatments, such as a surgical operation, has always been of great interest. Accurate posttreatment prediction is especially beneficial for clinicians, patients, care givers, and therapists. This paper introduces a prediction method for postoperative functional outcomes by a novel use of Gaussian process regression. The proposed method specifically considers the restricted value range of the target variables by modeling the Gaussian process based on a truncated Normal distribution, which significantly improves the prediction results. The prediction has been made in assistance with target tracking examinations using a highly portable and inexpensive handgrip device, which greatly contributes to the prediction performance. The proposed method has been validated through a dataset collected from a clinical cohort pilot involving 15 patients with cervical spinal cord disorder. The results show that the proposed method can accurately predict postoperative functional outcomes, Oswestry disability index and target tracking scores, based on the patient's preoperative information with a mean absolute error of 0.079 and 0.014 (out of 1.0), respectively.

  7. ASPIC: a novel method to predict the exon-intron structure of a gene that is optimally compatible to a set of transcript sequences.

    PubMed

    Bonizzoni, Paola; Rizzi, Raffaella; Pesole, Graziano

    2005-10-05

    Currently available methods to predict splice sites are mainly based on the independent and progressive alignment of transcript data (mostly ESTs) to the genomic sequence. Apart from often being computationally expensive, this approach is vulnerable to several problems--hence the need to develop novel strategies. We propose a method, based on a novel multiple genome-EST alignment algorithm, for the detection of splice sites. To avoid limitations of splice sites prediction (mainly, over-predictions) due to independent single EST alignments to the genomic sequence our approach performs a multiple alignment of transcript data to the genomic sequence based on the combined analysis of all available data. We recast the problem of predicting constitutive and alternative splicing as an optimization problem, where the optimal multiple transcript alignment minimizes the number of exons and hence of splice site observations. We have implemented a splice site predictor based on this algorithm in the software tool ASPIC (Alternative Splicing PredICtion). It is distinguished from other methods based on BLAST-like tools by the incorporation of entirely new ad hoc procedures for accurate and computationally efficient transcript alignment and adopts dynamic programming for the refinement of intron boundaries. ASPIC also provides the minimal set of non-mergeable transcript isoforms compatible with the detected splicing events. The ASPIC web resource is dynamically interconnected with the Ensembl and Unigene databases and also implements an upload facility. Extensive bench marking shows that ASPIC outperforms other existing methods in the detection of novel splicing isoforms and in the minimization of over-predictions. ASPIC also requires a lower computation time for processing a single gene and an EST cluster. The ASPIC web resource is available at http://aspic.algo.disco.unimib.it/aspic-devel/.

  8. A novel method to predict current voltage characteristics of positive corona discharges based on a perturbation technique. I. Local analysis

    NASA Astrophysics Data System (ADS)

    Shibata, Hisaichi; Takaki, Ryoji

    2017-11-01

    A novel method to compute current-voltage characteristics (CVCs) of direct current positive corona discharges is formulated based on a perturbation technique. We use linearized fluid equations coupled with the linearized Poisson's equation. Townsend relation is assumed to predict CVCs apart from the linearization point. We choose coaxial cylinders as a test problem, and we have successfully predicted parameters which can determine CVCs with arbitrary inner and outer radii. It is also confirmed that the proposed method essentially does not induce numerical instabilities.

  9. Integrated Detection and Prediction of Influenza Activity for Real-Time Surveillance: Algorithm Design.

    PubMed

    Spreco, Armin; Eriksson, Olle; Dahlström, Örjan; Cowling, Benjamin John; Timpka, Toomas

    2017-06-15

    Influenza is a viral respiratory disease capable of causing epidemics that represent a threat to communities worldwide. The rapidly growing availability of electronic "big data" from diagnostic and prediagnostic sources in health care and public health settings permits advance of a new generation of methods for local detection and prediction of winter influenza seasons and influenza pandemics. The aim of this study was to present a method for integrated detection and prediction of influenza virus activity in local settings using electronically available surveillance data and to evaluate its performance by retrospective application on authentic data from a Swedish county. An integrated detection and prediction method was formally defined based on a design rationale for influenza detection and prediction methods adapted for local surveillance. The novel method was retrospectively applied on data from the winter influenza season 2008-09 in a Swedish county (population 445,000). Outcome data represented individuals who met a clinical case definition for influenza (based on International Classification of Diseases version 10 [ICD-10] codes) from an electronic health data repository. Information from calls to a telenursing service in the county was used as syndromic data source. The novel integrated detection and prediction method is based on nonmechanistic statistical models and is designed for integration in local health information systems. The method is divided into separate modules for detection and prediction of local influenza virus activity. The function of the detection module is to alert for an upcoming period of increased load of influenza cases on local health care (using influenza-diagnosis data), whereas the function of the prediction module is to predict the timing of the activity peak (using syndromic data) and its intensity (using influenza-diagnosis data). For detection modeling, exponential regression was used based on the assumption that the beginning of a winter influenza season has an exponential growth of infected individuals. For prediction modeling, linear regression was applied on 7-day periods at the time in order to find the peak timing, whereas a derivate of a normal distribution density function was used to find the peak intensity. We found that the integrated detection and prediction method detected the 2008-09 winter influenza season on its starting day (optimal timeliness 0 days), whereas the predicted peak was estimated to occur 7 days ahead of the factual peak and the predicted peak intensity was estimated to be 26% lower than the factual intensity (6.3 compared with 8.5 influenza-diagnosis cases/100,000). Our detection and prediction method is one of the first integrated methods specifically designed for local application on influenza data electronically available for surveillance. The performance of the method in a retrospective study indicates that further prospective evaluations of the methods are justified. ©Armin Spreco, Olle Eriksson, Örjan Dahlström, Benjamin John Cowling, Toomas Timpka. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 15.06.2017.

  10. Prediction of Drug-Target Interactions and Drug Repositioning via Network-Based Inference

    PubMed Central

    Jiang, Jing; Lu, Weiqiang; Li, Weihua; Liu, Guixia; Zhou, Weixing; Huang, Jin; Tang, Yun

    2012-01-01

    Drug-target interaction (DTI) is the basis of drug discovery and design. It is time consuming and costly to determine DTI experimentally. Hence, it is necessary to develop computational methods for the prediction of potential DTI. Based on complex network theory, three supervised inference methods were developed here to predict DTI and used for drug repositioning, namely drug-based similarity inference (DBSI), target-based similarity inference (TBSI) and network-based inference (NBI). Among them, NBI performed best on four benchmark data sets. Then a drug-target network was created with NBI based on 12,483 FDA-approved and experimental drug-target binary links, and some new DTIs were further predicted. In vitro assays confirmed that five old drugs, namely montelukast, diclofenac, simvastatin, ketoconazole, and itraconazole, showed polypharmacological features on estrogen receptors or dipeptidyl peptidase-IV with half maximal inhibitory or effective concentration ranged from 0.2 to 10 µM. Moreover, simvastatin and ketoconazole showed potent antiproliferative activities on human MDA-MB-231 breast cancer cell line in MTT assays. The results indicated that these methods could be powerful tools in prediction of DTIs and drug repositioning. PMID:22589709

  11. A Method to Predict the Structure and Stability of RNA/RNA Complexes.

    PubMed

    Xu, Xiaojun; Chen, Shi-Jie

    2016-01-01

    RNA/RNA interactions are essential for genomic RNA dimerization and regulation of gene expression. Intermolecular loop-loop base pairing is a widespread and functionally important tertiary structure motif in RNA machinery. However, computational prediction of intermolecular loop-loop base pairing is challenged by the entropy and free energy calculation due to the conformational constraint and the intermolecular interactions. In this chapter, we describe a recently developed statistical mechanics-based method for the prediction of RNA/RNA complex structures and stabilities. The method is based on the virtual bond RNA folding model (Vfold). The main emphasis in the method is placed on the evaluation of the entropy and free energy for the loops, especially tertiary kissing loops. The method also uses recursive partition function calculations and two-step screening algorithm for large, complicated structures of RNA/RNA complexes. As case studies, we use the HIV-1 Mal dimer and the siRNA/HIV-1 mutant (T4) to illustrate the method.

  12. Predicting domain-domain interaction based on domain profiles with feature selection and support vector machines

    PubMed Central

    2010-01-01

    Background Protein-protein interaction (PPI) plays essential roles in cellular functions. The cost, time and other limitations associated with the current experimental methods have motivated the development of computational methods for predicting PPIs. As protein interactions generally occur via domains instead of the whole molecules, predicting domain-domain interaction (DDI) is an important step toward PPI prediction. Computational methods developed so far have utilized information from various sources at different levels, from primary sequences, to molecular structures, to evolutionary profiles. Results In this paper, we propose a computational method to predict DDI using support vector machines (SVMs), based on domains represented as interaction profile hidden Markov models (ipHMM) where interacting residues in domains are explicitly modeled according to the three dimensional structural information available at the Protein Data Bank (PDB). Features about the domains are extracted first as the Fisher scores derived from the ipHMM and then selected using singular value decomposition (SVD). Domain pairs are represented by concatenating their selected feature vectors, and classified by a support vector machine trained on these feature vectors. The method is tested by leave-one-out cross validation experiments with a set of interacting protein pairs adopted from the 3DID database. The prediction accuracy has shown significant improvement as compared to InterPreTS (Interaction Prediction through Tertiary Structure), an existing method for PPI prediction that also uses the sequences and complexes of known 3D structure. Conclusions We show that domain-domain interaction prediction can be significantly enhanced by exploiting information inherent in the domain profiles via feature selection based on Fisher scores, singular value decomposition and supervised learning based on support vector machines. Datasets and source code are freely available on the web at http://liao.cis.udel.edu/pub/svdsvm. Implemented in Matlab and supported on Linux and MS Windows. PMID:21034480

  13. Linear Elastic and Cohesive Fracture Analysis to Model Hydraulic Fracture in Brittle and Ductile Rocks

    NASA Astrophysics Data System (ADS)

    Yao, Yao

    2012-05-01

    Hydraulic fracturing technology is being widely used within the oil and gas industry for both waste injection and unconventional gas production wells. It is essential to predict the behavior of hydraulic fractures accurately based on understanding the fundamental mechanism(s). The prevailing approach for hydraulic fracture modeling continues to rely on computational methods based on Linear Elastic Fracture Mechanics (LEFM). Generally, these methods give reasonable predictions for hard rock hydraulic fracture processes, but still have inherent limitations, especially when fluid injection is performed in soft rock/sand or other non-conventional formations. These methods typically give very conservative predictions on fracture geometry and inaccurate estimation of required fracture pressure. One of the reasons the LEFM-based methods fail to give accurate predictions for these materials is that the fracture process zone ahead of the crack tip and softening effect should not be neglected in ductile rock fracture analysis. A 3D pore pressure cohesive zone model has been developed and applied to predict hydraulic fracturing under fluid injection. The cohesive zone method is a numerical tool developed to model crack initiation and growth in quasi-brittle materials considering the material softening effect. The pore pressure cohesive zone model has been applied to investigate the hydraulic fracture with different rock properties. The hydraulic fracture predictions of a three-layer water injection case have been compared using the pore pressure cohesive zone model with revised parameters, LEFM-based pseudo 3D model, a Perkins-Kern-Nordgren (PKN) model, and an analytical solution. Based on the size of the fracture process zone and its effect on crack extension in ductile rock, the fundamental mechanical difference of LEFM and cohesive fracture mechanics-based methods is discussed. An effective fracture toughness method has been proposed to consider the fracture process zone effect on the ductile rock fracture.

  14. Sphinx: merging knowledge-based and ab initio approaches to improve protein loop prediction

    PubMed Central

    Marks, Claire; Nowak, Jaroslaw; Klostermann, Stefan; Georges, Guy; Dunbar, James; Shi, Jiye; Kelm, Sebastian

    2017-01-01

    Abstract Motivation: Loops are often vital for protein function, however, their irregular structures make them difficult to model accurately. Current loop modelling algorithms can mostly be divided into two categories: knowledge-based, where databases of fragments are searched to find suitable conformations and ab initio, where conformations are generated computationally. Existing knowledge-based methods only use fragments that are the same length as the target, even though loops of slightly different lengths may adopt similar conformations. Here, we present a novel method, Sphinx, which combines ab initio techniques with the potential extra structural information contained within loops of a different length to improve structure prediction. Results: We show that Sphinx is able to generate high-accuracy predictions and decoy sets enriched with near-native loop conformations, performing better than the ab initio algorithm on which it is based. In addition, it is able to provide predictions for every target, unlike some knowledge-based methods. Sphinx can be used successfully for the difficult problem of antibody H3 prediction, outperforming RosettaAntibody, one of the leading H3-specific ab initio methods, both in accuracy and speed. Availability and Implementation: Sphinx is available at http://opig.stats.ox.ac.uk/webapps/sphinx. Contact: deane@stats.ox.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28453681

  15. Sphinx: merging knowledge-based and ab initio approaches to improve protein loop prediction.

    PubMed

    Marks, Claire; Nowak, Jaroslaw; Klostermann, Stefan; Georges, Guy; Dunbar, James; Shi, Jiye; Kelm, Sebastian; Deane, Charlotte M

    2017-05-01

    Loops are often vital for protein function, however, their irregular structures make them difficult to model accurately. Current loop modelling algorithms can mostly be divided into two categories: knowledge-based, where databases of fragments are searched to find suitable conformations and ab initio, where conformations are generated computationally. Existing knowledge-based methods only use fragments that are the same length as the target, even though loops of slightly different lengths may adopt similar conformations. Here, we present a novel method, Sphinx, which combines ab initio techniques with the potential extra structural information contained within loops of a different length to improve structure prediction. We show that Sphinx is able to generate high-accuracy predictions and decoy sets enriched with near-native loop conformations, performing better than the ab initio algorithm on which it is based. In addition, it is able to provide predictions for every target, unlike some knowledge-based methods. Sphinx can be used successfully for the difficult problem of antibody H3 prediction, outperforming RosettaAntibody, one of the leading H3-specific ab initio methods, both in accuracy and speed. Sphinx is available at http://opig.stats.ox.ac.uk/webapps/sphinx. deane@stats.ox.ac.uk. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.

  16. Prediction of Protein-Protein Interaction Sites with Machine-Learning-Based Data-Cleaning and Post-Filtering Procedures.

    PubMed

    Liu, Guang-Hui; Shen, Hong-Bin; Yu, Dong-Jun

    2016-04-01

    Accurately predicting protein-protein interaction sites (PPIs) is currently a hot topic because it has been demonstrated to be very useful for understanding disease mechanisms and designing drugs. Machine-learning-based computational approaches have been broadly utilized and demonstrated to be useful for PPI prediction. However, directly applying traditional machine learning algorithms, which often assume that samples in different classes are balanced, often leads to poor performance because of the severe class imbalance that exists in the PPI prediction problem. In this study, we propose a novel method for improving PPI prediction performance by relieving the severity of class imbalance using a data-cleaning procedure and reducing predicted false positives with a post-filtering procedure: First, a machine-learning-based data-cleaning procedure is applied to remove those marginal targets, which may potentially have a negative effect on training a model with a clear classification boundary, from the majority samples to relieve the severity of class imbalance in the original training dataset; then, a prediction model is trained on the cleaned dataset; finally, an effective post-filtering procedure is further used to reduce potential false positive predictions. Stringent cross-validation and independent validation tests on benchmark datasets demonstrated the efficacy of the proposed method, which exhibits highly competitive performance compared with existing state-of-the-art sequence-based PPIs predictors and should supplement existing PPI prediction methods.

  17. Support vector machine prediction of enzyme function with conjoint triad feature and hierarchical context.

    PubMed

    Wang, Yong-Cui; Wang, Yong; Yang, Zhi-Xia; Deng, Nai-Yang

    2011-06-20

    Enzymes are known as the largest class of proteins and their functions are usually annotated by the Enzyme Commission (EC), which uses a hierarchy structure, i.e., four numbers separated by periods, to classify the function of enzymes. Automatically categorizing enzyme into the EC hierarchy is crucial to understand its specific molecular mechanism. In this paper, we introduce two key improvements in predicting enzyme function within the machine learning framework. One is to introduce the efficient sequence encoding methods for representing given proteins. The second one is to develop a structure-based prediction method with low computational complexity. In particular, we propose to use the conjoint triad feature (CTF) to represent the given protein sequences by considering not only the composition of amino acids but also the neighbor relationships in the sequence. Then we develop a support vector machine (SVM)-based method, named as SVMHL (SVM for hierarchy labels), to output enzyme function by fully considering the hierarchical structure of EC. The experimental results show that our SVMHL with the CTF outperforms SVMHL with the amino acid composition (AAC) feature both in predictive accuracy and Matthew's correlation coefficient (MCC). In addition, SVMHL with the CTF obtains the accuracy and MCC ranging from 81% to 98% and 0.82 to 0.98 when predicting the first three EC digits on a low-homologous enzyme dataset. We further demonstrate that our method outperforms the methods which do not take account of hierarchical relationship among enzyme categories and alternative methods which incorporate prior knowledge about inter-class relationships. Our structure-based prediction model, SVMHL with the CTF, reduces the computational complexity and outperforms the alternative approaches in enzyme function prediction. Therefore our new method will be a useful tool for enzyme function prediction community.

  18. GalaxyGPCRloop: Template-Based and Ab Initio Structure Sampling of the Extracellular Loops of G-Protein-Coupled Receptors.

    PubMed

    Won, Jonghun; Lee, Gyu Rie; Park, Hahnbeom; Seok, Chaok

    2018-06-07

    The second extracellular loops (ECL2s) of G-protein-coupled receptors (GPCRs) are often involved in GPCR functions, and their structures have important implications in drug discovery. However, structure prediction of ECL2 is difficult because of its long length and the structural diversity among different GPCRs. In this study, a new ECL2 conformational sampling method involving both template-based and ab initio sampling was developed. Inspired by the observation of similar ECL2 structures of closely related GPCRs, a template-based sampling method employing loop structure templates selected from the structure database was developed. A new metric for evaluating similarity of the target loop to templates was introduced for template selection. An ab initio loop sampling method was also developed to treat cases without highly similar templates. The ab initio method is based on the previously developed fragment assembly and loop closure method. A new sampling component that takes advantage of secondary structure prediction was added. In addition, a conserved disulfide bridge restraining ECL2 conformation was predicted and analytically incorporated into sampling, reducing the effective dimension of the conformational search space. The sampling method was combined with an existing energy function for comparison with previously reported loop structure prediction methods, and the benchmark test demonstrated outstanding performance.

  19. A Coarse-Grained Elastic Network Atom Contact Model and Its Use in the Simulation of Protein Dynamics and the Prediction of the Effect of Mutations

    PubMed Central

    Frappier, Vincent; Najmanovich, Rafael J.

    2014-01-01

    Normal mode analysis (NMA) methods are widely used to study dynamic aspects of protein structures. Two critical components of NMA methods are coarse-graining in the level of simplification used to represent protein structures and the choice of potential energy functional form. There is a trade-off between speed and accuracy in different choices. In one extreme one finds accurate but slow molecular-dynamics based methods with all-atom representations and detailed atom potentials. On the other extreme, fast elastic network model (ENM) methods with Cα−only representations and simplified potentials that based on geometry alone, thus oblivious to protein sequence. Here we present ENCoM, an Elastic Network Contact Model that employs a potential energy function that includes a pairwise atom-type non-bonded interaction term and thus makes it possible to consider the effect of the specific nature of amino-acids on dynamics within the context of NMA. ENCoM is as fast as existing ENM methods and outperforms such methods in the generation of conformational ensembles. Here we introduce a new application for NMA methods with the use of ENCoM in the prediction of the effect of mutations on protein stability. While existing methods are based on machine learning or enthalpic considerations, the use of ENCoM, based on vibrational normal modes, is based on entropic considerations. This represents a novel area of application for NMA methods and a novel approach for the prediction of the effect of mutations. We compare ENCoM to a large number of methods in terms of accuracy and self-consistency. We show that the accuracy of ENCoM is comparable to that of the best existing methods. We show that existing methods are biased towards the prediction of destabilizing mutations and that ENCoM is less biased at predicting stabilizing mutations. PMID:24762569

  20. Application of empirical Bayes methods to predict the rate of decline in ERG at the individual level among patients with retinitis pigmentosa.

    PubMed

    Qiu, Weiliang; Sandberg, Michael A; Rosner, Bernard

    2018-05-31

    Retinitis pigmentosa is one of the most common forms of inherited retinal degeneration. The electroretinogram (ERG) can be used to determine the severity of retinitis pigmentosa-the lower the ERG amplitude, the more severe the disease is. In practice for career, lifestyle, and treatment counseling, it is of interest to predict the ERG amplitude of a patient at a future time. One approach is prediction based on the average rate of decline for individual patients. However, there is considerable variation both in initial amplitude and in rate of decline. In this article, we propose an empirical Bayes (EB) approach to incorporate the variations in initial amplitude and rate of decline for the prediction of ERG amplitude at the individual level. We applied the EB method to a collection of ERGs from 898 patients with 3 or more visits over 5 or more years of follow-up tested in the Berman-Gund Laboratory and observed that the predicted values at the last (kth) visit obtained by using the proposed method based on data for the first k-1 visits are highly correlated with the observed values at the kth visit (Spearman correlation =0.93) and have a higher correlation with the observed values than those obtained based on either the population average decline rate or those obtained based on the individual decline rate. The mean square errors for predicted values obtained by the EB method are also smaller than those predicted by the other methods. Copyright © 2018 John Wiley & Sons, Ltd.

  1. LBSizeCleav: improved support vector machine (SVM)-based prediction of Dicer cleavage sites using loop/bulge length.

    PubMed

    Bao, Yu; Hayashida, Morihiro; Akutsu, Tatsuya

    2016-11-25

    Dicer is necessary for the process of mature microRNA (miRNA) formation because the Dicer enzyme cleaves pre-miRNA correctly to generate miRNA with correct seed regions. Nonetheless, the mechanism underlying the selection of a Dicer cleavage site is still not fully understood. To date, several studies have been conducted to solve this problem, for example, a recent discovery indicates that the loop/bulge structure plays a central role in the selection of Dicer cleavage sites. In accordance with this breakthrough, a support vector machine (SVM)-based method called PHDCleav was developed to predict Dicer cleavage sites which outperforms other methods based on random forest and naive Bayes. PHDCleav, however, tests only whether a position in the shift window belongs to a loop/bulge structure. In this paper, we used the length of loop/bulge structures (in addition to their presence or absence) to develop an improved method, LBSizeCleav, for predicting Dicer cleavage sites. To evaluate our method, we used 810 empirically validated sequences of human pre-miRNAs and performed fivefold cross-validation. In both 5p and 3p arms of pre-miRNAs, LBSizeCleav showed greater prediction accuracy than PHDCleav did. This result suggests that the length of loop/bulge structures is useful for prediction of Dicer cleavage sites. We developed a novel algorithm for feature space mapping based on the length of a loop/bulge for predicting Dicer cleavage sites. The better performance of our method indicates the usefulness of the length of loop/bulge structures for such predictions.

  2. LDAP: a web server for lncRNA-disease association prediction.

    PubMed

    Lan, Wei; Li, Min; Zhao, Kaijie; Liu, Jin; Wu, Fang-Xiang; Pan, Yi; Wang, Jianxin

    2017-02-01

    Increasing evidences have demonstrated that long noncoding RNAs (lncRNAs) play important roles in many human diseases. Therefore, predicting novel lncRNA-disease associations would contribute to dissect the complex mechanisms of disease pathogenesis. Some computational methods have been developed to infer lncRNA-disease associations. However, most of these methods infer lncRNA-disease associations only based on single data resource. In this paper, we propose a new computational method to predict lncRNA-disease associations by integrating multiple biological data resources. Then, we implement this method as a web server for lncRNA-disease association prediction (LDAP). The input of the LDAP server is the lncRNA sequence. The LDAP predicts potential lncRNA-disease associations by using a bagging SVM classifier based on lncRNA similarity and disease similarity. The web server is available at http://bioinformatics.csu.edu.cn/ldap jxwang@mail.csu.edu.cn. Supplementary data are available at Bioinformatics online.

  3. Exploring Mouse Protein Function via Multiple Approaches.

    PubMed

    Huang, Guohua; Chu, Chen; Huang, Tao; Kong, Xiangyin; Zhang, Yunhua; Zhang, Ning; Cai, Yu-Dong

    2016-01-01

    Although the number of available protein sequences is growing exponentially, functional protein annotations lag far behind. Therefore, accurate identification of protein functions remains one of the major challenges in molecular biology. In this study, we presented a novel approach to predict mouse protein functions. The approach was a sequential combination of a similarity-based approach, an interaction-based approach and a pseudo amino acid composition-based approach. The method achieved an accuracy of about 0.8450 for the 1st-order predictions in the leave-one-out and ten-fold cross-validations. For the results yielded by the leave-one-out cross-validation, although the similarity-based approach alone achieved an accuracy of 0.8756, it was unable to predict the functions of proteins with no homologues. Comparatively, the pseudo amino acid composition-based approach alone reached an accuracy of 0.6786. Although the accuracy was lower than that of the previous approach, it could predict the functions of almost all proteins, even proteins with no homologues. Therefore, the combined method balanced the advantages and disadvantages of both approaches to achieve efficient performance. Furthermore, the results yielded by the ten-fold cross-validation indicate that the combined method is still effective and stable when there are no close homologs are available. However, the accuracy of the predicted functions can only be determined according to known protein functions based on current knowledge. Many protein functions remain unknown. By exploring the functions of proteins for which the 1st-order predicted functions are wrong but the 2nd-order predicted functions are correct, the 1st-order wrongly predicted functions were shown to be closely associated with the genes encoding the proteins. The so-called wrongly predicted functions could also potentially be correct upon future experimental verification. Therefore, the accuracy of the presented method may be much higher in reality.

  4. Exploring Mouse Protein Function via Multiple Approaches

    PubMed Central

    Huang, Tao; Kong, Xiangyin; Zhang, Yunhua; Zhang, Ning

    2016-01-01

    Although the number of available protein sequences is growing exponentially, functional protein annotations lag far behind. Therefore, accurate identification of protein functions remains one of the major challenges in molecular biology. In this study, we presented a novel approach to predict mouse protein functions. The approach was a sequential combination of a similarity-based approach, an interaction-based approach and a pseudo amino acid composition-based approach. The method achieved an accuracy of about 0.8450 for the 1st-order predictions in the leave-one-out and ten-fold cross-validations. For the results yielded by the leave-one-out cross-validation, although the similarity-based approach alone achieved an accuracy of 0.8756, it was unable to predict the functions of proteins with no homologues. Comparatively, the pseudo amino acid composition-based approach alone reached an accuracy of 0.6786. Although the accuracy was lower than that of the previous approach, it could predict the functions of almost all proteins, even proteins with no homologues. Therefore, the combined method balanced the advantages and disadvantages of both approaches to achieve efficient performance. Furthermore, the results yielded by the ten-fold cross-validation indicate that the combined method is still effective and stable when there are no close homologs are available. However, the accuracy of the predicted functions can only be determined according to known protein functions based on current knowledge. Many protein functions remain unknown. By exploring the functions of proteins for which the 1st-order predicted functions are wrong but the 2nd-order predicted functions are correct, the 1st-order wrongly predicted functions were shown to be closely associated with the genes encoding the proteins. The so-called wrongly predicted functions could also potentially be correct upon future experimental verification. Therefore, the accuracy of the presented method may be much higher in reality. PMID:27846315

  5. Predictive local receptive fields based respiratory motion tracking for motion-adaptive radiotherapy.

    PubMed

    Yubo Wang; Tatinati, Sivanagaraja; Liyu Huang; Kim Jeong Hong; Shafiq, Ghufran; Veluvolu, Kalyana C; Khong, Andy W H

    2017-07-01

    Extracranial robotic radiotherapy employs external markers and a correlation model to trace the tumor motion caused by the respiration. The real-time tracking of tumor motion however requires a prediction model to compensate the latencies induced by the software (image data acquisition and processing) and hardware (mechanical and kinematic) limitations of the treatment system. A new prediction algorithm based on local receptive fields extreme learning machines (pLRF-ELM) is proposed for respiratory motion prediction. All the existing respiratory motion prediction methods model the non-stationary respiratory motion traces directly to predict the future values. Unlike these existing methods, the pLRF-ELM performs prediction by modeling the higher-level features obtained by mapping the raw respiratory motion into the random feature space of ELM instead of directly modeling the raw respiratory motion. The developed method is evaluated using the dataset acquired from 31 patients for two horizons in-line with the latencies of treatment systems like CyberKnife. Results showed that pLRF-ELM is superior to that of existing prediction methods. Results further highlight that the abstracted higher-level features are suitable to approximate the nonlinear and non-stationary characteristics of respiratory motion for accurate prediction.

  6. PREDICTING THE EFFECTIVENESS OF CHEMICAL-PROTECTIVE CLOTHING MODEL AND TEST METHOD DEVELOPMENT

    EPA Science Inventory

    A predictive model and test method were developed for determining the chemical resistance of protective polymeric gloves exposed to liquid organic chemicals. The prediction of permeation through protective gloves by solvents was based on theories of the solution thermodynamics of...

  7. A Video-Based Measure of Preservice Teachers' Abilities to Predict Elementary Students' Scientific Reasoning

    ERIC Educational Resources Information Center

    Akerson, Valarie L.; Carter, Ingrid S.; Park Rogers, Meredith A.; Pongsanon, Khemmawadee

    2018-01-01

    In this mixed methods study, the researchers developed a video-based measure called a "Prediction Assessment" to determine preservice elementary teachers' abilities to predict students' scientific reasoning. The instrument is based on teachers' need to develop pedagogical content knowledge for teaching science. Developing a knowledge…

  8. Prediction of dynamical systems by symbolic regression

    NASA Astrophysics Data System (ADS)

    Quade, Markus; Abel, Markus; Shafi, Kamran; Niven, Robert K.; Noack, Bernd R.

    2016-07-01

    We study the modeling and prediction of dynamical systems based on conventional models derived from measurements. Such algorithms are highly desirable in situations where the underlying dynamics are hard to model from physical principles or simplified models need to be found. We focus on symbolic regression methods as a part of machine learning. These algorithms are capable of learning an analytically tractable model from data, a highly valuable property. Symbolic regression methods can be considered as generalized regression methods. We investigate two particular algorithms, the so-called fast function extraction which is a generalized linear regression algorithm, and genetic programming which is a very general method. Both are able to combine functions in a certain way such that a good model for the prediction of the temporal evolution of a dynamical system can be identified. We illustrate the algorithms by finding a prediction for the evolution of a harmonic oscillator based on measurements, by detecting an arriving front in an excitable system, and as a real-world application, the prediction of solar power production based on energy production observations at a given site together with the weather forecast.

  9. Practical theories for service life prediction of critical aerospace structural components

    NASA Technical Reports Server (NTRS)

    Ko, William L.; Monaghan, Richard C.; Jackson, Raymond H.

    1992-01-01

    A new second-order theory was developed for predicting the service lives of aerospace structural components. The predictions based on this new theory were compared with those based on the Ko first-order theory and the classical theory of service life predictions. The new theory gives very accurate service life predictions. An equivalent constant-amplitude stress cycle method was proposed for representing the random load spectrum for crack growth calculations. This method predicts the most conservative service life. The proposed use of minimum detectable crack size, instead of proof load established crack size as an initial crack size for crack growth calculations, could give a more realistic service life.

  10. A Generalized Approach for Measuring Relationships Among Genes.

    PubMed

    Wang, Lijun; Ahsan, Md Asif; Chen, Ming

    2017-07-21

    Several methods for identifying relationships among pairs of genes have been developed. In this article, we present a generalized approach for measuring relationships between any pairs of genes, which is based on statistical prediction. We derive two particular versions of the generalized approach, least squares estimation (LSE) and nearest neighbors prediction (NNP). According to mathematical proof, LSE is equivalent to the methods based on correlation; and NNP is approximate to one popular method called the maximal information coefficient (MIC) according to the performances in simulations and real dataset. Moreover, the approach based on statistical prediction can be extended from two-genes relationships to multi-genes relationships. This application would help to identify relationships among multi-genes.

  11. Prediction Interval Development for Wind-Tunnel Balance Check-Loading

    NASA Technical Reports Server (NTRS)

    Landman, Drew; Toro, Kenneth G.; Commo, Sean A.; Lynn, Keith C.

    2014-01-01

    Results from the Facility Analysis Verification and Operational Reliability project revealed a critical gap in capability in ground-based aeronautics research applications. Without a standardized process for check-loading the wind-tunnel balance or the model system, the quality of the aerodynamic force data collected varied significantly between facilities. A prediction interval is required in order to confirm a check-loading. The prediction interval provides an expected upper and lower bound on balance load prediction at a given confidence level. A method has been developed which accounts for sources of variability due to calibration and check-load application. The prediction interval method of calculation and a case study demonstrating its use is provided. Validation of the methods is demonstrated for the case study based on the probability of capture of confirmation points.

  12. A Predictive Model for Medical Events Based on Contextual Embedding of Temporal Sequences

    PubMed Central

    Wang, Zhimu; Huang, Yingxiang; Wang, Shuang; Wang, Fei; Jiang, Xiaoqian

    2016-01-01

    Background Medical concepts are inherently ambiguous and error-prone due to human fallibility, which makes it hard for them to be fully used by classical machine learning methods (eg, for tasks like early stage disease prediction). Objective Our work was to create a new machine-friendly representation that resembles the semantics of medical concepts. We then developed a sequential predictive model for medical events based on this new representation. Methods We developed novel contextual embedding techniques to combine different medical events (eg, diagnoses, prescriptions, and labs tests). Each medical event is converted into a numerical vector that resembles its “semantics,” via which the similarity between medical events can be easily measured. We developed simple and effective predictive models based on these vectors to predict novel diagnoses. Results We evaluated our sequential prediction model (and standard learning methods) in estimating the risk of potential diseases based on our contextual embedding representation. Our model achieved an area under the receiver operating characteristic (ROC) curve (AUC) of 0.79 on chronic systolic heart failure and an average AUC of 0.67 (over the 80 most common diagnoses) using the Medical Information Mart for Intensive Care III (MIMIC-III) dataset. Conclusions We propose a general early prognosis predictor for 80 different diagnoses. Our method computes numeric representation for each medical event to uncover the potential meaning of those events. Our results demonstrate the efficiency of the proposed method, which will benefit patients and physicians by offering more accurate diagnosis. PMID:27888170

  13. Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating

    PubMed Central

    Wang, Bingkun; Huang, Yongfeng; Li, Xing

    2016-01-01

    E-commerce develops rapidly. Learning and taking good advantage of the myriad reviews from online customers has become crucial to the success in this game, which calls for increasingly more accuracy in sentiment classification of these reviews. Therefore the finer-grained review rating prediction is preferred over the rough binary sentiment classification. There are mainly two types of method in current review rating prediction. One includes methods based on review text content which focus almost exclusively on textual content and seldom relate to those reviewers and items remarked in other relevant reviews. The other one contains methods based on collaborative filtering which extract information from previous records in the reviewer-item rating matrix, however, ignoring review textual content. Here we proposed a framework for review rating prediction which shows the effective combination of the two. Then we further proposed three specific methods under this framework. Experiments on two movie review datasets demonstrate that our review rating prediction framework has better performance than those previous methods. PMID:26880879

  14. Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating.

    PubMed

    Wang, Bingkun; Huang, Yongfeng; Li, Xing

    2016-01-01

    E-commerce develops rapidly. Learning and taking good advantage of the myriad reviews from online customers has become crucial to the success in this game, which calls for increasingly more accuracy in sentiment classification of these reviews. Therefore the finer-grained review rating prediction is preferred over the rough binary sentiment classification. There are mainly two types of method in current review rating prediction. One includes methods based on review text content which focus almost exclusively on textual content and seldom relate to those reviewers and items remarked in other relevant reviews. The other one contains methods based on collaborative filtering which extract information from previous records in the reviewer-item rating matrix, however, ignoring review textual content. Here we proposed a framework for review rating prediction which shows the effective combination of the two. Then we further proposed three specific methods under this framework. Experiments on two movie review datasets demonstrate that our review rating prediction framework has better performance than those previous methods.

  15. Soil-pipe interaction modeling for pipe behavior prediction with super learning based methods

    NASA Astrophysics Data System (ADS)

    Shi, Fang; Peng, Xiang; Liu, Huan; Hu, Yafei; Liu, Zheng; Li, Eric

    2018-03-01

    Underground pipelines are subject to severe distress from the surrounding expansive soil. To investigate the structural response of water mains to varying soil movements, field data, including pipe wall strains in situ soil water content, soil pressure and temperature, was collected. The research on monitoring data analysis has been reported, but the relationship between soil properties and pipe deformation has not been well-interpreted. To characterize the relationship between soil property and pipe deformation, this paper presents a super learning based approach combining feature selection algorithms to predict the water mains structural behavior in different soil environments. Furthermore, automatic variable selection method, e.i. recursive feature elimination algorithm, were used to identify the critical predictors contributing to the pipe deformations. To investigate the adaptability of super learning to different predictive models, this research employed super learning based methods to three different datasets. The predictive performance was evaluated by R-squared, root-mean-square error and mean absolute error. Based on the prediction performance evaluation, the superiority of super learning was validated and demonstrated by predicting three types of pipe deformations accurately. In addition, a comprehensive understand of the water mains working environments becomes possible.

  16. Semi-supervised protein subcellular localization.

    PubMed

    Xu, Qian; Hu, Derek Hao; Xue, Hong; Yu, Weichuan; Yang, Qiang

    2009-01-30

    Protein subcellular localization is concerned with predicting the location of a protein within a cell using computational method. The location information can indicate key functionalities of proteins. Accurate predictions of subcellular localizations of protein can aid the prediction of protein function and genome annotation, as well as the identification of drug targets. Computational methods based on machine learning, such as support vector machine approaches, have already been widely used in the prediction of protein subcellular localization. However, a major drawback of these machine learning-based approaches is that a large amount of data should be labeled in order to let the prediction system learn a classifier of good generalization ability. However, in real world cases, it is laborious, expensive and time-consuming to experimentally determine the subcellular localization of a protein and prepare instances of labeled data. In this paper, we present an approach based on a new learning framework, semi-supervised learning, which can use much fewer labeled instances to construct a high quality prediction model. We construct an initial classifier using a small set of labeled examples first, and then use unlabeled instances to refine the classifier for future predictions. Experimental results show that our methods can effectively reduce the workload for labeling data using the unlabeled data. Our method is shown to enhance the state-of-the-art prediction results of SVM classifiers by more than 10%.

  17. Improved regulatory element prediction based on tissue-specific local epigenomic signatures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    He, Yupeng; Gorkin, David U.; Dickel, Diane E.

    Accurate enhancer identification is critical for understanding the spatiotemporal transcriptional regulation during development as well as the functional impact of disease-related noncoding genetic variants. Computational methods have been developed to predict the genomic locations of active enhancers based on histone modifications, but the accuracy and resolution of these methods remain limited. Here, we present an algorithm, regulator y element prediction based on tissue-specific local epigenetic marks (REPTILE), which integrates histone modification and whole-genome cytosine DNA methylation profiles to identify the precise location of enhancers. We tested the ability of REPTILE to identify enhancers previously validated in reporter assays. Compared withmore » existing methods, REPTILE shows consistently superior performance across diverse cell and tissue types, and the enhancer locations are significantly more refined. We show that, by incorporating base-resolution methylation data, REPTILE greatly improves upon current methods for annotation of enhancers across a variety of cell and tissue types.« less

  18. Improved regulatory element prediction based on tissue-specific local epigenomic signatures

    DOE PAGES

    He, Yupeng; Gorkin, David U.; Dickel, Diane E.; ...

    2017-02-13

    Accurate enhancer identification is critical for understanding the spatiotemporal transcriptional regulation during development as well as the functional impact of disease-related noncoding genetic variants. Computational methods have been developed to predict the genomic locations of active enhancers based on histone modifications, but the accuracy and resolution of these methods remain limited. Here, we present an algorithm, regulator y element prediction based on tissue-specific local epigenetic marks (REPTILE), which integrates histone modification and whole-genome cytosine DNA methylation profiles to identify the precise location of enhancers. We tested the ability of REPTILE to identify enhancers previously validated in reporter assays. Compared withmore » existing methods, REPTILE shows consistently superior performance across diverse cell and tissue types, and the enhancer locations are significantly more refined. We show that, by incorporating base-resolution methylation data, REPTILE greatly improves upon current methods for annotation of enhancers across a variety of cell and tissue types.« less

  19. Comparative analysis of machine learning methods in ligand-based virtual screening of large compound libraries.

    PubMed

    Ma, Xiao H; Jia, Jia; Zhu, Feng; Xue, Ying; Li, Ze R; Chen, Yu Z

    2009-05-01

    Machine learning methods have been explored as ligand-based virtual screening tools for facilitating drug lead discovery. These methods predict compounds of specific pharmacodynamic, pharmacokinetic or toxicological properties based on their structure-derived structural and physicochemical properties. Increasing attention has been directed at these methods because of their capability in predicting compounds of diverse structures and complex structure-activity relationships without requiring the knowledge of target 3D structure. This article reviews current progresses in using machine learning methods for virtual screening of pharmacodynamically active compounds from large compound libraries, and analyzes and compares the reported performances of machine learning tools with those of structure-based and other ligand-based (such as pharmacophore and clustering) virtual screening methods. The feasibility to improve the performance of machine learning methods in screening large libraries is discussed.

  20. Short-arc measurement and fitting based on the bidirectional prediction of observed data

    NASA Astrophysics Data System (ADS)

    Fei, Zhigen; Xu, Xiaojie; Georgiadis, Anthimos

    2016-02-01

    To measure a short arc is a notoriously difficult problem. In this study, the bidirectional prediction method based on the Radial Basis Function Neural Network (RBFNN) to the observed data distributed along a short arc is proposed to increase the corresponding arc length, and thus improve its fitting accuracy. Firstly, the rationality of regarding observed data as a time series is discussed in accordance with the definition of a time series. Secondly, the RBFNN is constructed to predict the observed data where the interpolation method is used for enlarging the size of training examples in order to improve the learning accuracy of the RBFNN’s parameters. Finally, in the numerical simulation section, we focus on simulating how the size of the training sample and noise level influence the learning error and prediction error of the built RBFNN. Typically, the observed data coming from a 5{}^\\circ short arc are used to evaluate the performance of the Hyper method known as the ‘unbiased fitting method of circle’ with a different noise level before and after prediction. A number of simulation experiments reveal that the fitting stability and accuracy of the Hyper method after prediction are far superior to the ones before prediction.

  1. Long-Term Prediction of the Arctic Ionospheric TEC Based on Time-Varying Periodograms

    PubMed Central

    Liu, Jingbin; Chen, Ruizhi; Wang, Zemin; An, Jiachun; Hyyppä, Juha

    2014-01-01

    Knowledge of the polar ionospheric total electron content (TEC) and its future variations is of scientific and engineering relevance. In this study, a new method is developed to predict Arctic mean TEC on the scale of a solar cycle using previous data covering 14 years. The Arctic TEC is derived from global positioning system measurements using the spherical cap harmonic analysis mapping method. The study indicates that the variability of the Arctic TEC results in highly time-varying periodograms, which are utilized for prediction in the proposed method. The TEC time series is divided into two components of periodic oscillations and the average TEC. The newly developed method of TEC prediction is based on an extrapolation method that requires no input of physical observations of the time interval of prediction, and it is performed in both temporally backward and forward directions by summing the extrapolation of the two components. The backward prediction indicates that the Arctic TEC variability includes a 9 years period for the study duration, in addition to the well-established periods. The long-term prediction has an uncertainty of 4.8–5.6 TECU for different period sets. PMID:25369066

  2. Improved lossless intra coding for H.264/MPEG-4 AVC.

    PubMed

    Lee, Yung-Lyul; Han, Ki-Hun; Sullivan, Gary J

    2006-09-01

    A new lossless intra coding method based on sample-by-sample differential pulse code modulation (DPCM) is presented as an enhancement of the H.264/MPEG-4 AVC standard. The H.264/AVC design includes a multidirectional spatial prediction method to reduce spatial redundancy by using neighboring samples as a prediction for the samples in a block of data to be encoded. In the new lossless intra coding method, the spatial prediction is performed based on samplewise DPCM instead of in the block-based manner used in the current H.264/AVC standard, while the block structure is retained for the residual difference entropy coding process. We show that the new method, based on samplewise DPCM, does not have a major complexity penalty, despite its apparent pipeline dependencies. Experiments show that the new lossless intra coding method reduces the bit rate by approximately 12% in comparison with the lossless intra coding method previously included in the H.264/AVC standard. As a result, the new method is currently being adopted into the H.264/AVC standard in a new enhancement project.

  3. Structural reliability analysis under evidence theory using the active learning kriging model

    NASA Astrophysics Data System (ADS)

    Yang, Xufeng; Liu, Yongshou; Ma, Panke

    2017-11-01

    Structural reliability analysis under evidence theory is investigated. It is rigorously proved that a surrogate model providing only correct sign prediction of the performance function can meet the accuracy requirement of evidence-theory-based reliability analysis. Accordingly, a method based on the active learning kriging model which only correctly predicts the sign of the performance function is proposed. Interval Monte Carlo simulation and a modified optimization method based on Karush-Kuhn-Tucker conditions are introduced to make the method more efficient in estimating the bounds of failure probability based on the kriging model. Four examples are investigated to demonstrate the efficiency and accuracy of the proposed method.

  4. The importance of information on relatives for the prediction of genomic breeding values and the implications for the makeup of reference data sets in livestock breeding schemes.

    PubMed

    Clark, Samuel A; Hickey, John M; Daetwyler, Hans D; van der Werf, Julius H J

    2012-02-09

    The theory of genomic selection is based on the prediction of the effects of genetic markers in linkage disequilibrium with quantitative trait loci. However, genomic selection also relies on relationships between individuals to accurately predict genetic value. This study aimed to examine the importance of information on relatives versus that of unrelated or more distantly related individuals on the estimation of genomic breeding values. Simulated and real data were used to examine the effects of various degrees of relationship on the accuracy of genomic selection. Genomic Best Linear Unbiased Prediction (gBLUP) was compared to two pedigree based BLUP methods, one with a shallow one generation pedigree and the other with a deep ten generation pedigree. The accuracy of estimated breeding values for different groups of selection candidates that had varying degrees of relationships to a reference data set of 1750 animals was investigated. The gBLUP method predicted breeding values more accurately than BLUP. The most accurate breeding values were estimated using gBLUP for closely related animals. Similarly, the pedigree based BLUP methods were also accurate for closely related animals, however when the pedigree based BLUP methods were used to predict unrelated animals, the accuracy was close to zero. In contrast, gBLUP breeding values, for animals that had no pedigree relationship with animals in the reference data set, allowed substantial accuracy. An animal's relationship to the reference data set is an important factor for the accuracy of genomic predictions. Animals that share a close relationship to the reference data set had the highest accuracy from genomic predictions. However a baseline accuracy that is driven by the reference data set size and the overall population effective population size enables gBLUP to estimate a breeding value for unrelated animals within a population (breed), using information previously ignored by pedigree based BLUP methods.

  5. Risk Factors Analysis and Death Prediction in Some Life-Threatening Ailments Using Chi-Square Case-Based Reasoning (χ2 CBR) Model.

    PubMed

    Adeniyi, D A; Wei, Z; Yang, Y

    2018-01-30

    A wealth of data are available within the health care system, however, effective analysis tools for exploring the hidden patterns in these datasets are lacking. To alleviate this limitation, this paper proposes a simple but promising hybrid predictive model by suitably combining the Chi-square distance measurement with case-based reasoning technique. The study presents the realization of an automated risk calculator and death prediction in some life-threatening ailments using Chi-square case-based reasoning (χ 2 CBR) model. The proposed predictive engine is capable of reducing runtime and speeds up execution process through the use of critical χ 2 distribution value. This work also showcases the development of a novel feature selection method referred to as frequent item based rule (FIBR) method. This FIBR method is used for selecting the best feature for the proposed χ 2 CBR model at the preprocessing stage of the predictive procedures. The implementation of the proposed risk calculator is achieved through the use of an in-house developed PHP program experimented with XAMP/Apache HTTP server as hosting server. The process of data acquisition and case-based development is implemented using the MySQL application. Performance comparison between our system, the NBY, the ED-KNN, the ANN, the SVM, the Random Forest and the traditional CBR techniques shows that the quality of predictions produced by our system outperformed the baseline methods studied. The result of our experiment shows that the precision rate and predictive quality of our system in most cases are equal to or greater than 70%. Our result also shows that the proposed system executes faster than the baseline methods studied. Therefore, the proposed risk calculator is capable of providing useful, consistent, faster, accurate and efficient risk level prediction to both the patients and the physicians at any time, online and on a real-time basis.

  6. Five-equation and robust three-equation methods for solution verification of large eddy simulation

    NASA Astrophysics Data System (ADS)

    Dutta, Rabijit; Xing, Tao

    2018-02-01

    This study evaluates the recently developed general framework for solution verification methods for large eddy simulation (LES) using implicitly filtered LES of periodic channel flows at friction Reynolds number of 395 on eight systematically refined grids. The seven-equation method shows that the coupling error based on Hypothesis I is much smaller as compared with the numerical and modeling errors and therefore can be neglected. The authors recommend five-equation method based on Hypothesis II, which shows a monotonic convergence behavior of the predicted numerical benchmark ( S C ), and provides realistic error estimates without the need of fixing the orders of accuracy for either numerical or modeling errors. Based on the results from seven-equation and five-equation methods, less expensive three and four-equation methods for practical LES applications were derived. It was found that the new three-equation method is robust as it can be applied to any convergence types and reasonably predict the error trends. It was also observed that the numerical and modeling errors usually have opposite signs, which suggests error cancellation play an essential role in LES. When Reynolds averaged Navier-Stokes (RANS) based error estimation method is applied, it shows significant error in the prediction of S C on coarse meshes. However, it predicts reasonable S C when the grids resolve at least 80% of the total turbulent kinetic energy.

  7. Near Real-Time Optimal Prediction of Adverse Events in Aviation Data

    NASA Technical Reports Server (NTRS)

    Martin, Rodney Alexander; Das, Santanu

    2010-01-01

    The prediction of anomalies or adverse events is a challenging task, and there are a variety of methods which can be used to address the problem. In this paper, we demonstrate how to recast the anomaly prediction problem into a form whose solution is accessible as a level-crossing prediction problem. The level-crossing prediction problem has an elegant, optimal, yet untested solution under certain technical constraints, and only when the appropriate modeling assumptions are made. As such, we will thoroughly investigate the resilience of these modeling assumptions, and show how they affect final performance. Finally, the predictive capability of this method will be assessed by quantitative means, using both validation and test data containing anomalies or adverse events from real aviation data sets that have previously been identified as operationally significant by domain experts. It will be shown that the formulation proposed yields a lower false alarm rate on average than competing methods based on similarly advanced concepts, and a higher correct detection rate than a standard method based upon exceedances that is commonly used for prediction.

  8. Text Mining Improves Prediction of Protein Functional Sites

    PubMed Central

    Cohn, Judith D.; Ravikumar, Komandur E.

    2012-01-01

    We present an approach that integrates protein structure analysis and text mining for protein functional site prediction, called LEAP-FS (Literature Enhanced Automated Prediction of Functional Sites). The structure analysis was carried out using Dynamics Perturbation Analysis (DPA), which predicts functional sites at control points where interactions greatly perturb protein vibrations. The text mining extracts mentions of residues in the literature, and predicts that residues mentioned are functionally important. We assessed the significance of each of these methods by analyzing their performance in finding known functional sites (specifically, small-molecule binding sites and catalytic sites) in about 100,000 publicly available protein structures. The DPA predictions recapitulated many of the functional site annotations and preferentially recovered binding sites annotated as biologically relevant vs. those annotated as potentially spurious. The text-based predictions were also substantially supported by the functional site annotations: compared to other residues, residues mentioned in text were roughly six times more likely to be found in a functional site. The overlap of predictions with annotations improved when the text-based and structure-based methods agreed. Our analysis also yielded new high-quality predictions of many functional site residues that were not catalogued in the curated data sources we inspected. We conclude that both DPA and text mining independently provide valuable high-throughput protein functional site predictions, and that integrating the two methods using LEAP-FS further improves the quality of these predictions. PMID:22393388

  9. A Grammatical Approach to RNA-RNA Interaction Prediction

    NASA Astrophysics Data System (ADS)

    Kato, Yuki; Akutsu, Tatsuya; Seki, Hiroyuki

    2007-11-01

    Much attention has been paid to two interacting RNA molecules involved in post-transcriptional control of gene expression. Although there have been a few studies on RNA-RNA interaction prediction based on dynamic programming algorithm, no grammar-based approach has been proposed. The purpose of this paper is to provide a new modeling for RNA-RNA interaction based on multiple context-free grammar (MCFG). We present a polynomial time parsing algorithm for finding the most likely derivation tree for the stochastic version of MCFG, which is applicable to RNA joint secondary structure prediction including kissing hairpin loops. Also, elementary tests on RNA-RNA interaction prediction have shown that the proposed method is comparable to Alkan et al.'s method.

  10. Predicting missing links in complex networks based on common neighbors and distance

    PubMed Central

    Yang, Jinxuan; Zhang, Xiao-Dong

    2016-01-01

    The algorithms based on common neighbors metric to predict missing links in complex networks are very popular, but most of these algorithms do not account for missing links between nodes with no common neighbors. It is not accurate enough to reconstruct networks by using these methods in some cases especially when between nodes have less common neighbors. We proposed in this paper a new algorithm based on common neighbors and distance to improve accuracy of link prediction. Our proposed algorithm makes remarkable effect in predicting the missing links between nodes with no common neighbors and performs better than most existing currently used methods for a variety of real-world networks without increasing complexity. PMID:27905526

  11. Predicting plant biomass accumulation from image-derived parameters

    PubMed Central

    Chen, Dijun; Shi, Rongli; Pape, Jean-Michel; Neumann, Kerstin; Graner, Andreas; Chen, Ming; Klukas, Christian

    2018-01-01

    Abstract Background Image-based high-throughput phenotyping technologies have been rapidly developed in plant science recently, and they provide a great potential to gain more valuable information than traditionally destructive methods. Predicting plant biomass is regarded as a key purpose for plant breeders and ecologists. However, it is a great challenge to find a predictive biomass model across experiments. Results In the present study, we constructed 4 predictive models to examine the quantitative relationship between image-based features and plant biomass accumulation. Our methodology has been applied to 3 consecutive barley (Hordeum vulgare) experiments with control and stress treatments. The results proved that plant biomass can be accurately predicted from image-based parameters using a random forest model. The high prediction accuracy based on this model will contribute to relieving the phenotyping bottleneck in biomass measurement in breeding applications. The prediction performance is still relatively high across experiments under similar conditions. The relative contribution of individual features for predicting biomass was further quantified, revealing new insights into the phenotypic determinants of the plant biomass outcome. Furthermore, methods could also be used to determine the most important image-based features related to plant biomass accumulation, which would be promising for subsequent genetic mapping to uncover the genetic basis of biomass. Conclusions We have developed quantitative models to accurately predict plant biomass accumulation from image data. We anticipate that the analysis results will be useful to advance our views of the phenotypic determinants of plant biomass outcome, and the statistical methods can be broadly used for other plant species. PMID:29346559

  12. Predictive Big Data Analytics: A Study of Parkinson’s Disease Using Large, Complex, Heterogeneous, Incongruent, Multi-Source and Incomplete Observations

    PubMed Central

    Dinov, Ivo D.; Heavner, Ben; Tang, Ming; Glusman, Gustavo; Chard, Kyle; Darcy, Mike; Madduri, Ravi; Pa, Judy; Spino, Cathie; Kesselman, Carl; Foster, Ian; Deutsch, Eric W.; Price, Nathan D.; Van Horn, John D.; Ames, Joseph; Clark, Kristi; Hood, Leroy; Hampstead, Benjamin M.; Dauer, William; Toga, Arthur W.

    2016-01-01

    Background A unique archive of Big Data on Parkinson’s Disease is collected, managed and disseminated by the Parkinson’s Progression Markers Initiative (PPMI). The integration of such complex and heterogeneous Big Data from multiple sources offers unparalleled opportunities to study the early stages of prevalent neurodegenerative processes, track their progression and quickly identify the efficacies of alternative treatments. Many previous human and animal studies have examined the relationship of Parkinson’s disease (PD) risk to trauma, genetics, environment, co-morbidities, or life style. The defining characteristics of Big Data–large size, incongruency, incompleteness, complexity, multiplicity of scales, and heterogeneity of information-generating sources–all pose challenges to the classical techniques for data management, processing, visualization and interpretation. We propose, implement, test and validate complementary model-based and model-free approaches for PD classification and prediction. To explore PD risk using Big Data methodology, we jointly processed complex PPMI imaging, genetics, clinical and demographic data. Methods and Findings Collective representation of the multi-source data facilitates the aggregation and harmonization of complex data elements. This enables joint modeling of the complete data, leading to the development of Big Data analytics, predictive synthesis, and statistical validation. Using heterogeneous PPMI data, we developed a comprehensive protocol for end-to-end data characterization, manipulation, processing, cleaning, analysis and validation. Specifically, we (i) introduce methods for rebalancing imbalanced cohorts, (ii) utilize a wide spectrum of classification methods to generate consistent and powerful phenotypic predictions, and (iii) generate reproducible machine-learning based classification that enables the reporting of model parameters and diagnostic forecasting based on new data. We evaluated several complementary model-based predictive approaches, which failed to generate accurate and reliable diagnostic predictions. However, the results of several machine-learning based classification methods indicated significant power to predict Parkinson’s disease in the PPMI subjects (consistent accuracy, sensitivity, and specificity exceeding 96%, confirmed using statistical n-fold cross-validation). Clinical (e.g., Unified Parkinson's Disease Rating Scale (UPDRS) scores), demographic (e.g., age), genetics (e.g., rs34637584, chr12), and derived neuroimaging biomarker (e.g., cerebellum shape index) data all contributed to the predictive analytics and diagnostic forecasting. Conclusions Model-free Big Data machine learning-based classification methods (e.g., adaptive boosting, support vector machines) can outperform model-based techniques in terms of predictive precision and reliability (e.g., forecasting patient diagnosis). We observed that statistical rebalancing of cohort sizes yields better discrimination of group differences, specifically for predictive analytics based on heterogeneous and incomplete PPMI data. UPDRS scores play a critical role in predicting diagnosis, which is expected based on the clinical definition of Parkinson’s disease. Even without longitudinal UPDRS data, however, the accuracy of model-free machine learning based classification is over 80%. The methods, software and protocols developed here are openly shared and can be employed to study other neurodegenerative disorders (e.g., Alzheimer’s, Huntington’s, amyotrophic lateral sclerosis), as well as for other predictive Big Data analytics applications. PMID:27494614

  13. Novel Near-Lossless Compression Algorithm for Medical Sequence Images with Adaptive Block-Based Spatial Prediction.

    PubMed

    Song, Xiaoying; Huang, Qijun; Chang, Sheng; He, Jin; Wang, Hao

    2016-12-01

    To address the low compression efficiency of lossless compression and the low image quality of general near-lossless compression, a novel near-lossless compression algorithm based on adaptive spatial prediction is proposed for medical sequence images for possible diagnostic use in this paper. The proposed method employs adaptive block size-based spatial prediction to predict blocks directly in the spatial domain and Lossless Hadamard Transform before quantization to improve the quality of reconstructed images. The block-based prediction breaks the pixel neighborhood constraint and takes full advantage of the local spatial correlations found in medical images. The adaptive block size guarantees a more rational division of images and the improved use of the local structure. The results indicate that the proposed algorithm can efficiently compress medical images and produces a better peak signal-to-noise ratio (PSNR) under the same pre-defined distortion than other near-lossless methods.

  14. Analysis of spatial distribution of land cover maps accuracy

    NASA Astrophysics Data System (ADS)

    Khatami, R.; Mountrakis, G.; Stehman, S. V.

    2017-12-01

    Land cover maps have become one of the most important products of remote sensing science. However, classification errors will exist in any classified map and affect the reliability of subsequent map usage. Moreover, classification accuracy often varies over different regions of a classified map. These variations of accuracy will affect the reliability of subsequent analyses of different regions based on the classified maps. The traditional approach of map accuracy assessment based on an error matrix does not capture the spatial variation in classification accuracy. Here, per-pixel accuracy prediction methods are proposed based on interpolating accuracy values from a test sample to produce wall-to-wall accuracy maps. Different accuracy prediction methods were developed based on four factors: predictive domain (spatial versus spectral), interpolation function (constant, linear, Gaussian, and logistic), incorporation of class information (interpolating each class separately versus grouping them together), and sample size. Incorporation of spectral domain as explanatory feature spaces of classification accuracy interpolation was done for the first time in this research. Performance of the prediction methods was evaluated using 26 test blocks, with 10 km × 10 km dimensions, dispersed throughout the United States. The performance of the predictions was evaluated using the area under the curve (AUC) of the receiver operating characteristic. Relative to existing accuracy prediction methods, our proposed methods resulted in improvements of AUC of 0.15 or greater. Evaluation of the four factors comprising the accuracy prediction methods demonstrated that: i) interpolations should be done separately for each class instead of grouping all classes together; ii) if an all-classes approach is used, the spectral domain will result in substantially greater AUC than the spatial domain; iii) for the smaller sample size and per-class predictions, the spectral and spatial domain yielded similar AUC; iv) for the larger sample size (i.e., very dense spatial sample) and per-class predictions, the spatial domain yielded larger AUC; v) increasing the sample size improved accuracy predictions with a greater benefit accruing to the spatial domain; and vi) the function used for interpolation had the smallest effect on AUC.

  15. A group contribution method for the prediction of thermal conductivity of liquids and its application to the Prandtl number for vegetable oils

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rodenbush, C.M.; Viswanath, D.S.; Hsieh, F.H.

    Data on thermal conductivity of liquids, as a function of temperature, are essential in the design of heat- and mass- transfer equipment. A number of correlations have been developed to predict thermal conductivity of liquids with limited success. Among the correlations proposed so far, only the correlation due to Nagvekar and Daubert is based on group contributions. In this paper, a new group contribution method is developed based on the Klaas and Viswanath method for prediction of thermal conductivity of liquids and the results are compared to the method of Nagvekar and Daubert and other existing correlations. The present methodmore » predicts thermal conductivity of some 228 liquids that encompass 1487 experimental data points with an average absolute deviation of 2.5%. The group contribution method is used to examine the temperature dependence of Prandtl number for vegetable oils.« less

  16. Relative Humidity Has Uneven Effects on Shifts From Snow to Rain Over the Western U.S.

    NASA Astrophysics Data System (ADS)

    Harpold, A. A.; Rajagopal, S.; Crews, J. B.; Winchell, T.; Schumer, R.

    2017-10-01

    Predicting the phase of precipitation is fundamental to water supply and hazard forecasting. Phase prediction methods (PPMs) are used to predict snow fraction, or the ratio of snowfall to total precipitation. Common temperature-based regression (Dai method) and threshold at freezing (0°C) PPMs had comparable accuracy to a humidity-based PPM (TRH method) using 6 and 24 h observations. Using a daily climate data set from 1980 to 2015, the TRH method estimates 14% and 6% greater precipitation-weighted snow fraction than the 0°C and Dai methods, respectively. The TRH method predicts four times less area with declining snow fraction than the Dai method (2.1% and 8.1% of the study domain, respectively) from 1980 to 2015, with the largest differences in the Cascade and Sierra Nevada mountains and southwestern U.S. Future Representative Concentration Pathway (RCP) 8.5 projections suggest warming temperatures of 4.2°C and declining relative humidity of 1% over the 21st century. The TRH method predicts a smaller reduction in snow fraction than temperature-only PPMs by 2100, consistent with lower humidity buffering declines in snow fraction caused by regional warming.

  17. A component prediction method for flue gas of natural gas combustion based on nonlinear partial least squares method.

    PubMed

    Cao, Hui; Yan, Xingyu; Li, Yaojiang; Wang, Yanxia; Zhou, Yan; Yang, Sanchun

    2014-01-01

    Quantitative analysis for the flue gas of natural gas-fired generator is significant for energy conservation and emission reduction. The traditional partial least squares method may not deal with the nonlinear problems effectively. In the paper, a nonlinear partial least squares method with extended input based on radial basis function neural network (RBFNN) is used for components prediction of flue gas. For the proposed method, the original independent input matrix is the input of RBFNN and the outputs of hidden layer nodes of RBFNN are the extension term of the original independent input matrix. Then, the partial least squares regression is performed on the extended input matrix and the output matrix to establish the components prediction model of flue gas. A near-infrared spectral dataset of flue gas of natural gas combustion is used for estimating the effectiveness of the proposed method compared with PLS. The experiments results show that the root-mean-square errors of prediction values of the proposed method for methane, carbon monoxide, and carbon dioxide are, respectively, reduced by 4.74%, 21.76%, and 5.32% compared to those of PLS. Hence, the proposed method has higher predictive capabilities and better robustness.

  18. Self-Adaptive Prediction of Cloud Resource Demands Using Ensemble Model and Subtractive-Fuzzy Clustering Based Fuzzy Neural Network

    PubMed Central

    Chen, Zhijia; Zhu, Yuanchang; Di, Yanqiang; Feng, Shaochong

    2015-01-01

    In IaaS (infrastructure as a service) cloud environment, users are provisioned with virtual machines (VMs). To allocate resources for users dynamically and effectively, accurate resource demands predicting is essential. For this purpose, this paper proposes a self-adaptive prediction method using ensemble model and subtractive-fuzzy clustering based fuzzy neural network (ESFCFNN). We analyze the characters of user preferences and demands. Then the architecture of the prediction model is constructed. We adopt some base predictors to compose the ensemble model. Then the structure and learning algorithm of fuzzy neural network is researched. To obtain the number of fuzzy rules and the initial value of the premise and consequent parameters, this paper proposes the fuzzy c-means combined with subtractive clustering algorithm, that is, the subtractive-fuzzy clustering. Finally, we adopt different criteria to evaluate the proposed method. The experiment results show that the method is accurate and effective in predicting the resource demands. PMID:25691896

  19. Local Geometry and Evolutionary Conservation of Protein Surfaces Reveal the Multiple Recognition Patches in Protein-Protein Interactions

    PubMed Central

    Laine, Elodie; Carbone, Alessandra

    2015-01-01

    Protein-protein interactions (PPIs) are essential to all biological processes and they represent increasingly important therapeutic targets. Here, we present a new method for accurately predicting protein-protein interfaces, understanding their properties, origins and binding to multiple partners. Contrary to machine learning approaches, our method combines in a rational and very straightforward way three sequence- and structure-based descriptors of protein residues: evolutionary conservation, physico-chemical properties and local geometry. The implemented strategy yields very precise predictions for a wide range of protein-protein interfaces and discriminates them from small-molecule binding sites. Beyond its predictive power, the approach permits to dissect interaction surfaces and unravel their complexity. We show how the analysis of the predicted patches can foster new strategies for PPIs modulation and interaction surface redesign. The approach is implemented in JET2, an automated tool based on the Joint Evolutionary Trees (JET) method for sequence-based protein interface prediction. JET2 is freely available at www.lcqb.upmc.fr/JET2. PMID:26690684

  20. Point-Mass Aircraft Trajectory Prediction Using a Hierarchical, Highly-Adaptable Software Design

    NASA Technical Reports Server (NTRS)

    Karr, David A.; Vivona, Robert A.; Woods, Sharon E.; Wing, David J.

    2017-01-01

    A highly adaptable and extensible method for predicting four-dimensional trajectories of civil aircraft has been developed. This method, Behavior-Based Trajectory Prediction, is based on taxonomic concepts developed for the description and comparison of trajectory prediction software. A hierarchical approach to the "behavioral" layer of a point-mass model of aircraft flight, a clear separation between the "behavioral" and "mathematical" layers of the model, and an abstraction of the methods of integrating differential equations in the "mathematical" layer have been demonstrated to support aircraft models of different types (in particular, turbojet vs. turboprop aircraft) using performance models at different levels of detail and in different formats, and promise to be easily extensible to other aircraft types and sources of data. The resulting trajectories predict location, altitude, lateral and vertical speeds, and fuel consumption along the flight path of the subject aircraft accurately and quickly, accounting for local conditions of wind and outside air temperature. The Behavior-Based Trajectory Prediction concept was implemented in NASA's Traffic Aware Planner (TAP) flight-optimizing cockpit software application.

  1. Blind test of physics-based prediction of protein structures.

    PubMed

    Shell, M Scott; Ozkan, S Banu; Voelz, Vincent; Wu, Guohong Albert; Dill, Ken A

    2009-02-01

    We report here a multiprotein blind test of a computer method to predict native protein structures based solely on an all-atom physics-based force field. We use the AMBER 96 potential function with an implicit (GB/SA) model of solvation, combined with replica-exchange molecular-dynamics simulations. Coarse conformational sampling is performed using the zipping and assembly method (ZAM), an approach that is designed to mimic the putative physical routes of protein folding. ZAM was applied to the folding of six proteins, from 76 to 112 monomers in length, in CASP7, a community-wide blind test of protein structure prediction. Because these predictions have about the same level of accuracy as typical bioinformatics methods, and do not utilize information from databases of known native structures, this work opens up the possibility of predicting the structures of membrane proteins, synthetic peptides, or other foldable polymers, for which there is little prior knowledge of native structures. This approach may also be useful for predicting physical protein folding routes, non-native conformations, and other physical properties from amino acid sequences.

  2. Prediction of the Chapman-Jouguet chemical equilibrium state in a detonation wave from first principles based reactive molecular dynamics.

    PubMed

    Guo, Dezhou; Zybin, Sergey V; An, Qi; Goddard, William A; Huang, Fenglei

    2016-01-21

    The combustion or detonation of reacting materials at high temperature and pressure can be characterized by the Chapman-Jouguet (CJ) state that describes the chemical equilibrium of the products at the end of the reaction zone of the detonation wave for sustained detonation. This provides the critical properties and product kinetics for input to macroscale continuum simulations of energetic materials. We propose the ReaxFF Reactive Dynamics to CJ point protocol (Rx2CJ) for predicting the CJ state parameters, providing the means to predict the performance of new materials prior to synthesis and characterization, allowing the simulation based design to be done in silico. Our Rx2CJ method is based on atomistic reactive molecular dynamics (RMD) using the QM-derived ReaxFF force field. We validate this method here by predicting the CJ point and detonation products for three typical energetic materials. We find good agreement between the predicted and experimental detonation velocities, indicating that this method can reliably predict the CJ state using modest levels of computation.

  3. Blind Test of Physics-Based Prediction of Protein Structures

    PubMed Central

    Shell, M. Scott; Ozkan, S. Banu; Voelz, Vincent; Wu, Guohong Albert; Dill, Ken A.

    2009-01-01

    We report here a multiprotein blind test of a computer method to predict native protein structures based solely on an all-atom physics-based force field. We use the AMBER 96 potential function with an implicit (GB/SA) model of solvation, combined with replica-exchange molecular-dynamics simulations. Coarse conformational sampling is performed using the zipping and assembly method (ZAM), an approach that is designed to mimic the putative physical routes of protein folding. ZAM was applied to the folding of six proteins, from 76 to 112 monomers in length, in CASP7, a community-wide blind test of protein structure prediction. Because these predictions have about the same level of accuracy as typical bioinformatics methods, and do not utilize information from databases of known native structures, this work opens up the possibility of predicting the structures of membrane proteins, synthetic peptides, or other foldable polymers, for which there is little prior knowledge of native structures. This approach may also be useful for predicting physical protein folding routes, non-native conformations, and other physical properties from amino acid sequences. PMID:19186130

  4. An improved method for predicting the effects of flight on jet mixing noise

    NASA Technical Reports Server (NTRS)

    Stone, J. R.

    1979-01-01

    The NASA method (1976) for predicting the effects of flight on jet mixing noise was improved. The earlier method agreed reasonably well with experimental flight data for jet velocities up to about 520 m/sec (approximately 1700 ft/sec). The poorer agreement at high jet velocities appeared to be due primarily to the manner in which supersonic convection effects were formulated. The purely empirical supersonic convection formulation of the earlier method was replaced by one based on theoretical considerations. Other improvements of an empirical nature included were based on model-jet/free-jet simulated flight tests. The revised prediction method is presented and compared with experimental data obtained from the Bertin Aerotrain with a J85 engine, the DC-10 airplane with JT9D engines, and the DC-9 airplane with refanned JT8D engines. It is shown that the new method agrees better with the data base than a recently proposed SAE method.

  5. A new method for the prediction of chatter stability lobes based on dynamic cutting force simulation model and support vector machine

    NASA Astrophysics Data System (ADS)

    Peng, Chong; Wang, Lun; Liao, T. Warren

    2015-10-01

    Currently, chatter has become the critical factor in hindering machining quality and productivity in machining processes. To avoid cutting chatter, a new method based on dynamic cutting force simulation model and support vector machine (SVM) is presented for the prediction of chatter stability lobes. The cutting force is selected as the monitoring signal, and the wavelet energy entropy theory is used to extract the feature vectors. A support vector machine is constructed using the MATLAB LIBSVM toolbox for pattern classification based on the feature vectors derived from the experimental cutting data. Then combining with the dynamic cutting force simulation model, the stability lobes diagram (SLD) can be estimated. Finally, the predicted results are compared with existing methods such as zero-order analytical (ZOA) and semi-discretization (SD) method as well as actual cutting experimental results to confirm the validity of this new method.

  6. HIV-1 protease cleavage site prediction based on two-stage feature selection method.

    PubMed

    Niu, Bing; Yuan, Xiao-Cheng; Roeper, Preston; Su, Qiang; Peng, Chun-Rong; Yin, Jing-Yuan; Ding, Juan; Li, HaiPeng; Lu, Wen-Cong

    2013-03-01

    Knowledge of the mechanism of HIV protease cleavage specificity is critical to the design of specific and effective HIV inhibitors. Searching for an accurate, robust, and rapid method to correctly predict the cleavage sites in proteins is crucial when searching for possible HIV inhibitors. In this article, HIV-1 protease specificity was studied using the correlation-based feature subset (CfsSubset) selection method combined with Genetic Algorithms method. Thirty important biochemical features were found based on a jackknife test from the original data set containing 4,248 features. By using the AdaBoost method with the thirty selected features the prediction model yields an accuracy of 96.7% for the jackknife test and 92.1% for an independent set test, with increased accuracy over the original dataset by 6.7% and 77.4%, respectively. Our feature selection scheme could be a useful technique for finding effective competitive inhibitors of HIV protease.

  7. Predictive Big Data Analytics: A Study of Parkinson's Disease Using Large, Complex, Heterogeneous, Incongruent, Multi-Source and Incomplete Observations.

    PubMed

    Dinov, Ivo D; Heavner, Ben; Tang, Ming; Glusman, Gustavo; Chard, Kyle; Darcy, Mike; Madduri, Ravi; Pa, Judy; Spino, Cathie; Kesselman, Carl; Foster, Ian; Deutsch, Eric W; Price, Nathan D; Van Horn, John D; Ames, Joseph; Clark, Kristi; Hood, Leroy; Hampstead, Benjamin M; Dauer, William; Toga, Arthur W

    2016-01-01

    A unique archive of Big Data on Parkinson's Disease is collected, managed and disseminated by the Parkinson's Progression Markers Initiative (PPMI). The integration of such complex and heterogeneous Big Data from multiple sources offers unparalleled opportunities to study the early stages of prevalent neurodegenerative processes, track their progression and quickly identify the efficacies of alternative treatments. Many previous human and animal studies have examined the relationship of Parkinson's disease (PD) risk to trauma, genetics, environment, co-morbidities, or life style. The defining characteristics of Big Data-large size, incongruency, incompleteness, complexity, multiplicity of scales, and heterogeneity of information-generating sources-all pose challenges to the classical techniques for data management, processing, visualization and interpretation. We propose, implement, test and validate complementary model-based and model-free approaches for PD classification and prediction. To explore PD risk using Big Data methodology, we jointly processed complex PPMI imaging, genetics, clinical and demographic data. Collective representation of the multi-source data facilitates the aggregation and harmonization of complex data elements. This enables joint modeling of the complete data, leading to the development of Big Data analytics, predictive synthesis, and statistical validation. Using heterogeneous PPMI data, we developed a comprehensive protocol for end-to-end data characterization, manipulation, processing, cleaning, analysis and validation. Specifically, we (i) introduce methods for rebalancing imbalanced cohorts, (ii) utilize a wide spectrum of classification methods to generate consistent and powerful phenotypic predictions, and (iii) generate reproducible machine-learning based classification that enables the reporting of model parameters and diagnostic forecasting based on new data. We evaluated several complementary model-based predictive approaches, which failed to generate accurate and reliable diagnostic predictions. However, the results of several machine-learning based classification methods indicated significant power to predict Parkinson's disease in the PPMI subjects (consistent accuracy, sensitivity, and specificity exceeding 96%, confirmed using statistical n-fold cross-validation). Clinical (e.g., Unified Parkinson's Disease Rating Scale (UPDRS) scores), demographic (e.g., age), genetics (e.g., rs34637584, chr12), and derived neuroimaging biomarker (e.g., cerebellum shape index) data all contributed to the predictive analytics and diagnostic forecasting. Model-free Big Data machine learning-based classification methods (e.g., adaptive boosting, support vector machines) can outperform model-based techniques in terms of predictive precision and reliability (e.g., forecasting patient diagnosis). We observed that statistical rebalancing of cohort sizes yields better discrimination of group differences, specifically for predictive analytics based on heterogeneous and incomplete PPMI data. UPDRS scores play a critical role in predicting diagnosis, which is expected based on the clinical definition of Parkinson's disease. Even without longitudinal UPDRS data, however, the accuracy of model-free machine learning based classification is over 80%. The methods, software and protocols developed here are openly shared and can be employed to study other neurodegenerative disorders (e.g., Alzheimer's, Huntington's, amyotrophic lateral sclerosis), as well as for other predictive Big Data analytics applications.

  8. Using the surface panel method to predict the steady performance of ducted propellers

    NASA Astrophysics Data System (ADS)

    Cai, Hao-Peng; Su, Yu-Min; Li, Xin; Shen, Hai-Long

    2009-12-01

    A new numerical method was developed for predicting the steady hydrodynamic performance of ducted propellers. A potential based surface panel method was applied both to the duct and the propeller, and the interaction between them was solved by an induced velocity potential iterative method. Compared with the induced velocity iterative method, the method presented can save programming and calculating time. Numerical results for a JD simplified ducted propeller series showed that the method presented is effective for predicting the steady hydrodynamic performance of ducted propellers.

  9. New approach to predict photoallergic potentials of chemicals based on murine local lymph node assay.

    PubMed

    Maeda, Yosuke; Hirosaki, Haruka; Yamanaka, Hidenori; Takeyoshi, Masahiro

    2018-05-23

    Photoallergic dermatitis, caused by pharmaceuticals and other consumer products, is a very important issue in human health. However, S10 guidelines of the International Conference on Harmonization do not recommend the existing prediction methods for photoallergy because of their low predictability in human cases. We applied local lymph node assay (LLNA), a reliable, quantitative skin sensitization prediction test, to develop a new photoallergy prediction method. This method involves a three-step approach: (1) ultraviolet (UV) absorption analysis; (2) determination of no observed adverse effect level for skin phototoxicity based on LLNA; and (3) photoallergy evaluation based on LLNA. Photoallergic potential of chemicals was evaluated by comparing lymph node cell proliferation among groups treated with chemicals with minimal effect levels of skin sensitization and skin phototoxicity under UV irradiation (UV+) or non-UV irradiation (UV-). A case showing significant difference (P < .05) in lymph node cell proliferation rates between UV- and UV+ groups was considered positive for photoallergic reaction. After testing 13 chemicals, seven human photoallergens tested positive and the other six, with no evidence of causing photoallergic dermatitis or UV absorption, tested negative. Among these chemicals, both doxycycline hydrochloride and minocycline hydrochloride were tetracycline antibiotics with different photoallergic properties, and the new method clearly distinguished between the photoallergic properties of these chemicals. These findings suggested high predictability of our method; therefore, it is promising and effective in predicting human photoallergens. Copyright © 2018 John Wiley & Sons, Ltd.

  10. SPRINT: ultrafast protein-protein interaction prediction of the entire human interactome.

    PubMed

    Li, Yiwei; Ilie, Lucian

    2017-11-15

    Proteins perform their functions usually by interacting with other proteins. Predicting which proteins interact is a fundamental problem. Experimental methods are slow, expensive, and have a high rate of error. Many computational methods have been proposed among which sequence-based ones are very promising. However, so far no such method is able to predict effectively the entire human interactome: they require too much time or memory. We present SPRINT (Scoring PRotein INTeractions), a new sequence-based algorithm and tool for predicting protein-protein interactions. We comprehensively compare SPRINT with state-of-the-art programs on seven most reliable human PPI datasets and show that it is more accurate while running orders of magnitude faster and using very little memory. SPRINT is the only sequence-based program that can effectively predict the entire human interactome: it requires between 15 and 100 min, depending on the dataset. Our goal is to transform the very challenging problem of predicting the entire human interactome into a routine task. The source code of SPRINT is freely available from https://github.com/lucian-ilie/SPRINT/ and the datasets and predicted PPIs from www.csd.uwo.ca/faculty/ilie/SPRINT/ .

  11. Predicting protein subcellular locations using hierarchical ensemble of Bayesian classifiers based on Markov chains.

    PubMed

    Bulashevska, Alla; Eils, Roland

    2006-06-14

    The subcellular location of a protein is closely related to its function. It would be worthwhile to develop a method to predict the subcellular location for a given protein when only the amino acid sequence of the protein is known. Although many efforts have been made to predict subcellular location from sequence information only, there is the need for further research to improve the accuracy of prediction. A novel method called HensBC is introduced to predict protein subcellular location. HensBC is a recursive algorithm which constructs a hierarchical ensemble of classifiers. The classifiers used are Bayesian classifiers based on Markov chain models. We tested our method on six various datasets; among them are Gram-negative bacteria dataset, data for discriminating outer membrane proteins and apoptosis proteins dataset. We observed that our method can predict the subcellular location with high accuracy. Another advantage of the proposed method is that it can improve the accuracy of the prediction of some classes with few sequences in training and is therefore useful for datasets with imbalanced distribution of classes. This study introduces an algorithm which uses only the primary sequence of a protein to predict its subcellular location. The proposed recursive scheme represents an interesting methodology for learning and combining classifiers. The method is computationally efficient and competitive with the previously reported approaches in terms of prediction accuracies as empirical results indicate. The code for the software is available upon request.

  12. Univariate Time Series Prediction of Solar Power Using a Hybrid Wavelet-ARMA-NARX Prediction Method

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nazaripouya, Hamidreza; Wang, Yubo; Chu, Chi-Cheng

    This paper proposes a new hybrid method for super short-term solar power prediction. Solar output power usually has a complex, nonstationary, and nonlinear characteristic due to intermittent and time varying behavior of solar radiance. In addition, solar power dynamics is fast and is inertia less. An accurate super short-time prediction is required to compensate for the fluctuations and reduce the impact of solar power penetration on the power system. The objective is to predict one step-ahead solar power generation based only on historical solar power time series data. The proposed method incorporates discrete wavelet transform (DWT), Auto-Regressive Moving Average (ARMA)more » models, and Recurrent Neural Networks (RNN), while the RNN architecture is based on Nonlinear Auto-Regressive models with eXogenous inputs (NARX). The wavelet transform is utilized to decompose the solar power time series into a set of richer-behaved forming series for prediction. ARMA model is employed as a linear predictor while NARX is used as a nonlinear pattern recognition tool to estimate and compensate the error of wavelet-ARMA prediction. The proposed method is applied to the data captured from UCLA solar PV panels and the results are compared with some of the common and most recent solar power prediction methods. The results validate the effectiveness of the proposed approach and show a considerable improvement in the prediction precision.« less

  13. Object-color-signal prediction using wraparound Gaussian metamers.

    PubMed

    Mirzaei, Hamidreza; Funt, Brian

    2014-07-01

    Alexander Logvinenko introduced an object-color atlas based on idealized reflectances called rectangular metamers in 2009. For a given color signal, the atlas specifies a unique reflectance that is metameric to it under the given illuminant. The atlas is complete and illuminant invariant, but not possible to implement in practice. He later introduced a parametric representation of the object-color atlas based on smoother "wraparound Gaussian" functions. In this paper, these wraparound Gaussians are used in predicting illuminant-induced color signal changes. The method proposed in this paper is based on computationally "relighting" that reflectance to determine what its color signal would be under any other illuminant. Since that reflectance is in the metamer set the prediction is also physically realizable, which cannot be guaranteed for predictions obtained via von Kries scaling. Testing on Munsell spectra and a multispectral image shows that the proposed method outperforms the predictions of both those based on von Kries scaling and those based on the Bradford transform.

  14. SCMPSP: Prediction and characterization of photosynthetic proteins based on a scoring card method.

    PubMed

    Vasylenko, Tamara; Liou, Yi-Fan; Chen, Hong-An; Charoenkwan, Phasit; Huang, Hui-Ling; Ho, Shinn-Ying

    2015-01-01

    Photosynthetic proteins (PSPs) greatly differ in their structure and function as they are involved in numerous subprocesses that take place inside an organelle called a chloroplast. Few studies predict PSPs from sequences due to their high variety of sequences and structues. This work aims to predict and characterize PSPs by establishing the datasets of PSP and non-PSP sequences and developing prediction methods. A novel bioinformatics method of predicting and characterizing PSPs based on scoring card method (SCMPSP) was used. First, a dataset consisting of 649 PSPs was established by using a Gene Ontology term GO:0015979 and 649 non-PSPs from the SwissProt database with sequence identity <= 25%.- Several prediction methods are presented based on support vector machine (SVM), decision tree J48, Bayes, BLAST, and SCM. The SVM method using dipeptide features-performed well and yielded - a test accuracy of 72.31%. The SCMPSP method uses the estimated propensity scores of 400 dipeptides - as PSPs and has a test accuracy of 71.54%, which is comparable to that of the SVM method. The derived propensity scores of 20 amino acids were further used to identify informative physicochemical properties for characterizing PSPs. The analytical results reveal the following four characteristics of PSPs: 1) PSPs favour hydrophobic side chain amino acids; 2) PSPs are composed of the amino acids prone to form helices in membrane environments; 3) PSPs have low interaction with water; and 4) PSPs prefer to be composed of the amino acids of electron-reactive side chains. The SCMPSP method not only estimates the propensity of a sequence to be PSPs, it also discovers characteristics that further improve understanding of PSPs. The SCMPSP source code and the datasets used in this study are available at http://iclab.life.nctu.edu.tw/SCMPSP/.

  15. Intertransaction Class Association Rule Mining Based on Genetic Network Programming and Its Application to Stock Market Prediction

    NASA Astrophysics Data System (ADS)

    Yang, Yuchen; Mabu, Shingo; Shimada, Kaoru; Hirasawa, Kotaro

    Intertransaction association rules have been reported to be useful in many fields such as stock market prediction, but still there are not so many efficient methods to dig them out from large data sets. Furthermore, how to use and measure these more complex rules should be considered carefully. In this paper, we propose a new intertransaction class association rule mining method based on Genetic Network Programming (GNP), which has the ability to overcome some shortages of Apriori-like based intertransaction association methods. Moreover, a general classifier model for intertransaction rules is also introduced. In experiments on the real world application of stock market prediction, the method shows its efficiency and ability to obtain good results and can bring more benefits with a suitable classifier considering larger interval span.

  16. An Approach for Predicting Essential Genes Using Multiple Homology Mapping and Machine Learning Algorithms.

    PubMed

    Hua, Hong-Li; Zhang, Fa-Zhan; Labena, Abraham Alemayehu; Dong, Chuan; Jin, Yan-Ting; Guo, Feng-Biao

    Investigation of essential genes is significant to comprehend the minimal gene sets of cell and discover potential drug targets. In this study, a novel approach based on multiple homology mapping and machine learning method was introduced to predict essential genes. We focused on 25 bacteria which have characterized essential genes. The predictions yielded the highest area under receiver operating characteristic (ROC) curve (AUC) of 0.9716 through tenfold cross-validation test. Proper features were utilized to construct models to make predictions in distantly related bacteria. The accuracy of predictions was evaluated via the consistency of predictions and known essential genes of target species. The highest AUC of 0.9552 and average AUC of 0.8314 were achieved when making predictions across organisms. An independent dataset from Synechococcus elongatus , which was released recently, was obtained for further assessment of the performance of our model. The AUC score of predictions is 0.7855, which is higher than other methods. This research presents that features obtained by homology mapping uniquely can achieve quite great or even better results than those integrated features. Meanwhile, the work indicates that machine learning-based method can assign more efficient weight coefficients than using empirical formula based on biological knowledge.

  17. Prediction-based dynamic load-sharing heuristics

    NASA Technical Reports Server (NTRS)

    Goswami, Kumar K.; Devarakonda, Murthy; Iyer, Ravishankar K.

    1993-01-01

    The authors present dynamic load-sharing heuristics that use predicted resource requirements of processes to manage workloads in a distributed system. A previously developed statistical pattern-recognition method is employed for resource prediction. While nonprediction-based heuristics depend on a rapidly changing system status, the new heuristics depend on slowly changing program resource usage patterns. Furthermore, prediction-based heuristics can be more effective since they use future requirements rather than just the current system state. Four prediction-based heuristics, two centralized and two distributed, are presented. Using trace driven simulations, they are compared against random scheduling and two effective nonprediction based heuristics. Results show that the prediction-based centralized heuristics achieve up to 30 percent better response times than the nonprediction centralized heuristic, and that the prediction-based distributed heuristics achieve up to 50 percent improvements relative to their nonprediction counterpart.

  18. A novel knowledge-based potential for RNA 3D structure evaluation

    NASA Astrophysics Data System (ADS)

    Yang, Yi; Gu, Qi; Zhang, Ben-Gong; Shi, Ya-Zhou; Shao, Zhi-Gang

    2018-03-01

    Ribonucleic acids (RNAs) play a vital role in biology, and knowledge of their three-dimensional (3D) structure is required to understand their biological functions. Recently structural prediction methods have been developed to address this issue, but a series of RNA 3D structures are generally predicted by most existing methods. Therefore, the evaluation of the predicted structures is generally indispensable. Although several methods have been proposed to assess RNA 3D structures, the existing methods are not precise enough. In this work, a new all-atom knowledge-based potential is developed for more accurately evaluating RNA 3D structures. The potential not only includes local and nonlocal interactions but also fully considers the specificity of each RNA by introducing a retraining mechanism. Based on extensive test sets generated from independent methods, the proposed potential correctly distinguished the native state and ranked near-native conformations to effectively select the best. Furthermore, the proposed potential precisely captured RNA structural features such as base-stacking and base-pairing. Comparisons with existing potential methods show that the proposed potential is very reliable and accurate in RNA 3D structure evaluation. Project supported by the National Science Foundation of China (Grants Nos. 11605125, 11105054, 11274124, and 11401448).

  19. DrugE-Rank: improving drug–target interaction prediction of new candidate drugs or targets by ensemble learning to rank

    PubMed Central

    Yuan, Qingjun; Gao, Junning; Wu, Dongliang; Zhang, Shihua; Mamitsuka, Hiroshi; Zhu, Shanfeng

    2016-01-01

    Motivation: Identifying drug–target interactions is an important task in drug discovery. To reduce heavy time and financial cost in experimental way, many computational approaches have been proposed. Although these approaches have used many different principles, their performance is far from satisfactory, especially in predicting drug–target interactions of new candidate drugs or targets. Methods: Approaches based on machine learning for this problem can be divided into two types: feature-based and similarity-based methods. Learning to rank is the most powerful technique in the feature-based methods. Similarity-based methods are well accepted, due to their idea of connecting the chemical and genomic spaces, represented by drug and target similarities, respectively. We propose a new method, DrugE-Rank, to improve the prediction performance by nicely combining the advantages of the two different types of methods. That is, DrugE-Rank uses LTR, for which multiple well-known similarity-based methods can be used as components of ensemble learning. Results: The performance of DrugE-Rank is thoroughly examined by three main experiments using data from DrugBank: (i) cross-validation on FDA (US Food and Drug Administration) approved drugs before March 2014; (ii) independent test on FDA approved drugs after March 2014; and (iii) independent test on FDA experimental drugs. Experimental results show that DrugE-Rank outperforms competing methods significantly, especially achieving more than 30% improvement in Area under Prediction Recall curve for FDA approved new drugs and FDA experimental drugs. Availability: http://datamining-iip.fudan.edu.cn/service/DrugE-Rank Contact: zhusf@fudan.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27307615

  20. Empirical Observations on the Sensitivity of Hot Cathode Ionization Type Vacuum Gages

    NASA Technical Reports Server (NTRS)

    Summers, R. L.

    1969-01-01

    A study of empirical methods of predicting tile relative sensitivities of hot cathode ionization gages is presented. Using previously published gage sensitivities, several rules for predicting relative sensitivity are tested. The relative sensitivity to different gases is shown to be invariant with gage type, in the linear range of gage operation. The total ionization cross section, molecular and molar polarizability, and refractive index are demonstrated to be useful parameters for predicting relative gage sensitivity. Using data from the literature, the probable error of predictions of relative gage sensitivity based on these molecular properties is found to be about 10 percent. A comprehensive table of predicted relative sensitivities, based on empirical methods, is presented.

  1. Benchmark data sets for structure-based computational target prediction.

    PubMed

    Schomburg, Karen T; Rarey, Matthias

    2014-08-25

    Structure-based computational target prediction methods identify potential targets for a bioactive compound. Methods based on protein-ligand docking so far face many challenges, where the greatest probably is the ranking of true targets in a large data set of protein structures. Currently, no standard data sets for evaluation exist, rendering comparison and demonstration of improvements of methods cumbersome. Therefore, we propose two data sets and evaluation strategies for a meaningful evaluation of new target prediction methods, i.e., a small data set consisting of three target classes for detailed proof-of-concept and selectivity studies and a large data set consisting of 7992 protein structures and 72 drug-like ligands allowing statistical evaluation with performance metrics on a drug-like chemical space. Both data sets are built from openly available resources, and any information needed to perform the described experiments is reported. We describe the composition of the data sets, the setup of screening experiments, and the evaluation strategy. Performance metrics capable to measure the early recognition of enrichments like AUC, BEDROC, and NSLR are proposed. We apply a sequence-based target prediction method to the large data set to analyze its content of nontrivial evaluation cases. The proposed data sets are used for method evaluation of our new inverse screening method iRAISE. The small data set reveals the method's capability and limitations to selectively distinguish between rather similar protein structures. The large data set simulates real target identification scenarios. iRAISE achieves in 55% excellent or good enrichment a median AUC of 0.67 and RMSDs below 2.0 Å for 74% and was able to predict the first true target in 59 out of 72 cases in the top 2% of the protein data set of about 8000 structures.

  2. Comparison of Several Methods of Predicting the Pressure Loss at Altitude Across a Baffled Aircraft-Engine Cylinder

    NASA Technical Reports Server (NTRS)

    Neustein, Joseph; Schafer, Louis J , Jr

    1946-01-01

    Several methods of predicting the compressible-flow pressure loss across a baffled aircraft-engine cylinder were analytically related and were experimentally investigated on a typical air-cooled aircraft-engine cylinder. Tests with and without heat transfer covered a wide range of cooling-air flows and simulated altitudes from sea level to 40,000 feet. Both the analysis and the test results showed that the method based on the density determined by the static pressure and the stagnation temperature at the baffle exit gave results comparable with those obtained from methods derived by one-dimensional-flow theory. The method based on a characteristic Mach number, although related analytically to one-dimensional-flow theory, was found impractical in the present tests because of the difficulty encountered in defining the proper characteristic state of the cooling air. Accurate predictions of altitude pressure loss can apparently be made by these methods, provided that they are based on the results of sea-level tests with heat transfer.

  3. Comparing statistical and process-based flow duration curve models in ungauged basins and changing rain regimes

    NASA Astrophysics Data System (ADS)

    Müller, M. F.; Thompson, S. E.

    2016-02-01

    The prediction of flow duration curves (FDCs) in ungauged basins remains an important task for hydrologists given the practical relevance of FDCs for water management and infrastructure design. Predicting FDCs in ungauged basins typically requires spatial interpolation of statistical or model parameters. This task is complicated if climate becomes non-stationary, as the prediction challenge now also requires extrapolation through time. In this context, process-based models for FDCs that mechanistically link the streamflow distribution to climate and landscape factors may have an advantage over purely statistical methods to predict FDCs. This study compares a stochastic (process-based) and statistical method for FDC prediction in both stationary and non-stationary contexts, using Nepal as a case study. Under contemporary conditions, both models perform well in predicting FDCs, with Nash-Sutcliffe coefficients above 0.80 in 75 % of the tested catchments. The main drivers of uncertainty differ between the models: parameter interpolation was the main source of error for the statistical model, while violations of the assumptions of the process-based model represented the main source of its error. The process-based approach performed better than the statistical approach in numerical simulations with non-stationary climate drivers. The predictions of the statistical method under non-stationary rainfall conditions were poor if (i) local runoff coefficients were not accurately determined from the gauge network, or (ii) streamflow variability was strongly affected by changes in rainfall. A Monte Carlo analysis shows that the streamflow regimes in catchments characterized by frequent wet-season runoff and a rapid, strongly non-linear hydrologic response are particularly sensitive to changes in rainfall statistics. In these cases, process-based prediction approaches are favored over statistical models.

  4. Automatic prediction of protein domains from sequence information using a hybrid learning system.

    PubMed

    Nagarajan, Niranjan; Yona, Golan

    2004-06-12

    We describe a novel method for detecting the domain structure of a protein from sequence information alone. The method is based on analyzing multiple sequence alignments that are derived from a database search. Multiple measures are defined to quantify the domain information content of each position along the sequence and are combined into a single predictor using a neural network. The output is further smoothed and post-processed using a probabilistic model to predict the most likely transition positions between domains. The method was assessed using the domain definitions in SCOP and CATH for proteins of known structure and was compared with several other existing methods. Our method performs well both in terms of accuracy and sensitivity. It improves significantly over the best methods available, even some of the semi-manual ones, while being fully automatic. Our method can also be used to suggest and verify domain partitions based on structural data. A few examples of predicted domain definitions and alternative partitions, as suggested by our method, are also discussed. An online domain-prediction server is available at http://biozon.org/tools/domains/

  5. Prediction of protein subcellular localization by weighted gene ontology terms.

    PubMed

    Chi, Sang-Mun

    2010-08-27

    We develop a new weighting approach of gene ontology (GO) terms for predicting protein subcellular localization. The weights of individual GO terms, corresponding to their contribution to the prediction algorithm, are determined by the term-weighting methods used in text categorization. We evaluate several term-weighting methods, which are based on inverse document frequency, information gain, gain ratio, odds ratio, and chi-square and its variants. Additionally, we propose a new term-weighting method based on the logarithmic transformation of chi-square. The proposed term-weighting method performs better than other term-weighting methods, and also outperforms state-of-the-art subcellular prediction methods. Our proposed method achieves 98.1%, 99.3%, 98.1%, 98.1%, and 95.9% overall accuracies for the animal BaCelLo independent dataset (IDS), fungal BaCelLo IDS, animal Höglund IDS, fungal Höglund IDS, and PLOC dataset, respectively. Furthermore, the close correlation between high-weighted GO terms and subcellular localizations suggests that our proposed method appropriately weights GO terms according to their relevance to the localizations. Copyright 2010 Elsevier Inc. All rights reserved.

  6. Predictive onboard flow control for packet switching satellites

    NASA Technical Reports Server (NTRS)

    Bobinsky, Eric A.

    1992-01-01

    We outline two alternate approaches to predicting the onset of congestion in a packet switching satellite, and argue that predictive, rather than reactive, flow control is necessary for the efficient operation of such a system. The first method discussed is based on standard, statistical techniques which are used to periodically calculate a probability of near-term congestion based on arrival rate statistics. If this probability exceeds a present threshold, the satellite would transmit a rate-reduction signal to all active ground stations. The second method discussed would utilize a neural network to periodically predict the occurrence of buffer overflow based on input data which would include, in addition to arrival rates, the distributions of packet lengths, source addresses, and destination addresses.

  7. Utilizing knowledge base of amino acids structural neighborhoods to predict protein-protein interaction sites.

    PubMed

    Jelínek, Jan; Škoda, Petr; Hoksza, David

    2017-12-06

    Protein-protein interactions (PPI) play a key role in an investigation of various biochemical processes, and their identification is thus of great importance. Although computational prediction of which amino acids take part in a PPI has been an active field of research for some time, the quality of in-silico methods is still far from perfect. We have developed a novel prediction method called INSPiRE which benefits from a knowledge base built from data available in Protein Data Bank. All proteins involved in PPIs were converted into labeled graphs with nodes corresponding to amino acids and edges to pairs of neighboring amino acids. A structural neighborhood of each node was then encoded into a bit string and stored in the knowledge base. When predicting PPIs, INSPiRE labels amino acids of unknown proteins as interface or non-interface based on how often their structural neighborhood appears as interface or non-interface in the knowledge base. We evaluated INSPiRE's behavior with respect to different types and sizes of the structural neighborhood. Furthermore, we examined the suitability of several different features for labeling the nodes. Our evaluations showed that INSPiRE clearly outperforms existing methods with respect to Matthews correlation coefficient. In this paper we introduce a new knowledge-based method for identification of protein-protein interaction sites called INSPiRE. Its knowledge base utilizes structural patterns of known interaction sites in the Protein Data Bank which are then used for PPI prediction. Extensive experiments on several well-established datasets show that INSPiRE significantly surpasses existing PPI approaches.

  8. One- and two-stage Arrhenius models for pharmaceutical shelf life prediction.

    PubMed

    Fan, Zhewen; Zhang, Lanju

    2015-01-01

    One of the most challenging aspects of the pharmaceutical development is the demonstration and estimation of chemical stability. It is imperative that pharmaceutical products be stable for two or more years. Long-term stability studies are required to support such shelf life claim at registration. However, during drug development to facilitate formulation and dosage form selection, an accelerated stability study with stressed storage condition is preferred to quickly obtain a good prediction of shelf life under ambient storage conditions. Such a prediction typically uses Arrhenius equation that describes relationship between degradation rate and temperature (and humidity). Existing methods usually rely on the assumption of normality of the errors. In addition, shelf life projection is usually based on confidence band of a regression line. However, the coverage probability of a method is often overlooked or under-reported. In this paper, we introduce two nonparametric bootstrap procedures for shelf life estimation based on accelerated stability testing, and compare them with a one-stage nonlinear Arrhenius prediction model. Our simulation results demonstrate that one-stage nonlinear Arrhenius method has significant lower coverage than nominal levels. Our bootstrap method gave better coverage and led to a shelf life prediction closer to that based on long-term stability data.

  9. Predicting online ratings based on the opinion spreading process

    NASA Astrophysics Data System (ADS)

    He, Xing-Sheng; Zhou, Ming-Yang; Zhuo, Zhao; Fu, Zhong-Qian; Liu, Jian-Guo

    2015-10-01

    Predicting users' online ratings is always a challenge issue and has drawn lots of attention. In this paper, we present a rating prediction method by combining the user opinion spreading process with the collaborative filtering algorithm, where user similarity is defined by measuring the amount of opinion a user transfers to another based on the primitive user-item rating matrix. The proposed method could produce a more precise rating prediction for each unrated user-item pair. In addition, we introduce a tunable parameter λ to regulate the preferential diffusion relevant to the degree of both opinion sender and receiver. The numerical results for Movielens and Netflix data sets show that this algorithm has a better accuracy than the standard user-based collaborative filtering algorithm using Cosine and Pearson correlation without increasing computational complexity. By tuning λ, our method could further boost the prediction accuracy when using Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) as measurements. In the optimal cases, on Movielens and Netflix data sets, the corresponding algorithmic accuracy (MAE and RMSE) are improved 11.26% and 8.84%, 13.49% and 10.52% compared to the item average method, respectively.

  10. A Noise Level Prediction Method Based on Electro-Mechanical Frequency Response Function for Capacitors

    PubMed Central

    Zhu, Lingyu; Ji, Shengchang; Shen, Qi; Liu, Yuan; Li, Jinyu; Liu, Hao

    2013-01-01

    The capacitors in high-voltage direct-current (HVDC) converter stations radiate a lot of audible noise which can reach higher than 100 dB. The existing noise level prediction methods are not satisfying enough. In this paper, a new noise level prediction method is proposed based on a frequency response function considering both electrical and mechanical characteristics of capacitors. The electro-mechanical frequency response function (EMFRF) is defined as the frequency domain quotient of the vibration response and the squared capacitor voltage, and it is obtained from impulse current experiment. Under given excitations, the vibration response of the capacitor tank is the product of EMFRF and the square of the given capacitor voltage in frequency domain, and the radiated audible noise is calculated by structure acoustic coupling formulas. The noise level under the same excitations is also measured in laboratory, and the results are compared with the prediction. The comparison proves that the noise prediction method is effective. PMID:24349105

  11. A noise level prediction method based on electro-mechanical frequency response function for capacitors.

    PubMed

    Zhu, Lingyu; Ji, Shengchang; Shen, Qi; Liu, Yuan; Li, Jinyu; Liu, Hao

    2013-01-01

    The capacitors in high-voltage direct-current (HVDC) converter stations radiate a lot of audible noise which can reach higher than 100 dB. The existing noise level prediction methods are not satisfying enough. In this paper, a new noise level prediction method is proposed based on a frequency response function considering both electrical and mechanical characteristics of capacitors. The electro-mechanical frequency response function (EMFRF) is defined as the frequency domain quotient of the vibration response and the squared capacitor voltage, and it is obtained from impulse current experiment. Under given excitations, the vibration response of the capacitor tank is the product of EMFRF and the square of the given capacitor voltage in frequency domain, and the radiated audible noise is calculated by structure acoustic coupling formulas. The noise level under the same excitations is also measured in laboratory, and the results are compared with the prediction. The comparison proves that the noise prediction method is effective.

  12. Prediction of multi-drug resistance transporters using a novel sequence analysis method [version 2; referees: 2 approved

    DOE PAGES

    McDermott, Jason E.; Bruillard, Paul; Overall, Christopher C.; ...

    2015-03-09

    There are many examples of groups of proteins that have similar function, but the determinants of functional specificity may be hidden by lack of sequencesimilarity, or by large groups of similar sequences with different functions. Transporters are one such protein group in that the general function, transport, can be easily inferred from the sequence, but the substrate specificity can be impossible to predict from sequence with current methods. In this paper we describe a linguistic-based approach to identify functional patterns from groups of unaligned protein sequences and its application to predict multi-drug resistance transporters (MDRs) from bacteria. We first showmore » that our method can recreate known patterns from PROSITE for several motifs from unaligned sequences. We then show that the method, MDRpred, can predict MDRs with greater accuracy and positive predictive value than a collection of currently available family-based models from the Pfam database. Finally, we apply MDRpred to a large collection of protein sequences from an environmental microbiome study to make novel predictions about drug resistance in a potential environmental reservoir.« less

  13. Probabilistic Analysis of Aircraft Gas Turbine Disk Life and Reliability

    NASA Technical Reports Server (NTRS)

    Melis, Matthew E.; Zaretsky, Erwin V.; August, Richard

    1999-01-01

    Two series of low cycle fatigue (LCF) test data for two groups of different aircraft gas turbine engine compressor disk geometries were reanalyzed and compared using Weibull statistics. Both groups of disks were manufactured from titanium (Ti-6Al-4V) alloy. A NASA Glenn Research Center developed probabilistic computer code Probable Cause was used to predict disk life and reliability. A material-life factor A was determined for titanium (Ti-6Al-4V) alloy based upon fatigue disk data and successfully applied to predict the life of the disks as a function of speed. A comparison was made with the currently used life prediction method based upon crack growth rate. Applying an endurance limit to the computer code did not significantly affect the predicted lives under engine operating conditions. Failure location prediction correlates with those experimentally observed in the LCF tests. A reasonable correlation was obtained between the predicted disk lives using the Probable Cause code and a modified crack growth method for life prediction. Both methods slightly overpredict life for one disk group and significantly under predict it for the other.

  14. Posterior Predictive Bayesian Phylogenetic Model Selection

    PubMed Central

    Lewis, Paul O.; Xie, Wangang; Chen, Ming-Hui; Fan, Yu; Kuo, Lynn

    2014-01-01

    We present two distinctly different posterior predictive approaches to Bayesian phylogenetic model selection and illustrate these methods using examples from green algal protein-coding cpDNA sequences and flowering plant rDNA sequences. The Gelfand–Ghosh (GG) approach allows dissection of an overall measure of model fit into components due to posterior predictive variance (GGp) and goodness-of-fit (GGg), which distinguishes this method from the posterior predictive P-value approach. The conditional predictive ordinate (CPO) method provides a site-specific measure of model fit useful for exploratory analyses and can be combined over sites yielding the log pseudomarginal likelihood (LPML) which is useful as an overall measure of model fit. CPO provides a useful cross-validation approach that is computationally efficient, requiring only a sample from the posterior distribution (no additional simulation is required). Both GG and CPO add new perspectives to Bayesian phylogenetic model selection based on the predictive abilities of models and complement the perspective provided by the marginal likelihood (including Bayes Factor comparisons) based solely on the fit of competing models to observed data. [Bayesian; conditional predictive ordinate; CPO; L-measure; LPML; model selection; phylogenetics; posterior predictive.] PMID:24193892

  15. Protein loop modeling using a new hybrid energy function and its application to modeling in inaccurate structural environments.

    PubMed

    Park, Hahnbeom; Lee, Gyu Rie; Heo, Lim; Seok, Chaok

    2014-01-01

    Protein loop modeling is a tool for predicting protein local structures of particular interest, providing opportunities for applications involving protein structure prediction and de novo protein design. Until recently, the majority of loop modeling methods have been developed and tested by reconstructing loops in frameworks of experimentally resolved structures. In many practical applications, however, the protein loops to be modeled are located in inaccurate structural environments. These include loops in model structures, low-resolution experimental structures, or experimental structures of different functional forms. Accordingly, discrepancies in the accuracy of the structural environment assumed in development of the method and that in practical applications present additional challenges to modern loop modeling methods. This study demonstrates a new strategy for employing a hybrid energy function combining physics-based and knowledge-based components to help tackle this challenge. The hybrid energy function is designed to combine the strengths of each energy component, simultaneously maintaining accurate loop structure prediction in a high-resolution framework structure and tolerating minor environmental errors in low-resolution structures. A loop modeling method based on global optimization of this new energy function is tested on loop targets situated in different levels of environmental errors, ranging from experimental structures to structures perturbed in backbone as well as side chains and template-based model structures. The new method performs comparably to force field-based approaches in loop reconstruction in crystal structures and better in loop prediction in inaccurate framework structures. This result suggests that higher-accuracy predictions would be possible for a broader range of applications. The web server for this method is available at http://galaxy.seoklab.org/loop with the PS2 option for the scoring function.

  16. Comparison of modeling methods to predict the spatial distribution of deep-sea coral and sponge in the Gulf of Alaska

    NASA Astrophysics Data System (ADS)

    Rooper, Christopher N.; Zimmermann, Mark; Prescott, Megan M.

    2017-08-01

    Deep-sea coral and sponge ecosystems are widespread throughout most of Alaska's marine waters, and are associated with many different species of fishes and invertebrates. These ecosystems are vulnerable to the effects of commercial fishing activities and climate change. We compared four commonly used species distribution models (general linear models, generalized additive models, boosted regression trees and random forest models) and an ensemble model to predict the presence or absence and abundance of six groups of benthic invertebrate taxa in the Gulf of Alaska. All four model types performed adequately on training data for predicting presence and absence, with regression forest models having the best overall performance measured by the area under the receiver-operating-curve (AUC). The models also performed well on the test data for presence and absence with average AUCs ranging from 0.66 to 0.82. For the test data, ensemble models performed the best. For abundance data, there was an obvious demarcation in performance between the two regression-based methods (general linear models and generalized additive models), and the tree-based models. The boosted regression tree and random forest models out-performed the other models by a wide margin on both the training and testing data. However, there was a significant drop-off in performance for all models of invertebrate abundance ( 50%) when moving from the training data to the testing data. Ensemble model performance was between the tree-based and regression-based methods. The maps of predictions from the models for both presence and abundance agreed very well across model types, with an increase in variability in predictions for the abundance data. We conclude that where data conforms well to the modeled distribution (such as the presence-absence data and binomial distribution in this study), the four types of models will provide similar results, although the regression-type models may be more consistent with biological theory. For data with highly zero-inflated distributions and non-normal distributions such as the abundance data from this study, the tree-based methods performed better. Ensemble models that averaged predictions across the four model types, performed better than the GLM or GAM models but slightly poorer than the tree-based methods, suggesting ensemble models might be more robust to overfitting than tree methods, while mitigating some of the disadvantages in predictive performance of regression methods.

  17. A Critical Review for Developing Accurate and Dynamic Predictive Models Using Machine Learning Methods in Medicine and Health Care.

    PubMed

    Alanazi, Hamdan O; Abdullah, Abdul Hanan; Qureshi, Kashif Naseer

    2017-04-01

    Recently, Artificial Intelligence (AI) has been used widely in medicine and health care sector. In machine learning, the classification or prediction is a major field of AI. Today, the study of existing predictive models based on machine learning methods is extremely active. Doctors need accurate predictions for the outcomes of their patients' diseases. In addition, for accurate predictions, timing is another significant factor that influences treatment decisions. In this paper, existing predictive models in medicine and health care have critically reviewed. Furthermore, the most famous machine learning methods have explained, and the confusion between a statistical approach and machine learning has clarified. A review of related literature reveals that the predictions of existing predictive models differ even when the same dataset is used. Therefore, existing predictive models are essential, and current methods must be improved.

  18. Climate-based models for pulsed resources improve predictability of consumer population dynamics: outbreaks of house mice in forest ecosystems.

    PubMed

    Holland, E Penelope; James, Alex; Ruscoe, Wendy A; Pech, Roger P; Byrom, Andrea E

    2015-01-01

    Accurate predictions of the timing and magnitude of consumer responses to episodic seeding events (masts) are important for understanding ecosystem dynamics and for managing outbreaks of invasive species generated by masts. While models relating consumer populations to resource fluctuations have been developed successfully for a range of natural and modified ecosystems, a critical gap that needs addressing is better prediction of resource pulses. A recent model used change in summer temperature from one year to the next (ΔT) for predicting masts for forest and grassland plants in New Zealand. We extend this climate-based method in the framework of a model for consumer-resource dynamics to predict invasive house mouse (Mus musculus) outbreaks in forest ecosystems. Compared with previous mast models based on absolute temperature, the ΔT method for predicting masts resulted in an improved model for mouse population dynamics. There was also a threshold effect of ΔT on the likelihood of an outbreak occurring. The improved climate-based method for predicting resource pulses and consumer responses provides a straightforward rule of thumb for determining, with one year's advance warning, whether management intervention might be required in invaded ecosystems. The approach could be applied to consumer-resource systems worldwide where climatic variables are used to model the size and duration of resource pulses, and may have particular relevance for ecosystems where global change scenarios predict increased variability in climatic events.

  19. Evaluating the predictive power of multivariate tensor-based morphometry in Alzheimer's disease progression via convex fused sparse group Lasso

    NASA Astrophysics Data System (ADS)

    Tsao, Sinchai; Gajawelli, Niharika; Zhou, Jiayu; Shi, Jie; Ye, Jieping; Wang, Yalin; Lepore, Natasha

    2014-03-01

    Prediction of Alzheimers disease (AD) progression based on baseline measures allows us to understand disease progression and has implications in decisions concerning treatment strategy. To this end we combine a predictive multi-task machine learning method1 with novel MR-based multivariate morphometric surface map of the hippocampus2 to predict future cognitive scores of patients. Previous work by Zhou et al.1 has shown that a multi-task learning framework that performs prediction of all future time points (or tasks) simultaneously can be used to encode both sparsity as well as temporal smoothness. They showed that this can be used in predicting cognitive outcomes of Alzheimers Disease Neuroimaging Initiative (ADNI) subjects based on FreeSurfer-based baseline MRI features, MMSE score demographic information and ApoE status. Whilst volumetric information may hold generalized information on brain status, we hypothesized that hippocampus specific information may be more useful in predictive modeling of AD. To this end, we applied Shi et al.2s recently developed multivariate tensor-based (mTBM) parametric surface analysis method to extract features from the hippocampal surface. We show that by combining the power of the multi-task framework with the sensitivity of mTBM features of the hippocampus surface, we are able to improve significantly improve predictive performance of ADAS cognitive scores 6, 12, 24, 36 and 48 months from baseline.

  20. New support vector machine-based method for microRNA target prediction.

    PubMed

    Li, L; Gao, Q; Mao, X; Cao, Y

    2014-06-09

    MicroRNA (miRNA) plays important roles in cell differentiation, proliferation, growth, mobility, and apoptosis. An accurate list of precise target genes is necessary in order to fully understand the importance of miRNAs in animal development and disease. Several computational methods have been proposed for miRNA target-gene identification. However, these methods still have limitations with respect to their sensitivity and accuracy. Thus, we developed a new miRNA target-prediction method based on the support vector machine (SVM) model. The model supplies information of two binding sites (primary and secondary) for a radial basis function kernel as a similarity measure for SVM features. The information is categorized based on structural, thermodynamic, and sequence conservation. Using high-confidence datasets selected from public miRNA target databases, we obtained a human miRNA target SVM classifier model with high performance and provided an efficient tool for human miRNA target gene identification. Experiments have shown that our method is a reliable tool for miRNA target-gene prediction, and a successful application of an SVM classifier. Compared with other methods, the method proposed here improves the sensitivity and accuracy of miRNA prediction. Its performance can be further improved by providing more training examples.

  1. Multiplier method may be unreliable to predict the timing of temporary hemiepiphysiodesis for coronal angular deformity.

    PubMed

    Wu, Zhenkai; Ding, Jing; Zhao, Dahang; Zhao, Li; Li, Hai; Liu, Jianlin

    2017-07-10

    The multiplier method was introduced by Paley to calculate the timing for temporary hemiepiphysiodesis. However, this method has not been verified in terms of clinical outcome measure. We aimed to (1) predict the rate of angular correction per year (ACPY) at the various corresponding ages by means of multiplier method and verify the reliability based on the data from the published studies and (2) screen out risk factors for deviation of prediction. A comprehensive search was performed in the following electronic databases: Cochrane, PubMed, and EMBASE™. A total of 22 studies met the inclusion criteria. If the actual value of ACPY from the collected date was located out of the range of the predicted value based on the multiplier method, it was considered as the deviation of prediction (DOP). The associations of patient characteristics with DOP were assessed with the use of univariate logistic regression. Only one article was evaluated as moderate evidence; the remaining articles were evaluated as poor quality. The rate of DOP was 31.82%. In the detailed individual data of included studies, the rate of DOP was 55.44%. The multiplier method is not reliable in predicting the timing for temporary hemiepiphysiodesis, even though it is prone to be more reliable for the younger patients with idiopathic genu coronal deformity.

  2. Modified-Fibonacci-Dual-Lucas method for earthquake prediction

    NASA Astrophysics Data System (ADS)

    Boucouvalas, A. C.; Gkasios, M.; Tselikas, N. T.; Drakatos, G.

    2015-06-01

    The FDL method makes use of Fibonacci, Dual and Lucas numbers and has shown considerable success in predicting earthquake events locally as well as globally. Predicting the location of the epicenter of an earthquake is one difficult challenge the other being the timing and magnitude. One technique for predicting the onset of earthquakes is the use of cycles, and the discovery of periodicity. Part of this category is the reported FDL method. The basis of the reported FDL method is the creation of FDL future dates based on the onset date of significant earthquakes. The assumption being that each occurred earthquake discontinuity can be thought of as a generating source of FDL time series The connection between past earthquakes and future earthquakes based on FDL numbers has also been reported with sample earthquakes since 1900. Using clustering methods it has been shown that significant earthquakes (<6.5R) can be predicted with very good accuracy window (+-1 day). In this contribution we present an improvement modification to the FDL method, the MFDL method, which performs better than the FDL. We use the FDL numbers to develop possible earthquakes dates but with the important difference that the starting seed date is a trigger planetary aspect prior to the earthquake. Typical planetary aspects are Moon conjunct Sun, Moon opposite Sun, Moon conjunct or opposite North or South Modes. In order to test improvement of the method we used all +8R earthquakes recorded since 1900, (86 earthquakes from USGS data). We have developed the FDL numbers for each of those seeds, and examined the earthquake hit rates (for a window of 3, i.e. +-1 day of target date) and for <6.5R. The successes are counted for each one of the 86 earthquake seeds and we compare the MFDL method with the FDL method. In every case we find improvement when the starting seed date is on the planetary trigger date prior to the earthquake. We observe no improvement only when a planetary trigger coincided with the earthquake date and in this case the FDL method coincides with the MFDL. Based on the MDFL method we present the prediction method capable of predicting global events or localized earthquakes and we will discuss the accuracy of the method in as far as the prediction and location parts of the method. We show example calendar style predictions for global events as well as for the Greek region using planetary alignment seeds.

  3. Selecting Optimal Random Forest Predictive Models: A Case Study on Predicting the Spatial Distribution of Seabed Hardness

    PubMed Central

    Li, Jin; Tran, Maggie; Siwabessy, Justy

    2016-01-01

    Spatially continuous predictions of seabed hardness are important baseline environmental information for sustainable management of Australia’s marine jurisdiction. Seabed hardness is often inferred from multibeam backscatter data with unknown accuracy and can be inferred from underwater video footage at limited locations. In this study, we classified the seabed into four classes based on two new seabed hardness classification schemes (i.e., hard90 and hard70). We developed optimal predictive models to predict seabed hardness using random forest (RF) based on the point data of hardness classes and spatially continuous multibeam data. Five feature selection (FS) methods that are variable importance (VI), averaged variable importance (AVI), knowledge informed AVI (KIAVI), Boruta and regularized RF (RRF) were tested based on predictive accuracy. Effects of highly correlated, important and unimportant predictors on the accuracy of RF predictive models were examined. Finally, spatial predictions generated using the most accurate models were visually examined and analysed. This study confirmed that: 1) hard90 and hard70 are effective seabed hardness classification schemes; 2) seabed hardness of four classes can be predicted with a high degree of accuracy; 3) the typical approach used to pre-select predictive variables by excluding highly correlated variables needs to be re-examined; 4) the identification of the important and unimportant predictors provides useful guidelines for further improving predictive models; 5) FS methods select the most accurate predictive model(s) instead of the most parsimonious ones, and AVI and Boruta are recommended for future studies; and 6) RF is an effective modelling method with high predictive accuracy for multi-level categorical data and can be applied to ‘small p and large n’ problems in environmental sciences. Additionally, automated computational programs for AVI need to be developed to increase its computational efficiency and caution should be taken when applying filter FS methods in selecting predictive models. PMID:26890307

  4. Selecting Optimal Random Forest Predictive Models: A Case Study on Predicting the Spatial Distribution of Seabed Hardness.

    PubMed

    Li, Jin; Tran, Maggie; Siwabessy, Justy

    2016-01-01

    Spatially continuous predictions of seabed hardness are important baseline environmental information for sustainable management of Australia's marine jurisdiction. Seabed hardness is often inferred from multibeam backscatter data with unknown accuracy and can be inferred from underwater video footage at limited locations. In this study, we classified the seabed into four classes based on two new seabed hardness classification schemes (i.e., hard90 and hard70). We developed optimal predictive models to predict seabed hardness using random forest (RF) based on the point data of hardness classes and spatially continuous multibeam data. Five feature selection (FS) methods that are variable importance (VI), averaged variable importance (AVI), knowledge informed AVI (KIAVI), Boruta and regularized RF (RRF) were tested based on predictive accuracy. Effects of highly correlated, important and unimportant predictors on the accuracy of RF predictive models were examined. Finally, spatial predictions generated using the most accurate models were visually examined and analysed. This study confirmed that: 1) hard90 and hard70 are effective seabed hardness classification schemes; 2) seabed hardness of four classes can be predicted with a high degree of accuracy; 3) the typical approach used to pre-select predictive variables by excluding highly correlated variables needs to be re-examined; 4) the identification of the important and unimportant predictors provides useful guidelines for further improving predictive models; 5) FS methods select the most accurate predictive model(s) instead of the most parsimonious ones, and AVI and Boruta are recommended for future studies; and 6) RF is an effective modelling method with high predictive accuracy for multi-level categorical data and can be applied to 'small p and large n' problems in environmental sciences. Additionally, automated computational programs for AVI need to be developed to increase its computational efficiency and caution should be taken when applying filter FS methods in selecting predictive models.

  5. Algorithm for predicting the evolution of series of dynamics of complex systems in solving information problems

    NASA Astrophysics Data System (ADS)

    Kasatkina, T. I.; Dushkin, A. V.; Pavlov, V. A.; Shatovkin, R. R.

    2018-03-01

    In the development of information, systems and programming to predict the series of dynamics, neural network methods have recently been applied. They are more flexible, in comparison with existing analogues and are capable of taking into account the nonlinearities of the series. In this paper, we propose a modified algorithm for predicting the series of dynamics, which includes a method for training neural networks, an approach to describing and presenting input data, based on the prediction by the multilayer perceptron method. To construct a neural network, the values of a series of dynamics at the extremum points and time values corresponding to them, formed based on the sliding window method, are used as input data. The proposed algorithm can act as an independent approach to predicting the series of dynamics, and be one of the parts of the forecasting system. The efficiency of predicting the evolution of the dynamics series for a short-term one-step and long-term multi-step forecast by the classical multilayer perceptron method and a modified algorithm using synthetic and real data is compared. The result of this modification was the minimization of the magnitude of the iterative error that arises from the previously predicted inputs to the inputs to the neural network, as well as the increase in the accuracy of the iterative prediction of the neural network.

  6. Improving prediction of heterodimeric protein complexes using combination with pairwise kernel.

    PubMed

    Ruan, Peiying; Hayashida, Morihiro; Akutsu, Tatsuya; Vert, Jean-Philippe

    2018-02-19

    Since many proteins become functional only after they interact with their partner proteins and form protein complexes, it is essential to identify the sets of proteins that form complexes. Therefore, several computational methods have been proposed to predict complexes from the topology and structure of experimental protein-protein interaction (PPI) network. These methods work well to predict complexes involving at least three proteins, but generally fail at identifying complexes involving only two different proteins, called heterodimeric complexes or heterodimers. There is however an urgent need for efficient methods to predict heterodimers, since the majority of known protein complexes are precisely heterodimers. In this paper, we use three promising kernel functions, Min kernel and two pairwise kernels, which are Metric Learning Pairwise Kernel (MLPK) and Tensor Product Pairwise Kernel (TPPK). We also consider the normalization forms of Min kernel. Then, we combine Min kernel or its normalization form and one of the pairwise kernels by plugging. We applied kernels based on PPI, domain, phylogenetic profile, and subcellular localization properties to predicting heterodimers. Then, we evaluate our method by employing C-Support Vector Classification (C-SVC), carrying out 10-fold cross-validation, and calculating the average F-measures. The results suggest that the combination of normalized-Min-kernel and MLPK leads to the best F-measure and improved the performance of our previous work, which had been the best existing method so far. We propose new methods to predict heterodimers, using a machine learning-based approach. We train a support vector machine (SVM) to discriminate interacting vs non-interacting protein pairs, based on informations extracted from PPI, domain, phylogenetic profiles and subcellular localization. We evaluate in detail new kernel functions to encode these data, and report prediction performance that outperforms the state-of-the-art.

  7. The energetic cost of walking: a comparison of predictive methods.

    PubMed

    Kramer, Patricia Ann; Sylvester, Adam D

    2011-01-01

    The energy that animals devote to locomotion has been of intense interest to biologists for decades and two basic methodologies have emerged to predict locomotor energy expenditure: those based on metabolic and those based on mechanical energy. Metabolic energy approaches share the perspective that prediction of locomotor energy expenditure should be based on statistically significant proxies of metabolic function, while mechanical energy approaches, which derive from many different perspectives, focus on quantifying the energy of movement. Some controversy exists as to which mechanical perspective is "best", but from first principles all mechanical methods should be equivalent if the inputs to the simulation are of similar quality. Our goals in this paper are 1) to establish the degree to which the various methods of calculating mechanical energy are correlated, and 2) to investigate to what degree the prediction methods explain the variation in energy expenditure. We use modern humans as the model organism in this experiment because their data are readily attainable, but the methodology is appropriate for use in other species. Volumetric oxygen consumption and kinematic and kinetic data were collected on 8 adults while walking at their self-selected slow, normal and fast velocities. Using hierarchical statistical modeling via ordinary least squares and maximum likelihood techniques, the predictive ability of several metabolic and mechanical approaches were assessed. We found that all approaches are correlated and that the mechanical approaches explain similar amounts of the variation in metabolic energy expenditure. Most methods predict the variation within an individual well, but are poor at accounting for variation between individuals. Our results indicate that the choice of predictive method is dependent on the question(s) of interest and the data available for use as inputs. Although we used modern humans as our model organism, these results can be extended to other species.

  8. Variable context Markov chains for HIV protease cleavage site prediction.

    PubMed

    Oğul, Hasan

    2009-06-01

    Deciphering the knowledge of HIV protease specificity and developing computational tools for detecting its cleavage sites in protein polypeptide chain are very desirable for designing efficient and specific chemical inhibitors to prevent acquired immunodeficiency syndrome. In this study, we developed a generative model based on a generalization of variable order Markov chains (VOMC) for peptide sequences and adapted the model for prediction of their cleavability by certain proteases. The new method, called variable context Markov chains (VCMC), attempts to identify the context equivalence based on the evolutionary similarities between individual amino acids. It was applied for HIV-1 protease cleavage site prediction problem and shown to outperform existing methods in terms of prediction accuracy on a common dataset. In general, the method is a promising tool for prediction of cleavage sites of all proteases and encouraged to be used for any kind of peptide classification problem as well.

  9. New insights from cluster analysis methods for RNA secondary structure prediction

    PubMed Central

    Rogers, Emily; Heitsch, Christine

    2016-01-01

    A widening gap exists between the best practices for RNA secondary structure prediction developed by computational researchers and the methods used in practice by experimentalists. Minimum free energy (MFE) predictions, although broadly used, are outperformed by methods which sample from the Boltzmann distribution and data mine the results. In particular, moving beyond the single structure prediction paradigm yields substantial gains in accuracy. Furthermore, the largest improvements in accuracy and precision come from viewing secondary structures not at the base pair level but at lower granularity/higher abstraction. This suggests that random errors affecting precision and systematic ones affecting accuracy are both reduced by this “fuzzier” view of secondary structures. Thus experimentalists who are willing to adopt a more rigorous, multilayered approach to secondary structure prediction by iterating through these levels of granularity will be much better able to capture fundamental aspects of RNA base pairing. PMID:26971529

  10. Secondary Structure Predictions for Long RNA Sequences Based on Inversion Excursions and MapReduce.

    PubMed

    Yehdego, Daniel T; Zhang, Boyu; Kodimala, Vikram K R; Johnson, Kyle L; Taufer, Michela; Leung, Ming-Ying

    2013-05-01

    Secondary structures of ribonucleic acid (RNA) molecules play important roles in many biological processes including gene expression and regulation. Experimental observations and computing limitations suggest that we can approach the secondary structure prediction problem for long RNA sequences by segmenting them into shorter chunks, predicting the secondary structures of each chunk individually using existing prediction programs, and then assembling the results to give the structure of the original sequence. The selection of cutting points is a crucial component of the segmenting step. Noting that stem-loops and pseudoknots always contain an inversion, i.e., a stretch of nucleotides followed closely by its inverse complementary sequence, we developed two cutting methods for segmenting long RNA sequences based on inversion excursions: the centered and optimized method. Each step of searching for inversions, chunking, and predictions can be performed in parallel. In this paper we use a MapReduce framework, i.e., Hadoop, to extensively explore meaningful inversion stem lengths and gap sizes for the segmentation and identify correlations between chunking methods and prediction accuracy. We show that for a set of long RNA sequences in the RFAM database, whose secondary structures are known to contain pseudoknots, our approach predicts secondary structures more accurately than methods that do not segment the sequence, when the latter predictions are possible computationally. We also show that, as sequences exceed certain lengths, some programs cannot computationally predict pseudoknots while our chunking methods can. Overall, our predicted structures still retain the accuracy level of the original prediction programs when compared with known experimental secondary structure.

  11. A Micromechanics-Based Method for Multiscale Fatigue Prediction

    NASA Astrophysics Data System (ADS)

    Moore, John Allan

    An estimated 80% of all structural failures are due to mechanical fatigue, often resulting in catastrophic, dangerous and costly failure events. However, an accurate model to predict fatigue remains an elusive goal. One of the major challenges is that fatigue is intrinsically a multiscale process, which is dependent on a structure's geometric design as well as its material's microscale morphology. The following work begins with a microscale study of fatigue nucleation around non- metallic inclusions. Based on this analysis, a novel multiscale method for fatigue predictions is developed. This method simulates macroscale geometries explicitly while concurrently calculating the simplified response of microscale inclusions. Thus, providing adequate detail on multiple scales for accurate fatigue life predictions. The methods herein provide insight into the multiscale nature of fatigue, while also developing a tool to aid in geometric design and material optimization for fatigue critical devices such as biomedical stents and artificial heart valves.

  12. Binding ligand prediction for proteins using partial matching of local surface patches.

    PubMed

    Sael, Lee; Kihara, Daisuke

    2010-01-01

    Functional elucidation of uncharacterized protein structures is an important task in bioinformatics. We report our new approach for structure-based function prediction which captures local surface features of ligand binding pockets. Function of proteins, specifically, binding ligands of proteins, can be predicted by finding similar local surface regions of known proteins. To enable partial comparison of binding sites in proteins, a weighted bipartite matching algorithm is used to match pairs of surface patches. The surface patches are encoded with the 3D Zernike descriptors. Unlike the existing methods which compare global characteristics of the protein fold or the global pocket shape, the local surface patch method can find functional similarity between non-homologous proteins and binding pockets for flexible ligand molecules. The proposed method improves prediction results over global pocket shape-based method which was previously developed by our group.

  13. Binding Ligand Prediction for Proteins Using Partial Matching of Local Surface Patches

    PubMed Central

    Sael, Lee; Kihara, Daisuke

    2010-01-01

    Functional elucidation of uncharacterized protein structures is an important task in bioinformatics. We report our new approach for structure-based function prediction which captures local surface features of ligand binding pockets. Function of proteins, specifically, binding ligands of proteins, can be predicted by finding similar local surface regions of known proteins. To enable partial comparison of binding sites in proteins, a weighted bipartite matching algorithm is used to match pairs of surface patches. The surface patches are encoded with the 3D Zernike descriptors. Unlike the existing methods which compare global characteristics of the protein fold or the global pocket shape, the local surface patch method can find functional similarity between non-homologous proteins and binding pockets for flexible ligand molecules. The proposed method improves prediction results over global pocket shape-based method which was previously developed by our group. PMID:21614188

  14. Prediction-based manufacturing center self-adaptive demand side energy optimization in cyber physical systems

    NASA Astrophysics Data System (ADS)

    Sun, Xinyao; Wang, Xue; Wu, Jiangwei; Liu, Youda

    2014-05-01

    Cyber physical systems(CPS) recently emerge as a new technology which can provide promising approaches to demand side management(DSM), an important capability in industrial power systems. Meanwhile, the manufacturing center is a typical industrial power subsystem with dozens of high energy consumption devices which have complex physical dynamics. DSM, integrated with CPS, is an effective methodology for solving energy optimization problems in manufacturing center. This paper presents a prediction-based manufacturing center self-adaptive energy optimization method for demand side management in cyber physical systems. To gain prior knowledge of DSM operating results, a sparse Bayesian learning based componential forecasting method is introduced to predict 24-hour electric load levels for specific industrial areas in China. From this data, a pricing strategy is designed based on short-term load forecasting results. To minimize total energy costs while guaranteeing manufacturing center service quality, an adaptive demand side energy optimization algorithm is presented. The proposed scheme is tested in a machining center energy optimization experiment. An AMI sensing system is then used to measure the demand side energy consumption of the manufacturing center. Based on the data collected from the sensing system, the load prediction-based energy optimization scheme is implemented. By employing both the PSO and the CPSO method, the problem of DSM in the manufacturing center is solved. The results of the experiment show the self-adaptive CPSO energy optimization method enhances optimization by 5% compared with the traditional PSO optimization method.

  15. Three methods to construct predictive models using logistic regression and likelihood ratios to facilitate adjustment for pretest probability give similar results.

    PubMed

    Chan, Siew Foong; Deeks, Jonathan J; Macaskill, Petra; Irwig, Les

    2008-01-01

    To compare three predictive models based on logistic regression to estimate adjusted likelihood ratios allowing for interdependency between diagnostic variables (tests). This study was a review of the theoretical basis, assumptions, and limitations of published models; and a statistical extension of methods and application to a case study of the diagnosis of obstructive airways disease based on history and clinical examination. Albert's method includes an offset term to estimate an adjusted likelihood ratio for combinations of tests. Spiegelhalter and Knill-Jones method uses the unadjusted likelihood ratio for each test as a predictor and computes shrinkage factors to allow for interdependence. Knottnerus' method differs from the other methods because it requires sequencing of tests, which limits its application to situations where there are few tests and substantial data. Although parameter estimates differed between the models, predicted "posttest" probabilities were generally similar. Construction of predictive models using logistic regression is preferred to the independence Bayes' approach when it is important to adjust for dependency of tests errors. Methods to estimate adjusted likelihood ratios from predictive models should be considered in preference to a standard logistic regression model to facilitate ease of interpretation and application. Albert's method provides the most straightforward approach.

  16. Improved regulatory element prediction based on tissue-specific local epigenomic signatures

    PubMed Central

    He, Yupeng; Gorkin, David U.; Dickel, Diane E.; Nery, Joseph R.; Castanon, Rosa G.; Lee, Ah Young; Shen, Yin; Visel, Axel; Pennacchio, Len A.; Ren, Bing; Ecker, Joseph R.

    2017-01-01

    Accurate enhancer identification is critical for understanding the spatiotemporal transcriptional regulation during development as well as the functional impact of disease-related noncoding genetic variants. Computational methods have been developed to predict the genomic locations of active enhancers based on histone modifications, but the accuracy and resolution of these methods remain limited. Here, we present an algorithm, regulatory element prediction based on tissue-specific local epigenetic marks (REPTILE), which integrates histone modification and whole-genome cytosine DNA methylation profiles to identify the precise location of enhancers. We tested the ability of REPTILE to identify enhancers previously validated in reporter assays. Compared with existing methods, REPTILE shows consistently superior performance across diverse cell and tissue types, and the enhancer locations are significantly more refined. We show that, by incorporating base-resolution methylation data, REPTILE greatly improves upon current methods for annotation of enhancers across a variety of cell and tissue types. REPTILE is available at https://github.com/yupenghe/REPTILE/. PMID:28193886

  17. Guiding Conformation Space Search with an All-Atom Energy Potential

    PubMed Central

    Brunette, TJ; Brock, Oliver

    2009-01-01

    The most significant impediment for protein structure prediction is the inadequacy of conformation space search. Conformation space is too large and the energy landscape too rugged for existing search methods to consistently find near-optimal minima. To alleviate this problem, we present model-based search, a novel conformation space search method. Model-based search uses highly accurate information obtained during search to build an approximate, partial model of the energy landscape. Model-based search aggregates information in the model as it progresses, and in turn uses this information to guide exploration towards regions most likely to contain a near-optimal minimum. We validate our method by predicting the structure of 32 proteins, ranging in length from 49 to 213 amino acids. Our results demonstrate that model-based search is more effective at finding low-energy conformations in high-dimensional conformation spaces than existing search methods. The reduction in energy translates into structure predictions of increased accuracy. PMID:18536015

  18. Predictive modeling and reducing cyclic variability in autoignition engines

    DOEpatents

    Hellstrom, Erik; Stefanopoulou, Anna; Jiang, Li; Larimore, Jacob

    2016-08-30

    Methods and systems are provided for controlling a vehicle engine to reduce cycle-to-cycle combustion variation. A predictive model is applied to predict cycle-to-cycle combustion behavior of an engine based on observed engine performance variables. Conditions are identified, based on the predicted cycle-to-cycle combustion behavior, that indicate high cycle-to-cycle combustion variation. Corrective measures are then applied to prevent the predicted high cycle-to-cycle combustion variation.

  19. HMMBinder: DNA-Binding Protein Prediction Using HMM Profile Based Features.

    PubMed

    Zaman, Rianon; Chowdhury, Shahana Yasmin; Rashid, Mahmood A; Sharma, Alok; Dehzangi, Abdollah; Shatabda, Swakkhar

    2017-01-01

    DNA-binding proteins often play important role in various processes within the cell. Over the last decade, a wide range of classification algorithms and feature extraction techniques have been used to solve this problem. In this paper, we propose a novel DNA-binding protein prediction method called HMMBinder. HMMBinder uses monogram and bigram features extracted from the HMM profiles of the protein sequences. To the best of our knowledge, this is the first application of HMM profile based features for the DNA-binding protein prediction problem. We applied Support Vector Machines (SVM) as a classification technique in HMMBinder. Our method was tested on standard benchmark datasets. We experimentally show that our method outperforms the state-of-the-art methods found in the literature.

  20. Efficient operation scheduling for adsorption chillers using predictive optimization-based control methods

    NASA Astrophysics Data System (ADS)

    Bürger, Adrian; Sawant, Parantapa; Bohlayer, Markus; Altmann-Dieses, Angelika; Braun, Marco; Diehl, Moritz

    2017-10-01

    Within this work, the benefits of using predictive control methods for the operation of Adsorption Cooling Machines (ACMs) are shown on a simulation study. Since the internal control decisions of series-manufactured ACMs often cannot be influenced, the work focuses on optimized scheduling of an ACM considering its internal functioning as well as forecasts for load and driving energy occurrence. For illustration, an assumed solar thermal climate system is introduced and a system model suitable for use within gradient-based optimization methods is developed. The results of a system simulation using a conventional scheme for ACM scheduling are compared to the results of a predictive, optimization-based scheduling approach for the same exemplary scenario of load and driving energy occurrence. The benefits of the latter approach are shown and future actions for application of these methods for system control are addressed.

  1. Normalized Rotational Multiple Yield Surface Framework (NRMYSF) stress-strain curve prediction method based on small strain triaxial test data on undisturbed Auckland residual clay soils

    NASA Astrophysics Data System (ADS)

    Noor, M. J. Md; Ibrahim, A.; Rahman, A. S. A.

    2018-04-01

    Small strain triaxial test measurement is considered to be significantly accurate compared to the external strain measurement using conventional method due to systematic errors normally associated with the test. Three submersible miniature linear variable differential transducer (LVDT) mounted on yokes which clamped directly onto the soil sample at equally 120° from the others. The device setup using 0.4 N resolution load cell and 16 bit AD converter was capable of consistently resolving displacement of less than 1µm and measuring axial strains ranging from less than 0.001% to 2.5%. Further analysis of small strain local measurement data was performed using new Normalized Multiple Yield Surface Framework (NRMYSF) method and compared with existing Rotational Multiple Yield Surface Framework (RMYSF) prediction method. The prediction of shear strength based on combined intrinsic curvilinear shear strength envelope using small strain triaxial test data confirmed the significant improvement and reliability of the measurement and analysis methods. Moreover, the NRMYSF method shows an excellent data prediction and significant improvement toward more reliable prediction of soil strength that can reduce the cost and time of experimental laboratory test.

  2. Construction of prediction intervals for Palmer Drought Severity Index using bootstrap

    NASA Astrophysics Data System (ADS)

    Beyaztas, Ufuk; Bickici Arikan, Bugrayhan; Beyaztas, Beste Hamiye; Kahya, Ercan

    2018-04-01

    In this study, we propose an approach based on the residual-based bootstrap method to obtain valid prediction intervals using monthly, short-term (three-months) and mid-term (six-months) drought observations. The effects of North Atlantic and Arctic Oscillation indexes on the constructed prediction intervals are also examined. Performance of the proposed approach is evaluated for the Palmer Drought Severity Index (PDSI) obtained from Konya closed basin located in Central Anatolia, Turkey. The finite sample properties of the proposed method are further illustrated by an extensive simulation study. Our results revealed that the proposed approach is capable of producing valid prediction intervals for future PDSI values.

  3. A Third Approach to Gene Prediction Suggests Thousands of Additional Human Transcribed Regions

    PubMed Central

    Glusman, Gustavo; Qin, Shizhen; El-Gewely, M. Raafat; Siegel, Andrew F; Roach, Jared C; Hood, Leroy; Smit, Arian F. A

    2006-01-01

    The identification and characterization of the complete ensemble of genes is a main goal of deciphering the digital information stored in the human genome. Many algorithms for computational gene prediction have been described, ultimately derived from two basic concepts: (1) modeling gene structure and (2) recognizing sequence similarity. Successful hybrid methods combining these two concepts have also been developed. We present a third orthogonal approach to gene prediction, based on detecting the genomic signatures of transcription, accumulated over evolutionary time. We discuss four algorithms based on this third concept: Greens and CHOWDER, which quantify mutational strand biases caused by transcription-coupled DNA repair, and ROAST and PASTA, which are based on strand-specific selection against polyadenylation signals. We combined these algorithms into an integrated method called FEAST, which we used to predict the location and orientation of thousands of putative transcription units not overlapping known genes. Many of the newly predicted transcriptional units do not appear to code for proteins. The new algorithms are particularly apt at detecting genes with long introns and lacking sequence conservation. They therefore complement existing gene prediction methods and will help identify functional transcripts within many apparent “genomic deserts.” PMID:16543943

  4. Multiaxial Fatigue Life Prediction Based on Nonlinear Continuum Damage Mechanics and Critical Plane Method

    NASA Astrophysics Data System (ADS)

    Wu, Z. R.; Li, X.; Fang, L.; Song, Y. D.

    2018-04-01

    A new multiaxial fatigue life prediction model has been proposed in this paper. The concepts of nonlinear continuum damage mechanics and critical plane criteria were incorporated in the proposed model. The shear strain-based damage control parameter was chosen to account for multiaxial fatigue damage under constant amplitude loading. Fatigue tests were conducted on nickel-based superalloy GH4169 tubular specimens at the temperature of 400 °C under proportional and nonproportional loading. The proposed method was checked against the multiaxial fatigue test data of GH4169. Most of prediction results are within a factor of two scatter band of the test results.

  5. DrugE-Rank: improving drug-target interaction prediction of new candidate drugs or targets by ensemble learning to rank.

    PubMed

    Yuan, Qingjun; Gao, Junning; Wu, Dongliang; Zhang, Shihua; Mamitsuka, Hiroshi; Zhu, Shanfeng

    2016-06-15

    Identifying drug-target interactions is an important task in drug discovery. To reduce heavy time and financial cost in experimental way, many computational approaches have been proposed. Although these approaches have used many different principles, their performance is far from satisfactory, especially in predicting drug-target interactions of new candidate drugs or targets. Approaches based on machine learning for this problem can be divided into two types: feature-based and similarity-based methods. Learning to rank is the most powerful technique in the feature-based methods. Similarity-based methods are well accepted, due to their idea of connecting the chemical and genomic spaces, represented by drug and target similarities, respectively. We propose a new method, DrugE-Rank, to improve the prediction performance by nicely combining the advantages of the two different types of methods. That is, DrugE-Rank uses LTR, for which multiple well-known similarity-based methods can be used as components of ensemble learning. The performance of DrugE-Rank is thoroughly examined by three main experiments using data from DrugBank: (i) cross-validation on FDA (US Food and Drug Administration) approved drugs before March 2014; (ii) independent test on FDA approved drugs after March 2014; and (iii) independent test on FDA experimental drugs. Experimental results show that DrugE-Rank outperforms competing methods significantly, especially achieving more than 30% improvement in Area under Prediction Recall curve for FDA approved new drugs and FDA experimental drugs. http://datamining-iip.fudan.edu.cn/service/DrugE-Rank zhusf@fudan.edu.cn Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  6. Network-based ranking methods for prediction of novel disease associated microRNAs.

    PubMed

    Le, Duc-Hau

    2015-10-01

    Many studies have shown roles of microRNAs on human disease and a number of computational methods have been proposed to predict such associations by ranking candidate microRNAs according to their relevance to a disease. Among them, machine learning-based methods usually have a limitation in specifying non-disease microRNAs as negative training samples. Meanwhile, network-based methods are becoming dominant since they well exploit a "disease module" principle in microRNA functional similarity networks. Of which, random walk with restart (RWR) algorithm-based method is currently state-of-the-art. The use of this algorithm was inspired from its success in predicting disease gene because the "disease module" principle also exists in protein interaction networks. Besides, many algorithms designed for webpage ranking have been successfully applied in ranking disease candidate genes because web networks share topological properties with protein interaction networks. However, these algorithms have not yet been utilized for disease microRNA prediction. We constructed microRNA functional similarity networks based on shared targets of microRNAs, and then we integrated them with a microRNA functional synergistic network, which was recently identified. After analyzing topological properties of these networks, in addition to RWR, we assessed the performance of (i) PRINCE (PRIoritizatioN and Complex Elucidation), which was proposed for disease gene prediction; (ii) PageRank with Priors (PRP) and K-Step Markov (KSM), which were used for studying web networks; and (iii) a neighborhood-based algorithm. Analyses on topological properties showed that all microRNA functional similarity networks are small-worldness and scale-free. The performance of each algorithm was assessed based on average AUC values on 35 disease phenotypes and average rankings of newly discovered disease microRNAs. As a result, the performance on the integrated network was better than that on individual ones. In addition, the performance of PRINCE, PRP and KSM was comparable with that of RWR, whereas it was worst for the neighborhood-based algorithm. Moreover, all the algorithms were stable with the change of parameters. Final, using the integrated network, we predicted six novel miRNAs (i.e., hsa-miR-101, hsa-miR-181d, hsa-miR-192, hsa-miR-423-3p, hsa-miR-484 and hsa-miR-98) associated with breast cancer. Network-based ranking algorithms, which were successfully applied for either disease gene prediction or for studying social/web networks, can be also used effectively for disease microRNA prediction. Copyright © 2015 Elsevier Ltd. All rights reserved.

  7. Improved NASA-ANOPP Noise Prediction Computer Code for Advanced Subsonic Propulsion Systems

    NASA Technical Reports Server (NTRS)

    Kontos, K. B.; Janardan, B. A.; Gliebe, P. R.

    1996-01-01

    Recent experience using ANOPP to predict turbofan engine flyover noise suggests that it over-predicts overall EPNL by a significant amount. An improvement in this prediction method is desired for system optimization and assessment studies of advanced UHB engines. An assessment of the ANOPP fan inlet, fan exhaust, jet, combustor, and turbine noise prediction methods is made using static engine component noise data from the CF6-8OC2, E(3), and QCSEE turbofan engines. It is shown that the ANOPP prediction results are generally higher than the measured GE data, and that the inlet noise prediction method (Heidmann method) is the most significant source of this overprediction. Fan noise spectral comparisons show that improvements to the fan tone, broadband, and combination tone noise models are required to yield results that more closely simulate the GE data. Suggested changes that yield improved fan noise predictions but preserve the Heidmann model structure are identified and described. These changes are based on the sets of engine data mentioned, as well as some CFM56 engine data that was used to expand the combination tone noise database. It should be noted that the recommended changes are based on an analysis of engines that are limited to single stage fans with design tip relative Mach numbers greater than one.

  8. Inferring parturition and neonate survival from movement patterns of female ungulates: a case study using woodland caribou

    PubMed Central

    DeMars, Craig A; Auger-Méthé, Marie; Schlägel, Ulrike E; Boutin, Stan

    2013-01-01

    Analyses of animal movement data have primarily focused on understanding patterns of space use and the behavioural processes driving them. Here, we analyzed animal movement data to infer components of individual fitness, specifically parturition and neonate survival. We predicted that parturition and neonate loss events could be identified by sudden and marked changes in female movement patterns. Using GPS radio-telemetry data from female woodland caribou (Rangifer tarandus caribou), we developed and tested two novel movement-based methods for inferring parturition and neonate survival. The first method estimated movement thresholds indicative of parturition and neonate loss from population-level data then applied these thresholds in a moving-window analysis on individual time-series data. The second method used an individual-based approach that discriminated among three a priori models representing the movement patterns of non-parturient females, females with surviving offspring, and females losing offspring. The models assumed that step lengths (the distance between successive GPS locations) were exponentially distributed and that abrupt changes in the scale parameter of the exponential distribution were indicative of parturition and offspring loss. Both methods predicted parturition with near certainty (>97% accuracy) and produced appropriate predictions of parturition dates. Prediction of neonate survival was affected by data quality for both methods; however, when using high quality data (i.e., with few missing GPS locations), the individual-based method performed better, predicting neonate survival status with an accuracy rate of 87%. Understanding ungulate population dynamics often requires estimates of parturition and neonate survival rates. With GPS radio-collars increasingly being used in research and management of ungulates, our movement-based methods represent a viable approach for estimating rates of both parameters. PMID:24324866

  9. Inferring parturition and neonate survival from movement patterns of female ungulates: a case study using woodland caribou.

    PubMed

    Demars, Craig A; Auger-Méthé, Marie; Schlägel, Ulrike E; Boutin, Stan

    2013-10-01

    Analyses of animal movement data have primarily focused on understanding patterns of space use and the behavioural processes driving them. Here, we analyzed animal movement data to infer components of individual fitness, specifically parturition and neonate survival. We predicted that parturition and neonate loss events could be identified by sudden and marked changes in female movement patterns. Using GPS radio-telemetry data from female woodland caribou (Rangifer tarandus caribou), we developed and tested two novel movement-based methods for inferring parturition and neonate survival. The first method estimated movement thresholds indicative of parturition and neonate loss from population-level data then applied these thresholds in a moving-window analysis on individual time-series data. The second method used an individual-based approach that discriminated among three a priori models representing the movement patterns of non-parturient females, females with surviving offspring, and females losing offspring. The models assumed that step lengths (the distance between successive GPS locations) were exponentially distributed and that abrupt changes in the scale parameter of the exponential distribution were indicative of parturition and offspring loss. Both methods predicted parturition with near certainty (>97% accuracy) and produced appropriate predictions of parturition dates. Prediction of neonate survival was affected by data quality for both methods; however, when using high quality data (i.e., with few missing GPS locations), the individual-based method performed better, predicting neonate survival status with an accuracy rate of 87%. Understanding ungulate population dynamics often requires estimates of parturition and neonate survival rates. With GPS radio-collars increasingly being used in research and management of ungulates, our movement-based methods represent a viable approach for estimating rates of both parameters.

  10. Determination of parameters of a new method for predicting alloy properties

    NASA Technical Reports Server (NTRS)

    Bozzolo, Guillermo; Ferrante, John

    1992-01-01

    Recently, a semiempirical method for alloys based on equivalent crystal theory was introduced. The method successfully predicts the concentration dependence of the heat of formation and lattice parameter of binary alloys. A study of the parameters of the method is presented, along with new results for (gamma)Fe-Pd and (gamma)Fe-Ni alloys.

  11. FUN-LDA: A Latent Dirichlet Allocation Model for Predicting Tissue-Specific Functional Effects of Noncoding Variation: Methods and Applications.

    PubMed

    Backenroth, Daniel; He, Zihuai; Kiryluk, Krzysztof; Boeva, Valentina; Pethukova, Lynn; Khurana, Ekta; Christiano, Angela; Buxbaum, Joseph D; Ionita-Laza, Iuliana

    2018-05-03

    We describe a method based on a latent Dirichlet allocation model for predicting functional effects of noncoding genetic variants in a cell-type- and/or tissue-specific way (FUN-LDA). Using this unsupervised approach, we predict tissue-specific functional effects for every position in the human genome in 127 different tissues and cell types. We demonstrate the usefulness of our predictions by using several validation experiments. Using eQTL data from several sources, including the GTEx project, Geuvadis project, and TwinsUK cohort, we show that eQTLs in specific tissues tend to be most enriched among the predicted functional variants in relevant tissues in Roadmap. We further show how these integrated functional scores can be used for (1) deriving the most likely cell or tissue type causally implicated for a complex trait by using summary statistics from genome-wide association studies and (2) estimating a tissue-based correlation matrix of various complex traits. We found large enrichment of heritability in functional components of relevant tissues for various complex traits, and FUN-LDA yielded higher enrichment estimates than existing methods. Finally, using experimentally validated functional variants from the literature and variants possibly implicated in disease by previous studies, we rigorously compare FUN-LDA with state-of-the-art functional annotation methods and show that FUN-LDA has better prediction accuracy and higher resolution than these methods. In particular, our results suggest that tissue- and cell-type-specific functional prediction methods tend to have substantially better prediction accuracy than organism-level prediction methods. Scores for each position in the human genome and for each ENCODE and Roadmap tissue are available online (see Web Resources). Copyright © 2018 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  12. R2C: improving ab initio residue contact map prediction using dynamic fusion strategy and Gaussian noise filter.

    PubMed

    Yang, Jing; Jin, Qi-Yu; Zhang, Biao; Shen, Hong-Bin

    2016-08-15

    Inter-residue contacts in proteins dictate the topology of protein structures. They are crucial for protein folding and structural stability. Accurate prediction of residue contacts especially for long-range contacts is important to the quality of ab inito structure modeling since they can enforce strong restraints to structure assembly. In this paper, we present a new Residue-Residue Contact predictor called R2C that combines machine learning-based and correlated mutation analysis-based methods, together with a two-dimensional Gaussian noise filter to enhance the long-range residue contact prediction. Our results show that the outputs from the machine learning-based method are concentrated with better performance on short-range contacts; while for correlated mutation analysis-based approach, the predictions are widespread with higher accuracy on long-range contacts. An effective query-driven dynamic fusion strategy proposed here takes full advantages of the two different methods, resulting in an impressive overall accuracy improvement. We also show that the contact map directly from the prediction model contains the interesting Gaussian noise, which has not been discovered before. Different from recent studies that tried to further enhance the quality of contact map by removing its transitive noise, we designed a new two-dimensional Gaussian noise filter, which was especially helpful for reinforcing the long-range residue contact prediction. Tested on recent CASP10/11 datasets, the overall top L/5 accuracy of our final R2C predictor is 17.6%/15.5% higher than the pure machine learning-based method and 7.8%/8.3% higher than the correlated mutation analysis-based approach for the long-range residue contact prediction. http://www.csbio.sjtu.edu.cn/bioinf/R2C/Contact:hbshen@sjtu.edu.cn Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  13. Clustering gene expression data based on predicted differential effects of GV interaction.

    PubMed

    Pan, Hai-Yan; Zhu, Jun; Han, Dan-Fu

    2005-02-01

    Microarray has become a popular biotechnology in biological and medical research. However, systematic and stochastic variabilities in microarray data are expected and unavoidable, resulting in the problem that the raw measurements have inherent "noise" within microarray experiments. Currently, logarithmic ratios are usually analyzed by various clustering methods directly, which may introduce bias interpretation in identifying groups of genes or samples. In this paper, a statistical method based on mixed model approaches was proposed for microarray data cluster analysis. The underlying rationale of this method is to partition the observed total gene expression level into various variations caused by different factors using an ANOVA model, and to predict the differential effects of GV (gene by variety) interaction using the adjusted unbiased prediction (AUP) method. The predicted GV interaction effects can then be used as the inputs of cluster analysis. We illustrated the application of our method with a gene expression dataset and elucidated the utility of our approach using an external validation.

  14. Effects of inductive bias on computational evaluations of ligand-based modeling and on drug discovery

    NASA Astrophysics Data System (ADS)

    Cleves, Ann E.; Jain, Ajay N.

    2008-03-01

    Inductive bias is the set of assumptions that a person or procedure makes in making a prediction based on data. Different methods for ligand-based predictive modeling have different inductive biases, with a particularly sharp contrast between 2D and 3D similarity methods. A unique aspect of ligand design is that the data that exist to test methodology have been largely man-made, and that this process of design involves prediction. By analyzing the molecular similarities of known drugs, we show that the inductive bias of the historic drug discovery process has a very strong 2D bias. In studying the performance of ligand-based modeling methods, it is critical to account for this issue in dataset preparation, use of computational controls, and in the interpretation of results. We propose specific strategies to explicitly address the problems posed by inductive bias considerations.

  15. Unsteady Fast Random Particle Mesh method for efficient prediction of tonal and broadband noises of a centrifugal fan unit

    NASA Astrophysics Data System (ADS)

    Heo, Seung; Cheong, Cheolung; Kim, Taehoon

    2015-09-01

    In this study, efficient numerical method is proposed for predicting tonal and broadband noises of a centrifugal fan unit. The proposed method is based on Hybrid Computational Aero-Acoustic (H-CAA) techniques combined with Unsteady Fast Random Particle Mesh (U-FRPM) method. The U-FRPM method is developed by extending the FRPM method proposed by Ewert et al. and is utilized to synthesize turbulence flow field from unsteady RANS solutions. The H-CAA technique combined with U-FRPM method is applied to predict broadband as well as tonal noises of a centrifugal fan unit in a household refrigerator. Firstly, unsteady flow field driven by a rotating fan is computed by solving the RANS equations with Computational Fluid Dynamic (CFD) techniques. Main source regions around the rotating fan are identified by examining the computed flow fields. Then, turbulence flow fields in the main source regions are synthesized by applying the U-FRPM method. The acoustic analogy is applied to model acoustic sources in the main source regions. Finally, the centrifugal fan noise is predicted by feeding the modeled acoustic sources into an acoustic solver based on the Boundary Element Method (BEM). The sound spectral levels predicted using the current numerical method show good agreements with the measured spectra at the Blade Pass Frequencies (BPFs) as well as in the high frequency range. On the more, the present method enables quantitative assessment of relative contributions of identified source regions to the sound field by comparing predicted sound pressure spectrum due to modeled sources.

  16. Using methods from the data mining and machine learning literature for disease classification and prediction: A case study examining classification of heart failure sub-types

    PubMed Central

    Austin, Peter C.; Tu, Jack V.; Ho, Jennifer E.; Levy, Daniel; Lee, Douglas S.

    2014-01-01

    Objective Physicians classify patients into those with or without a specific disease. Furthermore, there is often interest in classifying patients according to disease etiology or subtype. Classification trees are frequently used to classify patients according to the presence or absence of a disease. However, classification trees can suffer from limited accuracy. In the data-mining and machine learning literature, alternate classification schemes have been developed. These include bootstrap aggregation (bagging), boosting, random forests, and support vector machines. Study design and Setting We compared the performance of these classification methods with those of conventional classification trees to classify patients with heart failure according to the following sub-types: heart failure with preserved ejection fraction (HFPEF) vs. heart failure with reduced ejection fraction (HFREF). We also compared the ability of these methods to predict the probability of the presence of HFPEF with that of conventional logistic regression. Results We found that modern, flexible tree-based methods from the data mining literature offer substantial improvement in prediction and classification of heart failure sub-type compared to conventional classification and regression trees. However, conventional logistic regression had superior performance for predicting the probability of the presence of HFPEF compared to the methods proposed in the data mining literature. Conclusion The use of tree-based methods offers superior performance over conventional classification and regression trees for predicting and classifying heart failure subtypes in a population-based sample of patients from Ontario. However, these methods do not offer substantial improvements over logistic regression for predicting the presence of HFPEF. PMID:23384592

  17. RF-Phos: A Novel General Phosphorylation Site Prediction Tool Based on Random Forest.

    PubMed

    Ismail, Hamid D; Jones, Ahoi; Kim, Jung H; Newman, Robert H; Kc, Dukka B

    2016-01-01

    Protein phosphorylation is one of the most widespread regulatory mechanisms in eukaryotes. Over the past decade, phosphorylation site prediction has emerged as an important problem in the field of bioinformatics. Here, we report a new method, termed Random Forest-based Phosphosite predictor 2.0 (RF-Phos 2.0), to predict phosphorylation sites given only the primary amino acid sequence of a protein as input. RF-Phos 2.0, which uses random forest with sequence and structural features, is able to identify putative sites of phosphorylation across many protein families. In side-by-side comparisons based on 10-fold cross validation and an independent dataset, RF-Phos 2.0 compares favorably to other popular mammalian phosphosite prediction methods, such as PhosphoSVM, GPS2.1, and Musite.

  18. Pharmacokinetics of low-dose nedaplatin and validation of AUC prediction in patients with non-small-cell lung carcinoma.

    PubMed

    Niioka, Takenori; Uno, Tsukasa; Yasui-Furukori, Norio; Takahata, Takenori; Shimizu, Mikiko; Sugawara, Kazunobu; Tateishi, Tomonori

    2007-04-01

    The aim of this study was to determine the pharmacokinetics of low-dose nedaplatin combined with paclitaxel and radiation therapy in patients having non-small-cell lung carcinoma and establish the optimal dosage regimen for low-dose nedaplatin. We also evaluated predictive accuracy of reported formulas to estimate the area under the plasma concentration-time curve (AUC) of low-dose nedaplatin. A total of 19 patients were administered a constant intravenous infusion of 20 mg/m(2) body surface area (BSA) nedaplatin for an hour, and blood samples were collected at 1, 2, 3, 4, 6, 8, and 19 h after the administration. Plasma concentrations of unbound platinum were measured, and the actual value of platinum AUC (actual AUC) was calculated based on these data. The predicted value of platinum AUC (predicted AUC) was determined by three predictive methods reported in previous studies, consisting of Bayesian method, limited sampling strategies with plasma concentration at a single time point, and simple formula method (SFM) without measured plasma concentration. Three error indices, mean prediction error (ME, measure of bias), mean absolute error (MAE, measure of accuracy), and root mean squared prediction error (RMSE, measure of precision), were obtained from the difference between the actual and the predicted AUC, to compare the accuracy between the three predictive methods. The AUC showed more than threefold inter-patient variation, and there was a favorable correlation between nedaplatin clearance and creatinine clearance (Ccr) (r = 0.832, P < 0.01). In three error indices, MAE and RMSE showed significant difference between the three AUC predictive methods, and the method of SFM had the most favorable results, in which %ME, %MAE, and %RMSE were 5.5, 10.7, and 15.4, respectively. The dosage regimen of low-dose nedaplatin should be established based on Ccr rather than on BSA. Since prediction accuracy of SFM, which did not require measured plasma concentration, was most favorable among the three methods evaluated in this study, SFM could be the most practical method to predict AUC of low-dose nedaplatin in a clinical situation judging from its high accuracy in predicting AUC without measured plasma concentration.

  19. Predicting carbon dioxide and energy fluxes across global FLUXNET sites with regression algorithms

    DOE PAGES

    Tramontana, Gianluca; Jung, Martin; Schwalm, Christopher R.; ...

    2016-07-29

    Spatio-temporal fields of land–atmosphere fluxes derived from data-driven models can complement simulations by process-based land surface models. While a number of strategies for empirical models with eddy-covariance flux data have been applied, a systematic intercomparison of these methods has been missing so far. In this study, we performed a cross-validation experiment for predicting carbon dioxide, latent heat, sensible heat and net radiation fluxes across different ecosystem types with 11 machine learning (ML) methods from four different classes (kernel methods, neural networks, tree methods, and regression splines). We applied two complementary setups: (1) 8-day average fluxes based on remotely sensed data andmore » (2) daily mean fluxes based on meteorological data and a mean seasonal cycle of remotely sensed variables. The patterns of predictions from different ML and experimental setups were highly consistent. There were systematic differences in performance among the fluxes, with the following ascending order: net ecosystem exchange ( R 2 < 0.5), ecosystem respiration ( R 2 > 0.6), gross primary production ( R 2> 0.7), latent heat ( R 2 > 0.7), sensible heat ( R 2 > 0.7), and net radiation ( R 2 > 0.8). The ML methods predicted the across-site variability and the mean seasonal cycle of the observed fluxes very well ( R 2 > 0.7), while the 8-day deviations from the mean seasonal cycle were not well predicted ( R 2 < 0.5). Fluxes were better predicted at forested and temperate climate sites than at sites in extreme climates or less represented by training data (e.g., the tropics). Finally, the evaluated large ensemble of ML-based models will be the basis of new global flux products.« less

  20. Predicting carbon dioxide and energy fluxes across global FLUXNET sites with regression algorithms

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tramontana, Gianluca; Jung, Martin; Schwalm, Christopher R.

    Spatio-temporal fields of land–atmosphere fluxes derived from data-driven models can complement simulations by process-based land surface models. While a number of strategies for empirical models with eddy-covariance flux data have been applied, a systematic intercomparison of these methods has been missing so far. In this study, we performed a cross-validation experiment for predicting carbon dioxide, latent heat, sensible heat and net radiation fluxes across different ecosystem types with 11 machine learning (ML) methods from four different classes (kernel methods, neural networks, tree methods, and regression splines). We applied two complementary setups: (1) 8-day average fluxes based on remotely sensed data andmore » (2) daily mean fluxes based on meteorological data and a mean seasonal cycle of remotely sensed variables. The patterns of predictions from different ML and experimental setups were highly consistent. There were systematic differences in performance among the fluxes, with the following ascending order: net ecosystem exchange ( R 2 < 0.5), ecosystem respiration ( R 2 > 0.6), gross primary production ( R 2> 0.7), latent heat ( R 2 > 0.7), sensible heat ( R 2 > 0.7), and net radiation ( R 2 > 0.8). The ML methods predicted the across-site variability and the mean seasonal cycle of the observed fluxes very well ( R 2 > 0.7), while the 8-day deviations from the mean seasonal cycle were not well predicted ( R 2 < 0.5). Fluxes were better predicted at forested and temperate climate sites than at sites in extreme climates or less represented by training data (e.g., the tropics). Finally, the evaluated large ensemble of ML-based models will be the basis of new global flux products.« less

  1. Window-Based Channel Impulse Response Prediction for Time-Varying Ultra-Wideband Channels.

    PubMed

    Al-Samman, A M; Azmi, M H; Rahman, T A; Khan, I; Hindia, M N; Fattouh, A

    2016-01-01

    This work proposes channel impulse response (CIR) prediction for time-varying ultra-wideband (UWB) channels by exploiting the fast movement of channel taps within delay bins. Considering the sparsity of UWB channels, we introduce a window-based CIR (WB-CIR) to approximate the high temporal resolutions of UWB channels. A recursive least square (RLS) algorithm is adopted to predict the time evolution of the WB-CIR. For predicting the future WB-CIR tap of window wk, three RLS filter coefficients are computed from the observed WB-CIRs of the left wk-1, the current wk and the right wk+1 windows. The filter coefficient with the lowest RLS error is used to predict the future WB-CIR tap. To evaluate our proposed prediction method, UWB CIRs are collected through measurement campaigns in outdoor environments considering line-of-sight (LOS) and non-line-of-sight (NLOS) scenarios. Under similar computational complexity, our proposed method provides an improvement in prediction errors of approximately 80% for LOS and 63% for NLOS scenarios compared with a conventional method.

  2. Window-Based Channel Impulse Response Prediction for Time-Varying Ultra-Wideband Channels

    PubMed Central

    Al-Samman, A. M.; Azmi, M. H.; Rahman, T. A.; Khan, I.; Hindia, M. N.; Fattouh, A.

    2016-01-01

    This work proposes channel impulse response (CIR) prediction for time-varying ultra-wideband (UWB) channels by exploiting the fast movement of channel taps within delay bins. Considering the sparsity of UWB channels, we introduce a window-based CIR (WB-CIR) to approximate the high temporal resolutions of UWB channels. A recursive least square (RLS) algorithm is adopted to predict the time evolution of the WB-CIR. For predicting the future WB-CIR tap of window wk, three RLS filter coefficients are computed from the observed WB-CIRs of the left wk−1, the current wk and the right wk+1 windows. The filter coefficient with the lowest RLS error is used to predict the future WB-CIR tap. To evaluate our proposed prediction method, UWB CIRs are collected through measurement campaigns in outdoor environments considering line-of-sight (LOS) and non-line-of-sight (NLOS) scenarios. Under similar computational complexity, our proposed method provides an improvement in prediction errors of approximately 80% for LOS and 63% for NLOS scenarios compared with a conventional method. PMID:27992445

  3. Stochastic or statistic? Comparing flow duration curve models in ungauged basins and changing climates

    NASA Astrophysics Data System (ADS)

    Müller, M. F.; Thompson, S. E.

    2015-09-01

    The prediction of flow duration curves (FDCs) in ungauged basins remains an important task for hydrologists given the practical relevance of FDCs for water management and infrastructure design. Predicting FDCs in ungauged basins typically requires spatial interpolation of statistical or model parameters. This task is complicated if climate becomes non-stationary, as the prediction challenge now also requires extrapolation through time. In this context, process-based models for FDCs that mechanistically link the streamflow distribution to climate and landscape factors may have an advantage over purely statistical methods to predict FDCs. This study compares a stochastic (process-based) and statistical method for FDC prediction in both stationary and non-stationary contexts, using Nepal as a case study. Under contemporary conditions, both models perform well in predicting FDCs, with Nash-Sutcliffe coefficients above 0.80 in 75 % of the tested catchments. The main drives of uncertainty differ between the models: parameter interpolation was the main source of error for the statistical model, while violations of the assumptions of the process-based model represented the main source of its error. The process-based approach performed better than the statistical approach in numerical simulations with non-stationary climate drivers. The predictions of the statistical method under non-stationary rainfall conditions were poor if (i) local runoff coefficients were not accurately determined from the gauge network, or (ii) streamflow variability was strongly affected by changes in rainfall. A Monte Carlo analysis shows that the streamflow regimes in catchments characterized by a strong wet-season runoff and a rapid, strongly non-linear hydrologic response are particularly sensitive to changes in rainfall statistics. In these cases, process-based prediction approaches are strongly favored over statistical models.

  4. The use of genomic information increases the accuracy of breeding value predictions for sea louse (Caligus rogercresseyi) resistance in Atlantic salmon (Salmo salar).

    PubMed

    Correa, Katharina; Bangera, Rama; Figueroa, René; Lhorente, Jean P; Yáñez, José M

    2017-01-31

    Sea lice infestations caused by Caligus rogercresseyi are a main concern to the salmon farming industry due to associated economic losses. Resistance to this parasite was shown to have low to moderate genetic variation and its genetic architecture was suggested to be polygenic. The aim of this study was to compare accuracies of breeding value predictions obtained with pedigree-based best linear unbiased prediction (P-BLUP) methodology against different genomic prediction approaches: genomic BLUP (G-BLUP), Bayesian Lasso, and Bayes C. To achieve this, 2404 individuals from 118 families were measured for C. rogercresseyi count after a challenge and genotyped using 37 K single nucleotide polymorphisms. Accuracies were assessed using fivefold cross-validation and SNP densities of 0.5, 1, 5, 10, 25 and 37 K. Accuracy of genomic predictions increased with increasing SNP density and was higher than pedigree-based BLUP predictions by up to 22%. Both Bayesian and G-BLUP methods can predict breeding values with higher accuracies than pedigree-based BLUP, however, G-BLUP may be the preferred method because of reduced computation time and ease of implementation. A relatively low marker density (i.e. 10 K) is sufficient for maximal increase in accuracy when using G-BLUP or Bayesian methods for genomic prediction of C. rogercresseyi resistance in Atlantic salmon.

  5. Correaltion of full-scale drag predictions with flight measurements on the C-141A aircraft. Phase 2: Wind tunnel test, analysis, and prediction techniques. Volume 1: Drag predictions, wind tunnel data analysis and correlation

    NASA Technical Reports Server (NTRS)

    Macwilkinson, D. G.; Blackerby, W. T.; Paterson, J. H.

    1974-01-01

    The degree of cruise drag correlation on the C-141A aircraft is determined between predictions based on wind tunnel test data, and flight test results. An analysis of wind tunnel tests on a 0.0275 scale model at Reynolds number up to 3.05 x 1 million/MAC is reported. Model support interference corrections are evaluated through a series of tests, and fully corrected model data are analyzed to provide details on model component interference factors. It is shown that predicted minimum profile drag for the complete configuration agrees within 0.75% of flight test data, using a wind tunnel extrapolation method based on flat plate skin friction and component shape factors. An alternative method of extrapolation, based on computed profile drag from a subsonic viscous theory, results in a prediction four percent lower than flight test data.

  6. Improved protein-protein interactions prediction via weighted sparse representation model combining continuous wavelet descriptor and PseAA composition.

    PubMed

    Huang, Yu-An; You, Zhu-Hong; Chen, Xing; Yan, Gui-Ying

    2016-12-23

    Protein-protein interactions (PPIs) are essential to most biological processes. Since bioscience has entered into the era of genome and proteome, there is a growing demand for the knowledge about PPI network. High-throughput biological technologies can be used to identify new PPIs, but they are expensive, time-consuming, and tedious. Therefore, computational methods for predicting PPIs have an important role. For the past years, an increasing number of computational methods such as protein structure-based approaches have been proposed for predicting PPIs. The major limitation in principle of these methods lies in the prior information of the protein to infer PPIs. Therefore, it is of much significance to develop computational methods which only use the information of protein amino acids sequence. Here, we report a highly efficient approach for predicting PPIs. The main improvements come from the use of a novel protein sequence representation by combining continuous wavelet descriptor and Chou's pseudo amino acid composition (PseAAC), and from adopting weighted sparse representation based classifier (WSRC). This method, cross-validated on the PPIs datasets of Saccharomyces cerevisiae, Human and H. pylori, achieves an excellent results with accuracies as high as 92.50%, 95.54% and 84.28% respectively, significantly better than previously proposed methods. Extensive experiments are performed to compare the proposed method with state-of-the-art Support Vector Machine (SVM) classifier. The outstanding results yield by our model that the proposed feature extraction method combing two kinds of descriptors have strong expression ability and are expected to provide comprehensive and effective information for machine learning-based classification models. In addition, the prediction performance in the comparison experiments shows the well cooperation between the combined feature and WSRC. Thus, the proposed method is a very efficient method to predict PPIs and may be a useful supplementary tool for future proteomics studies.

  7. Space vehicle engine and heat shield environment review. Volume 1: Engineering analysis

    NASA Technical Reports Server (NTRS)

    Mcanelly, W. B.; Young, C. T. K.

    1973-01-01

    Methods for predicting the base heating characteristics of a multiple rocket engine installation are discussed. The environmental data is applied to the design of adequate protection system for the engine components. The methods for predicting the base region thermal environment are categorized as: (1) scale model testing, (2) extrapolation of previous and related flight test results, and (3) semiempirical analytical techniques.

  8. LOCSVMPSI: a web server for subcellular localization of eukaryotic proteins using SVM and profile of PSI-BLAST

    PubMed Central

    Xie, Dan; Li, Ao; Wang, Minghui; Fan, Zhewen; Feng, Huanqing

    2005-01-01

    Subcellular location of a protein is one of the key functional characters as proteins must be localized correctly at the subcellular level to have normal biological function. In this paper, a novel method named LOCSVMPSI has been introduced, which is based on the support vector machine (SVM) and the position-specific scoring matrix generated from profiles of PSI-BLAST. With a jackknife test on the RH2427 data set, LOCSVMPSI achieved a high overall prediction accuracy of 90.2%, which is higher than the prediction results by SubLoc and ESLpred on this data set. In addition, prediction performance of LOCSVMPSI was evaluated with 5-fold cross validation test on the PK7579 data set and the prediction results were consistently better than the previous method based on several SVMs using composition of both amino acids and amino acid pairs. Further test on the SWISSPROT new-unique data set showed that LOCSVMPSI also performed better than some widely used prediction methods, such as PSORTII, TargetP and LOCnet. All these results indicate that LOCSVMPSI is a powerful tool for the prediction of eukaryotic protein subcellular localization. An online web server (current version is 1.3) based on this method has been developed and is freely available to both academic and commercial users, which can be accessed by at . PMID:15980436

  9. Using physics-based pose predictions and free energy perturbation calculations to predict binding poses and relative binding affinities for FXR ligands in the D3R Grand Challenge 2

    NASA Astrophysics Data System (ADS)

    Athanasiou, Christina; Vasilakaki, Sofia; Dellis, Dimitris; Cournia, Zoe

    2018-01-01

    Computer-aided drug design has become an integral part of drug discovery and development in the pharmaceutical and biotechnology industry, and is nowadays extensively used in the lead identification and lead optimization phases. The drug design data resource (D3R) organizes challenges against blinded experimental data to prospectively test computational methodologies as an opportunity for improved methods and algorithms to emerge. We participated in Grand Challenge 2 to predict the crystallographic poses of 36 Farnesoid X Receptor (FXR)-bound ligands and the relative binding affinities for two designated subsets of 18 and 15 FXR-bound ligands. Here, we present our methodology for pose and affinity predictions and its evaluation after the release of the experimental data. For predicting the crystallographic poses, we used docking and physics-based pose prediction methods guided by the binding poses of native ligands. For FXR ligands with known chemotypes in the PDB, we accurately predicted their binding modes, while for those with unknown chemotypes the predictions were more challenging. Our group ranked #1st (based on the median RMSD) out of 46 groups, which submitted complete entries for the binding pose prediction challenge. For the relative binding affinity prediction challenge, we performed free energy perturbation (FEP) calculations coupled with molecular dynamics (MD) simulations. FEP/MD calculations displayed a high success rate in identifying compounds with better or worse binding affinity than the reference (parent) compound. Our studies suggest that when ligands with chemical precedent are available in the literature, binding pose predictions using docking and physics-based methods are reliable; however, predictions are challenging for ligands with completely unknown chemotypes. We also show that FEP/MD calculations hold predictive value and can nowadays be used in a high throughput mode in a lead optimization project provided that crystal structures of sufficiently high quality are available.

  10. Recommendation Techniques for Drug-Target Interaction Prediction and Drug Repositioning.

    PubMed

    Alaimo, Salvatore; Giugno, Rosalba; Pulvirenti, Alfredo

    2016-01-01

    The usage of computational methods in drug discovery is a common practice. More recently, by exploiting the wealth of biological knowledge bases, a novel approach called drug repositioning has raised. Several computational methods are available, and these try to make a high-level integration of all the knowledge in order to discover unknown mechanisms. In this chapter, we review drug-target interaction prediction methods based on a recommendation system. We also give some extensions which go beyond the bipartite network case.

  11. Predictive and Experimental Approaches for Elucidating Protein–Protein Interactions and Quaternary Structures

    PubMed Central

    Nealon, John Oliver; Philomina, Limcy Seby

    2017-01-01

    The elucidation of protein–protein interactions is vital for determining the function and action of quaternary protein structures. Here, we discuss the difficulty and importance of establishing protein quaternary structure and review in vitro and in silico methods for doing so. Determining the interacting partner proteins of predicted protein structures is very time-consuming when using in vitro methods, this can be somewhat alleviated by use of predictive methods. However, developing reliably accurate predictive tools has proved to be difficult. We review the current state of the art in predictive protein interaction software and discuss the problem of scoring and therefore ranking predictions. Current community-based predictive exercises are discussed in relation to the growth of protein interaction prediction as an area within these exercises. We suggest a fusion of experimental and predictive methods that make use of sparse experimental data to determine higher resolution predicted protein interactions as being necessary to drive forward development. PMID:29206185

  12. A study of microstructural characteristics and differential thermal analysis of Ni-based superalloys

    NASA Technical Reports Server (NTRS)

    Aggarwal, M. D.; Lal, R. B.; Oyekenu, Samuel A.; Parr, Richard; Gentz, Stephen

    1989-01-01

    The objective of this work is to correlate the mechanical properties of the Ni-based superalloy MAR M246(Hf) used in the Space Shuttle Main Engine with its structural characteristics by systematic study of optical photomicrographs and differential thermal analysis. The authors developed a method of predicting the liquidus and solidus temperature of various nickel based superalloys (MAR-M247, Waspaloy, Udimet-41, polycrystalline and single crystals of CMSX-2 and CMSX-3) and comparing the predictions with the experimental differential thermal analysis (DTA) curves using Perkin-Elmer DTA 1700. The method of predicting these temperatures is based on the additive effect of the components dissolved in nickel. The results were compared with the experimental values.

  13. Stata Modules for Calculating Novel Predictive Performance Indices for Logistic Models

    PubMed Central

    Barkhordari, Mahnaz; Padyab, Mojgan; Hadaegh, Farzad; Azizi, Fereidoun; Bozorgmanesh, Mohammadreza

    2016-01-01

    Background Prediction is a fundamental part of prevention of cardiovascular diseases (CVD). The development of prediction algorithms based on the multivariate regression models loomed several decades ago. Parallel with predictive models development, biomarker researches emerged in an impressively great scale. The key question is how best to assess and quantify the improvement in risk prediction offered by new biomarkers or more basically how to assess the performance of a risk prediction model. Discrimination, calibration, and added predictive value have been recently suggested to be used while comparing the predictive performances of the predictive models’ with and without novel biomarkers. Objectives Lack of user-friendly statistical software has restricted implementation of novel model assessment methods while examining novel biomarkers. We intended, thus, to develop a user-friendly software that could be used by researchers with few programming skills. Materials and Methods We have written a Stata command that is intended to help researchers obtain cut point-free and cut point-based net reclassification improvement index and (NRI) and relative and absolute Integrated discriminatory improvement index (IDI) for logistic-based regression analyses.We applied the commands to a real data on women participating the Tehran lipid and glucose study (TLGS) to examine if information of a family history of premature CVD, waist circumference, and fasting plasma glucose can improve predictive performance of the Framingham’s “general CVD risk” algorithm. Results The command is addpred for logistic regression models. Conclusions The Stata package provided herein can encourage the use of novel methods in examining predictive capacity of ever-emerging plethora of novel biomarkers. PMID:27279830

  14. Predicting locations of rare aquatic species’ habitat with a combination of species-specific and assemblage-based models

    USGS Publications Warehouse

    McKenna, James E.; Carlson, Douglas M.; Payne-Wynne, Molly L.

    2013-01-01

    Aim: Rare aquatic species are a substantial component of biodiversity, and their conservation is a major objective of many management plans. However, they are difficult to assess, and their optimal habitats are often poorly known. Methods to effectively predict the likely locations of suitable rare aquatic species habitats are needed. We combine two modelling approaches to predict occurrence and general abundance of several rare fish species. Location: Allegheny watershed of western New York State (USA) Methods: Our method used two empirical neural network modelling approaches (species specific and assemblage based) to predict stream-by-stream occurrence and general abundance of rare darters, based on broad-scale habitat conditions. Species-specific models were developed for longhead darter (Percina macrocephala), spotted darter (Etheostoma maculatum) and variegate darter (Etheostoma variatum) in the Allegheny drainage. An additional model predicted the type of rare darter-containing assemblage expected in each stream reach. Predictions from both models were then combined inclusively and exclusively and compared with additional independent data. Results Example rare darter predictions demonstrate the method's effectiveness. Models performed well (R2 ≥ 0.79), identified where suitable darter habitat was most likely to occur, and predictions matched well to those of collection sites. Additional independent data showed that the most conservative (exclusive) model slightly underestimated the distributions of these rare darters or predictions were displaced by one stream reach, suggesting that new darter habitat types were detected in the later collections. Main conclusions Broad-scale habitat variables can be used to effectively identify rare species' habitats. Combining species-specific and assemblage-based models enhances our ability to make use of the sparse data on rare species and to identify habitat units most likely and least likely to support those species. This hybrid approach may assist managers with the prioritization of habitats to be examined or conserved for rare species.

  15. Analysis of Physicochemical and Structural Properties Determining HIV-1 Coreceptor Usage

    PubMed Central

    Bozek, Katarzyna; Lengauer, Thomas; Sierra, Saleta; Kaiser, Rolf; Domingues, Francisco S.

    2013-01-01

    The relationship of HIV tropism with disease progression and the recent development of CCR5-blocking drugs underscore the importance of monitoring virus coreceptor usage. As an alternative to costly phenotypic assays, computational methods aim at predicting virus tropism based on the sequence and structure of the V3 loop of the virus gp120 protein. Here we present a numerical descriptor of the V3 loop encoding its physicochemical and structural properties. The descriptor allows for structure-based prediction of HIV tropism and identification of properties of the V3 loop that are crucial for coreceptor usage. Use of the proposed descriptor for prediction results in a statistically significant improvement over the prediction based solely on V3 sequence with 3 percentage points improvement in AUC and 7 percentage points in sensitivity at the specificity of the 11/25 rule (95%). We additionally assessed the predictive power of the new method on clinically derived ‘bulk’ sequence data and obtained a statistically significant improvement in AUC of 3 percentage points over sequence-based prediction. Furthermore, we demonstrated the capacity of our method to predict therapy outcome by applying it to 53 samples from patients undergoing Maraviroc therapy. The analysis of structural features of the loop informative of tropism indicates the importance of two loop regions and their physicochemical properties. The regions are located on opposite strands of the loop stem and the respective features are predominantly charge-, hydrophobicity- and structure-related. These regions are in close proximity in the bound conformation of the loop potentially forming a site determinant for the coreceptor binding. The method is available via server under http://structure.bioinf.mpi-inf.mpg.de/. PMID:23555214

  16. A variable capacitance based modeling and power capability predicting method for ultracapacitor

    NASA Astrophysics Data System (ADS)

    Liu, Chang; Wang, Yujie; Chen, Zonghai; Ling, Qiang

    2018-01-01

    Methods of accurate modeling and power capability predicting for ultracapacitors are of great significance in management and application of lithium-ion battery/ultracapacitor hybrid energy storage system. To overcome the simulation error coming from constant capacitance model, an improved ultracapacitor model based on variable capacitance is proposed, where the main capacitance varies with voltage according to a piecewise linear function. A novel state-of-charge calculation approach is developed accordingly. After that, a multi-constraint power capability prediction is developed for ultracapacitor, in which a Kalman-filter-based state observer is designed for tracking ultracapacitor's real-time behavior. Finally, experimental results verify the proposed methods. The accuracy of the proposed model is verified by terminal voltage simulating results under different temperatures, and the effectiveness of the designed observer is proved by various test conditions. Additionally, the power capability prediction results of different time scales and temperatures are compared, to study their effects on ultracapacitor's power capability.

  17. Sea surface temperature predictions using a multi-ocean analysis ensemble scheme

    NASA Astrophysics Data System (ADS)

    Zhang, Ying; Zhu, Jieshun; Li, Zhongxian; Chen, Haishan; Zeng, Gang

    2017-08-01

    This study examined the global sea surface temperature (SST) predictions by a so-called multiple-ocean analysis ensemble (MAE) initialization method which was applied in the National Centers for Environmental Prediction (NCEP) Climate Forecast System Version 2 (CFSv2). Different from most operational climate prediction practices which are initialized by a specific ocean analysis system, the MAE method is based on multiple ocean analyses. In the paper, the MAE method was first justified by analyzing the ocean temperature variability in four ocean analyses which all are/were applied for operational climate predictions either at the European Centre for Medium-range Weather Forecasts or at NCEP. It was found that these systems exhibit substantial uncertainties in estimating the ocean states, especially at the deep layers. Further, a set of MAE hindcasts was conducted based on the four ocean analyses with CFSv2, starting from each April during 1982-2007. The MAE hindcasts were verified against a subset of hindcasts from the NCEP CFS Reanalysis and Reforecast (CFSRR) Project. Comparisons suggested that MAE shows better SST predictions than CFSRR over most regions where ocean dynamics plays a vital role in SST evolutions, such as the El Niño and Atlantic Niño regions. Furthermore, significant improvements were also found in summer precipitation predictions over the equatorial eastern Pacific and Atlantic oceans, for which the local SST prediction improvements should be responsible. The prediction improvements by MAE imply a problem for most current climate predictions which are based on a specific ocean analysis system. That is, their predictions would drift towards states biased by errors inherent in their ocean initialization system, and thus have large prediction errors. In contrast, MAE arguably has an advantage by sampling such structural uncertainties, and could efficiently cancel these errors out in their predictions.

  18. A temperature match based optimization method for daily load prediction considering DLC effect

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yu, Z.

    This paper presents a unique optimization method for short term load forecasting. The new method is based on the optimal template temperature match between the future and past temperatures. The optimal error reduction technique is a new concept introduced in this paper. Two case studies show that for hourly load forecasting, this method can yield results as good as the rather complicated Box-Jenkins Transfer Function method, and better than the Box-Jenkins method; for peak load prediction, this method is comparable in accuracy to the neural network method with back propagation, and can produce more accurate results than the multi-linear regressionmore » method. The DLC effect on system load is also considered in this method.« less

  19. Strainrange partitioning life predictions of the long time metal properties council creep-fatigue tests

    NASA Technical Reports Server (NTRS)

    Saltsman, J. F.; Halford, G. R.

    1979-01-01

    The method of strainrange partitioning is used to predict the cyclic lives of the Metal Properties Council's long time creep-fatigue interspersion tests of several steel alloys. Comparisons are made with predictions based upon the time- and cycle-fraction approach. The method of strainrange partitioning is shown to give consistently more accurate predictions of cyclic life than is given by the time- and cycle-fraction approach.

  20. Propeller noise prediction

    NASA Technical Reports Server (NTRS)

    Zorumski, W. E.

    1983-01-01

    Analytic propeller noise prediction involves a sequence of computations culminating in the application of acoustic equations. The prediction sequence currently used by NASA in its ANOPP (aircraft noise prediction) program is described. The elements of the sequence are called program modules. The first group of modules analyzes the propeller geometry, the aerodynamics, including both potential and boundary layer flow, the propeller performance, and the surface loading distribution. This group of modules is based entirely on aerodynamic strip theory. The next group of modules deals with the actual noise prediction, based on data from the first group. Deterministic predictions of periodic thickness and loading noise are made using Farassat's time-domain methods. Broadband noise is predicted by the semi-empirical Schlinker-Amiet method. Near-field predictions of fuselage surface pressures include the effects of boundary layer refraction and (for a cylinder) scattering. Far-field predictions include atmospheric and ground effects. Experimental data from subsonic and transonic propellers are compared and NASA's future direction is propeller noise technology development are indicated.

  1. SVM and SVM Ensembles in Breast Cancer Prediction.

    PubMed

    Huang, Min-Wei; Chen, Chih-Wen; Lin, Wei-Chao; Ke, Shih-Wen; Tsai, Chih-Fong

    2017-01-01

    Breast cancer is an all too common disease in women, making how to effectively predict it an active research problem. A number of statistical and machine learning techniques have been employed to develop various breast cancer prediction models. Among them, support vector machines (SVM) have been shown to outperform many related techniques. To construct the SVM classifier, it is first necessary to decide the kernel function, and different kernel functions can result in different prediction performance. However, there have been very few studies focused on examining the prediction performances of SVM based on different kernel functions. Moreover, it is unknown whether SVM classifier ensembles which have been proposed to improve the performance of single classifiers can outperform single SVM classifiers in terms of breast cancer prediction. Therefore, the aim of this paper is to fully assess the prediction performance of SVM and SVM ensembles over small and large scale breast cancer datasets. The classification accuracy, ROC, F-measure, and computational times of training SVM and SVM ensembles are compared. The experimental results show that linear kernel based SVM ensembles based on the bagging method and RBF kernel based SVM ensembles with the boosting method can be the better choices for a small scale dataset, where feature selection should be performed in the data pre-processing stage. For a large scale dataset, RBF kernel based SVM ensembles based on boosting perform better than the other classifiers.

  2. SVM and SVM Ensembles in Breast Cancer Prediction

    PubMed Central

    Huang, Min-Wei; Chen, Chih-Wen; Lin, Wei-Chao; Ke, Shih-Wen; Tsai, Chih-Fong

    2017-01-01

    Breast cancer is an all too common disease in women, making how to effectively predict it an active research problem. A number of statistical and machine learning techniques have been employed to develop various breast cancer prediction models. Among them, support vector machines (SVM) have been shown to outperform many related techniques. To construct the SVM classifier, it is first necessary to decide the kernel function, and different kernel functions can result in different prediction performance. However, there have been very few studies focused on examining the prediction performances of SVM based on different kernel functions. Moreover, it is unknown whether SVM classifier ensembles which have been proposed to improve the performance of single classifiers can outperform single SVM classifiers in terms of breast cancer prediction. Therefore, the aim of this paper is to fully assess the prediction performance of SVM and SVM ensembles over small and large scale breast cancer datasets. The classification accuracy, ROC, F-measure, and computational times of training SVM and SVM ensembles are compared. The experimental results show that linear kernel based SVM ensembles based on the bagging method and RBF kernel based SVM ensembles with the boosting method can be the better choices for a small scale dataset, where feature selection should be performed in the data pre-processing stage. For a large scale dataset, RBF kernel based SVM ensembles based on boosting perform better than the other classifiers. PMID:28060807

  3. Computational Prediction of Metabolism: Sites, Products, SAR, P450 Enzyme Dynamics, and Mechanisms

    PubMed Central

    2012-01-01

    Metabolism of xenobiotics remains a central challenge for the discovery and development of drugs, cosmetics, nutritional supplements, and agrochemicals. Metabolic transformations are frequently related to the incidence of toxic effects that may result from the emergence of reactive species, the systemic accumulation of metabolites, or by induction of metabolic pathways. Experimental investigation of the metabolism of small organic molecules is particularly resource demanding; hence, computational methods are of considerable interest to complement experimental approaches. This review provides a broad overview of structure- and ligand-based computational methods for the prediction of xenobiotic metabolism. Current computational approaches to address xenobiotic metabolism are discussed from three major perspectives: (i) prediction of sites of metabolism (SOMs), (ii) elucidation of potential metabolites and their chemical structures, and (iii) prediction of direct and indirect effects of xenobiotics on metabolizing enzymes, where the focus is on the cytochrome P450 (CYP) superfamily of enzymes, the cardinal xenobiotics metabolizing enzymes. For each of these domains, a variety of approaches and their applications are systematically reviewed, including expert systems, data mining approaches, quantitative structure–activity relationships (QSARs), and machine learning-based methods, pharmacophore-based algorithms, shape-focused techniques, molecular interaction fields (MIFs), reactivity-focused techniques, protein–ligand docking, molecular dynamics (MD) simulations, and combinations of methods. Predictive metabolism is a developing area, and there is still enormous potential for improvement. However, it is clear that the combination of rapidly increasing amounts of available ligand- and structure-related experimental data (in particular, quantitative data) with novel and diverse simulation and modeling approaches is accelerating the development of effective tools for prediction of in vivo metabolism, which is reflected by the diverse and comprehensive data sources and methods for metabolism prediction reviewed here. This review attempts to survey the range and scope of computational methods applied to metabolism prediction and also to compare and contrast their applicability and performance. PMID:22339582

  4. Particle Pollution Estimation Based on Image Analysis

    PubMed Central

    Liu, Chenbin; Tsow, Francis; Zou, Yi; Tao, Nongjian

    2016-01-01

    Exposure to fine particles can cause various diseases, and an easily accessible method to monitor the particles can help raise public awareness and reduce harmful exposures. Here we report a method to estimate PM air pollution based on analysis of a large number of outdoor images available for Beijing, Shanghai (China) and Phoenix (US). Six image features were extracted from the images, which were used, together with other relevant data, such as the position of the sun, date, time, geographic information and weather conditions, to predict PM2.5 index. The results demonstrate that the image analysis method provides good prediction of PM2.5 indexes, and different features have different significance levels in the prediction. PMID:26828757

  5. Particle Pollution Estimation Based on Image Analysis.

    PubMed

    Liu, Chenbin; Tsow, Francis; Zou, Yi; Tao, Nongjian

    2016-01-01

    Exposure to fine particles can cause various diseases, and an easily accessible method to monitor the particles can help raise public awareness and reduce harmful exposures. Here we report a method to estimate PM air pollution based on analysis of a large number of outdoor images available for Beijing, Shanghai (China) and Phoenix (US). Six image features were extracted from the images, which were used, together with other relevant data, such as the position of the sun, date, time, geographic information and weather conditions, to predict PM2.5 index. The results demonstrate that the image analysis method provides good prediction of PM2.5 indexes, and different features have different significance levels in the prediction.

  6. Method to predict relative hydriding within a group of zirconium alloys under nuclear irradiation

    DOEpatents

    Johnson, A.B. Jr.; Levy, I.S.; Trimble, D.J.; Lanning, D.D.; Gerber, F.S.

    1990-04-10

    An out-of-reactor method for screening to predict relative in-reactor hydriding behavior of zirconium-based materials is disclosed. Samples of zirconium-based materials having different compositions and/or fabrication methods are autoclaved in a relatively concentrated (0.3 to 1.0M) aqueous lithium hydroxide solution at constant temperatures within the water reactor coolant temperature range (280 to 316 C). Samples tested by this out-of-reactor procedure, when compared on the basis of the ratio of hydrogen weight gain to oxide weight gain, accurately predict the relative rate of hydriding for the same materials when subject to in-reactor (irradiated) corrosion. 1 figure.

  7. Receptor-based 3D QSAR analysis of estrogen receptor ligands - merging the accuracy of receptor-based alignments with the computational efficiency of ligand-based methods

    NASA Astrophysics Data System (ADS)

    Sippl, Wolfgang

    2000-08-01

    One of the major challenges in computational approaches to drug design is the accurate prediction of binding affinity of biomolecules. In the present study several prediction methods for a published set of estrogen receptor ligands are investigated and compared. The binding modes of 30 ligands were determined using the docking program AutoDock and were compared with available X-ray structures of estrogen receptor-ligand complexes. On the basis of the docking results an interaction energy-based model, which uses the information of the whole ligand-receptor complex, was generated. Several parameters were modified in order to analyze their influence onto the correlation between binding affinities and calculated ligand-receptor interaction energies. The highest correlation coefficient ( r 2 = 0.617, q 2 LOO = 0.570) was obtained considering protein flexibility during the interaction energy evaluation. The second prediction method uses a combination of receptor-based and 3D quantitative structure-activity relationships (3D QSAR) methods. The ligand alignment obtained from the docking simulations was taken as basis for a comparative field analysis applying the GRID/GOLPE program. Using the interaction field derived with a water probe and applying the smart region definition (SRD) variable selection, a significant and robust model was obtained ( r 2 = 0.991, q 2 LOO = 0.921). The predictive ability of the established model was further evaluated by using a test set of six additional compounds. The comparison with the generated interaction energy-based model and with a traditional CoMFA model obtained using a ligand-based alignment ( r 2 = 0.951, q 2 LOO = 0.796) indicates that the combination of receptor-based and 3D QSAR methods is able to improve the quality of the underlying model.

  8. United3D: a protein model quality assessment program that uses two consensus based methods.

    PubMed

    Terashi, Genki; Oosawa, Makoto; Nakamura, Yuuki; Kanou, Kazuhiko; Takeda-Shitaka, Mayuko

    2012-01-01

    In protein structure prediction, such as template-based modeling and free modeling (ab initio modeling), the step that assesses the quality of protein models is very important. We have developed a model quality assessment (QA) program United3D that uses an optimized clustering method and a simple Cα atom contact-based potential. United3D automatically estimates the quality scores (Qscore) of predicted protein models that are highly correlated with the actual quality (GDT_TS). The performance of United3D was tested in the ninth Critical Assessment of protein Structure Prediction (CASP9) experiment. In CASP9, United3D showed the lowest average loss of GDT_TS (5.3) among the QA methods participated in CASP9. This result indicates that the performance of United3D to identify the high quality models from the models predicted by CASP9 servers on 116 targets was best among the QA methods that were tested in CASP9. United3D also produced high average Pearson correlation coefficients (0.93) and acceptable Kendall rank correlation coefficients (0.68) between the Qscore and GDT_TS. This performance was competitive with the other top ranked QA methods that were tested in CASP9. These results indicate that United3D is a useful tool for selecting high quality models from many candidate model structures provided by various modeling methods. United3D will improve the accuracy of protein structure prediction.

  9. NAPR: a Cloud-Based Framework for Neuroanatomical Age Prediction.

    PubMed

    Pardoe, Heath R; Kuzniecky, Ruben

    2018-01-01

    The availability of cloud computing services has enabled the widespread adoption of the "software as a service" (SaaS) approach for software distribution, which utilizes network-based access to applications running on centralized servers. In this paper we apply the SaaS approach to neuroimaging-based age prediction. Our system, named "NAPR" (Neuroanatomical Age Prediction using R), provides access to predictive modeling software running on a persistent cloud-based Amazon Web Services (AWS) compute instance. The NAPR framework allows external users to estimate the age of individual subjects using cortical thickness maps derived from their own locally processed T1-weighted whole brain MRI scans. As a demonstration of the NAPR approach, we have developed two age prediction models that were trained using healthy control data from the ABIDE, CoRR, DLBS and NKI Rockland neuroimaging datasets (total N = 2367, age range 6-89 years). The provided age prediction models were trained using (i) relevance vector machines and (ii) Gaussian processes machine learning methods applied to cortical thickness surfaces obtained using Freesurfer v5.3. We believe that this transparent approach to out-of-sample evaluation and comparison of neuroimaging age prediction models will facilitate the development of improved age prediction models and allow for robust evaluation of the clinical utility of these methods.

  10. Linear regression models for solvent accessibility prediction in proteins.

    PubMed

    Wagner, Michael; Adamczak, Rafał; Porollo, Aleksey; Meller, Jarosław

    2005-04-01

    The relative solvent accessibility (RSA) of an amino acid residue in a protein structure is a real number that represents the solvent exposed surface area of this residue in relative terms. The problem of predicting the RSA from the primary amino acid sequence can therefore be cast as a regression problem. Nevertheless, RSA prediction has so far typically been cast as a classification problem. Consequently, various machine learning techniques have been used within the classification framework to predict whether a given amino acid exceeds some (arbitrary) RSA threshold and would thus be predicted to be "exposed," as opposed to "buried." We have recently developed novel methods for RSA prediction using nonlinear regression techniques which provide accurate estimates of the real-valued RSA and outperform classification-based approaches with respect to commonly used two-class projections. However, while their performance seems to provide a significant improvement over previously published approaches, these Neural Network (NN) based methods are computationally expensive to train and involve several thousand parameters. In this work, we develop alternative regression models for RSA prediction which are computationally much less expensive, involve orders-of-magnitude fewer parameters, and are still competitive in terms of prediction quality. In particular, we investigate several regression models for RSA prediction using linear L1-support vector regression (SVR) approaches as well as standard linear least squares (LS) regression. Using rigorously derived validation sets of protein structures and extensive cross-validation analysis, we compare the performance of the SVR with that of LS regression and NN-based methods. In particular, we show that the flexibility of the SVR (as encoded by metaparameters such as the error insensitivity and the error penalization terms) can be very beneficial to optimize the prediction accuracy for buried residues. We conclude that the simple and computationally much more efficient linear SVR performs comparably to nonlinear models and thus can be used in order to facilitate further attempts to design more accurate RSA prediction methods, with applications to fold recognition and de novo protein structure prediction methods.

  11. Generalized Predictive and Neural Generalized Predictive Control of Aerospace Systems

    NASA Technical Reports Server (NTRS)

    Kelkar, Atul G.

    2000-01-01

    The research work presented in this thesis addresses the problem of robust control of uncertain linear and nonlinear systems using Neural network-based Generalized Predictive Control (NGPC) methodology. A brief overview of predictive control and its comparison with Linear Quadratic (LQ) control is given to emphasize advantages and drawbacks of predictive control methods. It is shown that the Generalized Predictive Control (GPC) methodology overcomes the drawbacks associated with traditional LQ control as well as conventional predictive control methods. It is shown that in spite of the model-based nature of GPC it has good robustness properties being special case of receding horizon control. The conditions for choosing tuning parameters for GPC to ensure closed-loop stability are derived. A neural network-based GPC architecture is proposed for the control of linear and nonlinear uncertain systems. A methodology to account for parametric uncertainty in the system is proposed using on-line training capability of multi-layer neural network. Several simulation examples and results from real-time experiments are given to demonstrate the effectiveness of the proposed methodology.

  12. Fuzzy Regression Prediction and Application Based on Multi-Dimensional Factors of Freight Volume

    NASA Astrophysics Data System (ADS)

    Xiao, Mengting; Li, Cheng

    2018-01-01

    Based on the reality of the development of air cargo, the multi-dimensional fuzzy regression method is used to determine the influencing factors, and the three most important influencing factors of GDP, total fixed assets investment and regular flight route mileage are determined. The system’s viewpoints and analogy methods, the use of fuzzy numbers and multiple regression methods to predict the civil aviation cargo volume. In comparison with the 13th Five-Year Plan for China’s Civil Aviation Development (2016-2020), it is proved that this method can effectively improve the accuracy of forecasting and reduce the risk of forecasting. It is proved that this model predicts civil aviation freight volume of the feasibility, has a high practical significance and practical operation.

  13. Multi-model blending

    DOEpatents

    Hamann, Hendrik F.; Hwang, Youngdeok; van Kessel, Theodore G.; Khabibrakhmanov, Ildar K.; Muralidhar, Ramachandran

    2016-10-18

    A method and a system to perform multi-model blending are described. The method includes obtaining one or more sets of predictions of historical conditions, the historical conditions corresponding with a time T that is historical in reference to current time, and the one or more sets of predictions of the historical conditions being output by one or more models. The method also includes obtaining actual historical conditions, the actual historical conditions being measured conditions at the time T, assembling a training data set including designating the two or more set of predictions of historical conditions as predictor variables and the actual historical conditions as response variables, and training a machine learning algorithm based on the training data set. The method further includes obtaining a blended model based on the machine learning algorithm.

  14. User’s Guide for T.E.S.T. (version 4.2) (Toxicity Estimation Software Tool) A Program to Estimate Toxicity from Molecular Structure

    EPA Science Inventory

    The user's guide describes the methods used by TEST to predict toxicity and physical properties (including the new mode of action based method used to predict acute aquatic toxicity). It describes all of the experimental data sets included in the tool. It gives the prediction res...

  15. Machine learning classification with confidence: application of transductive conformal predictors to MRI-based diagnostic and prognostic markers in depression.

    PubMed

    Nouretdinov, Ilia; Costafreda, Sergi G; Gammerman, Alexander; Chervonenkis, Alexey; Vovk, Vladimir; Vapnik, Vladimir; Fu, Cynthia H Y

    2011-05-15

    There is rapidly accumulating evidence that the application of machine learning classification to neuroimaging measurements may be valuable for the development of diagnostic and prognostic prediction tools in psychiatry. However, current methods do not produce a measure of the reliability of the predictions. Knowing the risk of the error associated with a given prediction is essential for the development of neuroimaging-based clinical tools. We propose a general probabilistic classification method to produce measures of confidence for magnetic resonance imaging (MRI) data. We describe the application of transductive conformal predictor (TCP) to MRI images. TCP generates the most likely prediction and a valid measure of confidence, as well as the set of all possible predictions for a given confidence level. We present the theoretical motivation for TCP, and we have applied TCP to structural and functional MRI data in patients and healthy controls to investigate diagnostic and prognostic prediction in depression. We verify that TCP predictions are as accurate as those obtained with more standard machine learning methods, such as support vector machine, while providing the additional benefit of a valid measure of confidence for each prediction. Copyright © 2010 Elsevier Inc. All rights reserved.

  16. A Sensor Dynamic Measurement Error Prediction Model Based on NAPSO-SVM.

    PubMed

    Jiang, Minlan; Jiang, Lan; Jiang, Dingde; Li, Fei; Song, Houbing

    2018-01-15

    Dynamic measurement error correction is an effective way to improve sensor precision. Dynamic measurement error prediction is an important part of error correction, and support vector machine (SVM) is often used for predicting the dynamic measurement errors of sensors. Traditionally, the SVM parameters were always set manually, which cannot ensure the model's performance. In this paper, a SVM method based on an improved particle swarm optimization (NAPSO) is proposed to predict the dynamic measurement errors of sensors. Natural selection and simulated annealing are added in the PSO to raise the ability to avoid local optima. To verify the performance of NAPSO-SVM, three types of algorithms are selected to optimize the SVM's parameters: the particle swarm optimization algorithm (PSO), the improved PSO optimization algorithm (NAPSO), and the glowworm swarm optimization (GSO). The dynamic measurement error data of two sensors are applied as the test data. The root mean squared error and mean absolute percentage error are employed to evaluate the prediction models' performances. The experimental results show that among the three tested algorithms the NAPSO-SVM method has a better prediction precision and a less prediction errors, and it is an effective method for predicting the dynamic measurement errors of sensors.

  17. A novel hybrid decomposition-and-ensemble model based on CEEMD and GWO for short-term PM2.5 concentration forecasting

    NASA Astrophysics Data System (ADS)

    Niu, Mingfei; Wang, Yufang; Sun, Shaolong; Li, Yongwu

    2016-06-01

    To enhance prediction reliability and accuracy, a hybrid model based on the promising principle of "decomposition and ensemble" and a recently proposed meta-heuristic called grey wolf optimizer (GWO) is introduced for daily PM2.5 concentration forecasting. Compared with existing PM2.5 forecasting methods, this proposed model has improved the prediction accuracy and hit rates of directional prediction. The proposed model involves three main steps, i.e., decomposing the original PM2.5 series into several intrinsic mode functions (IMFs) via complementary ensemble empirical mode decomposition (CEEMD) for simplifying the complex data; individually predicting each IMF with support vector regression (SVR) optimized by GWO; integrating all predicted IMFs for the ensemble result as the final prediction by another SVR optimized by GWO. Seven benchmark models, including single artificial intelligence (AI) models, other decomposition-ensemble models with different decomposition methods and models with the same decomposition-ensemble method but optimized by different algorithms, are considered to verify the superiority of the proposed hybrid model. The empirical study indicates that the proposed hybrid decomposition-ensemble model is remarkably superior to all considered benchmark models for its higher prediction accuracy and hit rates of directional prediction.

  18. A novel method for structure-based prediction of ion channel conductance properties.

    PubMed Central

    Smart, O S; Breed, J; Smith, G R; Sansom, M S

    1997-01-01

    A rapid and easy-to-use method of predicting the conductance of an ion channel from its three-dimensional structure is presented. The method combines the pore dimensions of the channel as measured in the HOLE program with an Ohmic model of conductance. An empirically based correction factor is then applied. The method yielded good results for six experimental channel structures (none of which were included in the training set) with predictions accurate to within an average factor of 1.62 to the true values. The predictive r2 was equal to 0.90, which is indicative of a good predictive ability. The procedure is used to validate model structures of alamethicin and phospholamban. Two genuine predictions for the conductance of channels with known structure but without reported conductances are given. A modification of the procedure that calculates the expected results for the effect of the addition of nonelectrolyte polymers on conductance is set out. Results for a cholera toxin B-subunit crystal structure agree well with the measured values. The difficulty in interpreting such studies is discussed, with the conclusion that measurements on channels of known structure are required. Images FIGURE 1 FIGURE 3 FIGURE 4 FIGURE 6 FIGURE 10 PMID:9138559

  19. [Prediction of the side-cut product yield of atmospheric/vacuum distillation unit by NIR crude oil rapid assay].

    PubMed

    Wang, Yan-Bin; Hu, Yu-Zhong; Li, Wen-Le; Zhang, Wei-Song; Zhou, Feng; Luo, Zhi

    2014-10-01

    In the present paper, based on the fast evaluation technique of near infrared, a method to predict the yield of atmos- pheric and vacuum line was developed, combined with H/CAMS software. Firstly, the near-infrared (NIR) spectroscopy method for rapidly determining the true boiling point of crude oil was developed. With commercially available crude oil spectroscopy da- tabase and experiments test from Guangxi Petrochemical Company, calibration model was established and a topological method was used as the calibration. The model can be employed to predict the true boiling point of crude oil. Secondly, the true boiling point based on NIR rapid assay was converted to the side-cut product yield of atmospheric/vacuum distillation unit by H/CAMS software. The predicted yield and the actual yield of distillation product for naphtha, diesel, wax and residual oil were compared in a 7-month period. The result showed that the NIR rapid crude assay can predict the side-cut product yield accurately. The near infrared analytic method for predicting yield has the advantages of fast analysis, reliable results, and being easy to online operate, and it can provide elementary data for refinery planning optimization and crude oil blending.

  20. Solar Atmosphere to Earth's Surface: Long Lead Time dB/dt Predictions with the Space Weather Modeling Framework

    NASA Astrophysics Data System (ADS)

    Welling, D. T.; Manchester, W.; Savani, N.; Sokolov, I.; van der Holst, B.; Jin, M.; Toth, G.; Liemohn, M. W.; Gombosi, T. I.

    2017-12-01

    The future of space weather prediction depends on the community's ability to predict L1 values from observations of the solar atmosphere, which can yield hours of lead time. While both empirical and physics-based L1 forecast methods exist, it is not yet known if this nascent capability can translate to skilled dB/dt forecasts at the Earth's surface. This paper shows results for the first forecast-quality, solar-atmosphere-to-Earth's-surface dB/dt predictions. Two methods are used to predict solar wind and IMF conditions at L1 for several real-world coronal mass ejection events. The first method is an empirical and observationally based system to estimate the plasma characteristics. The magnetic field predictions are based on the Bz4Cast system which assumes that the CME has a cylindrical flux rope geometry locally around Earth's trajectory. The remaining plasma parameters of density, temperature and velocity are estimated from white-light coronagraphs via a variety of triangulation methods and forward based modelling. The second is a first-principles-based approach that combines the Eruptive Event Generator using Gibson-Low configuration (EEGGL) model with the Alfven Wave Solar Model (AWSoM). EEGGL specifies parameters for the Gibson-Low flux rope such that it erupts, driving a CME in the coronal model that reproduces coronagraph observations and propagates to 1AU. The resulting solar wind predictions are used to drive the operational Space Weather Modeling Framework (SWMF) for geospace. Following the configuration used by NOAA's Space Weather Prediction Center, this setup couples the BATS-R-US global magnetohydromagnetic model to the Rice Convection Model (RCM) ring current model and a height-integrated ionosphere electrodynamics model. The long lead time predictions of dB/dt are compared to model results that are driven by L1 solar wind observations. Both are compared to real-world observations from surface magnetometers at a variety of geomagnetic latitudes. Metrics are calculated to examine how the simulated solar wind drivers impact forecast skill. These results illustrate the current state of long-lead-time forecasting and the promise of this technology for operational use.

  1. Analysis of Strand-Specific RNA-Seq Data Using Machine Learning Reveals the Structures of Transcription Units in Clostridium thermocellum

    DOE PAGES

    Chou, Wen-Chi; Ma, Qin; Yang, Shihui; ...

    2015-03-12

    The identification of transcription units (TUs) encoded in a bacterial genome is essential to elucidation of transcriptional regulation of the organism. To gain a detailed understanding of the dynamically composed TU structures, we have used four strand-specific RNA-seq (ssRNA-seq) datasets collected under two experimental conditions to derive the genomic TU organization of Clostridium thermocellum using a machine-learning approach. Our method accurately predicted the genomic boundaries of individual TUs based on two sets of parameters measuring the RNA-seq expression patterns across the genome: expression-level continuity and variance. A total of 2590 distinct TUs are predicted based on the four RNA-seq datasets.more » Moreover, among the predicted TUs, 44% have multiple genes. We assessed our prediction method on an independent set of RNA-seq data with longer reads. The evaluation confirmed the high quality of the predicted TUs. Functional enrichment analyses on a selected subset of the predicted TUs revealed interesting biology. To demonstrate the generality of the prediction method, we have also applied the method to RNA-seq data collected on Escherichia coli and achieved high prediction accuracies. The TU prediction program named SeqTU is publicly available athttps://code.google.com/p/seqtu/. We expect that the predicted TUs can serve as the baseline information for studying transcriptional and post-transcriptional regulation in C. thermocellum and other bacteria.« less

  2. Improved Prediction of Blood-Brain Barrier Permeability Through Machine Learning with Combined Use of Molecular Property-Based Descriptors and Fingerprints.

    PubMed

    Yuan, Yaxia; Zheng, Fang; Zhan, Chang-Guo

    2018-03-21

    Blood-brain barrier (BBB) permeability of a compound determines whether the compound can effectively enter the brain. It is an essential property which must be accounted for in drug discovery with a target in the brain. Several computational methods have been used to predict the BBB permeability. In particular, support vector machine (SVM), which is a kernel-based machine learning method, has been used popularly in this field. For SVM training and prediction, the compounds are characterized by molecular descriptors. Some SVM models were based on the use of molecular property-based descriptors (including 1D, 2D, and 3D descriptors) or fragment-based descriptors (known as the fingerprints of a molecule). The selection of descriptors is critical for the performance of a SVM model. In this study, we aimed to develop a generally applicable new SVM model by combining all of the features of the molecular property-based descriptors and fingerprints to improve the accuracy for the BBB permeability prediction. The results indicate that our SVM model has improved accuracy compared to the currently available models of the BBB permeability prediction.

  3. A New Method for Predicting Patient Survivorship Using Efficient Bayesian Network Learning

    PubMed Central

    Jiang, Xia; Xue, Diyang; Brufsky, Adam; Khan, Seema; Neapolitan, Richard

    2014-01-01

    The purpose of this investigation is to develop and evaluate a new Bayesian network (BN)-based patient survivorship prediction method. The central hypothesis is that the method predicts patient survivorship well, while having the capability to handle high-dimensional data and be incorporated into a clinical decision support system (CDSS). We have developed EBMC_Survivorship (EBMC_S), which predicts survivorship for each year individually. EBMC_S is based on the EBMC BN algorithm, which has been shown to handle high-dimensional data. BNs have excellent architecture for decision support systems. In this study, we evaluate EBMC_S using the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) dataset, which concerns breast tumors. A 5-fold cross-validation study indicates that EMBC_S performs better than the Cox proportional hazard model and is comparable to the random survival forest method. We show that EBMC_S provides additional information such as sensitivity analyses, which covariates predict each year, and yearly areas under the ROC curve (AUROCs). We conclude that our investigation supports the central hypothesis. PMID:24558297

  4. A new method for predicting patient survivorship using efficient bayesian network learning.

    PubMed

    Jiang, Xia; Xue, Diyang; Brufsky, Adam; Khan, Seema; Neapolitan, Richard

    2014-01-01

    The purpose of this investigation is to develop and evaluate a new Bayesian network (BN)-based patient survivorship prediction method. The central hypothesis is that the method predicts patient survivorship well, while having the capability to handle high-dimensional data and be incorporated into a clinical decision support system (CDSS). We have developed EBMC_Survivorship (EBMC_S), which predicts survivorship for each year individually. EBMC_S is based on the EBMC BN algorithm, which has been shown to handle high-dimensional data. BNs have excellent architecture for decision support systems. In this study, we evaluate EBMC_S using the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) dataset, which concerns breast tumors. A 5-fold cross-validation study indicates that EMBC_S performs better than the Cox proportional hazard model and is comparable to the random survival forest method. We show that EBMC_S provides additional information such as sensitivity analyses, which covariates predict each year, and yearly areas under the ROC curve (AUROCs). We conclude that our investigation supports the central hypothesis.

  5. Systematic Interpolation Method Predicts Antibody Monomer-Dimer Separation by Gradient Elution Chromatography at High Protein Loads.

    PubMed

    Creasy, Arch; Reck, Jason; Pabst, Timothy; Hunter, Alan; Barker, Gregory; Carta, Giorgio

    2018-05-29

    A previously developed empirical interpolation (EI) method is extended to predict highly overloaded multicomponent elution behavior on a cation exchange (CEX) column based on batch isotherm data. Instead of a fully mechanistic model, the EI method employs an empirically modified multicomponent Langmuir equation to correlate two-component adsorption isotherm data at different salt concentrations. Piecewise cubic interpolating polynomials are then used to predict competitive binding at intermediate salt concentrations. The approach is tested for the separation of monoclonal antibody monomer and dimer mixtures by gradient elution on the cation exchange resin Nuvia HR-S. Adsorption isotherms are obtained over a range of salt concentrations with varying monomer and dimer concentrations. Coupled with a lumped kinetic model, the interpolated isotherms predict the column behavior for highly overloaded conditions. Predictions based on the EI method showed good agreement with experimental elution curves for protein loads up to 40 mg/mL column or about 50% of the column binding capacity. The approach can be extended to other chromatographic modalities and to more than two components. This article is protected by copyright. All rights reserved.

  6. Modeling and predicting tumor response in radioligand therapy.

    PubMed

    Kletting, Peter; Thieme, Anne; Eberhardt, Nina; Rinscheid, Andreas; D'Alessandria, Calogero; Allmann, Jakob; Wester, Hans-Jürgen; Tauber, Robert; Beer, Ambros J; Glatting, Gerhard; Eiber, Matthias

    2018-05-10

    The aim of this work was to develop a theranostic method that allows predicting PSMA-positive tumor volume after radioligand therapy (RLT) based on a pre-therapeutic PET/CT measurement and physiologically based pharmacokinetic/dynamic (PBPK/PD) modeling at the example of RLT using 177 Lu-labeled PSMA for imaging and therapy (PSMA I&T). Methods: A recently developed PBPK model for 177 Lu PSMA I&T RLT was extended to account for tumor (exponential) growth and reduction due to irradiation (linear quadratic model). Data of 13 patients with metastatic castration-resistant prostate cancer (mCRPC) were retrospectively analyzed. Pharmacokinetic/dynamic parameters were simultaneously fitted in a Bayesian framework to PET/CT activity concentrations, planar scintigraphy data and tumor volumes prior and post (6 weeks) therapy. The method was validated using the leave-one-out Jackknife method. The tumor volume post therapy was predicted based on pre-therapy PET/CT imaging and PBPK/PD modeling. Results: The relative deviation of the predicted and measured tumor volume for PSMA-positive tumor cells (6 weeks post therapy) was 1±40% excluding one patient (PSA negative) from the population. The radiosensitivity for the PSA positive patients was determined to be 0.0172±0.0084 Gy-1. Conclusion: The proposed method is the first attempt to solely use PET/CT and modeling methods to predict the PSMA-positive tumor volume after radioligand therapy. Internal validation shows that this is feasible with an acceptable accuracy. Improvement of the method and external validation of the model is ongoing. Copyright © 2018 by the Society of Nuclear Medicine and Molecular Imaging, Inc.

  7. Modularity of Protein Folds as a Tool for Template-Free Modeling of Structures.

    PubMed

    Vallat, Brinda; Madrid-Aliste, Carlos; Fiser, Andras

    2015-08-01

    Predicting the three-dimensional structure of proteins from their amino acid sequences remains a challenging problem in molecular biology. While the current structural coverage of proteins is almost exclusively provided by template-based techniques, the modeling of the rest of the protein sequences increasingly require template-free methods. However, template-free modeling methods are much less reliable and are usually applicable for smaller proteins, leaving much space for improvement. We present here a novel computational method that uses a library of supersecondary structure fragments, known as Smotifs, to model protein structures. The library of Smotifs has saturated over time, providing a theoretical foundation for efficient modeling. The method relies on weak sequence signals from remotely related protein structures to create a library of Smotif fragments specific to the target protein sequence. This Smotif library is exploited in a fragment assembly protocol to sample decoys, which are assessed by a composite scoring function. Since the Smotif fragments are larger in size compared to the ones used in other fragment-based methods, the proposed modeling algorithm, SmotifTF, can employ an exhaustive sampling during decoy assembly. SmotifTF successfully predicts the overall fold of the target proteins in about 50% of the test cases and performs competitively when compared to other state of the art prediction methods, especially when sequence signal to remote homologs is diminishing. Smotif-based modeling is complementary to current prediction methods and provides a promising direction in addressing the structure prediction problem, especially when targeting larger proteins for modeling.

  8. Ensemble framework based real-time respiratory motion prediction for adaptive radiotherapy applications.

    PubMed

    Tatinati, Sivanagaraja; Nazarpour, Kianoush; Tech Ang, Wei; Veluvolu, Kalyana C

    2016-08-01

    Successful treatment of tumors with motion-adaptive radiotherapy requires accurate prediction of respiratory motion, ideally with a prediction horizon larger than the latency in radiotherapy system. Accurate prediction of respiratory motion is however a non-trivial task due to the presence of irregularities and intra-trace variabilities, such as baseline drift and temporal changes in fundamental frequency pattern. In this paper, to enhance the accuracy of the respiratory motion prediction, we propose a stacked regression ensemble framework that integrates heterogeneous respiratory motion prediction algorithms. We further address two crucial issues for developing a successful ensemble framework: (1) selection of appropriate prediction methods to ensemble (level-0 methods) among the best existing prediction methods; and (2) finding a suitable generalization approach that can successfully exploit the relative advantages of the chosen level-0 methods. The efficacy of the developed ensemble framework is assessed with real respiratory motion traces acquired from 31 patients undergoing treatment. Results show that the developed ensemble framework improves the prediction performance significantly compared to the best existing methods. Copyright © 2016 IPEM. Published by Elsevier Ltd. All rights reserved.

  9. Prognostics of lithium-ion batteries based on Dempster-Shafer theory and the Bayesian Monte Carlo method

    NASA Astrophysics Data System (ADS)

    He, Wei; Williard, Nicholas; Osterman, Michael; Pecht, Michael

    A new method for state of health (SOH) and remaining useful life (RUL) estimations for lithium-ion batteries using Dempster-Shafer theory (DST) and the Bayesian Monte Carlo (BMC) method is proposed. In this work, an empirical model based on the physical degradation behavior of lithium-ion batteries is developed. Model parameters are initialized by combining sets of training data based on DST. BMC is then used to update the model parameters and predict the RUL based on available data through battery capacity monitoring. As more data become available, the accuracy of the model in predicting RUL improves. Two case studies demonstrating this approach are presented.

  10. Virtual World Currency Value Fluctuation Prediction System Based on User Sentiment Analysis.

    PubMed

    Kim, Young Bin; Lee, Sang Hyeok; Kang, Shin Jin; Choi, Myung Jin; Lee, Jung; Kim, Chang Hun

    2015-01-01

    In this paper, we present a method for predicting the value of virtual currencies used in virtual gaming environments that support multiple users, such as massively multiplayer online role-playing games (MMORPGs). Predicting virtual currency values in a virtual gaming environment has rarely been explored; it is difficult to apply real-world methods for predicting fluctuating currency values or shares to the virtual gaming world on account of differences in domains between the two worlds. To address this issue, we herein predict virtual currency value fluctuations by collecting user opinion data from a virtual community and analyzing user sentiments or emotions from the opinion data. The proposed method is straightforward and applicable to predicting virtual currencies as well as to gaming environments, including MMORPGs. We test the proposed method using large-scale MMORPGs and demonstrate that virtual currencies can be effectively and efficiently predicted with it.

  11. Virtual World Currency Value Fluctuation Prediction System Based on User Sentiment Analysis

    PubMed Central

    Kim, Young Bin; Lee, Sang Hyeok; Kang, Shin Jin; Choi, Myung Jin; Lee, Jung; Kim, Chang Hun

    2015-01-01

    In this paper, we present a method for predicting the value of virtual currencies used in virtual gaming environments that support multiple users, such as massively multiplayer online role-playing games (MMORPGs). Predicting virtual currency values in a virtual gaming environment has rarely been explored; it is difficult to apply real-world methods for predicting fluctuating currency values or shares to the virtual gaming world on account of differences in domains between the two worlds. To address this issue, we herein predict virtual currency value fluctuations by collecting user opinion data from a virtual community and analyzing user sentiments or emotions from the opinion data. The proposed method is straightforward and applicable to predicting virtual currencies as well as to gaming environments, including MMORPGs. We test the proposed method using large-scale MMORPGs and demonstrate that virtual currencies can be effectively and efficiently predicted with it. PMID:26241496

  12. Multiaxial Fatigue Life Prediction Based on Short Crack Propagation Model with Equivalent Strain Parameter

    NASA Astrophysics Data System (ADS)

    Zhao, Xiang-Feng; Shang, De-Guang; Sun, Yu-Juan; Song, Ming-Liang; Wang, Xiao-Wei

    2018-01-01

    The maximum shear strain and the normal strain excursion on the critical plane are regarded as the primary parameters of the crack driving force to establish a new short crack model in this paper. An equivalent strain-based intensity factor is proposed to correlate the short crack growth rate under multiaxial loading. According to the short crack model, a new method is proposed for multiaxial fatigue life prediction based on crack growth analysis. It is demonstrated that the method can be used under proportional and non-proportional loadings. The predicted results showed a good agreement with experimental lives in both high-cycle and low-cycle regions.

  13. A method for modelling GP practice level deprivation scores using GIS

    PubMed Central

    Strong, Mark; Maheswaran, Ravi; Pearson, Tim; Fryers, Paul

    2007-01-01

    Background A measure of general practice level socioeconomic deprivation can be used to explore the association between deprivation and other practice characteristics. An area-based categorisation is commonly chosen as the basis for such a deprivation measure. Ideally a practice population-weighted area-based deprivation score would be calculated using individual level spatially referenced data. However, these data are often unavailable. One approach is to link the practice postcode to an area-based deprivation score, but this method has limitations. This study aimed to develop a Geographical Information Systems (GIS) based model that could better predict a practice population-weighted deprivation score in the absence of patient level data than simple practice postcode linkage. Results We calculated predicted practice level Index of Multiple Deprivation (IMD) 2004 deprivation scores using two methods that did not require patient level data. Firstly we linked the practice postcode to an IMD 2004 score, and secondly we used a GIS model derived using data from Rotherham, UK. We compared our two sets of predicted scores to "gold standard" practice population-weighted scores for practices in Doncaster, Havering and Warrington. Overall, the practice postcode linkage method overestimated "gold standard" IMD scores by 2.54 points (95% CI 0.94, 4.14), whereas our modelling method showed no such bias (mean difference 0.36, 95% CI -0.30, 1.02). The postcode-linked method systematically underestimated the gold standard score in less deprived areas, and overestimated it in more deprived areas. Our modelling method showed a small underestimation in scores at higher levels of deprivation in Havering, but showed no bias in Doncaster or Warrington. The postcode-linked method showed more variability when predicting scores than did the GIS modelling method. Conclusion A GIS based model can be used to predict a practice population-weighted area-based deprivation measure in the absence of patient level data. Our modelled measure generally had better agreement with the population-weighted measure than did a postcode-linked measure. Our model may also avoid an underestimation of IMD scores in less deprived areas, and overestimation of scores in more deprived areas, seen when using postcode linked scores. The proposed method may be of use to researchers who do not have access to patient level spatially referenced data. PMID:17822545

  14. Prediction of high-dimensional states subject to respiratory motion: a manifold learning approach

    NASA Astrophysics Data System (ADS)

    Liu, Wenyang; Sawant, Amit; Ruan, Dan

    2016-07-01

    The development of high-dimensional imaging systems in image-guided radiotherapy provides important pathways to the ultimate goal of real-time full volumetric motion monitoring. Effective motion management during radiation treatment usually requires prediction to account for system latency and extra signal/image processing time. It is challenging to predict high-dimensional respiratory motion due to the complexity of the motion pattern combined with the curse of dimensionality. Linear dimension reduction methods such as PCA have been used to construct a linear subspace from the high-dimensional data, followed by efficient predictions on the lower-dimensional subspace. In this study, we extend such rationale to a more general manifold and propose a framework for high-dimensional motion prediction with manifold learning, which allows one to learn more descriptive features compared to linear methods with comparable dimensions. Specifically, a kernel PCA is used to construct a proper low-dimensional feature manifold, where accurate and efficient prediction can be performed. A fixed-point iterative pre-image estimation method is used to recover the predicted value in the original state space. We evaluated and compared the proposed method with a PCA-based approach on level-set surfaces reconstructed from point clouds captured by a 3D photogrammetry system. The prediction accuracy was evaluated in terms of root-mean-squared-error. Our proposed method achieved consistent higher prediction accuracy (sub-millimeter) for both 200 ms and 600 ms lookahead lengths compared to the PCA-based approach, and the performance gain was statistically significant.

  15. A machine-learning approach for predicting palmitoylation sites from integrated sequence-based features.

    PubMed

    Li, Liqi; Luo, Qifa; Xiao, Weidong; Li, Jinhui; Zhou, Shiwen; Li, Yongsheng; Zheng, Xiaoqi; Yang, Hua

    2017-02-01

    Palmitoylation is the covalent attachment of lipids to amino acid residues in proteins. As an important form of protein posttranslational modification, it increases the hydrophobicity of proteins, which contributes to the protein transportation, organelle localization, and functions, therefore plays an important role in a variety of cell biological processes. Identification of palmitoylation sites is necessary for understanding protein-protein interaction, protein stability, and activity. Since conventional experimental techniques to determine palmitoylation sites in proteins are both labor intensive and costly, a fast and accurate computational approach to predict palmitoylation sites from protein sequences is in urgent need. In this study, a support vector machine (SVM)-based method was proposed through integrating PSI-BLAST profile, physicochemical properties, [Formula: see text]-mer amino acid compositions (AACs), and [Formula: see text]-mer pseudo AACs into the principal feature vector. A recursive feature selection scheme was subsequently implemented to single out the most discriminative features. Finally, an SVM method was implemented to predict palmitoylation sites in proteins based on the optimal features. The proposed method achieved an accuracy of 99.41% and Matthews Correlation Coefficient of 0.9773 for a benchmark dataset. The result indicates the efficiency and accuracy of our method in prediction of palmitoylation sites based on protein sequences.

  16. Comparison of in silico models for prediction of mutagenicity.

    PubMed

    Bakhtyari, Nazanin G; Raitano, Giuseppa; Benfenati, Emilio; Martin, Todd; Young, Douglas

    2013-01-01

    Using a dataset with more than 6000 compounds, the performance of eight quantitative structure activity relationships (QSAR) models was evaluated: ACD/Tox Suite, Absorption, Distribution, Metabolism, Elimination, and Toxicity of chemical substances (ADMET) predictor, Derek, Toxicity Estimation Software Tool (T.E.S.T.), TOxicity Prediction by Komputer Assisted Technology (TOPKAT), Toxtree, CEASAR, and SARpy (SAR in python). In general, the results showed a high level of performance. To have a realistic estimate of the predictive ability, the results for chemicals inside and outside the training set for each model were considered. The effect of applicability domain tools (when available) on the prediction accuracy was also evaluated. The predictive tools included QSAR models, knowledge-based systems, and a combination of both methods. Models based on statistical QSAR methods gave better results.

  17. Predicting flight delay based on multiple linear regression

    NASA Astrophysics Data System (ADS)

    Ding, Yi

    2017-08-01

    Delay of flight has been regarded as one of the toughest difficulties in aviation control. How to establish an effective model to handle the delay prediction problem is a significant work. To solve the problem that the flight delay is difficult to predict, this study proposes a method to model the arriving flights and a multiple linear regression algorithm to predict delay, comparing with Naive-Bayes and C4.5 approach. Experiments based on a realistic dataset of domestic airports show that the accuracy of the proposed model approximates 80%, which is further improved than the Naive-Bayes and C4.5 approach approaches. The result testing shows that this method is convenient for calculation, and also can predict the flight delays effectively. It can provide decision basis for airport authorities.

  18. A comparison of five methods to predict genomic breeding values of dairy bulls from genome-wide SNP markers

    PubMed Central

    2009-01-01

    Background Genomic selection (GS) uses molecular breeding values (MBV) derived from dense markers across the entire genome for selection of young animals. The accuracy of MBV prediction is important for a successful application of GS. Recently, several methods have been proposed to estimate MBV. Initial simulation studies have shown that these methods can accurately predict MBV. In this study we compared the accuracies and possible bias of five different regression methods in an empirical application in dairy cattle. Methods Genotypes of 7,372 SNP and highly accurate EBV of 1,945 dairy bulls were used to predict MBV for protein percentage (PPT) and a profit index (Australian Selection Index, ASI). Marker effects were estimated by least squares regression (FR-LS), Bayesian regression (Bayes-R), random regression best linear unbiased prediction (RR-BLUP), partial least squares regression (PLSR) and nonparametric support vector regression (SVR) in a training set of 1,239 bulls. Accuracy and bias of MBV prediction were calculated from cross-validation of the training set and tested against a test team of 706 young bulls. Results For both traits, FR-LS using a subset of SNP was significantly less accurate than all other methods which used all SNP. Accuracies obtained by Bayes-R, RR-BLUP, PLSR and SVR were very similar for ASI (0.39-0.45) and for PPT (0.55-0.61). Overall, SVR gave the highest accuracy. All methods resulted in biased MBV predictions for ASI, for PPT only RR-BLUP and SVR predictions were unbiased. A significant decrease in accuracy of prediction of ASI was seen in young test cohorts of bulls compared to the accuracy derived from cross-validation of the training set. This reduction was not apparent for PPT. Combining MBV predictions with pedigree based predictions gave 1.05 - 1.34 times higher accuracies compared to predictions based on pedigree alone. Some methods have largely different computational requirements, with PLSR and RR-BLUP requiring the least computing time. Conclusions The four methods which use information from all SNP namely RR-BLUP, Bayes-R, PLSR and SVR generate similar accuracies of MBV prediction for genomic selection, and their use in the selection of immediate future generations in dairy cattle will be comparable. The use of FR-LS in genomic selection is not recommended. PMID:20043835

  19. Adaptive MPC based on MIMO ARX-Laguerre model.

    PubMed

    Ben Abdelwahed, Imen; Mbarek, Abdelkader; Bouzrara, Kais

    2017-03-01

    This paper proposes a method for synthesizing an adaptive predictive controller using a reduced complexity model. This latter is given by the projection of the ARX model on Laguerre bases. The resulting model is entitled MIMO ARX-Laguerre and it is characterized by an easy recursive representation. The adaptive predictive control law is computed based on multi-step-ahead finite-element predictors, identified directly from experimental input/output data. The model is tuned in each iteration by an online identification algorithms of both model parameters and Laguerre poles. The proposed approach avoids time consuming numerical optimization algorithms associated with most common linear predictive control strategies, which makes it suitable for real-time implementation. The method is used to synthesize and test in numerical simulations adaptive predictive controllers for the CSTR process benchmark. Copyright © 2016 ISA. Published by Elsevier Ltd. All rights reserved.

  20. Application of clustering analysis in the prediction of photovoltaic power generation based on neural network

    NASA Astrophysics Data System (ADS)

    Cheng, K.; Guo, L. M.; Wang, Y. K.; Zafar, M. T.

    2017-11-01

    In order to select effective samples in the large number of data of PV power generation years and improve the accuracy of PV power generation forecasting model, this paper studies the application of clustering analysis in this field and establishes forecasting model based on neural network. Based on three different types of weather on sunny, cloudy and rainy days, this research screens samples of historical data by the clustering analysis method. After screening, it establishes BP neural network prediction models using screened data as training data. Then, compare the six types of photovoltaic power generation prediction models before and after the data screening. Results show that the prediction model combining with clustering analysis and BP neural networks is an effective method to improve the precision of photovoltaic power generation.

  1. MQAPRank: improved global protein model quality assessment by learning-to-rank.

    PubMed

    Jing, Xiaoyang; Dong, Qiwen

    2017-05-25

    Protein structure prediction has achieved a lot of progress during the last few decades and a greater number of models for a certain sequence can be predicted. Consequently, assessing the qualities of predicted protein models in perspective is one of the key components of successful protein structure prediction. Over the past years, a number of methods have been developed to address this issue, which could be roughly divided into three categories: single methods, quasi-single methods and clustering (or consensus) methods. Although these methods achieve much success at different levels, accurate protein model quality assessment is still an open problem. Here, we present the MQAPRank, a global protein model quality assessment program based on learning-to-rank. The MQAPRank first sorts the decoy models by using single method based on learning-to-rank algorithm to indicate their relative qualities for the target protein. And then it takes the first five models as references to predict the qualities of other models by using average GDT_TS scores between reference models and other models. Benchmarked on CASP11 and 3DRobot datasets, the MQAPRank achieved better performances than other leading protein model quality assessment methods. Recently, the MQAPRank participated in the CASP12 under the group name FDUBio and achieved the state-of-the-art performances. The MQAPRank provides a convenient and powerful tool for protein model quality assessment with the state-of-the-art performances, it is useful for protein structure prediction and model quality assessment usages.

  2. Improving consensus contact prediction via server correlation reduction.

    PubMed

    Gao, Xin; Bu, Dongbo; Xu, Jinbo; Li, Ming

    2009-05-06

    Protein inter-residue contacts play a crucial role in the determination and prediction of protein structures. Previous studies on contact prediction indicate that although template-based consensus methods outperform sequence-based methods on targets with typical templates, such consensus methods perform poorly on new fold targets. However, we find out that even for new fold targets, the models generated by threading programs can contain many true contacts. The challenge is how to identify them. In this paper, we develop an integer linear programming model for consensus contact prediction. In contrast to the simple majority voting method assuming that all the individual servers are equally important and independent, the newly developed method evaluates their correlation by using maximum likelihood estimation and extracts independent latent servers from them by using principal component analysis. An integer linear programming method is then applied to assign a weight to each latent server to maximize the difference between true contacts and false ones. The proposed method is tested on the CASP7 data set. If the top L/5 predicted contacts are evaluated where L is the protein size, the average accuracy is 73%, which is much higher than that of any previously reported study. Moreover, if only the 15 new fold CASP7 targets are considered, our method achieves an average accuracy of 37%, which is much better than that of the majority voting method, SVM-LOMETS, SVM-SEQ, and SAM-T06. These methods demonstrate an average accuracy of 13.0%, 10.8%, 25.8% and 21.2%, respectively. Reducing server correlation and optimally combining independent latent servers show a significant improvement over the traditional consensus methods. This approach can hopefully provide a powerful tool for protein structure refinement and prediction use.

  3. Physics-based protein-structure prediction using a hierarchical protocol based on the UNRES force field: assessment in two blind tests.

    PubMed

    Ołdziej, S; Czaplewski, C; Liwo, A; Chinchio, M; Nanias, M; Vila, J A; Khalili, M; Arnautova, Y A; Jagielska, A; Makowski, M; Schafroth, H D; Kaźmierkiewicz, R; Ripoll, D R; Pillardy, J; Saunders, J A; Kang, Y K; Gibson, K D; Scheraga, H A

    2005-05-24

    Recent improvements in the protein-structure prediction method developed in our laboratory, based on the thermodynamic hypothesis, are described. The conformational space is searched extensively at the united-residue level by using our physics-based UNRES energy function and the conformational space annealing method of global optimization. The lowest-energy coarse-grained structures are then converted to an all-atom representation and energy-minimized with the ECEPP/3 force field. The procedure was assessed in two recent blind tests of protein-structure prediction. During the first blind test, we predicted large fragments of alpha and alpha+beta proteins [60-70 residues with C(alpha) rms deviation (rmsd) <6 A]. However, for alpha+beta proteins, significant topological errors occurred despite low rmsd values. In the second exercise, we predicted whole structures of five proteins (two alpha and three alpha+beta, with sizes of 53-235 residues) with remarkably good accuracy. In particular, for the genomic target TM0487 (a 102-residue alpha+beta protein from Thermotoga maritima), we predicted the complete, topologically correct structure with 7.3-A C(alpha) rmsd. So far this protein is the largest alpha+beta protein predicted based solely on the amino acid sequence and a physics-based potential-energy function and search procedure. For target T0198, a phosphate transport system regulator PhoU from T. maritima (a 235-residue mainly alpha-helical protein), we predicted the topology of the whole six-helix bundle correctly within 8 A rmsd, except the 32 C-terminal residues, most of which form a beta-hairpin. These and other examples described in this work demonstrate significant progress in physics-based protein-structure prediction.

  4. Automatic peak selection by a Benjamini-Hochberg-based algorithm.

    PubMed

    Abbas, Ahmed; Kong, Xin-Bing; Liu, Zhi; Jing, Bing-Yi; Gao, Xin

    2013-01-01

    A common issue in bioinformatics is that computational methods often generate a large number of predictions sorted according to certain confidence scores. A key problem is then determining how many predictions must be selected to include most of the true predictions while maintaining reasonably high precision. In nuclear magnetic resonance (NMR)-based protein structure determination, for instance, computational peak picking methods are becoming more and more common, although expert-knowledge remains the method of choice to determine how many peaks among thousands of candidate peaks should be taken into consideration to capture the true peaks. Here, we propose a Benjamini-Hochberg (B-H)-based approach that automatically selects the number of peaks. We formulate the peak selection problem as a multiple testing problem. Given a candidate peak list sorted by either volumes or intensities, we first convert the peaks into [Formula: see text]-values and then apply the B-H-based algorithm to automatically select the number of peaks. The proposed approach is tested on the state-of-the-art peak picking methods, including WaVPeak [1] and PICKY [2]. Compared with the traditional fixed number-based approach, our approach returns significantly more true peaks. For instance, by combining WaVPeak or PICKY with the proposed method, the missing peak rates are on average reduced by 20% and 26%, respectively, in a benchmark set of 32 spectra extracted from eight proteins. The consensus of the B-H-selected peaks from both WaVPeak and PICKY achieves 88% recall and 83% precision, which significantly outperforms each individual method and the consensus method without using the B-H algorithm. The proposed method can be used as a standard procedure for any peak picking method and straightforwardly applied to some other prediction selection problems in bioinformatics. The source code, documentation and example data of the proposed method is available at http://sfb.kaust.edu.sa/pages/software.aspx.

  5. Automatic Peak Selection by a Benjamini-Hochberg-Based Algorithm

    PubMed Central

    Abbas, Ahmed; Kong, Xin-Bing; Liu, Zhi; Jing, Bing-Yi; Gao, Xin

    2013-01-01

    A common issue in bioinformatics is that computational methods often generate a large number of predictions sorted according to certain confidence scores. A key problem is then determining how many predictions must be selected to include most of the true predictions while maintaining reasonably high precision. In nuclear magnetic resonance (NMR)-based protein structure determination, for instance, computational peak picking methods are becoming more and more common, although expert-knowledge remains the method of choice to determine how many peaks among thousands of candidate peaks should be taken into consideration to capture the true peaks. Here, we propose a Benjamini-Hochberg (B-H)-based approach that automatically selects the number of peaks. We formulate the peak selection problem as a multiple testing problem. Given a candidate peak list sorted by either volumes or intensities, we first convert the peaks into -values and then apply the B-H-based algorithm to automatically select the number of peaks. The proposed approach is tested on the state-of-the-art peak picking methods, including WaVPeak [1] and PICKY [2]. Compared with the traditional fixed number-based approach, our approach returns significantly more true peaks. For instance, by combining WaVPeak or PICKY with the proposed method, the missing peak rates are on average reduced by 20% and 26%, respectively, in a benchmark set of 32 spectra extracted from eight proteins. The consensus of the B-H-selected peaks from both WaVPeak and PICKY achieves 88% recall and 83% precision, which significantly outperforms each individual method and the consensus method without using the B-H algorithm. The proposed method can be used as a standard procedure for any peak picking method and straightforwardly applied to some other prediction selection problems in bioinformatics. The source code, documentation and example data of the proposed method is available at http://sfb.kaust.edu.sa/pages/software.aspx. PMID:23308147

  6. Properties of a Formal Method for Prediction of Emergent Behaviors in Swarm-based Systems

    NASA Technical Reports Server (NTRS)

    Rouff, Christopher; Vanderbilt, Amy; Hinchey, Mike; Truszkowski, Walt; Rash, James

    2004-01-01

    Autonomous intelligent swarms of satellites are being proposed for NASA missions that have complex behaviors and interactions. The emergent properties of swarms make these missions powerful, but at the same time more difficult to design and assure that proper behaviors will emerge. This paper gives the results of research into formal methods techniques for verification and validation of NASA swarm-based missions. Multiple formal methods were evaluated to determine their effectiveness in modeling and assuring the behavior of swarms of spacecraft. The NASA ANTS mission was used as an example of swarm intelligence for which to apply the formal methods. This paper will give the evaluation of these formal methods and give partial specifications of the ANTS mission using four selected methods. We then give an evaluation of the methods and the needed properties of a formal method for effective specification and prediction of emergent behavior in swarm-based systems.

  7. Prediction of drug synergy in cancer using ensemble-based machine learning techniques

    NASA Astrophysics Data System (ADS)

    Singh, Harpreet; Rana, Prashant Singh; Singh, Urvinder

    2018-04-01

    Drug synergy prediction plays a significant role in the medical field for inhibiting specific cancer agents. It can be developed as a pre-processing tool for therapeutic successes. Examination of different drug-drug interaction can be done by drug synergy score. It needs efficient regression-based machine learning approaches to minimize the prediction errors. Numerous machine learning techniques such as neural networks, support vector machines, random forests, LASSO, Elastic Nets, etc., have been used in the past to realize requirement as mentioned above. However, these techniques individually do not provide significant accuracy in drug synergy score. Therefore, the primary objective of this paper is to design a neuro-fuzzy-based ensembling approach. To achieve this, nine well-known machine learning techniques have been implemented by considering the drug synergy data. Based on the accuracy of each model, four techniques with high accuracy are selected to develop ensemble-based machine learning model. These models are Random forest, Fuzzy Rules Using Genetic Cooperative-Competitive Learning method (GFS.GCCL), Adaptive-Network-Based Fuzzy Inference System (ANFIS) and Dynamic Evolving Neural-Fuzzy Inference System method (DENFIS). Ensembling is achieved by evaluating the biased weighted aggregation (i.e. adding more weights to the model with a higher prediction score) of predicted data by selected models. The proposed and existing machine learning techniques have been evaluated on drug synergy score data. The comparative analysis reveals that the proposed method outperforms others in terms of accuracy, root mean square error and coefficient of correlation.

  8. Improving Prediction Accuracy for WSN Data Reduction by Applying Multivariate Spatio-Temporal Correlation

    PubMed Central

    Carvalho, Carlos; Gomes, Danielo G.; Agoulmine, Nazim; de Souza, José Neuman

    2011-01-01

    This paper proposes a method based on multivariate spatial and temporal correlation to improve prediction accuracy in data reduction for Wireless Sensor Networks (WSN). Prediction of data not sent to the sink node is a technique used to save energy in WSNs by reducing the amount of data traffic. However, it may not be very accurate. Simulations were made involving simple linear regression and multiple linear regression functions to assess the performance of the proposed method. The results show a higher correlation between gathered inputs when compared to time, which is an independent variable widely used for prediction and forecasting. Prediction accuracy is lower when simple linear regression is used, whereas multiple linear regression is the most accurate one. In addition to that, our proposal outperforms some current solutions by about 50% in humidity prediction and 21% in light prediction. To the best of our knowledge, we believe that we are probably the first to address prediction based on multivariate correlation for WSN data reduction. PMID:22346626

  9. Remaining Useful Life Prediction for Lithium-Ion Batteries Based on Gaussian Processes Mixture

    PubMed Central

    Li, Lingling; Wang, Pengchong; Chao, Kuei-Hsiang; Zhou, Yatong; Xie, Yang

    2016-01-01

    The remaining useful life (RUL) prediction of Lithium-ion batteries is closely related to the capacity degeneration trajectories. Due to the self-charging and the capacity regeneration, the trajectories have the property of multimodality. Traditional prediction models such as the support vector machines (SVM) or the Gaussian Process regression (GPR) cannot accurately characterize this multimodality. This paper proposes a novel RUL prediction method based on the Gaussian Process Mixture (GPM). It can process multimodality by fitting different segments of trajectories with different GPR models separately, such that the tiny differences among these segments can be revealed. The method is demonstrated to be effective for prediction by the excellent predictive result of the experiments on the two commercial and chargeable Type 1850 Lithium-ion batteries, provided by NASA. The performance comparison among the models illustrates that the GPM is more accurate than the SVM and the GPR. In addition, GPM can yield the predictive confidence interval, which makes the prediction more reliable than that of traditional models. PMID:27632176

  10. Predicting and interpreting identification errors in military vehicle training using multidimensional scaling.

    PubMed

    Bohil, Corey J; Higgins, Nicholas A; Keebler, Joseph R

    2014-01-01

    We compared methods for predicting and understanding the source of confusion errors during military vehicle identification training. Participants completed training to identify main battle tanks. They also completed card-sorting and similarity-rating tasks to express their mental representation of resemblance across the set of training items. We expected participants to selectively attend to a subset of vehicle features during these tasks, and we hypothesised that we could predict identification confusion errors based on the outcomes of the card-sort and similarity-rating tasks. Based on card-sorting results, we were able to predict about 45% of observed identification confusions. Based on multidimensional scaling of the similarity-rating data, we could predict more than 80% of identification confusions. These methods also enabled us to infer the dimensions receiving significant attention from each participant. This understanding of mental representation may be crucial in creating personalised training that directs attention to features that are critical for accurate identification. Participants completed military vehicle identification training and testing, along with card-sorting and similarity-rating tasks. The data enabled us to predict up to 84% of identification confusion errors and to understand the mental representation underlying these errors. These methods have potential to improve training and reduce identification errors leading to fratricide.

  11. Failure prediction using machine learning and time series in optical network.

    PubMed

    Wang, Zhilong; Zhang, Min; Wang, Danshi; Song, Chuang; Liu, Min; Li, Jin; Lou, Liqi; Liu, Zhuo

    2017-08-07

    In this paper, we propose a performance monitoring and failure prediction method in optical networks based on machine learning. The primary algorithms of this method are the support vector machine (SVM) and double exponential smoothing (DES). With a focus on risk-aware models in optical networks, the proposed protection plan primarily investigates how to predict the risk of an equipment failure. To the best of our knowledge, this important problem has not yet been fully considered. Experimental results showed that the average prediction accuracy of our method was 95% when predicting the optical equipment failure state. This finding means that our method can forecast an equipment failure risk with high accuracy. Therefore, our proposed DES-SVM method can effectively improve traditional risk-aware models to protect services from possible failures and enhance the optical network stability.

  12. KFC2: a knowledge-based hot spot prediction method based on interface solvation, atomic density, and plasticity features.

    PubMed

    Zhu, Xiaolei; Mitchell, Julie C

    2011-09-01

    Hot spots constitute a small fraction of protein-protein interface residues, yet they account for a large fraction of the binding affinity. Based on our previous method (KFC), we present two new methods (KFC2a and KFC2b) that outperform other methods at hot spot prediction. A number of improvements were made in developing these new methods. First, we created a training data set that contained a similar number of hot spot and non-hot spot residues. In addition, we generated 47 different features, and different numbers of features were used to train the models to avoid over-fitting. Finally, two feature combinations were selected: One (used in KFC2a) is composed of eight features that are mainly related to solvent accessible surface area and local plasticity; the other (KFC2b) is composed of seven features, only two of which are identical to those used in KFC2a. The two models were built using support vector machines (SVM). The two KFC2 models were then tested on a mixed independent test set, and compared with other methods such as Robetta, FOLDEF, HotPoint, MINERVA, and KFC. KFC2a showed the highest predictive accuracy for hot spot residues (True Positive Rate: TPR = 0.85); however, the false positive rate was somewhat higher than for other models. KFC2b showed the best predictive accuracy for hot spot residues (True Positive Rate: TPR = 0.62) among all methods other than KFC2a, and the False Positive Rate (FPR = 0.15) was comparable with other highly predictive methods. Copyright © 2011 Wiley-Liss, Inc.

  13. Comparison on genomic predictions using three GBLUP methods and two single-step blending methods in the Nordic Holstein population

    PubMed Central

    2012-01-01

    Background A single-step blending approach allows genomic prediction using information of genotyped and non-genotyped animals simultaneously. However, the combined relationship matrix in a single-step method may need to be adjusted because marker-based and pedigree-based relationship matrices may not be on the same scale. The same may apply when a GBLUP model includes both genomic breeding values and residual polygenic effects. The objective of this study was to compare single-step blending methods and GBLUP methods with and without adjustment of the genomic relationship matrix for genomic prediction of 16 traits in the Nordic Holstein population. Methods The data consisted of de-regressed proofs (DRP) for 5 214 genotyped and 9 374 non-genotyped bulls. The bulls were divided into a training and a validation population by birth date, October 1, 2001. Five approaches for genomic prediction were used: 1) a simple GBLUP method, 2) a GBLUP method with a polygenic effect, 3) an adjusted GBLUP method with a polygenic effect, 4) a single-step blending method, and 5) an adjusted single-step blending method. In the adjusted GBLUP and single-step methods, the genomic relationship matrix was adjusted for the difference of scale between the genomic and the pedigree relationship matrices. A set of weights on the pedigree relationship matrix (ranging from 0.05 to 0.40) was used to build the combined relationship matrix in the single-step blending method and the GBLUP method with a polygenetic effect. Results Averaged over the 16 traits, reliabilities of genomic breeding values predicted using the GBLUP method with a polygenic effect (relative weight of 0.20) were 0.3% higher than reliabilities from the simple GBLUP method (without a polygenic effect). The adjusted single-step blending and original single-step blending methods (relative weight of 0.20) had average reliabilities that were 2.1% and 1.8% higher than the simple GBLUP method, respectively. In addition, the GBLUP method with a polygenic effect led to less bias of genomic predictions than the simple GBLUP method, and both single-step blending methods yielded less bias of predictions than all GBLUP methods. Conclusions The single-step blending method is an appealing approach for practical genomic prediction in dairy cattle. Genomic prediction from the single-step blending method can be improved by adjusting the scale of the genomic relationship matrix. PMID:22455934

  14. CSmetaPred: a consensus method for prediction of catalytic residues.

    PubMed

    Choudhary, Preeti; Kumar, Shailesh; Bachhawat, Anand Kumar; Pandit, Shashi Bhushan

    2017-12-22

    Knowledge of catalytic residues can play an essential role in elucidating mechanistic details of an enzyme. However, experimental identification of catalytic residues is a tedious and time-consuming task, which can be expedited by computational predictions. Despite significant development in active-site prediction methods, one of the remaining issues is ranked positions of putative catalytic residues among all ranked residues. In order to improve ranking of catalytic residues and their prediction accuracy, we have developed a meta-approach based method CSmetaPred. In this approach, residues are ranked based on the mean of normalized residue scores derived from four well-known catalytic residue predictors. The mean residue score of CSmetaPred is combined with predicted pocket information to improve prediction performance in meta-predictor, CSmetaPred_poc. Both meta-predictors are evaluated on two comprehensive benchmark datasets and three legacy datasets using Receiver Operating Characteristic (ROC) and Precision Recall (PR) curves. The visual and quantitative analysis of ROC and PR curves shows that meta-predictors outperform their constituent methods and CSmetaPred_poc is the best of evaluated methods. For instance, on CSAMAC dataset CSmetaPred_poc (CSmetaPred) achieves highest Mean Average Specificity (MAS), a scalar measure for ROC curve, of 0.97 (0.96). Importantly, median predicted rank of catalytic residues is the lowest (best) for CSmetaPred_poc. Considering residues ranked ≤20 classified as true positive in binary classification, CSmetaPred_poc achieves prediction accuracy of 0.94 on CSAMAC dataset. Moreover, on the same dataset CSmetaPred_poc predicts all catalytic residues within top 20 ranks for ~73% of enzymes. Furthermore, benchmarking of prediction on comparative modelled structures showed that models result in better prediction than only sequence based predictions. These analyses suggest that CSmetaPred_poc is able to rank putative catalytic residues at lower (better) ranked positions, which can facilitate and expedite their experimental characterization. The benchmarking studies showed that employing meta-approach in combining residue-level scores derived from well-known catalytic residue predictors can improve prediction accuracy as well as provide improved ranked positions of known catalytic residues. Hence, such predictions can assist experimentalist to prioritize residues for mutational studies in their efforts to characterize catalytic residues. Both meta-predictors are available as webserver at: http://14.139.227.206/csmetapred/ .

  15. Kernel-based whole-genome prediction of complex traits: a review.

    PubMed

    Morota, Gota; Gianola, Daniel

    2014-01-01

    Prediction of genetic values has been a focus of applied quantitative genetics since the beginning of the 20th century, with renewed interest following the advent of the era of whole genome-enabled prediction. Opportunities offered by the emergence of high-dimensional genomic data fueled by post-Sanger sequencing technologies, especially molecular markers, have driven researchers to extend Ronald Fisher and Sewall Wright's models to confront new challenges. In particular, kernel methods are gaining consideration as a regression method of choice for genome-enabled prediction. Complex traits are presumably influenced by many genomic regions working in concert with others (clearly so when considering pathways), thus generating interactions. Motivated by this view, a growing number of statistical approaches based on kernels attempt to capture non-additive effects, either parametrically or non-parametrically. This review centers on whole-genome regression using kernel methods applied to a wide range of quantitative traits of agricultural importance in animals and plants. We discuss various kernel-based approaches tailored to capturing total genetic variation, with the aim of arriving at an enhanced predictive performance in the light of available genome annotation information. Connections between prediction machines born in animal breeding, statistics, and machine learning are revisited, and their empirical prediction performance is discussed. Overall, while some encouraging results have been obtained with non-parametric kernels, recovering non-additive genetic variation in a validation dataset remains a challenge in quantitative genetics.

  16. Rigorous assessment and integration of the sequence and structure based features to predict hot spots

    PubMed Central

    2011-01-01

    Background Systematic mutagenesis studies have shown that only a few interface residues termed hot spots contribute significantly to the binding free energy of protein-protein interactions. Therefore, hot spots prediction becomes increasingly important for well understanding the essence of proteins interactions and helping narrow down the search space for drug design. Currently many computational methods have been developed by proposing different features. However comparative assessment of these features and furthermore effective and accurate methods are still in pressing need. Results In this study, we first comprehensively collect the features to discriminate hot spots and non-hot spots and analyze their distributions. We find that hot spots have lower relASA and larger relative change in ASA, suggesting hot spots tend to be protected from bulk solvent. In addition, hot spots have more contacts including hydrogen bonds, salt bridges, and atomic contacts, which favor complexes formation. Interestingly, we find that conservation score and sequence entropy are not significantly different between hot spots and non-hot spots in Ab+ dataset (all complexes). While in Ab- dataset (antigen-antibody complexes are excluded), there are significant differences in two features between hot pots and non-hot spots. Secondly, we explore the predictive ability for each feature and the combinations of features by support vector machines (SVMs). The results indicate that sequence-based feature outperforms other combinations of features with reasonable accuracy, with a precision of 0.69, a recall of 0.68, an F1 score of 0.68, and an AUC of 0.68 on independent test set. Compared with other machine learning methods and two energy-based approaches, our approach achieves the best performance. Moreover, we demonstrate the applicability of our method to predict hot spots of two protein complexes. Conclusion Experimental results show that support vector machine classifiers are quite effective in predicting hot spots based on sequence features. Hot spots cannot be fully predicted through simple analysis based on physicochemical characteristics, but there is reason to believe that integration of features and machine learning methods can remarkably improve the predictive performance for hot spots. PMID:21798070

  17. [Effect of stock abundance and environmental factors on the recruitment success of small yellow croaker in the East China Sea].

    PubMed

    Liu, Zun-lei; Yuan, Xing-wei; Yang, Lin-lin; Yan, Li-ping; Zhang, Hui; Cheng, Jia-hua

    2015-02-01

    Multiple hypotheses are available to explain recruitment rate. Model selection methods can be used to identify the best model that supports a particular hypothesis. However, using a single model for estimating recruitment success is often inadequate for overexploited population because of high model uncertainty. In this study, stock-recruitment data of small yellow croaker in the East China Sea collected from fishery dependent and independent surveys between 1992 and 2012 were used to examine density-dependent effects on recruitment success. Model selection methods based on frequentist (AIC, maximum adjusted R2 and P-values) and Bayesian (Bayesian model averaging, BMA) methods were applied to identify the relationship between recruitment and environment conditions. Interannual variability of the East China Sea environment was indicated by sea surface temperature ( SST) , meridional wind stress (MWS), zonal wind stress (ZWS), sea surface pressure (SPP) and runoff of Changjiang River ( RCR). Mean absolute error, mean squared predictive error and continuous ranked probability score were calculated to evaluate the predictive performance of recruitment success. The results showed that models structures were not consistent based on three kinds of model selection methods, predictive variables of models were spawning abundance and MWS by AIC, spawning abundance by P-values, spawning abundance, MWS and RCR by maximum adjusted R2. The recruitment success decreased linearly with stock abundance (P < 0.01), suggesting overcompensation effect in the recruitment success might be due to cannibalism or food competition. Meridional wind intensity showed marginally significant and positive effects on the recruitment success (P = 0.06), while runoff of Changjiang River showed a marginally negative effect (P = 0.07). Based on mean absolute error and continuous ranked probability score, predictive error associated with models obtained from BMA was the smallest amongst different approaches, while that from models selected based on the P-value of the independent variables was the highest. However, mean squared predictive error from models selected based on the maximum adjusted R2 was highest. We found that BMA method could improve the prediction of recruitment success, derive more accurate prediction interval and quantitatively evaluate model uncertainty.

  18. A state-based probabilistic model for tumor respiratory motion prediction

    NASA Astrophysics Data System (ADS)

    Kalet, Alan; Sandison, George; Wu, Huanmei; Schmitz, Ruth

    2010-12-01

    This work proposes a new probabilistic mathematical model for predicting tumor motion and position based on a finite state representation using the natural breathing states of exhale, inhale and end of exhale. Tumor motion was broken down into linear breathing states and sequences of states. Breathing state sequences and the observables representing those sequences were analyzed using a hidden Markov model (HMM) to predict the future sequences and new observables. Velocities and other parameters were clustered using a k-means clustering algorithm to associate each state with a set of observables such that a prediction of state also enables a prediction of tumor velocity. A time average model with predictions based on average past state lengths was also computed. State sequences which are known a priori to fit the data were fed into the HMM algorithm to set a theoretical limit of the predictive power of the model. The effectiveness of the presented probabilistic model has been evaluated for gated radiation therapy based on previously tracked tumor motion in four lung cancer patients. Positional prediction accuracy is compared with actual position in terms of the overall RMS errors. Various system delays, ranging from 33 to 1000 ms, were tested. Previous studies have shown duty cycles for latencies of 33 and 200 ms at around 90% and 80%, respectively, for linear, no prediction, Kalman filter and ANN methods as averaged over multiple patients. At 1000 ms, the previously reported duty cycles range from approximately 62% (ANN) down to 34% (no prediction). Average duty cycle for the HMM method was found to be 100% and 91 ± 3% for 33 and 200 ms latency and around 40% for 1000 ms latency in three out of four breathing motion traces. RMS errors were found to be lower than linear and no prediction methods at latencies of 1000 ms. The results show that for system latencies longer than 400 ms, the time average HMM prediction outperforms linear, no prediction, and the more general HMM-type predictive models. RMS errors for the time average model approach the theoretical limit of the HMM, and predicted state sequences are well correlated with sequences known to fit the data.

  19. SNBRFinder: A Sequence-Based Hybrid Algorithm for Enhanced Prediction of Nucleic Acid-Binding Residues.

    PubMed

    Yang, Xiaoxia; Wang, Jia; Sun, Jun; Liu, Rong

    2015-01-01

    Protein-nucleic acid interactions are central to various fundamental biological processes. Automated methods capable of reliably identifying DNA- and RNA-binding residues in protein sequence are assuming ever-increasing importance. The majority of current algorithms rely on feature-based prediction, but their accuracy remains to be further improved. Here we propose a sequence-based hybrid algorithm SNBRFinder (Sequence-based Nucleic acid-Binding Residue Finder) by merging a feature predictor SNBRFinderF and a template predictor SNBRFinderT. SNBRFinderF was established using the support vector machine whose inputs include sequence profile and other complementary sequence descriptors, while SNBRFinderT was implemented with the sequence alignment algorithm based on profile hidden Markov models to capture the weakly homologous template of query sequence. Experimental results show that SNBRFinderF was clearly superior to the commonly used sequence profile-based predictor and SNBRFinderT can achieve comparable performance to the structure-based template methods. Leveraging the complementary relationship between these two predictors, SNBRFinder reasonably improved the performance of both DNA- and RNA-binding residue predictions. More importantly, the sequence-based hybrid prediction reached competitive performance relative to our previous structure-based counterpart. Our extensive and stringent comparisons show that SNBRFinder has obvious advantages over the existing sequence-based prediction algorithms. The value of our algorithm is highlighted by establishing an easy-to-use web server that is freely accessible at http://ibi.hzau.edu.cn/SNBRFinder.

  20. A collaborative filtering-based approach to biomedical knowledge discovery.

    PubMed

    Lever, Jake; Gakkhar, Sitanshu; Gottlieb, Michael; Rashnavadi, Tahereh; Lin, Santina; Siu, Celia; Smith, Maia; Jones, Martin R; Krzywinski, Martin; Jones, Steven J M; Wren, Jonathan

    2018-02-15

    The increase in publication rates makes it challenging for an individual researcher to stay abreast of all relevant research in order to find novel research hypotheses. Literature-based discovery methods make use of knowledge graphs built using text mining and can infer future associations between biomedical concepts that will likely occur in new publications. These predictions are a valuable resource for researchers to explore a research topic. Current methods for prediction are based on the local structure of the knowledge graph. A method that uses global knowledge from across the knowledge graph needs to be developed in order to make knowledge discovery a frequently used tool by researchers. We propose an approach based on the singular value decomposition (SVD) that is able to combine data from across the knowledge graph through a reduced representation. Using cooccurrence data extracted from published literature, we show that SVD performs better than the leading methods for scoring discoveries. We also show the diminishing predictive power of knowledge discovery as we compare our predictions with real associations that appear further into the future. Finally, we examine the strengths and weaknesses of the SVD approach against another well-performing system using several predicted associations. All code and results files for this analysis can be accessed at https://github.com/jakelever/knowledgediscovery. sjones@bcgsc.ca. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  1. Thermodynamics of reformulated automotive fuels

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zudkevitch, D.; Murthy, A.K.S.; Gmehling, J.

    1995-06-01

    Two methods for predicting Reid vapor pressure (Rvp) and initial vapor emissions of reformulated gasoline blends that contain one or more oxygenated compounds show excellent agreement with experimental data. In the first method, method A, D-86 distillation data for gasoline blends are used for predicting Rvp from a simulation of the mini dry vapor pressure equivalent (Dvpe) experiment. The other method, method B, relies on analytical information (PIANO analyses) of the base gasoline and uses classical thermodynamics for simulating the same Rvp equivalent (Rvpe) mini experiment. Method B also predicts composition and other properties for the fuel`s initial vapor emission.more » Method B, although complex, is more useful in that is can predict properties of blends without a D-86 distillation. An important aspect of method B is its capability to predict composition of initial vapor emissions from gasoline blends. Thus, it offers a powerful tool to planners of gasoline blending. Method B uses theoretically sound formulas, rigorous thermodynamic routines and uses data and correlations of physical properties that are in the public domain. Results indicate that predictions made with both methods agree very well with experimental values of Dvpe. Computer simulation methods were programmed and tested.« less

  2. sNebula, a network-based algorithm to predict binding between human leukocyte antigens and peptides

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Luo, Heng; Ye, Hao; Ng, Hui Wen

    Understanding the binding between human leukocyte antigens (HLAs) and peptides is important to understand the functioning of the immune system. Since it is time-consuming and costly to measure the binding between large numbers of HLAs and peptides, computational methods including machine learning models and network approaches have been developed to predict HLA-peptide binding. However, there are several limitations for the existing methods. We developed a network-based algorithm called sNebula to address these limitations. We curated qualitative Class I HLA-peptide binding data and demonstrated the prediction performance of sNebula on this dataset using leave-one-out cross-validation and five-fold cross-validations. Furthermore, this algorithmmore » can predict not only peptides of different lengths and different types of HLAs, but also the peptides or HLAs that have no existing binding data. We believe sNebula is an effective method to predict HLA-peptide binding and thus improve our understanding of the immune system.« less

  3. sNebula, a network-based algorithm to predict binding between human leukocyte antigens and peptides

    PubMed Central

    Luo, Heng; Ye, Hao; Ng, Hui Wen; Sakkiah, Sugunadevi; Mendrick, Donna L.; Hong, Huixiao

    2016-01-01

    Understanding the binding between human leukocyte antigens (HLAs) and peptides is important to understand the functioning of the immune system. Since it is time-consuming and costly to measure the binding between large numbers of HLAs and peptides, computational methods including machine learning models and network approaches have been developed to predict HLA-peptide binding. However, there are several limitations for the existing methods. We developed a network-based algorithm called sNebula to address these limitations. We curated qualitative Class I HLA-peptide binding data and demonstrated the prediction performance of sNebula on this dataset using leave-one-out cross-validation and five-fold cross-validations. This algorithm can predict not only peptides of different lengths and different types of HLAs, but also the peptides or HLAs that have no existing binding data. We believe sNebula is an effective method to predict HLA-peptide binding and thus improve our understanding of the immune system. PMID:27558848

  4. sNebula, a network-based algorithm to predict binding between human leukocyte antigens and peptides

    DOE PAGES

    Luo, Heng; Ye, Hao; Ng, Hui Wen; ...

    2016-08-25

    Understanding the binding between human leukocyte antigens (HLAs) and peptides is important to understand the functioning of the immune system. Since it is time-consuming and costly to measure the binding between large numbers of HLAs and peptides, computational methods including machine learning models and network approaches have been developed to predict HLA-peptide binding. However, there are several limitations for the existing methods. We developed a network-based algorithm called sNebula to address these limitations. We curated qualitative Class I HLA-peptide binding data and demonstrated the prediction performance of sNebula on this dataset using leave-one-out cross-validation and five-fold cross-validations. Furthermore, this algorithmmore » can predict not only peptides of different lengths and different types of HLAs, but also the peptides or HLAs that have no existing binding data. We believe sNebula is an effective method to predict HLA-peptide binding and thus improve our understanding of the immune system.« less

  5. Probability Based hERG Blocker Classifiers.

    PubMed

    Wang, Zhi; Mussa, Hamse Y; Lowe, Robert; Glen, Robert C; Yan, Aixia

    2012-09-01

    The US Food and Drug Administration (FDA) require in vitro human ether-a-go-go related (hERG) ion channel affinity tests for all drug candidates prior to clinical trials. In this study, probabilistic-based methods were employed to develop prediction models on hERG inhibition prediction, which are different from traditional QSAR models that are mainly based on supervised 'hard point' (HP) classification approaches giving 'yes/no' answers. The obtained models can 'ascertain' whether or not a given set of compounds can block hERG ion channels. The results presented indicate that the proposed probabilistic-based method can be a valuable tool for ranking compounds with respect to their potential cardio-toxicity and will be promising for other toxic property predictions. Copyright © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  6. Prediction of rodent carcinogenicity bioassays from molecular structure using inductive logic programming

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    King, R.D.; Srinivasan, A.

    1996-10-01

    The machine learning program Progol was applied to the problem of forming the structure-activity relationship (SAR) for a set of compounds tested for carcinogenicity in rodent bioassays by the U.S. National Toxicology Program (NTP). Progol is the first inductive logic programming (ILP) algorithm to use a fully relational method for describing chemical structure in SARs, based on using atoms and their bond connectivities. Progol is well suited to forming SARs for carcinogenicity as it is designed to produce easily understandable rules (structural alerts) for sets of noncongeneric compounds. The Progol SAR method was tested by prediction of a set ofmore » compounds that have been widely predicted by other SAR methods (the compounds used in the NTP`s first round of carcinogenesis predictions). For these compounds no method (human or machine) was significantly more accurate than Progol. Progol was the most accurate method that did not use data from biological tests on rodents (however, the difference in accuracy is not significant). The Progol predictions were based solely on chemical structure and the results of tests for Salmonella mutagenicity. Using the full NTP database, the prediction accuracy of Progol was estimated to be 63% ({+-}3%) using 5-fold cross validation. A set of structural alerts for carcinogenesis was automatically generated and the chemical rationale for them investigated-these structural alerts are statistically independent of the Salmonella mutagenicity. Carcinogenicity is predicted for the compounds used in the NTP`s second round of carcinogenesis predictions. The results for prediction of carcinogenesis, taken together with the previous successful applications of predicting mutagenicity in nitroaromatic compounds, and inhibition of angiogenesis by suramin analogues, show that Progol has a role to play in understanding the SARs of cancer-related compounds. 29 refs., 2 figs., 4 tabs.« less

  7. Boosting compound-protein interaction prediction by deep learning.

    PubMed

    Tian, Kai; Shao, Mingyu; Wang, Yang; Guan, Jihong; Zhou, Shuigeng

    2016-11-01

    The identification of interactions between compounds and proteins plays an important role in network pharmacology and drug discovery. However, experimentally identifying compound-protein interactions (CPIs) is generally expensive and time-consuming, computational approaches are thus introduced. Among these, machine-learning based methods have achieved a considerable success. However, due to the nonlinear and imbalanced nature of biological data, many machine learning approaches have their own limitations. Recently, deep learning techniques show advantages over many state-of-the-art machine learning methods in some applications. In this study, we aim at improving the performance of CPI prediction based on deep learning, and propose a method called DL-CPI (the abbreviation of Deep Learning for Compound-Protein Interactions prediction), which employs deep neural network (DNN) to effectively learn the representations of compound-protein pairs. Extensive experiments show that DL-CPI can learn useful features of compound-protein pairs by a layerwise abstraction, and thus achieves better prediction performance than existing methods on both balanced and imbalanced datasets. Copyright © 2016 Elsevier Inc. All rights reserved.

  8. A polynomial based model for cell fate prediction in human diseases.

    PubMed

    Ma, Lichun; Zheng, Jie

    2017-12-21

    Cell fate regulation directly affects tissue homeostasis and human health. Research on cell fate decision sheds light on key regulators, facilitates understanding the mechanisms, and suggests novel strategies to treat human diseases that are related to abnormal cell development. In this study, we proposed a polynomial based model to predict cell fate. This model was derived from Taylor series. As a case study, gene expression data of pancreatic cells were adopted to test and verify the model. As numerous features (genes) are available, we employed two kinds of feature selection methods, i.e. correlation based and apoptosis pathway based. Then polynomials of different degrees were used to refine the cell fate prediction function. 10-fold cross-validation was carried out to evaluate the performance of our model. In addition, we analyzed the stability of the resultant cell fate prediction model by evaluating the ranges of the parameters, as well as assessing the variances of the predicted values at randomly selected points. Results show that, within both the two considered gene selection methods, the prediction accuracies of polynomials of different degrees show little differences. Interestingly, the linear polynomial (degree 1 polynomial) is more stable than others. When comparing the linear polynomials based on the two gene selection methods, it shows that although the accuracy of the linear polynomial that uses correlation analysis outcomes is a little higher (achieves 86.62%), the one within genes of the apoptosis pathway is much more stable. Considering both the prediction accuracy and the stability of polynomial models of different degrees, the linear model is a preferred choice for cell fate prediction with gene expression data of pancreatic cells. The presented cell fate prediction model can be extended to other cells, which may be important for basic research as well as clinical study of cell development related diseases.

  9. Measurement versus prediction in the construction of patient-reported outcome questionnaires: can we have our cake and eat it?

    PubMed

    Smits, Niels; van der Ark, L Andries; Conijn, Judith M

    2017-11-02

    Two important goals when using questionnaires are (a) measurement: the questionnaire is constructed to assign numerical values that accurately represent the test taker's attribute, and (b) prediction: the questionnaire is constructed to give an accurate forecast of an external criterion. Construction methods aimed at measurement prescribe that items should be reliable. In practice, this leads to questionnaires with high inter-item correlations. By contrast, construction methods aimed at prediction typically prescribe that items have a high correlation with the criterion and low inter-item correlations. The latter approach has often been said to produce a paradox concerning the relation between reliability and validity [1-3], because it is often assumed that good measurement is a prerequisite of good prediction. To answer four questions: (1) Why are measurement-based methods suboptimal for questionnaires that are used for prediction? (2) How should one construct a questionnaire that is used for prediction? (3) Do questionnaire-construction methods that optimize measurement and prediction lead to the selection of different items in the questionnaire? (4) Is it possible to construct a questionnaire that can be used for both measurement and prediction? An empirical data set consisting of scores of 242 respondents on questionnaire items measuring mental health is used to select items by means of two methods: a method that optimizes the predictive value of the scale (i.e., forecast a clinical diagnosis), and a method that optimizes the reliability of the scale. We show that for the two scales different sets of items are selected and that a scale constructed to meet the one goal does not show optimal performance with reference to the other goal. The answers are as follows: (1) Because measurement-based methods tend to maximize inter-item correlations by which predictive validity reduces. (2) Through selecting items that correlate highly with the criterion and lowly with the remaining items. (3) Yes, these methods may lead to different item selections. (4) For a single questionnaire: Yes, but it is problematic because reliability cannot be estimated accurately. For a test battery: Yes, but it is very costly. Implications for the construction of patient-reported outcome questionnaires are discussed.

  10. Blind prediction of noncanonical RNA structure at atomic accuracy.

    PubMed

    Watkins, Andrew M; Geniesse, Caleb; Kladwang, Wipapat; Zakrevsky, Paul; Jaeger, Luc; Das, Rhiju

    2018-05-01

    Prediction of RNA structure from nucleotide sequence remains an unsolved grand challenge of biochemistry and requires distinct concepts from protein structure prediction. Despite extensive algorithmic development in recent years, modeling of noncanonical base pairs of new RNA structural motifs has not been achieved in blind challenges. We report a stepwise Monte Carlo (SWM) method with a unique add-and-delete move set that enables predictions of noncanonical base pairs of complex RNA structures. A benchmark of 82 diverse motifs establishes the method's general ability to recover noncanonical pairs ab initio, including multistrand motifs that have been refractory to prior approaches. In a blind challenge, SWM models predicted nucleotide-resolution chemical mapping and compensatory mutagenesis experiments for three in vitro selected tetraloop/receptors with previously unsolved structures (C7.2, C7.10, and R1). As a final test, SWM blindly and correctly predicted all noncanonical pairs of a Zika virus double pseudoknot during a recent community-wide RNA-Puzzle. Stepwise structure formation, as encoded in the SWM method, enables modeling of noncanonical RNA structure in a variety of previously intractable problems.

  11. A summary of wind power prediction methods

    NASA Astrophysics Data System (ADS)

    Wang, Yuqi

    2018-06-01

    The deterministic prediction of wind power, the probability prediction and the prediction of wind power ramp events are introduced in this paper. Deterministic prediction includes the prediction of statistical learning based on histor ical data and the prediction of physical models based on NWP data. Due to the great impact of wind power ramp events on the power system, this paper also introduces the prediction of wind power ramp events. At last, the evaluation indicators of all kinds of prediction are given. The prediction of wind power can be a good solution to the adverse effects of wind power on the power system due to the abrupt, intermittent and undulation of wind power.

  12. Debris-flow runout predictions based on the average channel slope (ACS)

    USGS Publications Warehouse

    Prochaska, A.B.; Santi, P.M.; Higgins, J.D.; Cannon, S.H.

    2008-01-01

    Prediction of the runout distance of a debris flow is an important element in the delineation of potentially hazardous areas on alluvial fans and for the siting of mitigation structures. Existing runout estimation methods rely on input parameters that are often difficult to estimate, including volume, velocity, and frictional factors. In order to provide a simple method for preliminary estimates of debris-flow runout distances, we developed a model that provides runout predictions based on the average channel slope (ACS model) for non-volcanic debris flows that emanate from confined channels and deposit on well-defined alluvial fans. This model was developed from 20 debris-flow events in the western United States and British Columbia. Based on a runout estimation method developed for snow avalanches, this model predicts debris-flow runout as an angle of reach from a fixed point in the drainage channel to the end of the runout zone. The best fixed point was found to be the mid-point elevation of the drainage channel, measured from the apex of the alluvial fan to the top of the drainage basin. Predicted runout lengths were more consistent than those obtained from existing angle-of-reach estimation methods. Results of the model compared well with those of laboratory flume tests performed using the same range of channel slopes. The robustness of this model was tested by applying it to three debris-flow events not used in its development: predicted runout ranged from 82 to 131% of the actual runout for these three events. Prediction interval multipliers were also developed so that the user may calculate predicted runout within specified confidence limits. ?? 2008 Elsevier B.V. All rights reserved.

  13. NWP model forecast skill optimization via closure parameter variations

    NASA Astrophysics Data System (ADS)

    Järvinen, H.; Ollinaho, P.; Laine, M.; Solonen, A.; Haario, H.

    2012-04-01

    We present results of a novel approach to tune predictive skill of numerical weather prediction (NWP) models. These models contain tunable parameters which appear in parameterizations schemes of sub-grid scale physical processes. The current practice is to specify manually the numerical parameter values, based on expert knowledge. We developed recently a concept and method (QJRMS 2011) for on-line estimation of the NWP model parameters via closure parameter variations. The method called EPPES ("Ensemble prediction and parameter estimation system") utilizes ensemble prediction infra-structure for parameter estimation in a very cost-effective way: practically no new computations are introduced. The approach provides an algorithmic decision making tool for model parameter optimization in operational NWP. In EPPES, statistical inference about the NWP model tunable parameters is made by (i) generating an ensemble of predictions so that each member uses different model parameter values, drawn from a proposal distribution, and (ii) feeding-back the relative merits of the parameter values to the proposal distribution, based on evaluation of a suitable likelihood function against verifying observations. In this presentation, the method is first illustrated in low-order numerical tests using a stochastic version of the Lorenz-95 model which effectively emulates the principal features of ensemble prediction systems. The EPPES method correctly detects the unknown and wrongly specified parameters values, and leads to an improved forecast skill. Second, results with an ensemble prediction system emulator, based on the ECHAM5 atmospheric GCM show that the model tuning capability of EPPES scales up to realistic models and ensemble prediction systems. Finally, preliminary results of EPPES in the context of ECMWF forecasting system are presented.

  14. Two Stochastic Phases of Tick-wise Price Fluctuation and the Price Prediction Generator

    NASA Astrophysics Data System (ADS)

    Tanaka-Yamawaki, Mieko; Tokuoka, Seiji

    2007-07-01

    We report in this paper the existence of two different stochastic phases in the tick-wise price fluctuations. Based on this observation, we improve our old method of developing the evolutional strategy to predict the direction of the tick-wise price movements. We obtain a stable predictive power even in the region where the old method had a difficulty.

  15. BepiPred-2.0: improving sequence-based B-cell epitope prediction using conformational epitopes

    PubMed Central

    Jespersen, Martin Closter; Peters, Bjoern

    2017-01-01

    Abstract Antibodies have become an indispensable tool for many biotechnological and clinical applications. They bind their molecular target (antigen) by recognizing a portion of its structure (epitope) in a highly specific manner. The ability to predict epitopes from antigen sequences alone is a complex task. Despite substantial effort, limited advancement has been achieved over the last decade in the accuracy of epitope prediction methods, especially for those that rely on the sequence of the antigen only. Here, we present BepiPred-2.0 (http://www.cbs.dtu.dk/services/BepiPred/), a web server for predicting B-cell epitopes from antigen sequences. BepiPred-2.0 is based on a random forest algorithm trained on epitopes annotated from antibody-antigen protein structures. This new method was found to outperform other available tools for sequence-based epitope prediction both on epitope data derived from solved 3D structures, and on a large collection of linear epitopes downloaded from the IEDB database. The method displays results in a user-friendly and informative way, both for computer-savvy and non-expert users. We believe that BepiPred-2.0 will be a valuable tool for the bioinformatics and immunology community. PMID:28472356

  16. Rainfall Prediction of Indian Peninsula: Comparison of Time Series Based Approach and Predictor Based Approach using Machine Learning Techniques

    NASA Astrophysics Data System (ADS)

    Dash, Y.; Mishra, S. K.; Panigrahi, B. K.

    2017-12-01

    Prediction of northeast/post monsoon rainfall which occur during October, November and December (OND) over Indian peninsula is a challenging task due to the dynamic nature of uncertain chaotic climate. It is imperative to elucidate this issue by examining performance of different machine leaning (ML) approaches. The prime objective of this research is to compare between a) statistical prediction using historical rainfall observations and global atmosphere-ocean predictors like Sea Surface Temperature (SST) and Sea Level Pressure (SLP) and b) empirical prediction based on a time series analysis of past rainfall data without using any other predictors. Initially, ML techniques have been applied on SST and SLP data (1948-2014) obtained from NCEP/NCAR reanalysis monthly mean provided by the NOAA ESRL PSD. Later, this study investigated the applicability of ML methods using OND rainfall time series for 1948-2014 and forecasted up to 2018. The predicted values of aforementioned methods were verified using observed time series data collected from Indian Institute of Tropical Meteorology and the result revealed good performance of ML algorithms with minimal error scores. Thus, it is found that both statistical and empirical methods are useful for long range climatic projections.

  17. Prediction of microbe-disease association from the integration of neighbor and graph with collaborative recommendation model.

    PubMed

    Huang, Yu-An; You, Zhu-Hong; Chen, Xing; Huang, Zhi-An; Zhang, Shanwen; Yan, Gui-Ying

    2017-10-16

    Accumulating clinical researches have shown that specific microbes with abnormal levels are closely associated with the development of various human diseases. Knowledge of microbe-disease associations can provide valuable insights for complex disease mechanism understanding as well as the prevention, diagnosis and treatment of various diseases. However, little effort has been made to predict microbial candidates for human complex diseases on a large scale. In this work, we developed a new computational model for predicting microbe-disease associations by combining two single recommendation methods. Based on the assumption that functionally similar microbes tend to get involved in the mechanism of similar disease, we adopted neighbor-based collaborative filtering and a graph-based scoring method to compute association possibility of microbe-disease pairs. The promising prediction performance could be attributed to the use of hybrid approach based on two single recommendation methods as well as the introduction of Gaussian kernel-based similarity and symptom-based disease similarity. To evaluate the performance of the proposed model, we implemented leave-one-out and fivefold cross validations on the HMDAD database, which is recently built as the first database collecting experimentally-confirmed microbe-disease associations. As a result, NGRHMDA achieved reliable results with AUCs of 0.9023 ± 0.0031 and 0.9111 in the validation frameworks of fivefold CV and LOOCV. In addition, 78.2% microbe samples and 66.7% disease samples are found to be consistent with the basic assumption of our work that microbes tend to get involved in the similar disease clusters, and vice versa. Compared with other methods, the prediction results yielded by NGRHMDA demonstrate its effective prediction performance for microbe-disease associations. It is anticipated that NGRHMDA can be used as a useful tool to search the most potential microbial candidates for various diseases, and therefore boosts the medical knowledge and drug development. The codes and dataset of our work can be downloaded from https://github.com/yahuang1991/NGRHMDA .

  18. Network Location-Aware Service Recommendation with Random Walk in Cyber-Physical Systems.

    PubMed

    Yin, Yuyu; Yu, Fangzheng; Xu, Yueshen; Yu, Lifeng; Mu, Jinglong

    2017-09-08

    Cyber-physical systems (CPS) have received much attention from both academia and industry. An increasing number of functions in CPS are provided in the way of services, which gives rise to an urgent task, that is, how to recommend the suitable services in a huge number of available services in CPS. In traditional service recommendation, collaborative filtering (CF) has been studied in academia, and used in industry. However, there exist several defects that limit the application of CF-based methods in CPS. One is that under the case of high data sparsity, CF-based methods are likely to generate inaccurate prediction results. In this paper, we discover that mining the potential similarity relations among users or services in CPS is really helpful to improve the prediction accuracy. Besides, most of traditional CF-based methods are only capable of using the service invocation records, but ignore the context information, such as network location, which is a typical context in CPS. In this paper, we propose a novel service recommendation method for CPS, which utilizes network location as context information and contains three prediction models using random walking. We conduct sufficient experiments on two real-world datasets, and the results demonstrate the effectiveness of our proposed methods and verify that the network location is indeed useful in QoS prediction.

  19. Toward Biopredictive Dissolution for Enteric Coated Dosage Forms.

    PubMed

    Al-Gousous, J; Amidon, G L; Langguth, P

    2016-06-06

    The aim of this work was to develop a phosphate buffer based dissolution method for enteric-coated formulations with improved biopredictivity for fasted conditions. Two commercially available enteric-coated aspirin products were used as model formulations (Aspirin Protect 300 mg, and Walgreens Aspirin 325 mg). The disintegration performance of these products in a physiological 8 mM pH 6.5 bicarbonate buffer (representing the conditions in the proximal small intestine) was used as a standard to optimize the employed phosphate buffer molarity. To account for the fact that a pH and buffer molarity gradient exists along the small intestine, the introduction of such a gradient was proposed for products with prolonged lag times (when it leads to a release lower than 75% in the first hour post acid stage) in the proposed buffer. This would allow the method also to predict the performance of later-disintegrating products. Dissolution performance using the accordingly developed method was compared to that observed when using two well-established dissolution methods: the United States Pharmacopeia (USP) method and blank fasted state simulated intestinal fluid (FaSSIF). The resulting dissolution profiles were convoluted using GastroPlus software to obtain predicted pharmacokinetic profiles. A pharmacokinetic study on healthy human volunteers was performed to evaluate the predictions made by the different dissolution setups. The novel method provided the best prediction, by a relatively wide margin, for the difference between the lag times of the two tested formulations, indicating its being able to predict the post gastric emptying onset of drug release with reasonable accuracy. Both the new and the blank FaSSIF methods showed potential for establishing in vitro-in vivo correlation (IVIVC) concerning the prediction of Cmax and AUC0-24 (prediction errors not more than 20%). However, these predictions are strongly affected by the highly variable first pass metabolism necessitating the evaluation of an absorption rate metric that is more independent of the first-pass effect. The Cmax/AUC0-24 ratio was selected for this purpose. Regarding this metric's predictions, the new method provided very good prediction of the two products' performances relative to each other (only 1.05% prediction error in this regard), while its predictions for the individual products' values in absolute terms were borderline, narrowly missing the regulatory 20% prediction error limits (21.51% for Aspirin Protect and 22.58% for Walgreens Aspirin). The blank FaSSIF-based method provided good Cmax/AUC0-24 ratio prediction, in absolute terms, for Aspirin Protect (9.05% prediction error), but its prediction for Walgreens Aspirin (33.97% prediction error) was overwhelmingly poor. Thus it gave practically the same average but much higher maximum prediction errors compared to the new method, and it was strongly overdiscriminating as for predicting their performances relative to one another. The USP method, despite not being overdiscriminating, provided poor predictions of the individual products' Cmax/AUC0-24 ratios. This indicates that, overall, the new method is of improved biopredictivity compared to established methods.

  20. K-nearest neighbors based methods for identification of different gear crack levels under different motor speeds and loads: Revisited

    NASA Astrophysics Data System (ADS)

    Wang, Dong

    2016-03-01

    Gears are the most commonly used components in mechanical transmission systems. Their failures may cause transmission system breakdown and result in economic loss. Identification of different gear crack levels is important to prevent any unexpected gear failure because gear cracks lead to gear tooth breakage. Signal processing based methods mainly require expertize to explain gear fault signatures which is usually not easy to be achieved by ordinary users. In order to automatically identify different gear crack levels, intelligent gear crack identification methods should be developed. The previous case studies experimentally proved that K-nearest neighbors based methods exhibit high prediction accuracies for identification of 3 different gear crack levels under different motor speeds and loads. In this short communication, to further enhance prediction accuracies of existing K-nearest neighbors based methods and extend identification of 3 different gear crack levels to identification of 5 different gear crack levels, redundant statistical features are constructed by using Daubechies 44 (db44) binary wavelet packet transform at different wavelet decomposition levels, prior to the use of a K-nearest neighbors method. The dimensionality of redundant statistical features is 620, which provides richer gear fault signatures. Since many of these statistical features are redundant and highly correlated with each other, dimensionality reduction of redundant statistical features is conducted to obtain new significant statistical features. At last, the K-nearest neighbors method is used to identify 5 different gear crack levels under different motor speeds and loads. A case study including 3 experiments is investigated to demonstrate that the developed method provides higher prediction accuracies than the existing K-nearest neighbors based methods for recognizing different gear crack levels under different motor speeds and loads. Based on the new significant statistical features, some other popular statistical models including linear discriminant analysis, quadratic discriminant analysis, classification and regression tree and naive Bayes classifier, are compared with the developed method. The results show that the developed method has the highest prediction accuracies among these statistical models. Additionally, selection of the number of new significant features and parameter selection of K-nearest neighbors are thoroughly investigated.

  1. Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences.

    PubMed

    Mizianty, Marcin J; Kurgan, Lukasz

    2009-12-13

    Knowledge of structural class is used by numerous methods for identification of structural/functional characteristics of proteins and could be used for the detection of remote homologues, particularly for chains that share twilight-zone similarity. In contrast to existing sequence-based structural class predictors, which target four major classes and which are designed for high identity sequences, we predict seven classes from sequences that share twilight-zone identity with the training sequences. The proposed MODular Approach to Structural class prediction (MODAS) method is unique as it allows for selection of any subset of the classes. MODAS is also the first to utilize a novel, custom-built feature-based sequence representation that combines evolutionary profiles and predicted secondary structure. The features quantify information relevant to the definition of the classes including conservation of residues and arrangement and number of helix/strand segments. Our comprehensive design considers 8 feature selection methods and 4 classifiers to develop Support Vector Machine-based classifiers that are tailored for each of the seven classes. Tests on 5 twilight-zone and 1 high-similarity benchmark datasets and comparison with over two dozens of modern competing predictors show that MODAS provides the best overall accuracy that ranges between 80% and 96.7% (83.5% for the twilight-zone datasets), depending on the dataset. This translates into 19% and 8% error rate reduction when compared against the best performing competing method on two largest datasets. The proposed predictor provides accurate predictions at 58% accuracy for membrane proteins class, which is not considered by majority of existing methods, in spite that this class accounts for only 2% of the data. Our predictive model is analyzed to demonstrate how and why the input features are associated with the corresponding classes. The improved predictions stem from the novel features that express collocation of the secondary structure segments in the protein sequence and that combine evolutionary and secondary structure information. Our work demonstrates that conservation and arrangement of the secondary structure segments predicted along the protein chain can successfully predict structural classes which are defined based on the spatial arrangement of the secondary structures. A web server is available at http://biomine.ece.ualberta.ca/MODAS/.

  2. Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences

    PubMed Central

    2009-01-01

    Background Knowledge of structural class is used by numerous methods for identification of structural/functional characteristics of proteins and could be used for the detection of remote homologues, particularly for chains that share twilight-zone similarity. In contrast to existing sequence-based structural class predictors, which target four major classes and which are designed for high identity sequences, we predict seven classes from sequences that share twilight-zone identity with the training sequences. Results The proposed MODular Approach to Structural class prediction (MODAS) method is unique as it allows for selection of any subset of the classes. MODAS is also the first to utilize a novel, custom-built feature-based sequence representation that combines evolutionary profiles and predicted secondary structure. The features quantify information relevant to the definition of the classes including conservation of residues and arrangement and number of helix/strand segments. Our comprehensive design considers 8 feature selection methods and 4 classifiers to develop Support Vector Machine-based classifiers that are tailored for each of the seven classes. Tests on 5 twilight-zone and 1 high-similarity benchmark datasets and comparison with over two dozens of modern competing predictors show that MODAS provides the best overall accuracy that ranges between 80% and 96.7% (83.5% for the twilight-zone datasets), depending on the dataset. This translates into 19% and 8% error rate reduction when compared against the best performing competing method on two largest datasets. The proposed predictor provides accurate predictions at 58% accuracy for membrane proteins class, which is not considered by majority of existing methods, in spite that this class accounts for only 2% of the data. Our predictive model is analyzed to demonstrate how and why the input features are associated with the corresponding classes. Conclusions The improved predictions stem from the novel features that express collocation of the secondary structure segments in the protein sequence and that combine evolutionary and secondary structure information. Our work demonstrates that conservation and arrangement of the secondary structure segments predicted along the protein chain can successfully predict structural classes which are defined based on the spatial arrangement of the secondary structures. A web server is available at http://biomine.ece.ualberta.ca/MODAS/. PMID:20003388

  3. A Recursive Partitioning Method for the Prediction of Preference Rankings Based Upon Kemeny Distances.

    PubMed

    D'Ambrosio, Antonio; Heiser, Willem J

    2016-09-01

    Preference rankings usually depend on the characteristics of both the individuals judging a set of objects and the objects being judged. This topic has been handled in the literature with log-linear representations of the generalized Bradley-Terry model and, recently, with distance-based tree models for rankings. A limitation of these approaches is that they only work with full rankings or with a pre-specified pattern governing the presence of ties, and/or they are based on quite strict distributional assumptions. To overcome these limitations, we propose a new prediction tree method for ranking data that is totally distribution-free. It combines Kemeny's axiomatic approach to define a unique distance between rankings with the CART approach to find a stable prediction tree. Furthermore, our method is not limited by any particular design of the pattern of ties. The method is evaluated in an extensive full-factorial Monte Carlo study with a new simulation design.

  4. Hierarchical Ensemble Methods for Protein Function Prediction

    PubMed Central

    2014-01-01

    Protein function prediction is a complex multiclass multilabel classification problem, characterized by multiple issues such as the incompleteness of the available annotations, the integration of multiple sources of high dimensional biomolecular data, the unbalance of several functional classes, and the difficulty of univocally determining negative examples. Moreover, the hierarchical relationships between functional classes that characterize both the Gene Ontology and FunCat taxonomies motivate the development of hierarchy-aware prediction methods that showed significantly better performances than hierarchical-unaware “flat” prediction methods. In this paper, we provide a comprehensive review of hierarchical methods for protein function prediction based on ensembles of learning machines. According to this general approach, a separate learning machine is trained to learn a specific functional term and then the resulting predictions are assembled in a “consensus” ensemble decision, taking into account the hierarchical relationships between classes. The main hierarchical ensemble methods proposed in the literature are discussed in the context of existing computational methods for protein function prediction, highlighting their characteristics, advantages, and limitations. Open problems of this exciting research area of computational biology are finally considered, outlining novel perspectives for future research. PMID:25937954

  5. Predictive equation of state method for heavy materials based on the Dirac equation and density functional theory

    NASA Astrophysics Data System (ADS)

    Wills, John M.; Mattsson, Ann E.

    2012-02-01

    Density functional theory (DFT) provides a formally predictive base for equation of state properties. Available approximations to the exchange/correlation functional provide accurate predictions for many materials in the periodic table. For heavy materials however, DFT calculations, using available functionals, fail to provide quantitative predictions, and often fail to be even qualitative. This deficiency is due both to the lack of the appropriate confinement physics in the exchange/correlation functional and to approximations used to evaluate the underlying equations. In order to assess and develop accurate functionals, it is essential to eliminate all other sources of error. In this talk we describe an efficient first-principles electronic structure method based on the Dirac equation and compare the results obtained with this method with other methods generally used. Implications for high-pressure equation of state of relativistic materials are demonstrated in application to Ce and the light actinides. Sandia National Laboratories is a multi-program laboratory managed andoperated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.

  6. Novel Method of Production Decline Analysis

    NASA Astrophysics Data System (ADS)

    Xie, Shan; Lan, Yifei; He, Lei; Jiao, Yang; Wu, Yong

    2018-02-01

    ARPS decline curves is the most commonly used in oil and gas field due to its minimal data requirements and ease application. And prediction of production decline which is based on ARPS analysis rely on known decline type. However, when coefficient index are very approximate under different decline type, it is difficult to directly recognize decline trend of matched curves. Due to difficulties above, based on simulation results of multi-factor response experiments, a new dynamic decline prediction model is introduced with using multiple linear regression of influence factors. First of all, according to study of effect factors of production decline, interaction experimental schemes are designed. Based on simulated results, annual decline rate is predicted by decline model. Moreover, the new method is applied in A gas filed of Ordos Basin as example to illustrate reliability. The result commit that the new model can directly predict decline tendency without needing recognize decline style. From arithmetic aspect, it also take advantage of high veracity. Finally, the new method improves the evaluation method of gas well production decline in low permeability gas reservoir, which also provides technical support for further understanding of tight gas field development laws.

  7. Development of glucose measurement system based on pulsed laser-induced ultrasonic method

    NASA Astrophysics Data System (ADS)

    Ren, Zhong; Wan, Bin; Liu, Guodong; Xiong, Zhihua

    2016-09-01

    In this study, a kind of glucose measurement system based on pulsed-induced ultrasonic technique was established. In this system, the lateral detection mode was used, the Nd: YAG pumped optical parametric oscillator (OPO) pulsed laser was used as the excitation source, the high sensitivity ultrasonic transducer was used as the signal detector to capture the photoacoustic signals of the glucose. In the experiments, the real-time photoacoustic signals of glucose aqueous solutions with different concentrations were captured by ultrasonic transducer and digital oscilloscope. Moreover, the photoacoustic peak-to-peak values were gotten in the wavelength range from 1300nm to 2300nm. The characteristic absorption wavelengths of glucose were determined via the difference spectral method and second derivative method. In addition, the prediction models of predicting glucose concentrations were established via the multivariable linear regression algorithm and the optimal prediction model of corresponding optimal wavelengths. Results showed that the performance of the glucose system based on the pulsed-induced ultrasonic detection method was feasible. Therefore, the measurement scheme and prediction model have some potential value in the fields of non-invasive monitoring the concentration of the glucose gradient, especially in the food safety and biomedical fields.

  8. Systems-based biological concordance and predictive reproducibility of gene set discovery methods in cardiovascular disease.

    PubMed

    Azuaje, Francisco; Zheng, Huiru; Camargo, Anyela; Wang, Haiying

    2011-08-01

    The discovery of novel disease biomarkers is a crucial challenge for translational bioinformatics. Demonstration of both their classification power and reproducibility across independent datasets are essential requirements to assess their potential clinical relevance. Small datasets and multiplicity of putative biomarker sets may explain lack of predictive reproducibility. Studies based on pathway-driven discovery approaches have suggested that, despite such discrepancies, the resulting putative biomarkers tend to be implicated in common biological processes. Investigations of this problem have been mainly focused on datasets derived from cancer research. We investigated the predictive and functional concordance of five methods for discovering putative biomarkers in four independently-generated datasets from the cardiovascular disease domain. A diversity of biosignatures was identified by the different methods. However, we found strong biological process concordance between them, especially in the case of methods based on gene set analysis. With a few exceptions, we observed lack of classification reproducibility using independent datasets. Partial overlaps between our putative sets of biomarkers and the primary studies exist. Despite the observed limitations, pathway-driven or gene set analysis can predict potentially novel biomarkers and can jointly point to biomedically-relevant underlying molecular mechanisms. Copyright © 2011 Elsevier Inc. All rights reserved.

  9. Adaptive Granulation-Based Prediction for Energy System of Steel Industry.

    PubMed

    Wang, Tianyu; Han, Zhongyang; Zhao, Jun; Wang, Wei

    2018-01-01

    The flow variation tendency of byproduct gas plays a crucial role for energy scheduling in steel industry. An accurate prediction of its future trends will be significantly beneficial for the economic profits of steel enterprise. In this paper, a long-term prediction model for the energy system is proposed by providing an adaptive granulation-based method that considers the production semantics involved in the fluctuation tendency of the energy data, and partitions them into a series of information granules. To fully reflect the corresponding data characteristics of the formed unequal-length temporal granules, a 3-D feature space consisting of the timespan, the amplitude and the linetype is designed as linguistic descriptors. In particular, a collaborative-conditional fuzzy clustering method is proposed to granularize the tendency-based feature descriptors and specifically measure the amplitude variation of industrial data which plays a dominant role in the feature space. To quantify the performance of the proposed method, a series of real-world industrial data coming from the energy data center of a steel plant is employed to conduct the comparative experiments. The experimental results demonstrate that the proposed method successively satisfies the requirements of the practically viable prediction.

  10. Prediction of sonic boom from experimental near-field overpressure data. Volume 2: Data base construction

    NASA Technical Reports Server (NTRS)

    Glatt, C. R.; Reiners, S. J.; Hague, D. S.

    1975-01-01

    A computerized method for storing, updating and augmenting experimentally determined overpressure signatures has been developed. A data base of pressure signatures for a shuttle type vehicle has been stored. The data base has been used for the prediction of sonic boom with the program described in Volume I.

  11. Interlinking backscatter, grain size and benthic community structure

    NASA Astrophysics Data System (ADS)

    McGonigle, Chris; Collier, Jenny S.

    2014-06-01

    The relationship between acoustic backscatter, sediment grain size and benthic community structure is examined using three different quantitative methods, covering image- and angular response-based approaches. Multibeam time-series backscatter (300 kHz) data acquired in 2008 off the coast of East Anglia (UK) are compared with grain size properties, macrofaunal abundance and biomass from 130 Hamon and 16 Clamshell grab samples. Three predictive methods are used: 1) image-based (mean backscatter intensity); 2) angular response-based (predicted mean grain size), and 3) image-based (1st principal component and classification) from Quester Tangent Corporation Multiview software. Relationships between grain size and backscatter are explored using linear regression. Differences in grain size and benthic community structure between acoustically defined groups are examined using ANOVA and PERMANOVA+. Results for the Hamon grab stations indicate significant correlations between measured mean grain size and mean backscatter intensity, angular response predicted mean grain size, and 1st principal component of QTC analysis (all p < 0.001). Results for the Clamshell grab for two of the methods have stronger positive correlations; mean backscatter intensity (r2 = 0.619; p < 0.001) and angular response predicted mean grain size (r2 = 0.692; p < 0.001). ANOVA reveals significant differences in mean grain size (Hamon) within acoustic groups for all methods: mean backscatter (p < 0.001), angular response predicted grain size (p < 0.001), and QTC class (p = 0.009). Mean grain size (Clamshell) shows a significant difference between groups for mean backscatter (p = 0.001); other methods were not significant. PERMANOVA for the Hamon abundance shows benthic community structure was significantly different between acoustic groups for all methods (p ≤ 0.001). Overall these results show considerable promise in that more than 60% of the variance in the mean grain size of the Clamshell grab samples can be explained by mean backscatter or acoustically-predicted grain size. These results show that there is significant predictive capacity for sediment characteristics from multibeam backscatter and that these acoustic classifications can have ecological validity.

  12. Evaluation of predictive capacities of biomarkers based on research synthesis.

    PubMed

    Hattori, Satoshi; Zhou, Xiao-Hua

    2016-11-10

    The objective of diagnostic studies or prognostic studies is to evaluate and compare predictive capacities of biomarkers. Suppose we are interested in evaluation and comparison of predictive capacities of continuous biomarkers for a binary outcome based on research synthesis. In analysis of each study, subjects are often classified into two groups of the high-expression and low-expression groups according to a cut-off value, and statistical analysis is based on a 2 × 2 table defined by the response and the high expression or low expression of the biomarker. Because the cut-off is study specific, it is difficult to interpret a combined summary measure such as an odds ratio based on the standard meta-analysis techniques. The summary receiver operating characteristic curve is a useful method for meta-analysis of diagnostic studies in the presence of heterogeneity of cut-off values to examine discriminative capacities of biomarkers. We develop a method to estimate positive or negative predictive curves, which are alternative to the receiver operating characteristic curve based on information reported in published papers of each study. These predictive curves provide a useful graphical presentation of pairs of positive and negative predictive values and allow us to compare predictive capacities of biomarkers of different scales in the presence of heterogeneity in cut-off values among studies. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  13. [Research on engine remaining useful life prediction based on oil spectrum analysis and particle filtering].

    PubMed

    Sun, Lei; Jia, Yun-xian; Cai, Li-ying; Lin, Guo-yu; Zhao, Jin-song

    2013-09-01

    The spectrometric oil analysis(SOA) is an important technique for machine state monitoring, fault diagnosis and prognosis, and SOA based remaining useful life(RUL) prediction has an advantage of finding out the optimal maintenance strategy for machine system. Because the complexity of machine system, its health state degradation process can't be simply characterized by linear model, while particle filtering(PF) possesses obvious advantages over traditional Kalman filtering for dealing nonlinear and non-Gaussian system, the PF approach was applied to state forecasting by SOA, and the RUL prediction technique based on SOA and PF algorithm is proposed. In the prediction model, according to the estimating result of system's posterior probability, its prior probability distribution is realized, and the multi-step ahead prediction model based on PF algorithm is established. Finally, the practical SOA data of some engine was analyzed and forecasted by the above method, and the forecasting result was compared with that of traditional Kalman filtering method. The result fully shows the superiority and effectivity of the

  14. A method for fast safety screening of explosives in terms of crystal packing and molecular stability.

    PubMed

    Hu, Xiaohua; Chen, Nana; Li, Weichen

    2016-07-01

    Safety prediction is crucial to the molecular design or the material design of explosives, and the predictions based on any single factor alone will cause much inaccuracy, leading to a desire for a method on multi-bases. The presented proposes an improved method for fast screening explosive safety by combining a crystal packing factor and a molecular one, that is, steric hindrance against shear slide in crystal and molecular stability, denoted by intermolecular friction symbol (IFS) and bond dissociation energy (BDE) of trigger linkage respectively. Employing this BDE-IFS combined method, we understand the impact sensitivities of 24 existing explosives, and predict those of two energetic-energetic cocrystals of the observed CL-20/BTF and the supposed HMX/TATB. As a result, a better understanding is implemented by the combined method relative to molecular stability alone, verifying its improvement of more accurate predictions and the feasibility of IFS to graphically reflect molecular stacking in crystals. Also, this work verifies that the explosive safety is strongly related with its crystal stacking, which determines steric hindrance and influences shear slide.

  15. Efficient and accurate two-scale FE-FFT-based prediction of the effective material behavior of elasto-viscoplastic polycrystals

    NASA Astrophysics Data System (ADS)

    Kochmann, Julian; Wulfinghoff, Stephan; Ehle, Lisa; Mayer, Joachim; Svendsen, Bob; Reese, Stefanie

    2018-06-01

    Recently, two-scale FE-FFT-based methods (e.g., Spahn et al. in Comput Methods Appl Mech Eng 268:871-883, 2014; Kochmann et al. in Comput Methods Appl Mech Eng 305:89-110, 2016) have been proposed to predict the microscopic and overall mechanical behavior of heterogeneous materials. The purpose of this work is the extension to elasto-viscoplastic polycrystals, efficient and robust Fourier solvers and the prediction of micromechanical fields during macroscopic deformation processes. Assuming scale separation, the macroscopic problem is solved using the finite element method. The solution of the microscopic problem, which is embedded as a periodic unit cell (UC) in each macroscopic integration point, is found by employing fast Fourier transforms, fixed-point and Newton-Krylov methods. The overall material behavior is defined by the mean UC response. In order to ensure spatially converged micromechanical fields as well as feasible overall CPU times, an efficient but simple solution strategy for two-scale simulations is proposed. As an example, the constitutive behavior of 42CrMo4 steel is predicted during macroscopic three-point bending tests.

  16. Efficient and accurate two-scale FE-FFT-based prediction of the effective material behavior of elasto-viscoplastic polycrystals

    NASA Astrophysics Data System (ADS)

    Kochmann, Julian; Wulfinghoff, Stephan; Ehle, Lisa; Mayer, Joachim; Svendsen, Bob; Reese, Stefanie

    2017-09-01

    Recently, two-scale FE-FFT-based methods (e.g., Spahn et al. in Comput Methods Appl Mech Eng 268:871-883, 2014; Kochmann et al. in Comput Methods Appl Mech Eng 305:89-110, 2016) have been proposed to predict the microscopic and overall mechanical behavior of heterogeneous materials. The purpose of this work is the extension to elasto-viscoplastic polycrystals, efficient and robust Fourier solvers and the prediction of micromechanical fields during macroscopic deformation processes. Assuming scale separation, the macroscopic problem is solved using the finite element method. The solution of the microscopic problem, which is embedded as a periodic unit cell (UC) in each macroscopic integration point, is found by employing fast Fourier transforms, fixed-point and Newton-Krylov methods. The overall material behavior is defined by the mean UC response. In order to ensure spatially converged micromechanical fields as well as feasible overall CPU times, an efficient but simple solution strategy for two-scale simulations is proposed. As an example, the constitutive behavior of 42CrMo4 steel is predicted during macroscopic three-point bending tests.

  17. Genomic selection in plant breeding

    USDA-ARS?s Scientific Manuscript database

    Genomic selection (GS) is a method to predict the genetic value of selection candidates based on the genomic estimated breeding value (GEBV) predicted from high-density markers positioned throughout the genome. Unlike marker-assisted selection, the GEBV is based on all markers including both minor ...

  18. Aerodynamic characteristics of cruciform missiles at high angles of attack

    NASA Technical Reports Server (NTRS)

    Lesieutre, Daniel J.; Mendenhall, Michael R.; Nazario, Susana M.; Hemsch, Michael J.

    1987-01-01

    An aerodynamic prediction method for missile aerodynamic performance and preliminary design has been developed to utilize a newly available systematic fin data base and an improved equivalent angle of attack methodology. The method predicts total aerodynamic loads and individual fin forces and moments for body-tail (wing-body) and canard-body-tail configurations with cruciform fin arrangements. The data base and the prediction method are valid for angles of attack up to 45 deg, arbitrary roll angles, fin deflection angles between -40 deg and 40 deg, Mach numbers between 0.6 and 4.5, and fin aspect ratios between 0.25 and 4.0. The equivalent angle of attack concept is employed to include the effects of vorticity and geometric scaling.

  19. New method for probabilistic traffic demand predictions for en route sectors based on uncertain predictions of individual flight events.

    DOT National Transportation Integrated Search

    2011-06-14

    This paper presents a novel analytical approach to and techniques for translating characteristics of uncertainty in predicting sector entry times and times in sector for individual flights into characteristics of uncertainty in predicting one-minute ...

  20. An Efficient Scheme for Crystal Structure Prediction Based on Structural Motifs

    DOE PAGES

    Zhu, Zizhong; Wu, Ping; Wu, Shunqing; ...

    2017-05-15

    An efficient scheme based on structural motifs is proposed for the crystal structure prediction of materials. The key advantage of the present method comes in two fold: first, the degrees of freedom of the system are greatly reduced, since each structural motif, regardless of its size, can always be described by a set of parameters (R, θ, φ) with five degrees of freedom; second, the motifs could always appear in the predicted structures when the energies of the structures are relatively low. Both features make the present scheme a very efficient method for predicting desired materials. The method has beenmore » applied to the case of LiFePO 4, an important cathode material for lithium-ion batteries. Numerous new structures of LiFePO 4 have been found, compared to those currently available, available, demonstrating the reliability of the present methodology and illustrating the promise of the concept of structural motifs.« less

  1. In silico prediction of cytochrome P450-mediated drug metabolism.

    PubMed

    Zhang, Tao; Chen, Qi; Li, Li; Liu, Limin Angela; Wei, Dong-Qing

    2011-06-01

    The application of combinatorial chemistry and high-throughput screening technique enables the large number of chemicals to be generated and tested simultaneously, which will facilitate the drug development and discovery. At the same time, it brings about a challenge of how to efficiently identify the potential drug candidates from thousands of compounds. A way used to deal with the challenge is to consider the drug pharmacokinetic properties, such as absorption, distribution, metabolism and excretion (ADME), in the early stage of drug development. Among ADME properties, metabolism is of importance due to the strong association with efficacy and safety of drug. The review will focus on in silico approaches for prediction of Cytochrome P450-mediated drug metabolism. We will describe these predictive methods from two aspects, structure-based and data-based. Moreover, the applications and limitations of various methods will be discussed. Finally, we provide further direction toward improving the predictive accuracy of these in silico methods.

  2. Prediction of missing common genes for disease pairs using network based module separation on incomplete human interactome.

    PubMed

    Akram, Pakeeza; Liao, Li

    2017-12-06

    Identification of common genes associated with comorbid diseases can be critical in understanding their pathobiological mechanism. This work presents a novel method to predict missing common genes associated with a disease pair. Searching for missing common genes is formulated as an optimization problem to minimize network based module separation from two subgraphs produced by mapping genes associated with disease onto the interactome. Using cross validation on more than 600 disease pairs, our method achieves significantly higher average receiver operating characteristic ROC Score of 0.95 compared to a baseline ROC score 0.60 using randomized data. Missing common genes prediction is aimed to complete gene set associated with comorbid disease for better understanding of biological intervention. It will also be useful for gene targeted therapeutics related to comorbid diseases. This method can be further considered for prediction of missing edges to complete the subgraph associated with disease pair.

  3. A community computational challenge to predict the activity of pairs of compounds.

    PubMed

    Bansal, Mukesh; Yang, Jichen; Karan, Charles; Menden, Michael P; Costello, James C; Tang, Hao; Xiao, Guanghua; Li, Yajuan; Allen, Jeffrey; Zhong, Rui; Chen, Beibei; Kim, Minsoo; Wang, Tao; Heiser, Laura M; Realubit, Ronald; Mattioli, Michela; Alvarez, Mariano J; Shen, Yao; Gallahan, Daniel; Singer, Dinah; Saez-Rodriguez, Julio; Xie, Yang; Stolovitzky, Gustavo; Califano, Andrea

    2014-12-01

    Recent therapeutic successes have renewed interest in drug combinations, but experimental screening approaches are costly and often identify only small numbers of synergistic combinations. The DREAM consortium launched an open challenge to foster the development of in silico methods to computationally rank 91 compound pairs, from the most synergistic to the most antagonistic, based on gene-expression profiles of human B cells treated with individual compounds at multiple time points and concentrations. Using scoring metrics based on experimental dose-response curves, we assessed 32 methods (31 community-generated approaches and SynGen), four of which performed significantly better than random guessing. We highlight similarities between the methods. Although the accuracy of predictions was not optimal, we find that computational prediction of compound-pair activity is possible, and that community challenges can be useful to advance the field of in silico compound-synergy prediction.

  4. An Efficient Scheme for Crystal Structure Prediction Based on Structural Motifs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhu, Zizhong; Wu, Ping; Wu, Shunqing

    An efficient scheme based on structural motifs is proposed for the crystal structure prediction of materials. The key advantage of the present method comes in two fold: first, the degrees of freedom of the system are greatly reduced, since each structural motif, regardless of its size, can always be described by a set of parameters (R, θ, φ) with five degrees of freedom; second, the motifs could always appear in the predicted structures when the energies of the structures are relatively low. Both features make the present scheme a very efficient method for predicting desired materials. The method has beenmore » applied to the case of LiFePO 4, an important cathode material for lithium-ion batteries. Numerous new structures of LiFePO 4 have been found, compared to those currently available, available, demonstrating the reliability of the present methodology and illustrating the promise of the concept of structural motifs.« less

  5. Protein docking prediction using predicted protein-protein interface.

    PubMed

    Li, Bin; Kihara, Daisuke

    2012-01-10

    Many important cellular processes are carried out by protein complexes. To provide physical pictures of interacting proteins, many computational protein-protein prediction methods have been developed in the past. However, it is still difficult to identify the correct docking complex structure within top ranks among alternative conformations. We present a novel protein docking algorithm that utilizes imperfect protein-protein binding interface prediction for guiding protein docking. Since the accuracy of protein binding site prediction varies depending on cases, the challenge is to develop a method which does not deteriorate but improves docking results by using a binding site prediction which may not be 100% accurate. The algorithm, named PI-LZerD (using Predicted Interface with Local 3D Zernike descriptor-based Docking algorithm), is based on a pair wise protein docking prediction algorithm, LZerD, which we have developed earlier. PI-LZerD starts from performing docking prediction using the provided protein-protein binding interface prediction as constraints, which is followed by the second round of docking with updated docking interface information to further improve docking conformation. Benchmark results on bound and unbound cases show that PI-LZerD consistently improves the docking prediction accuracy as compared with docking without using binding site prediction or using the binding site prediction as post-filtering. We have developed PI-LZerD, a pairwise docking algorithm, which uses imperfect protein-protein binding interface prediction to improve docking accuracy. PI-LZerD consistently showed better prediction accuracy over alternative methods in the series of benchmark experiments including docking using actual docking interface site predictions as well as unbound docking cases.

  6. Sorting protein decoys by machine-learning-to-rank

    PubMed Central

    Jing, Xiaoyang; Wang, Kai; Lu, Ruqian; Dong, Qiwen

    2016-01-01

    Much progress has been made in Protein structure prediction during the last few decades. As the predicted models can span a broad range of accuracy spectrum, the accuracy of quality estimation becomes one of the key elements of successful protein structure prediction. Over the past years, a number of methods have been developed to address this issue, and these methods could be roughly divided into three categories: the single-model methods, clustering-based methods and quasi single-model methods. In this study, we develop a single-model method MQAPRank based on the learning-to-rank algorithm firstly, and then implement a quasi single-model method Quasi-MQAPRank. The proposed methods are benchmarked on the 3DRobot and CASP11 dataset. The five-fold cross-validation on the 3DRobot dataset shows the proposed single model method outperforms other methods whose outputs are taken as features of the proposed method, and the quasi single-model method can further enhance the performance. On the CASP11 dataset, the proposed methods also perform well compared with other leading methods in corresponding categories. In particular, the Quasi-MQAPRank method achieves a considerable performance on the CASP11 Best150 dataset. PMID:27530967

  7. Sorting protein decoys by machine-learning-to-rank.

    PubMed

    Jing, Xiaoyang; Wang, Kai; Lu, Ruqian; Dong, Qiwen

    2016-08-17

    Much progress has been made in Protein structure prediction during the last few decades. As the predicted models can span a broad range of accuracy spectrum, the accuracy of quality estimation becomes one of the key elements of successful protein structure prediction. Over the past years, a number of methods have been developed to address this issue, and these methods could be roughly divided into three categories: the single-model methods, clustering-based methods and quasi single-model methods. In this study, we develop a single-model method MQAPRank based on the learning-to-rank algorithm firstly, and then implement a quasi single-model method Quasi-MQAPRank. The proposed methods are benchmarked on the 3DRobot and CASP11 dataset. The five-fold cross-validation on the 3DRobot dataset shows the proposed single model method outperforms other methods whose outputs are taken as features of the proposed method, and the quasi single-model method can further enhance the performance. On the CASP11 dataset, the proposed methods also perform well compared with other leading methods in corresponding categories. In particular, the Quasi-MQAPRank method achieves a considerable performance on the CASP11 Best150 dataset.

  8. Gross domestic product estimation based on electricity utilization by artificial neural network

    NASA Astrophysics Data System (ADS)

    Stevanović, Mirjana; Vujičić, Slađana; Gajić, Aleksandar M.

    2018-01-01

    The main goal of the paper was to estimate gross domestic product (GDP) based on electricity estimation by artificial neural network (ANN). The electricity utilization was analyzed based on different sources like renewable, coal and nuclear sources. The ANN network was trained with two training algorithms namely extreme learning method and back-propagation algorithm in order to produce the best prediction results of the GDP. According to the results it can be concluded that the ANN model with extreme learning method could produce the acceptable prediction of the GDP based on the electricity utilization.

  9. Prediction of potential disease-associated microRNAs based on random walk.

    PubMed

    Xuan, Ping; Han, Ke; Guo, Yahong; Li, Jin; Li, Xia; Zhong, Yingli; Zhang, Zhaogong; Ding, Jian

    2015-06-01

    Identifying microRNAs associated with diseases (disease miRNAs) is helpful for exploring the pathogenesis of diseases. Because miRNAs fulfill function via the regulation of their target genes and because the current number of experimentally validated targets is insufficient, some existing methods have inferred potential disease miRNAs based on the predicted targets. It is difficult for these methods to achieve excellent performance due to the high false-positive and false-negative rates for the target prediction results. Alternatively, several methods have constructed a network composed of miRNAs based on their associated diseases and have exploited the information within the network to predict the disease miRNAs. However, these methods have failed to take into account the prior information regarding the network nodes and the respective local topological structures of the different categories of nodes. Therefore, it is essential to develop a method that exploits the more useful information to predict reliable disease miRNA candidates. miRNAs with similar functions are normally associated with similar diseases and vice versa. Therefore, the functional similarity between a pair of miRNAs is calculated based on their associated diseases to construct a miRNA network. We present a new prediction method based on random walk on the network. For the diseases with some known related miRNAs, the network nodes are divided into labeled nodes and unlabeled nodes, and the transition matrices are established for the two categories of nodes. Furthermore, different categories of nodes have different transition weights. In this way, the prior information of nodes can be completely exploited. Simultaneously, the various ranges of topologies around the different categories of nodes are integrated. In addition, how far the walker can go away from the labeled nodes is controlled by restarting the walking. This is helpful for relieving the negative effect of noisy data. For the diseases without any known related miRNAs, we extend the walking on a miRNA-disease bilayer network. During the prediction process, the similarity between diseases, the similarity between miRNAs, the known miRNA-disease associations and the topology information of the bilayer network are exploited. Moreover, the importance of information from different layers of network is considered. Our method achieves superior performance for 18 human diseases with AUC values ranging from 0.786 to 0.945. Moreover, case studies on breast neoplasms, lung neoplasms, prostatic neoplasms and 32 diseases further confirm the ability of our method to discover potential disease miRNAs. A web service for the prediction and analysis of disease miRNAs is available at http://bioinfolab.stx.hk/midp/. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  10. Predicting bioconcentration of chemicals into vegetation from soil or air using the molecular connectivity index

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dowdy, D.L.; McKone, T.E.; Hsieh, D.P.H.

    1995-12-31

    Bioconcentration factors (BCFs) are the ratio of chemical concentration found in an exposed organism (in this case a plant) to the concentration in an air or soil exposure medium. The authors examine here the use of molecular connectivity indices (MCIs) as quantitative structure-activity relationships (QSARS) for predicting BCFs for organic chemicals between plants and air or soil. The authors compare the reliability of the octanol-air partition coefficient (K{sub oa}) to the MC based prediction method for predicting plant/air partition coefficients. The authors also compare the reliability of the octanol/water partition coefficient (K{sub ow}) to the MC based prediction method formore » predicting plant/soil partition coefficients. The results here indicate that, relative to the use of K{sub ow} or K{sub oa} as predictors of BCFs the MC can substantially increase the reliability with which BCFs can be estimated. The authors find that the MC provides a relatively precise and accurate method for predicting the potential biotransfer of a chemical from environmental media into plants. In addition, the MC is much faster and more cost effective than direct measurements.« less

  11. The Prediction of the Gas Utilization Ratio Based on TS Fuzzy Neural Network and Particle Swarm Optimization

    PubMed Central

    Jiang, Haihe; Yin, Yixin; Xiao, Wendong; Zhao, Baoyong

    2018-01-01

    Gas utilization ratio (GUR) is an important indicator that is used to evaluate the energy consumption of blast furnaces (BFs). Currently, the existing methods cannot predict the GUR accurately. In this paper, we present a novel data-driven model for predicting the GUR. The proposed approach utilized both the TS fuzzy neural network (TS-FNN) and the particle swarm algorithm (PSO) to predict the GUR. The particle swarm algorithm (PSO) is applied to optimize the parameters of the TS-FNN in order to decrease the error caused by the inaccurate initial parameter. This paper also applied the box graph (Box-plot) method to eliminate the abnormal value of the raw data during the data preprocessing. This method can deal with the data which does not obey the normal distribution which is caused by the complex industrial environments. The prediction results demonstrate that the optimization model based on PSO and the TS-FNN approach achieves higher prediction accuracy compared with the TS-FNN model and SVM model and the proposed approach can accurately predict the GUR of the blast furnace, providing an effective way for the on-line blast furnace distribution control. PMID:29461469

  12. The Prediction of the Gas Utilization Ratio based on TS Fuzzy Neural Network and Particle Swarm Optimization.

    PubMed

    Zhang, Sen; Jiang, Haihe; Yin, Yixin; Xiao, Wendong; Zhao, Baoyong

    2018-02-20

    Gas utilization ratio (GUR) is an important indicator that is used to evaluate the energy consumption of blast furnaces (BFs). Currently, the existing methods cannot predict the GUR accurately. In this paper, we present a novel data-driven model for predicting the GUR. The proposed approach utilized both the TS fuzzy neural network (TS-FNN) and the particle swarm algorithm (PSO) to predict the GUR. The particle swarm algorithm (PSO) is applied to optimize the parameters of the TS-FNN in order to decrease the error caused by the inaccurate initial parameter. This paper also applied the box graph (Box-plot) method to eliminate the abnormal value of the raw data during the data preprocessing. This method can deal with the data which does not obey the normal distribution which is caused by the complex industrial environments. The prediction results demonstrate that the optimization model based on PSO and the TS-FNN approach achieves higher prediction accuracy compared with the TS-FNN model and SVM model and the proposed approach can accurately predict the GUR of the blast furnace, providing an effective way for the on-line blast furnace distribution control.

  13. Predicting Welding Distortion in a Panel Structure with Longitudinal Stiffeners Using Inherent Deformations Obtained by Inverse Analysis Method

    PubMed Central

    Liang, Wei; Murakawa, Hidekazu

    2014-01-01

    Welding-induced deformation not only negatively affects dimension accuracy but also degrades the performance of product. If welding deformation can be accurately predicted beforehand, the predictions will be helpful for finding effective methods to improve manufacturing accuracy. Till now, there are two kinds of finite element method (FEM) which can be used to simulate welding deformation. One is the thermal elastic plastic FEM and the other is elastic FEM based on inherent strain theory. The former only can be used to calculate welding deformation for small or medium scale welded structures due to the limitation of computing speed. On the other hand, the latter is an effective method to estimate the total welding distortion for large and complex welded structures even though it neglects the detailed welding process. When the elastic FEM is used to calculate the welding-induced deformation for a large structure, the inherent deformations in each typical joint should be obtained beforehand. In this paper, a new method based on inverse analysis was proposed to obtain the inherent deformations for weld joints. Through introducing the inherent deformations obtained by the proposed method into the elastic FEM based on inherent strain theory, we predicted the welding deformation of a panel structure with two longitudinal stiffeners. In addition, experiments were carried out to verify the simulation results. PMID:25276856

  14. Predicting welding distortion in a panel structure with longitudinal stiffeners using inherent deformations obtained by inverse analysis method.

    PubMed

    Liang, Wei; Murakawa, Hidekazu

    2014-01-01

    Welding-induced deformation not only negatively affects dimension accuracy but also degrades the performance of product. If welding deformation can be accurately predicted beforehand, the predictions will be helpful for finding effective methods to improve manufacturing accuracy. Till now, there are two kinds of finite element method (FEM) which can be used to simulate welding deformation. One is the thermal elastic plastic FEM and the other is elastic FEM based on inherent strain theory. The former only can be used to calculate welding deformation for small or medium scale welded structures due to the limitation of computing speed. On the other hand, the latter is an effective method to estimate the total welding distortion for large and complex welded structures even though it neglects the detailed welding process. When the elastic FEM is used to calculate the welding-induced deformation for a large structure, the inherent deformations in each typical joint should be obtained beforehand. In this paper, a new method based on inverse analysis was proposed to obtain the inherent deformations for weld joints. Through introducing the inherent deformations obtained by the proposed method into the elastic FEM based on inherent strain theory, we predicted the welding deformation of a panel structure with two longitudinal stiffeners. In addition, experiments were carried out to verify the simulation results.

  15. Testing by artificial intelligence: computational alternatives to the determination of mutagenicity.

    PubMed

    Klopman, G; Rosenkranz, H S

    1992-08-01

    In order to develop methods for evaluating the predictive performance of computer-driven structure-activity methods (SAR) as well as to determine the limits of predictivity, we investigated the behavior of two Salmonella mutagenicity data bases: (a) a subset from the Genetox Program and (b) one from the U.S. National Toxicology Program (NTP). For molecules common to the two data bases, the experimental concordance was 76% when "marginals" were included and 81% when they were excluded. Three SAR methods were evaluated: CASE, MULTICASE and CASE/Graph Indices (CASE/GI). The programs "learned" the Genetox data base and used it to predict NTP molecules that were not present in the Genetox compilation. The concordances were 72, 80 and 47% respectively. Obviously, the MULTICASE version is superior and approaches the 85% interlaboratory variability observed for the Salmonella mutagenicity assays when the latter was carried out under carefully controlled conditions.

  16. Detrended cross-correlation coefficient: Application to predict apoptosis protein subcellular localization.

    PubMed

    Liang, Yunyun; Liu, Sanyang; Zhang, Shengli

    2016-12-01

    Apoptosis, or programed cell death, plays a central role in the development and homeostasis of an organism. Obtaining information on subcellular location of apoptosis proteins is very helpful for understanding the apoptosis mechanism. The prediction of subcellular localization of an apoptosis protein is still a challenging task, and existing methods mainly based on protein primary sequences. In this paper, we introduce a new position-specific scoring matrix (PSSM)-based method by using detrended cross-correlation (DCCA) coefficient of non-overlapping windows. Then a 190-dimensional (190D) feature vector is constructed on two widely used datasets: CL317 and ZD98, and support vector machine is adopted as classifier. To evaluate the proposed method, objective and rigorous jackknife cross-validation tests are performed on the two datasets. The results show that our approach offers a novel and reliable PSSM-based tool for prediction of apoptosis protein subcellular localization. Copyright © 2016 Elsevier Inc. All rights reserved.

  17. PARTS: Probabilistic Alignment for RNA joinT Secondary structure prediction

    PubMed Central

    Harmanci, Arif Ozgun; Sharma, Gaurav; Mathews, David H.

    2008-01-01

    A novel method is presented for joint prediction of alignment and common secondary structures of two RNA sequences. The joint consideration of common secondary structures and alignment is accomplished by structural alignment over a search space defined by the newly introduced motif called matched helical regions. The matched helical region formulation generalizes previously employed constraints for structural alignment and thereby better accommodates the structural variability within RNA families. A probabilistic model based on pseudo free energies obtained from precomputed base pairing and alignment probabilities is utilized for scoring structural alignments. Maximum a posteriori (MAP) common secondary structures, sequence alignment and joint posterior probabilities of base pairing are obtained from the model via a dynamic programming algorithm called PARTS. The advantage of the more general structural alignment of PARTS is seen in secondary structure predictions for the RNase P family. For this family, the PARTS MAP predictions of secondary structures and alignment perform significantly better than prior methods that utilize a more restrictive structural alignment model. For the tRNA and 5S rRNA families, the richer structural alignment model of PARTS does not offer a benefit and the method therefore performs comparably with existing alternatives. For all RNA families studied, the posterior probability estimates obtained from PARTS offer an improvement over posterior probability estimates from a single sequence prediction. When considering the base pairings predicted over a threshold value of confidence, the combination of sensitivity and positive predictive value is superior for PARTS than for the single sequence prediction. PARTS source code is available for download under the GNU public license at http://rna.urmc.rochester.edu. PMID:18304945

  18. The Energetic Cost of Walking: A Comparison of Predictive Methods

    PubMed Central

    Kramer, Patricia Ann; Sylvester, Adam D.

    2011-01-01

    Background The energy that animals devote to locomotion has been of intense interest to biologists for decades and two basic methodologies have emerged to predict locomotor energy expenditure: those based on metabolic and those based on mechanical energy. Metabolic energy approaches share the perspective that prediction of locomotor energy expenditure should be based on statistically significant proxies of metabolic function, while mechanical energy approaches, which derive from many different perspectives, focus on quantifying the energy of movement. Some controversy exists as to which mechanical perspective is “best”, but from first principles all mechanical methods should be equivalent if the inputs to the simulation are of similar quality. Our goals in this paper are 1) to establish the degree to which the various methods of calculating mechanical energy are correlated, and 2) to investigate to what degree the prediction methods explain the variation in energy expenditure. Methodology/Principal Findings We use modern humans as the model organism in this experiment because their data are readily attainable, but the methodology is appropriate for use in other species. Volumetric oxygen consumption and kinematic and kinetic data were collected on 8 adults while walking at their self-selected slow, normal and fast velocities. Using hierarchical statistical modeling via ordinary least squares and maximum likelihood techniques, the predictive ability of several metabolic and mechanical approaches were assessed. We found that all approaches are correlated and that the mechanical approaches explain similar amounts of the variation in metabolic energy expenditure. Most methods predict the variation within an individual well, but are poor at accounting for variation between individuals. Conclusion Our results indicate that the choice of predictive method is dependent on the question(s) of interest and the data available for use as inputs. Although we used modern humans as our model organism, these results can be extended to other species. PMID:21731693

  19. SHM-Based Probabilistic Fatigue Life Prediction for Bridges Based on FE Model Updating

    PubMed Central

    Lee, Young-Joo; Cho, Soojin

    2016-01-01

    Fatigue life prediction for a bridge should be based on the current condition of the bridge, and various sources of uncertainty, such as material properties, anticipated vehicle loads and environmental conditions, make the prediction very challenging. This paper presents a new approach for probabilistic fatigue life prediction for bridges using finite element (FE) model updating based on structural health monitoring (SHM) data. Recently, various types of SHM systems have been used to monitor and evaluate the long-term structural performance of bridges. For example, SHM data can be used to estimate the degradation of an in-service bridge, which makes it possible to update the initial FE model. The proposed method consists of three steps: (1) identifying the modal properties of a bridge, such as mode shapes and natural frequencies, based on the ambient vibration under passing vehicles; (2) updating the structural parameters of an initial FE model using the identified modal properties; and (3) predicting the probabilistic fatigue life using the updated FE model. The proposed method is demonstrated by application to a numerical model of a bridge, and the impact of FE model updating on the bridge fatigue life is discussed. PMID:26950125

  20. Fusion of multiscale wavelet-based fractal analysis on retina image for stroke prediction.

    PubMed

    Che Azemin, M Z; Kumar, Dinesh K; Wong, T Y; Wang, J J; Kawasaki, R; Mitchell, P; Arjunan, Sridhar P

    2010-01-01

    In this paper, we present a novel method of analyzing retinal vasculature using Fourier Fractal Dimension to extract the complexity of the retinal vasculature enhanced at different wavelet scales. Logistic regression was used as a fusion method to model the classifier for 5-year stroke prediction. The efficacy of this technique has been tested using standard pattern recognition performance evaluation, Receivers Operating Characteristics (ROC) analysis and medical prediction statistics, odds ratio. Stroke prediction model was developed using the proposed system.

  1. Data base for the prediction of inlet external drag

    NASA Technical Reports Server (NTRS)

    Mcmillan, O. J.; Perkins, E. W.; Perkins, S. C., Jr.

    1980-01-01

    Results are presented from a study to define and evaluate the data base for predicting an airframe/propulsion system interference effect shown to be of considerable importance, inlet external drag. The study is focused on supersonic tactical aircraft with highly integrated jet propulsion systems, although some information is included for supersonic strategic aircraft and for transport aircraft designed for high subsonic or low supersonic cruise. The data base for inlet external drag is considered to consist of the theoretical and empirical prediction methods as well as the experimental data identified in an extensive literature search. The state of the art in the subsonic and transonic speed regimes is evaluated. The experimental data base is organized and presented in a series of tables in which the test article, the quantities measured and the ranges of test conditions covered are described for each set of data; in this way, the breadth of coverage and gaps in the existing experimental data are evident. Prediction methods are categorized by method of solution, type of inlet and speed range to which they apply, major features are given, and their accuracy is assessed by means of comparison to experimental data.

  2. A probabilistic neural network based approach for predicting the output power of wind turbines

    NASA Astrophysics Data System (ADS)

    Tabatabaei, Sajad

    2017-03-01

    Finding the authentic predicting tools of eliminating the uncertainty of wind speed forecasts is highly required while wind power sources are strongly penetrating. Recently, traditional predicting models of generating point forecasts have no longer been trustee. Thus, the present paper aims at utilising the concept of prediction intervals (PIs) to assess the uncertainty of wind power generation in power systems. Besides, this paper uses a newly introduced non-parametric approach called lower upper bound estimation (LUBE) to build the PIs since the forecasting errors are unable to be modelled properly by applying distribution probability functions. In the present proposed LUBE method, a PI combination-based fuzzy framework is used to overcome the performance instability of neutral networks (NNs) used in LUBE. In comparison to other methods, this formulation more suitably has satisfied the PI coverage and PI normalised average width (PINAW). Since this non-linear problem has a high complexity, a new heuristic-based optimisation algorithm comprising a novel modification is introduced to solve the aforesaid problems. Based on data sets taken from a wind farm in Australia, the feasibility and satisfying performance of the suggested method have been investigated.

  3. Evaluating the effect of disturbed ensemble distributions on SCFG based statistical sampling of RNA secondary structures.

    PubMed

    Scheid, Anika; Nebel, Markus E

    2012-07-09

    Over the past years, statistical and Bayesian approaches have become increasingly appreciated to address the long-standing problem of computational RNA structure prediction. Recently, a novel probabilistic method for the prediction of RNA secondary structures from a single sequence has been studied which is based on generating statistically representative and reproducible samples of the entire ensemble of feasible structures for a particular input sequence. This method samples the possible foldings from a distribution implied by a sophisticated (traditional or length-dependent) stochastic context-free grammar (SCFG) that mirrors the standard thermodynamic model applied in modern physics-based prediction algorithms. Specifically, that grammar represents an exact probabilistic counterpart to the energy model underlying the Sfold software, which employs a sampling extension of the partition function (PF) approach to produce statistically representative subsets of the Boltzmann-weighted ensemble. Although both sampling approaches have the same worst-case time and space complexities, it has been indicated that they differ in performance (both with respect to prediction accuracy and quality of generated samples), where neither of these two competing approaches generally outperforms the other. In this work, we will consider the SCFG based approach in order to perform an analysis on how the quality of generated sample sets and the corresponding prediction accuracy changes when different degrees of disturbances are incorporated into the needed sampling probabilities. This is motivated by the fact that if the results prove to be resistant to large errors on the distinct sampling probabilities (compared to the exact ones), then it will be an indication that these probabilities do not need to be computed exactly, but it may be sufficient and more efficient to approximate them. Thus, it might then be possible to decrease the worst-case time requirements of such an SCFG based sampling method without significant accuracy losses. If, on the other hand, the quality of sampled structures can be observed to strongly react to slight disturbances, there is little hope for improving the complexity by heuristic procedures. We hence provide a reliable test for the hypothesis that a heuristic method could be implemented to improve the time scaling of RNA secondary structure prediction in the worst-case - without sacrificing much of the accuracy of the results. Our experiments indicate that absolute errors generally lead to the generation of useless sample sets, whereas relative errors seem to have only small negative impact on both the predictive accuracy and the overall quality of resulting structure samples. Based on these observations, we present some useful ideas for developing a time-reduced sampling method guaranteeing an acceptable predictive accuracy. We also discuss some inherent drawbacks that arise in the context of approximation. The key results of this paper are crucial for the design of an efficient and competitive heuristic prediction method based on the increasingly accepted and attractive statistical sampling approach. This has indeed been indicated by the construction of prototype algorithms.

  4. Evaluating the effect of disturbed ensemble distributions on SCFG based statistical sampling of RNA secondary structures

    PubMed Central

    2012-01-01

    Background Over the past years, statistical and Bayesian approaches have become increasingly appreciated to address the long-standing problem of computational RNA structure prediction. Recently, a novel probabilistic method for the prediction of RNA secondary structures from a single sequence has been studied which is based on generating statistically representative and reproducible samples of the entire ensemble of feasible structures for a particular input sequence. This method samples the possible foldings from a distribution implied by a sophisticated (traditional or length-dependent) stochastic context-free grammar (SCFG) that mirrors the standard thermodynamic model applied in modern physics-based prediction algorithms. Specifically, that grammar represents an exact probabilistic counterpart to the energy model underlying the Sfold software, which employs a sampling extension of the partition function (PF) approach to produce statistically representative subsets of the Boltzmann-weighted ensemble. Although both sampling approaches have the same worst-case time and space complexities, it has been indicated that they differ in performance (both with respect to prediction accuracy and quality of generated samples), where neither of these two competing approaches generally outperforms the other. Results In this work, we will consider the SCFG based approach in order to perform an analysis on how the quality of generated sample sets and the corresponding prediction accuracy changes when different degrees of disturbances are incorporated into the needed sampling probabilities. This is motivated by the fact that if the results prove to be resistant to large errors on the distinct sampling probabilities (compared to the exact ones), then it will be an indication that these probabilities do not need to be computed exactly, but it may be sufficient and more efficient to approximate them. Thus, it might then be possible to decrease the worst-case time requirements of such an SCFG based sampling method without significant accuracy losses. If, on the other hand, the quality of sampled structures can be observed to strongly react to slight disturbances, there is little hope for improving the complexity by heuristic procedures. We hence provide a reliable test for the hypothesis that a heuristic method could be implemented to improve the time scaling of RNA secondary structure prediction in the worst-case – without sacrificing much of the accuracy of the results. Conclusions Our experiments indicate that absolute errors generally lead to the generation of useless sample sets, whereas relative errors seem to have only small negative impact on both the predictive accuracy and the overall quality of resulting structure samples. Based on these observations, we present some useful ideas for developing a time-reduced sampling method guaranteeing an acceptable predictive accuracy. We also discuss some inherent drawbacks that arise in the context of approximation. The key results of this paper are crucial for the design of an efficient and competitive heuristic prediction method based on the increasingly accepted and attractive statistical sampling approach. This has indeed been indicated by the construction of prototype algorithms. PMID:22776037

  5. Drug-target interaction prediction using ensemble learning and dimensionality reduction.

    PubMed

    Ezzat, Ali; Wu, Min; Li, Xiao-Li; Kwoh, Chee-Keong

    2017-10-01

    Experimental prediction of drug-target interactions is expensive, time-consuming and tedious. Fortunately, computational methods help narrow down the search space for interaction candidates to be further examined via wet-lab techniques. Nowadays, the number of attributes/features for drugs and targets, as well as the amount of their interactions, are increasing, making these computational methods inefficient or occasionally prohibitive. This motivates us to derive a reduced feature set for prediction. In addition, since ensemble learning techniques are widely used to improve the classification performance, it is also worthwhile to design an ensemble learning framework to enhance the performance for drug-target interaction prediction. In this paper, we propose a framework for drug-target interaction prediction leveraging both feature dimensionality reduction and ensemble learning. First, we conducted feature subspacing to inject diversity into the classifier ensemble. Second, we applied three different dimensionality reduction methods to the subspaced features. Third, we trained homogeneous base learners with the reduced features and then aggregated their scores to derive the final predictions. For base learners, we selected two classifiers, namely Decision Tree and Kernel Ridge Regression, resulting in two variants of ensemble models, EnsemDT and EnsemKRR, respectively. In our experiments, we utilized AUC (Area under ROC Curve) as an evaluation metric. We compared our proposed methods with various state-of-the-art methods under 5-fold cross validation. Experimental results showed EnsemKRR achieving the highest AUC (94.3%) for predicting drug-target interactions. In addition, dimensionality reduction helped improve the performance of EnsemDT. In conclusion, our proposed methods produced significant improvements for drug-target interaction prediction. Copyright © 2017 Elsevier Inc. All rights reserved.

  6. Implementing a novel movement-based approach to inferring parturition and neonate caribou calf survival.

    PubMed

    Bonar, Maegwin; Ellington, E Hance; Lewis, Keith P; Vander Wal, Eric

    2018-01-01

    In ungulates, parturition is correlated with a reduction in movement rate. With advances in movement-based technologies comes an opportunity to develop new techniques to assess reproduction in wild ungulates that are less invasive and reduce biases. DeMars et al. (2013, Ecology and Evolution 3:4149-4160) proposed two promising new methods (individual- and population-based; the DeMars model) that use GPS inter-fix step length of adult female caribou (Rangifer tarandus caribou) to infer parturition and neonate survival. Our objective was to apply the DeMars model to caribou populations that may violate model assumptions for retrospective analysis of parturition and calf survival. We extended the use of the DeMars model after assigning parturition and calf mortality status by examining herd-wide distributions of parturition date, calf mortality date, and survival. We used the DeMars model to estimate parturition and calf mortality events and compared them with the known parturition and calf mortality events from collared adult females (n = 19). We also used the DeMars model to estimate parturition and calf mortality events for collared female caribou with unknown parturition and calf mortality events (n = 43) and instead derived herd-wide estimates of calf survival as well as distributions of parturition and calf mortality dates and compared them to herd-wide estimates generated from calves fitted with VHF collars (n = 134). For our data, the individual-based method was effective at predicting calf mortality, but was not effective at predicting parturition. The population-based method was more effective at predicting parturition but was not effective at predicting calf mortality. At the herd-level, the predicted distributions of parturition date from both methods differed from each other and from the distribution derived from the parturition dates of VHF-collared calves (log-ranked test: χ2 = 40.5, df = 2, p < 0.01). The predicted distributions of calf mortality dates from both methods were similar to the observed distribution derived from VHF-collared calves. Both methods underestimated herd-wide calf survival based on VHF-collared calves, however, a combination of the individual- and population-based methods produced herd-wide survival estimates similar to estimates generated from collared calves. The limitations we experienced when applying the DeMars model could result from the shortcomings in our data violating model assumptions. However despite the differences in our caribou systems, with proper validation techniques the framework in the DeMars model is sufficient to make inferences on parturition and calf mortality.

  7. Implementing a novel movement-based approach to inferring parturition and neonate caribou calf survival

    PubMed Central

    Ellington, E. Hance; Lewis, Keith P.; Vander Wal, Eric

    2018-01-01

    In ungulates, parturition is correlated with a reduction in movement rate. With advances in movement-based technologies comes an opportunity to develop new techniques to assess reproduction in wild ungulates that are less invasive and reduce biases. DeMars et al. (2013, Ecology and Evolution 3:4149–4160) proposed two promising new methods (individual- and population-based; the DeMars model) that use GPS inter-fix step length of adult female caribou (Rangifer tarandus caribou) to infer parturition and neonate survival. Our objective was to apply the DeMars model to caribou populations that may violate model assumptions for retrospective analysis of parturition and calf survival. We extended the use of the DeMars model after assigning parturition and calf mortality status by examining herd-wide distributions of parturition date, calf mortality date, and survival. We used the DeMars model to estimate parturition and calf mortality events and compared them with the known parturition and calf mortality events from collared adult females (n = 19). We also used the DeMars model to estimate parturition and calf mortality events for collared female caribou with unknown parturition and calf mortality events (n = 43) and instead derived herd-wide estimates of calf survival as well as distributions of parturition and calf mortality dates and compared them to herd-wide estimates generated from calves fitted with VHF collars (n = 134). For our data, the individual-based method was effective at predicting calf mortality, but was not effective at predicting parturition. The population-based method was more effective at predicting parturition but was not effective at predicting calf mortality. At the herd-level, the predicted distributions of parturition date from both methods differed from each other and from the distribution derived from the parturition dates of VHF-collared calves (log-ranked test: χ2 = 40.5, df = 2, p < 0.01). The predicted distributions of calf mortality dates from both methods were similar to the observed distribution derived from VHF-collared calves. Both methods underestimated herd-wide calf survival based on VHF-collared calves, however, a combination of the individual- and population-based methods produced herd-wide survival estimates similar to estimates generated from collared calves. The limitations we experienced when applying the DeMars model could result from the shortcomings in our data violating model assumptions. However despite the differences in our caribou systems, with proper validation techniques the framework in the DeMars model is sufficient to make inferences on parturition and calf mortality. PMID:29466451

  8. PSSP-RFE: accurate prediction of protein structural class by recursive feature extraction from PSI-BLAST profile, physical-chemical property and functional annotations.

    PubMed

    Li, Liqi; Cui, Xiang; Yu, Sanjiu; Zhang, Yuan; Luo, Zhong; Yang, Hua; Zhou, Yue; Zheng, Xiaoqi

    2014-01-01

    Protein structure prediction is critical to functional annotation of the massively accumulated biological sequences, which prompts an imperative need for the development of high-throughput technologies. As a first and key step in protein structure prediction, protein structural class prediction becomes an increasingly challenging task. Amongst most homological-based approaches, the accuracies of protein structural class prediction are sufficiently high for high similarity datasets, but still far from being satisfactory for low similarity datasets, i.e., below 40% in pairwise sequence similarity. Therefore, we present a novel method for accurate and reliable protein structural class prediction for both high and low similarity datasets. This method is based on Support Vector Machine (SVM) in conjunction with integrated features from position-specific score matrix (PSSM), PROFEAT and Gene Ontology (GO). A feature selection approach, SVM-RFE, is also used to rank the integrated feature vectors through recursively removing the feature with the lowest ranking score. The definitive top features selected by SVM-RFE are input into the SVM engines to predict the structural class of a query protein. To validate our method, jackknife tests were applied to seven widely used benchmark datasets, reaching overall accuracies between 84.61% and 99.79%, which are significantly higher than those achieved by state-of-the-art tools. These results suggest that our method could serve as an accurate and cost-effective alternative to existing methods in protein structural classification, especially for low similarity datasets.

  9. Is there more valuable information in PWI datasets for a voxel-wise acute ischemic stroke tissue outcome prediction than what is represented by typical perfusion maps?

    NASA Astrophysics Data System (ADS)

    Forkert, Nils Daniel; Siemonsen, Susanne; Dalski, Michael; Verleger, Tobias; Kemmling, Andre; Fiehler, Jens

    2014-03-01

    The acute ischemic stroke is a leading cause for death and disability in the industry nations. In case of a present acute ischemic stroke, the prediction of the future tissue outcome is of high interest for the clinicians as it can be used to support therapy decision making. Within this context, it has already been shown that the voxel-wise multi-parametric tissue outcome prediction leads to more promising results compared to single channel perfusion map thresholding. Most previously published multi-parametric predictions employ information from perfusion maps derived from perfusion-weighted MRI together with other image sequences such as diffusion-weighted MRI. However, it remains unclear if the typically calculated perfusion maps used for this purpose really include all valuable information from the PWI dataset for an optimal tissue outcome prediction. To investigate this problem in more detail, two different methods to predict tissue outcome using a k-nearest-neighbor approach were developed in this work and evaluated based on 18 datasets of acute stroke patients with known tissue outcome. The first method integrates apparent diffusion coefficient and perfusion parameter (Tmax, MTT, CBV, CBF) information for the voxel-wise prediction, while the second method employs also apparent diffusion coefficient information but the complete perfusion information in terms of the voxel-wise residue functions instead of the perfusion parameter maps for the voxel-wise prediction. Overall, the comparison of the results of the two prediction methods for the 18 patients using a leave-one-out cross validation revealed no considerable differences. Quantitatively, the parameter-based prediction of tissue outcome led to a mean Dice coefficient of 0.474, while the prediction using the residue functions led to a mean Dice coefficient of 0.461. Thus, it may be concluded from the results of this study that the perfusion parameter maps typically derived from PWI datasets include all valuable perfusion information required for a voxel-based tissue outcome prediction, while the complete analysis of the residue functions does not add further benefits for the voxel-wise tissue outcome prediction and is also computationally more expensive.

  10. Comparison of Methods of Detection of Exceptional Sequences in Prokaryotic Genomes.

    PubMed

    Rusinov, I S; Ershova, A S; Karyagina, A S; Spirin, S A; Alexeevski, A V

    2018-02-01

    Many proteins need recognition of specific DNA sequences for functioning. The number of recognition sites and their distribution along the DNA might be of biological importance. For example, the number of restriction sites is often reduced in prokaryotic and phage genomes to decrease the probability of DNA cleavage by restriction endonucleases. We call a sequence an exceptional one if its frequency in a genome significantly differs from one predicted by some mathematical model. An exceptional sequence could be either under- or over-represented, depending on its frequency in comparison with the predicted one. Exceptional sequences could be considered biologically meaningful, for example, as targets of DNA-binding proteins or as parts of abundant repetitive elements. Several methods to predict frequency of a short sequence in a genome, based on actual frequencies of certain its subsequences, are used. The most popular are methods based on Markov chain models. But any rigorous comparison of the methods has not previously been performed. We compared three methods for the prediction of short sequence frequencies: the maximum-order Markov chain model-based method, the method that uses geometric mean of extended Markovian estimates, and the method that utilizes frequencies of all subsequences including discontiguous ones. We applied them to restriction sites in complete genomes of 2500 prokaryotic species and demonstrated that the results depend greatly on the method used: lists of 5% of the most under-represented sites differed by up to 50%. The method designed by Burge and coauthors in 1992, which utilizes all subsequences of the sequence, showed a higher precision than the other two methods both on prokaryotic genomes and randomly generated sequences after computational imitation of selective pressure. We propose this method as the first choice for detection of exceptional sequences in prokaryotic genomes.

  11. Three-dimensional computed tomographic volumetry precisely predicts the postoperative pulmonary function.

    PubMed

    Kobayashi, Keisuke; Saeki, Yusuke; Kitazawa, Shinsuke; Kobayashi, Naohiro; Kikuchi, Shinji; Goto, Yukinobu; Sakai, Mitsuaki; Sato, Yukio

    2017-11-01

    It is important to accurately predict the patient's postoperative pulmonary function. The aim of this study was to compare the accuracy of predictions of the postoperative residual pulmonary function obtained with three-dimensional computed tomographic (3D-CT) volumetry with that of predictions obtained with the conventional segment-counting method. Fifty-three patients scheduled to undergo lung cancer resection, pulmonary function tests, and computed tomography were enrolled in this study. The postoperative residual pulmonary function was predicted based on the segment-counting and 3D-CT volumetry methods. The predicted postoperative values were compared with the results of postoperative pulmonary function tests. Regarding the linear correlation coefficients between the predicted postoperative values and the measured values, those obtained using the 3D-CT volumetry method tended to be higher than those acquired using the segment-counting method. In addition, the variations between the predicted and measured values were smaller with the 3D-CT volumetry method than with the segment-counting method. These results were more obvious in COPD patients than in non-COPD patients. Our findings suggested that the 3D-CT volumetry was able to predict the residual pulmonary function more accurately than the segment-counting method, especially in patients with COPD. This method might lead to the selection of appropriate candidates for surgery among patients with a marginal pulmonary function.

  12. Degradation Prediction Model Based on a Neural Network with Dynamic Windows

    PubMed Central

    Zhang, Xinghui; Xiao, Lei; Kang, Jianshe

    2015-01-01

    Tracking degradation of mechanical components is very critical for effective maintenance decision making. Remaining useful life (RUL) estimation is a widely used form of degradation prediction. RUL prediction methods when enough run-to-failure condition monitoring data can be used have been fully researched, but for some high reliability components, it is very difficult to collect run-to-failure condition monitoring data, i.e., from normal to failure. Only a certain number of condition indicators in certain period can be used to estimate RUL. In addition, some existing prediction methods have problems which block RUL estimation due to poor extrapolability. The predicted value converges to a certain constant or fluctuates in certain range. Moreover, the fluctuant condition features also have bad effects on prediction. In order to solve these dilemmas, this paper proposes a RUL prediction model based on neural network with dynamic windows. This model mainly consists of three steps: window size determination by increasing rate, change point detection and rolling prediction. The proposed method has two dominant strengths. One is that the proposed approach does not need to assume the degradation trajectory is subject to a certain distribution. The other is it can adapt to variation of degradation indicators which greatly benefits RUL prediction. Finally, the performance of the proposed RUL prediction model is validated by real field data and simulation data. PMID:25806873

  13. Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure.

    PubMed

    Song, Jiangning; Yuan, Zheng; Tan, Hao; Huber, Thomas; Burrage, Kevin

    2007-12-01

    Disulfide bonds are primary covalent crosslinks between two cysteine residues in proteins that play critical roles in stabilizing the protein structures and are commonly found in extracy-toplasmatic or secreted proteins. In protein folding prediction, the localization of disulfide bonds can greatly reduce the search in conformational space. Therefore, there is a great need to develop computational methods capable of accurately predicting disulfide connectivity patterns in proteins that could have potentially important applications. We have developed a novel method to predict disulfide connectivity patterns from protein primary sequence, using a support vector regression (SVR) approach based on multiple sequence feature vectors and predicted secondary structure by the PSIPRED program. The results indicate that our method could achieve a prediction accuracy of 74.4% and 77.9%, respectively, when averaged on proteins with two to five disulfide bridges using 4-fold cross-validation, measured on the protein and cysteine pair on a well-defined non-homologous dataset. We assessed the effects of different sequence encoding schemes on the prediction performance of disulfide connectivity. It has been shown that the sequence encoding scheme based on multiple sequence feature vectors coupled with predicted secondary structure can significantly improve the prediction accuracy, thus enabling our method to outperform most of other currently available predictors. Our work provides a complementary approach to the current algorithms that should be useful in computationally assigning disulfide connectivity patterns and helps in the annotation of protein sequences generated by large-scale whole-genome projects. The prediction web server and Supplementary Material are accessible at http://foo.maths.uq.edu.au/~huber/disulfide

  14. Numerical weather prediction model tuning via ensemble prediction system

    NASA Astrophysics Data System (ADS)

    Jarvinen, H.; Laine, M.; Ollinaho, P.; Solonen, A.; Haario, H.

    2011-12-01

    This paper discusses a novel approach to tune predictive skill of numerical weather prediction (NWP) models. NWP models contain tunable parameters which appear in parameterizations schemes of sub-grid scale physical processes. Currently, numerical values of these parameters are specified manually. In a recent dual manuscript (QJRMS, revised) we developed a new concept and method for on-line estimation of the NWP model parameters. The EPPES ("Ensemble prediction and parameter estimation system") method requires only minimal changes to the existing operational ensemble prediction infra-structure and it seems very cost-effective because practically no new computations are introduced. The approach provides an algorithmic decision making tool for model parameter optimization in operational NWP. In EPPES, statistical inference about the NWP model tunable parameters is made by (i) generating each member of the ensemble of predictions using different model parameter values, drawn from a proposal distribution, and (ii) feeding-back the relative merits of the parameter values to the proposal distribution, based on evaluation of a suitable likelihood function against verifying observations. In the presentation, the method is first illustrated in low-order numerical tests using a stochastic version of the Lorenz-95 model which effectively emulates the principal features of ensemble prediction systems. The EPPES method correctly detects the unknown and wrongly specified parameters values, and leads to an improved forecast skill. Second, results with an atmospheric general circulation model based ensemble prediction system show that the NWP model tuning capacity of EPPES scales up to realistic models and ensemble prediction systems. Finally, a global top-end NWP model tuning exercise with preliminary results is published.

  15. Airfoil self-noise and prediction

    NASA Technical Reports Server (NTRS)

    Brooks, Thomas F.; Pope, D. Stuart; Marcolini, Michael A.

    1989-01-01

    A prediction method is developed for the self-generated noise of an airfoil blade encountering smooth flow. The prediction methods for the individual self-noise mechanisms are semiempirical and are based on previous theoretical studies and data obtained from tests of two- and three-dimensional airfoil blade sections. The self-noise mechanisms are due to specific boundary-layer phenomena, that is, the boundary-layer turbulence passing the trailing edge, separated-boundary-layer and stalled flow over an airfoil, vortex shedding due to laminar boundary layer instabilities, vortex shedding from blunt trailing edges, and the turbulent vortex flow existing near the tip of lifting blades. The predictions are compared successfully with published data from three self-noise studies of different airfoil shapes. An application of the prediction method is reported for a large scale-model helicopter rotor, and the predictions compared well with experimental broadband noise measurements. A computer code of the method is given.

  16. Estimation of the base temperature and growth phase duration in terms of thermal time for four grapevine cultivars

    NASA Astrophysics Data System (ADS)

    Zapata, D.; Salazar, M.; Chaves, B.; Keller, M.; Hoogenboom, G.

    2015-12-01

    Thermal time models have been used to predict the development of many different species, including grapevine ( Vitis vinifera L.). These models normally assume that there is a linear relationship between temperature and plant development. The goal of this study was to estimate the base temperature and duration in terms of thermal time for predicting veraison for four grapevine cultivars. Historical phenological data for four cultivars that were collected in the Pacific Northwest were used to develop the thermal time model. Base temperatures ( T b) of 0 and 10 °C and the best estimated T b using three different methods were evaluated for predicting veraison in grapevine. Thermal time requirements for each individual cultivar were evaluated through analysis of variance, and means were compared using the Fisher's test. The methods that were applied to estimate T b for the development of wine grapes included the least standard deviation in heat units, the regression coefficient, and the development rate method. The estimated T b varied among methods and cultivars. The development rate method provided the lowest T b values for all cultivars. For the three methods, Chardonnay had the lowest T b ranging from 8.7 to 10.7 °C, while the highest T b values were obtained for Riesling and Cabernet Sauvignon with 11.8 and 12.8 °C, respectively. Thermal time also differed among cultivars, when either the fixed or estimated T b was used. Predictions of the beginning of ripening with the estimated temperature resulted in the lowest variation in real days when compared with predictions using T b = 0 or 10 °C, regardless of the method that was used to estimate the T b.

  17. Does rational selection of training and test sets improve the outcome of QSAR modeling?

    PubMed

    Martin, Todd M; Harten, Paul; Young, Douglas M; Muratov, Eugene N; Golbraikh, Alexander; Zhu, Hao; Tropsha, Alexander

    2012-10-22

    Prior to using a quantitative structure activity relationship (QSAR) model for external predictions, its predictive power should be established and validated. In the absence of a true external data set, the best way to validate the predictive ability of a model is to perform its statistical external validation. In statistical external validation, the overall data set is divided into training and test sets. Commonly, this splitting is performed using random division. Rational splitting methods can divide data sets into training and test sets in an intelligent fashion. The purpose of this study was to determine whether rational division methods lead to more predictive models compared to random division. A special data splitting procedure was used to facilitate the comparison between random and rational division methods. For each toxicity end point, the overall data set was divided into a modeling set (80% of the overall set) and an external evaluation set (20% of the overall set) using random division. The modeling set was then subdivided into a training set (80% of the modeling set) and a test set (20% of the modeling set) using rational division methods and by using random division. The Kennard-Stone, minimal test set dissimilarity, and sphere exclusion algorithms were used as the rational division methods. The hierarchical clustering, random forest, and k-nearest neighbor (kNN) methods were used to develop QSAR models based on the training sets. For kNN QSAR, multiple training and test sets were generated, and multiple QSAR models were built. The results of this study indicate that models based on rational division methods generate better statistical results for the test sets than models based on random division, but the predictive power of both types of models are comparable.

  18. Automated Deep Learning-Based System to Identify Endothelial Cells Derived from Induced Pluripotent Stem Cells.

    PubMed

    Kusumoto, Dai; Lachmann, Mark; Kunihiro, Takeshi; Yuasa, Shinsuke; Kishino, Yoshikazu; Kimura, Mai; Katsuki, Toshiomi; Itoh, Shogo; Seki, Tomohisa; Fukuda, Keiichi

    2018-06-05

    Deep learning technology is rapidly advancing and is now used to solve complex problems. Here, we used deep learning in convolutional neural networks to establish an automated method to identify endothelial cells derived from induced pluripotent stem cells (iPSCs), without the need for immunostaining or lineage tracing. Networks were trained to predict whether phase-contrast images contain endothelial cells based on morphology only. Predictions were validated by comparison to immunofluorescence staining for CD31, a marker of endothelial cells. Method parameters were then automatically and iteratively optimized to increase prediction accuracy. We found that prediction accuracy was correlated with network depth and pixel size of images to be analyzed. Finally, K-fold cross-validation confirmed that optimized convolutional neural networks can identify endothelial cells with high performance, based only on morphology. Copyright © 2018 The Author(s). Published by Elsevier Inc. All rights reserved.

  19. Application of a data assimilation method via an ensemble Kalman filter to reactive urea hydrolysis transport modeling

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Juxiu Tong; Bill X. Hu; Hai Huang

    2014-03-01

    With growing importance of water resources in the world, remediations of anthropogenic contaminations due to reactive solute transport become even more important. A good understanding of reactive rate parameters such as kinetic parameters is the key to accurately predicting reactive solute transport processes and designing corresponding remediation schemes. For modeling reactive solute transport, it is very difficult to estimate chemical reaction rate parameters due to complex processes of chemical reactions and limited available data. To find a method to get the reactive rate parameters for the reactive urea hydrolysis transport modeling and obtain more accurate prediction for the chemical concentrations,more » we developed a data assimilation method based on an ensemble Kalman filter (EnKF) method to calibrate reactive rate parameters for modeling urea hydrolysis transport in a synthetic one-dimensional column at laboratory scale and to update modeling prediction. We applied a constrained EnKF method to pose constraints to the updated reactive rate parameters and the predicted solute concentrations based on their physical meanings after the data assimilation calibration. From the study results we concluded that we could efficiently improve the chemical reactive rate parameters with the data assimilation method via the EnKF, and at the same time we could improve solute concentration prediction. The more data we assimilated, the more accurate the reactive rate parameters and concentration prediction. The filter divergence problem was also solved in this study.« less

  20. Remaining dischargeable time prediction for lithium-ion batteries using unscented Kalman filter

    NASA Astrophysics Data System (ADS)

    Dong, Guangzhong; Wei, Jingwen; Chen, Zonghai; Sun, Han; Yu, Xiaowei

    2017-10-01

    To overcome the range anxiety, one of the important strategies is to accurately predict the range or dischargeable time of the battery system. To accurately predict the remaining dischargeable time (RDT) of a battery, a RDT prediction framework based on accurate battery modeling and state estimation is presented in this paper. Firstly, a simplified linearized equivalent-circuit-model is developed to simulate the dynamic characteristics of a battery. Then, an online recursive least-square-algorithm method and unscented-Kalman-filter are employed to estimate the system matrices and SOC at every prediction point. Besides, a discrete wavelet transform technique is employed to capture the statistical information of past dynamics of input currents, which are utilized to predict the future battery currents. Finally, the RDT can be predicted based on the battery model, SOC estimation results and predicted future battery currents. The performance of the proposed methodology has been verified by a lithium-ion battery cell. Experimental results indicate that the proposed method can provide an accurate SOC and parameter estimation and the predicted RDT can solve the range anxiety issues.

  1. Prioritization of candidate disease genes by topological similarity between disease and protein diffusion profiles.

    PubMed

    Zhu, Jie; Qin, Yufang; Liu, Taigang; Wang, Jun; Zheng, Xiaoqi

    2013-01-01

    Identification of gene-phenotype relationships is a fundamental challenge in human health clinic. Based on the observation that genes causing the same or similar phenotypes tend to correlate with each other in the protein-protein interaction network, a lot of network-based approaches were proposed based on different underlying models. A recent comparative study showed that diffusion-based methods achieve the state-of-the-art predictive performance. In this paper, a new diffusion-based method was proposed to prioritize candidate disease genes. Diffusion profile of a disease was defined as the stationary distribution of candidate genes given a random walk with restart where similarities between phenotypes are incorporated. Then, candidate disease genes are prioritized by comparing their diffusion profiles with that of the disease. Finally, the effectiveness of our method was demonstrated through the leave-one-out cross-validation against control genes from artificial linkage intervals and randomly chosen genes. Comparative study showed that our method achieves improved performance compared to some classical diffusion-based methods. To further illustrate our method, we used our algorithm to predict new causing genes of 16 multifactorial diseases including Prostate cancer and Alzheimer's disease, and the top predictions were in good consistent with literature reports. Our study indicates that integration of multiple information sources, especially the phenotype similarity profile data, and introduction of global similarity measure between disease and gene diffusion profiles are helpful for prioritizing candidate disease genes. Programs and data are available upon request.

  2. Prediction of the future number of wells in production (in Spanish)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Coca, B.P.

    1981-01-01

    A method to predict the number of wells that will continue producing at a certain date in the future is presented. The method is applicable to reservoirs of the depletion type and is based on the survival probability concept. This is useful when forecasting by empirical methods. An example of a field in primary production is presented.

  3. Prediction of Protein Configurational Entropy (Popcoen).

    PubMed

    Goethe, Martin; Gleixner, Jan; Fita, Ignacio; Rubi, J Miguel

    2018-03-13

    A knowledge-based method for configurational entropy prediction of proteins is presented; this methodology is extremely fast, compared to previous approaches, because it does not involve any type of configurational sampling. Instead, the configurational entropy of a query fold is estimated by evaluating an artificial neural network, which was trained on molecular-dynamics simulations of ∼1000 proteins. The predicted entropy can be incorporated into a large class of protein software based on cost-function minimization/evaluation, in which configurational entropy is currently neglected for performance reasons. Software of this type is used for all major protein tasks such as structure predictions, proteins design, NMR and X-ray refinement, docking, and mutation effect predictions. Integrating the predicted entropy can yield a significant accuracy increase as we show exemplarily for native-state identification with the prominent protein software FoldX. The method has been termed Popcoen for Prediction of Protein Configurational Entropy. An implementation is freely available at http://fmc.ub.edu/popcoen/ .

  4. Biochemical methane potential prediction of plant biomasses: Comparing chemical composition versus near infrared methods and linear versus non-linear models.

    PubMed

    Godin, Bruno; Mayer, Frédéric; Agneessens, Richard; Gerin, Patrick; Dardenne, Pierre; Delfosse, Philippe; Delcarte, Jérôme

    2015-01-01

    The reliability of different models to predict the biochemical methane potential (BMP) of various plant biomasses using a multispecies dataset was compared. The most reliable prediction models of the BMP were those based on the near infrared (NIR) spectrum compared to those based on the chemical composition. The NIR predictions of local (specific regression and non-linear) models were able to estimate quantitatively, rapidly, cheaply and easily the BMP. Such a model could be further used for biomethanation plant management and optimization. The predictions of non-linear models were more reliable compared to those of linear models. The presentation form (green-dried, silage-dried and silage-wet form) of biomasses to the NIR spectrometer did not influence the performances of the NIR prediction models. The accuracy of the BMP method should be improved to enhance further the BMP prediction models. Copyright © 2014 Elsevier Ltd. All rights reserved.

  5. ATLAAS: an automatic decision tree-based learning algorithm for advanced image segmentation in positron emission tomography.

    PubMed

    Berthon, Beatrice; Marshall, Christopher; Evans, Mererid; Spezi, Emiliano

    2016-07-07

    Accurate and reliable tumour delineation on positron emission tomography (PET) is crucial for radiotherapy treatment planning. PET automatic segmentation (PET-AS) eliminates intra- and interobserver variability, but there is currently no consensus on the optimal method to use, as different algorithms appear to perform better for different types of tumours. This work aimed to develop a predictive segmentation model, trained to automatically select and apply the best PET-AS method, according to the tumour characteristics. ATLAAS, the automatic decision tree-based learning algorithm for advanced segmentation is based on supervised machine learning using decision trees. The model includes nine PET-AS methods and was trained on a 100 PET scans with known true contour. A decision tree was built for each PET-AS algorithm to predict its accuracy, quantified using the Dice similarity coefficient (DSC), according to the tumour volume, tumour peak to background SUV ratio and a regional texture metric. The performance of ATLAAS was evaluated for 85 PET scans obtained from fillable and printed subresolution sandwich phantoms. ATLAAS showed excellent accuracy across a wide range of phantom data and predicted the best or near-best segmentation algorithm in 93% of cases. ATLAAS outperformed all single PET-AS methods on fillable phantom data with a DSC of 0.881, while the DSC for H&N phantom data was 0.819. DSCs higher than 0.650 were achieved in all cases. ATLAAS is an advanced automatic image segmentation algorithm based on decision tree predictive modelling, which can be trained on images with known true contour, to predict the best PET-AS method when the true contour is unknown. ATLAAS provides robust and accurate image segmentation with potential applications to radiation oncology.

  6. ATLAAS: an automatic decision tree-based learning algorithm for advanced image segmentation in positron emission tomography

    NASA Astrophysics Data System (ADS)

    Berthon, Beatrice; Marshall, Christopher; Evans, Mererid; Spezi, Emiliano

    2016-07-01

    Accurate and reliable tumour delineation on positron emission tomography (PET) is crucial for radiotherapy treatment planning. PET automatic segmentation (PET-AS) eliminates intra- and interobserver variability, but there is currently no consensus on the optimal method to use, as different algorithms appear to perform better for different types of tumours. This work aimed to develop a predictive segmentation model, trained to automatically select and apply the best PET-AS method, according to the tumour characteristics. ATLAAS, the automatic decision tree-based learning algorithm for advanced segmentation is based on supervised machine learning using decision trees. The model includes nine PET-AS methods and was trained on a 100 PET scans with known true contour. A decision tree was built for each PET-AS algorithm to predict its accuracy, quantified using the Dice similarity coefficient (DSC), according to the tumour volume, tumour peak to background SUV ratio and a regional texture metric. The performance of ATLAAS was evaluated for 85 PET scans obtained from fillable and printed subresolution sandwich phantoms. ATLAAS showed excellent accuracy across a wide range of phantom data and predicted the best or near-best segmentation algorithm in 93% of cases. ATLAAS outperformed all single PET-AS methods on fillable phantom data with a DSC of 0.881, while the DSC for H&N phantom data was 0.819. DSCs higher than 0.650 were achieved in all cases. ATLAAS is an advanced automatic image segmentation algorithm based on decision tree predictive modelling, which can be trained on images with known true contour, to predict the best PET-AS method when the true contour is unknown. ATLAAS provides robust and accurate image segmentation with potential applications to radiation oncology.

  7. Estimation of body density based on hydrostatic weighing without head submersion in young Japanese adults.

    PubMed

    Demura, S; Sato, S; Kitabayashi, T

    2006-06-01

    This study examined a method of predicting body density based on hydrostatic weighing without head submersion (HWwithoutHS). Donnelly and Sintek (1984) developed a method to predict body density based on hydrostatic weight without head submersion. This method predicts the difference (D) between HWwithoutHS and hydrostatic weight with head submersion (HWwithHS) from anthropometric variables (head length and head width), and then calculates body density using D as a correction factor. We developed several prediction equations to estimate D based on head anthropometry and differences between the sexes, and compared their prediction accuracy with Donnelly and Sintek's equation. Thirty-two males and 32 females aged 17-26 years participated in the study. Multiple linear regression analysis was performed to obtain the prediction equations, and the systematic errors of their predictions were assessed by Bland-Altman plots. The best prediction equations obtained were: Males: D(g) = -164.12X1 - 125.81X2 - 111.03X3 + 100.66X4 + 6488.63, where X1 = head length (cm), X2 = head circumference (cm), X3 = head breadth (cm), X4 = head thickness (cm) (R = 0.858, R2 = 0.737, adjusted R2 = 0.687, standard error of the estimate = 224.1); Females: D(g) = -156.03X1 - 14.03X2 - 38.45X3 - 8.87X4 + 7852.45, where X1 = head circumference (cm), X2 = body mass (g), X3 = head length (cm), X4 = height (cm) (R = 0.913, R2 = 0.833, adjusted R2 = 0.808, standard error of the estimate = 137.7). The effective predictors in these prediction equations differed from those of Donnelly and Sintek's equation, and head circumference and head length were included in both equations. The prediction accuracy was improved by statistically selecting effective predictors. Since we did not assess cross-validity, the equations cannot be used to generalize to other populations, and further investigation is required.

  8. A Sensor Dynamic Measurement Error Prediction Model Based on NAPSO-SVM

    PubMed Central

    Jiang, Minlan; Jiang, Lan; Jiang, Dingde; Li, Fei

    2018-01-01

    Dynamic measurement error correction is an effective way to improve sensor precision. Dynamic measurement error prediction is an important part of error correction, and support vector machine (SVM) is often used for predicting the dynamic measurement errors of sensors. Traditionally, the SVM parameters were always set manually, which cannot ensure the model’s performance. In this paper, a SVM method based on an improved particle swarm optimization (NAPSO) is proposed to predict the dynamic measurement errors of sensors. Natural selection and simulated annealing are added in the PSO to raise the ability to avoid local optima. To verify the performance of NAPSO-SVM, three types of algorithms are selected to optimize the SVM’s parameters: the particle swarm optimization algorithm (PSO), the improved PSO optimization algorithm (NAPSO), and the glowworm swarm optimization (GSO). The dynamic measurement error data of two sensors are applied as the test data. The root mean squared error and mean absolute percentage error are employed to evaluate the prediction models’ performances. The experimental results show that among the three tested algorithms the NAPSO-SVM method has a better prediction precision and a less prediction errors, and it is an effective method for predicting the dynamic measurement errors of sensors. PMID:29342942

  9. Reconstruction of metabolic pathways by combining probabilistic graphical model-based and knowledge-based methods

    PubMed Central

    2014-01-01

    Automatic reconstruction of metabolic pathways for an organism from genomics and transcriptomics data has been a challenging and important problem in bioinformatics. Traditionally, known reference pathways can be mapped into an organism-specific ones based on its genome annotation and protein homology. However, this simple knowledge-based mapping method might produce incomplete pathways and generally cannot predict unknown new relations and reactions. In contrast, ab initio metabolic network construction methods can predict novel reactions and interactions, but its accuracy tends to be low leading to a lot of false positives. Here we combine existing pathway knowledge and a new ab initio Bayesian probabilistic graphical model together in a novel fashion to improve automatic reconstruction of metabolic networks. Specifically, we built a knowledge database containing known, individual gene / protein interactions and metabolic reactions extracted from existing reference pathways. Known reactions and interactions were then used as constraints for Bayesian network learning methods to predict metabolic pathways. Using individual reactions and interactions extracted from different pathways of many organisms to guide pathway construction is new and improves both the coverage and accuracy of metabolic pathway construction. We applied this probabilistic knowledge-based approach to construct the metabolic networks from yeast gene expression data and compared its results with 62 known metabolic networks in the KEGG database. The experiment showed that the method improved the coverage of metabolic network construction over the traditional reference pathway mapping method and was more accurate than pure ab initio methods. PMID:25374614

  10. Binary recursive partitioning: background, methods, and application to psychology.

    PubMed

    Merkle, Edgar C; Shaffer, Victoria A

    2011-02-01

    Binary recursive partitioning (BRP) is a computationally intensive statistical method that can be used in situations where linear models are often used. Instead of imposing many assumptions to arrive at a tractable statistical model, BRP simply seeks to accurately predict a response variable based on values of predictor variables. The method outputs a decision tree depicting the predictor variables that were related to the response variable, along with the nature of the variables' relationships. No significance tests are involved, and the tree's 'goodness' is judged based on its predictive accuracy. In this paper, we describe BRP methods in a detailed manner and illustrate their use in psychological research. We also provide R code for carrying out the methods.

  11. Predictive models for Escherichia coli concentrations at inland lake beaches and relationship of model variables to pathogen detection

    EPA Science Inventory

    Methods are needed improve the timeliness and accuracy of recreational water‐quality assessments. Traditional culture methods require 18–24 h to obtain results and may not reflect current conditions. Predictive models, based on environmental and water quality variables, have been...

  12. Parameter prediction based on Improved Process neural network and ARMA error compensation in Evaporation Process

    NASA Astrophysics Data System (ADS)

    Qian, Xiaoshan

    2018-01-01

    The traditional model of evaporation process parameters have continuity and cumulative characteristics of the prediction error larger issues, based on the basis of the process proposed an adaptive particle swarm neural network forecasting method parameters established on the autoregressive moving average (ARMA) error correction procedure compensated prediction model to predict the results of the neural network to improve prediction accuracy. Taking a alumina plant evaporation process to analyze production data validation, and compared with the traditional model, the new model prediction accuracy greatly improved, can be used to predict the dynamic process of evaporation of sodium aluminate solution components.

  13. Machine Learning–Based Differential Network Analysis: A Study of Stress-Responsive Transcriptomes in Arabidopsis[W

    PubMed Central

    Ma, Chuang; Xin, Mingming; Feldmann, Kenneth A.; Wang, Xiangfeng

    2014-01-01

    Machine learning (ML) is an intelligent data mining technique that builds a prediction model based on the learning of prior knowledge to recognize patterns in large-scale data sets. We present an ML-based methodology for transcriptome analysis via comparison of gene coexpression networks, implemented as an R package called machine learning–based differential network analysis (mlDNA) and apply this method to reanalyze a set of abiotic stress expression data in Arabidopsis thaliana. The mlDNA first used a ML-based filtering process to remove nonexpressed, constitutively expressed, or non-stress-responsive “noninformative” genes prior to network construction, through learning the patterns of 32 expression characteristics of known stress-related genes. The retained “informative” genes were subsequently analyzed by ML-based network comparison to predict candidate stress-related genes showing expression and network differences between control and stress networks, based on 33 network topological characteristics. Comparative evaluation of the network-centric and gene-centric analytic methods showed that mlDNA substantially outperformed traditional statistical testing–based differential expression analysis at identifying stress-related genes, with markedly improved prediction accuracy. To experimentally validate the mlDNA predictions, we selected 89 candidates out of the 1784 predicted salt stress–related genes with available SALK T-DNA mutagenesis lines for phenotypic screening and identified two previously unreported genes, mutants of which showed salt-sensitive phenotypes. PMID:24520154

  14. Effluent composition prediction of a two-stage anaerobic digestion process: machine learning and stoichiometry techniques.

    PubMed

    Alejo, Luz; Atkinson, John; Guzmán-Fierro, Víctor; Roeckel, Marlene

    2018-05-16

    Computational self-adapting methods (Support Vector Machines, SVM) are compared with an analytical method in effluent composition prediction of a two-stage anaerobic digestion (AD) process. Experimental data for the AD of poultry manure were used. The analytical method considers the protein as the only source of ammonia production in AD after degradation. Total ammonia nitrogen (TAN), total solids (TS), chemical oxygen demand (COD), and total volatile solids (TVS) were measured in the influent and effluent of the process. The TAN concentration in the effluent was predicted, this being the most inhibiting and polluting compound in AD. Despite the limited data available, the SVM-based model outperformed the analytical method for the TAN prediction, achieving a relative average error of 15.2% against 43% for the analytical method. Moreover, SVM showed higher prediction accuracy in comparison with Artificial Neural Networks. This result reveals the future promise of SVM for prediction in non-linear and dynamic AD processes. Graphical abstract ᅟ.

  15. Analysis of Free Modeling Predictions by RBO Aleph in CASP11

    PubMed Central

    Mabrouk, Mahmoud; Werner, Tim; Schneider, Michael; Putz, Ines; Brock, Oliver

    2015-01-01

    The CASP experiment is a biannual benchmark for assessing protein structure prediction methods. In CASP11, RBO Aleph ranked as one of the top-performing automated servers in the free modeling category. This category consists of targets for which structural templates are not easily retrievable. We analyze the performance of RBO Aleph and show that its success in CASP was a result of its ab initio structure prediction protocol. A detailed analysis of this protocol demonstrates that two components unique to our method greatly contributed to prediction quality: residue–residue contact prediction by EPC-map and contact–guided conformational space search by model-based search (MBS). Interestingly, our analysis also points to a possible fundamental problem in evaluating the performance of protein structure prediction methods: Improvements in components of the method do not necessarily lead to improvements of the entire method. This points to the fact that these components interact in ways that are poorly understood. This problem, if indeed true, represents a significant obstacle to community-wide progress. PMID:26492194

  16. Prediction of competitive diffusion on complex networks

    NASA Astrophysics Data System (ADS)

    Zhao, Jiuhua; Liu, Qipeng; Wang, Lin; Wang, Xiaofan

    2018-10-01

    In this paper, we study the prediction problem of diffusion process on complex networks in competitive circumstances. With this problem solved, the competitors could timely intervene the diffusion process if needed such that an expected outcome might be obtained. We consider a model with two groups of competitors spreading opposite opinions on a network. A prediction method based on the mutual influences among the agents is proposed, called Influence Matrix (IM for short), and simulations on real-world networks show that the proposed IM method has quite high accuracy on predicting both the preference of any normal agent and the final competition result. For comparison purpose, classic centrality measures are also used to predict the competition result. It is shown that PageRank, Degree, Katz Centrality, and the IM method are suitable for predicting the competition result. More precisely, in undirected networks, the IM method performs better than these centrality measures when the competing group contains more than one agent; in directed networks, the IM method performs only second to PageRank.

  17. Knowledge-based analysis of microarrays for the discovery of transcriptional regulation relationships

    PubMed Central

    2010-01-01

    Background The large amount of high-throughput genomic data has facilitated the discovery of the regulatory relationships between transcription factors and their target genes. While early methods for discovery of transcriptional regulation relationships from microarray data often focused on the high-throughput experimental data alone, more recent approaches have explored the integration of external knowledge bases of gene interactions. Results In this work, we develop an algorithm that provides improved performance in the prediction of transcriptional regulatory relationships by supplementing the analysis of microarray data with a new method of integrating information from an existing knowledge base. Using a well-known dataset of yeast microarrays and the Yeast Proteome Database, a comprehensive collection of known information of yeast genes, we show that knowledge-based predictions demonstrate better sensitivity and specificity in inferring new transcriptional interactions than predictions from microarray data alone. We also show that comprehensive, direct and high-quality knowledge bases provide better prediction performance. Comparison of our results with ChIP-chip data and growth fitness data suggests that our predicted genome-wide regulatory pairs in yeast are reasonable candidates for follow-up biological verification. Conclusion High quality, comprehensive, and direct knowledge bases, when combined with appropriate bioinformatic algorithms, can significantly improve the discovery of gene regulatory relationships from high throughput gene expression data. PMID:20122245

  18. Knowledge-based analysis of microarrays for the discovery of transcriptional regulation relationships.

    PubMed

    Seok, Junhee; Kaushal, Amit; Davis, Ronald W; Xiao, Wenzhong

    2010-01-18

    The large amount of high-throughput genomic data has facilitated the discovery of the regulatory relationships between transcription factors and their target genes. While early methods for discovery of transcriptional regulation relationships from microarray data often focused on the high-throughput experimental data alone, more recent approaches have explored the integration of external knowledge bases of gene interactions. In this work, we develop an algorithm that provides improved performance in the prediction of transcriptional regulatory relationships by supplementing the analysis of microarray data with a new method of integrating information from an existing knowledge base. Using a well-known dataset of yeast microarrays and the Yeast Proteome Database, a comprehensive collection of known information of yeast genes, we show that knowledge-based predictions demonstrate better sensitivity and specificity in inferring new transcriptional interactions than predictions from microarray data alone. We also show that comprehensive, direct and high-quality knowledge bases provide better prediction performance. Comparison of our results with ChIP-chip data and growth fitness data suggests that our predicted genome-wide regulatory pairs in yeast are reasonable candidates for follow-up biological verification. High quality, comprehensive, and direct knowledge bases, when combined with appropriate bioinformatic algorithms, can significantly improve the discovery of gene regulatory relationships from high throughput gene expression data.

  19. Knowledge-based grouping of modeled HLA peptide complexes.

    PubMed

    Kangueane, P; Sakharkar, M K; Lim, K S; Hao, H; Lin, K; Chee, R E; Kolatkar, P R

    2000-05-01

    Human leukocyte antigens are the most polymorphic of human genes and multiple sequence alignment shows that such polymorphisms are clustered in the functional peptide binding domains. Because of such polymorphism among the peptide binding residues, the prediction of peptides that bind to specific HLA molecules is very difficult. In recent years two different types of computer based prediction methods have been developed and both the methods have their own advantages and disadvantages. The nonavailability of allele specific binding data restricts the use of knowledge-based prediction methods for a wide range of HLA alleles. Alternatively, the modeling scheme appears to be a promising predictive tool for the selection of peptides that bind to specific HLA molecules. The scoring of the modeled HLA-peptide complexes is a major concern. The use of knowledge based rules (van der Waals clashes and solvent exposed hydrophobic residues) to distinguish binders from nonbinders is applied in the present study. The rules based on (1) number of observed atomic clashes between the modeled peptide and the HLA structure, and (2) number of solvent exposed hydrophobic residues on the modeled peptide effectively discriminate experimentally known binders from poor/nonbinders. Solved crystal complexes show no vdW Clash (vdWC) in 95% cases and no solvent exposed hydrophobic peptide residues (SEHPR) were seen in 86% cases. In our attempt to compare experimental binding data with the predicted scores by this scoring scheme, 77% of the peptides are correctly grouped as good binders with a sensitivity of 71%.

  20. Protein model quality assessment prediction by combining fragment comparisons and a consensus Cα contact potential

    PubMed Central

    Zhou, Hongyi; Skolnick, Jeffrey

    2009-01-01

    In this work, we develop a fully automated method for the quality assessment prediction of protein structural models generated by structure prediction approaches such as fold recognition servers, or ab initio methods. The approach is based on fragment comparisons and a consensus Cα contact potential derived from the set of models to be assessed and was tested on CASP7 server models. The average Pearson linear correlation coefficient between predicted quality and model GDT-score per target is 0.83 for the 98 targets which is better than those of other quality assessment methods that participated in CASP7. Our method also outperforms the other methods by about 3% as assessed by the total GDT-score of the selected top models. PMID:18004783

  1. Prediction of sound fields in acoustical cavities using the boundary element method. M.S. Thesis

    NASA Technical Reports Server (NTRS)

    Kipp, C. R.; Bernhard, R. J.

    1985-01-01

    A method was developed to predict sound fields in acoustical cavities. The method is based on the indirect boundary element method. An isoparametric quadratic boundary element is incorporated. Pressure, velocity and/or impedance boundary conditions may be applied to a cavity by using this method. The capability to include acoustic point sources within the cavity is implemented. The method is applied to the prediction of sound fields in spherical and rectangular cavities. All three boundary condition types are verified. Cases with a point source within the cavity domain are also studied. Numerically determined cavity pressure distributions and responses are presented. The numerical results correlate well with available analytical results.

  2. Predicting hot spots in protein interfaces based on protrusion index, pseudo hydrophobicity and electron-ion interaction pseudopotential features

    PubMed Central

    Xia, Junfeng; Yue, Zhenyu; Di, Yunqiang; Zhu, Xiaolei; Zheng, Chun-Hou

    2016-01-01

    The identification of hot spots, a small subset of protein interfaces that accounts for the majority of binding free energy, is becoming more important for the research of drug design and cancer development. Based on our previous methods (APIS and KFC2), here we proposed a novel hot spot prediction method. For each hot spot residue, we firstly constructed a wide variety of 108 sequence, structural, and neighborhood features to characterize potential hot spot residues, including conventional ones and new one (pseudo hydrophobicity) exploited in this study. We then selected 3 top-ranking features that contribute the most in the classification by a two-step feature selection process consisting of minimal-redundancy-maximal-relevance algorithm and an exhaustive search method. We used support vector machines to build our final prediction model. When testing our model on an independent test set, our method showed the highest F1-score of 0.70 and MCC of 0.46 comparing with the existing state-of-the-art hot spot prediction methods. Our results indicate that these features are more effective than the conventional features considered previously, and that the combination of our and traditional features may support the creation of a discriminative feature set for efficient prediction of hot spots in protein interfaces. PMID:26934646

  3. Prediction of cancer cell sensitivity to natural products based on genomic and chemical properties.

    PubMed

    Yue, Zhenyu; Zhang, Wenna; Lu, Yongming; Yang, Qiaoyue; Ding, Qiuying; Xia, Junfeng; Chen, Yan

    2015-01-01

    Natural products play a significant role in cancer chemotherapy. They are likely to provide many lead structures, which can be used as templates for the construction of novel drugs with enhanced antitumor activity. Traditional research approaches studied structure-activity relationship of natural products and obtained key structural properties, such as chemical bond or group, with the purpose of ascertaining their effect on a single cell line or a single tissue type. Here, for the first time, we develop a machine learning method to comprehensively predict natural products responses against a panel of cancer cell lines based on both the gene expression and the chemical properties of natural products. The results on two datasets, training set and independent test set, show that this proposed method yields significantly better prediction accuracy. In addition, we also demonstrate the predictive power of our proposed method by modeling the cancer cell sensitivity to two natural products, Curcumin and Resveratrol, which indicate that our method can effectively predict the response of cancer cell lines to these two natural products. Taken together, the method will facilitate the identification of natural products as cancer therapies and the development of precision medicine by linking the features of patient genomes to natural product sensitivity.

  4. A sampling-based method for ranking protein structural models by integrating multiple scores and features.

    PubMed

    Shi, Xiaohu; Zhang, Jingfen; He, Zhiquan; Shang, Yi; Xu, Dong

    2011-09-01

    One of the major challenges in protein tertiary structure prediction is structure quality assessment. In many cases, protein structure prediction tools generate good structural models, but fail to select the best models from a huge number of candidates as the final output. In this study, we developed a sampling-based machine-learning method to rank protein structural models by integrating multiple scores and features. First, features such as predicted secondary structure, solvent accessibility and residue-residue contact information are integrated by two Radial Basis Function (RBF) models trained from different datasets. Then, the two RBF scores and five selected scoring functions developed by others, i.e., Opus-CA, Opus-PSP, DFIRE, RAPDF, and Cheng Score are synthesized by a sampling method. At last, another integrated RBF model ranks the structural models according to the features of sampling distribution. We tested the proposed method by using two different datasets, including the CASP server prediction models of all CASP8 targets and a set of models generated by our in-house software MUFOLD. The test result shows that our method outperforms any individual scoring function on both best model selection, and overall correlation between the predicted ranking and the actual ranking of structural quality.

  5. Novel hyperspectral prediction method and apparatus

    NASA Astrophysics Data System (ADS)

    Kemeny, Gabor J.; Crothers, Natalie A.; Groth, Gard A.; Speck, Kathy A.; Marbach, Ralf

    2009-05-01

    Both the power and the challenge of hyperspectral technologies is the very large amount of data produced by spectral cameras. While off-line methodologies allow the collection of gigabytes of data, extended data analysis sessions are required to convert the data into useful information. In contrast, real-time monitoring, such as on-line process control, requires that compression of spectral data and analysis occur at a sustained full camera data rate. Efficient, high-speed practical methods for calibration and prediction are therefore sought to optimize the value of hyperspectral imaging. A novel method of matched filtering known as science based multivariate calibration (SBC) was developed for hyperspectral calibration. Classical (MLR) and inverse (PLS, PCR) methods are combined by spectroscopically measuring the spectral "signal" and by statistically estimating the spectral "noise." The accuracy of the inverse model is thus combined with the easy interpretability of the classical model. The SBC method is optimized for hyperspectral data in the Hyper-CalTM software used for the present work. The prediction algorithms can then be downloaded into a dedicated FPGA based High-Speed Prediction EngineTM module. Spectral pretreatments and calibration coefficients are stored on interchangeable SD memory cards, and predicted compositions are produced on a USB interface at real-time camera output rates. Applications include minerals, pharmaceuticals, food processing and remote sensing.

  6. A novel method for landslide displacement prediction by integrating advanced computational intelligence algorithms.

    PubMed

    Zhou, Chao; Yin, Kunlong; Cao, Ying; Ahmed, Bayes; Fu, Xiaolin

    2018-05-08

    Landslide displacement prediction is considered as an essential component for developing early warning systems. The modelling of conventional forecast methods requires enormous monitoring data that limit its application. To conduct accurate displacement prediction with limited data, a novel method is proposed and applied by integrating three computational intelligence algorithms namely: the wavelet transform (WT), the artificial bees colony (ABC), and the kernel-based extreme learning machine (KELM). At first, the total displacement was decomposed into several sub-sequences with different frequencies using the WT. Next each sub-sequence was predicted separately by the KELM whose parameters were optimized by the ABC. Finally the predicted total displacement was obtained by adding all the predicted sub-sequences. The Shuping landslide in the Three Gorges Reservoir area in China was taken as a case study. The performance of the new method was compared with the WT-ELM, ABC-KELM, ELM, and the support vector machine (SVM) methods. Results show that the prediction accuracy can be improved by decomposing the total displacement into sub-sequences with various frequencies and by predicting them separately. The ABC-KELM algorithm shows the highest prediction capacity followed by the ELM and SVM. Overall, the proposed method achieved excellent performance both in terms of accuracy and stability.

  7. Local-search based prediction of medical image registration error

    NASA Astrophysics Data System (ADS)

    Saygili, Görkem

    2018-03-01

    Medical image registration is a crucial task in many different medical imaging applications. Hence, considerable amount of work has been published recently that aim to predict the error in a registration without any human effort. If provided, these error predictions can be used as a feedback to the registration algorithm to further improve its performance. Recent methods generally start with extracting image-based and deformation-based features, then apply feature pooling and finally train a Random Forest (RF) regressor to predict the real registration error. Image-based features can be calculated after applying a single registration but provide limited accuracy whereas deformation-based features such as variation of deformation vector field may require up to 20 registrations which is a considerably high time-consuming task. This paper proposes to use extracted features from a local search algorithm as image-based features to estimate the error of a registration. The proposed method comprises a local search algorithm to find corresponding voxels between registered image pairs and based on the amount of shifts and stereo confidence measures, it predicts the amount of registration error in millimetres densely using a RF regressor. Compared to other algorithms in the literature, the proposed algorithm does not require multiple registrations, can be efficiently implemented on a Graphical Processing Unit (GPU) and can still provide highly accurate error predictions in existence of large registration error. Experimental results with real registrations on a public dataset indicate a substantially high accuracy achieved by using features from the local search algorithm.

  8. Multispectral code excited linear prediction coding and its application in magnetic resonance images.

    PubMed

    Hu, J H; Wang, Y; Cahill, P T

    1997-01-01

    This paper reports a multispectral code excited linear prediction (MCELP) method for the compression of multispectral images. Different linear prediction models and adaptation schemes have been compared. The method that uses a forward adaptive autoregressive (AR) model has been proven to achieve a good compromise between performance, complexity, and robustness. This approach is referred to as the MFCELP method. Given a set of multispectral images, the linear predictive coefficients are updated over nonoverlapping three-dimensional (3-D) macroblocks. Each macroblock is further divided into several 3-D micro-blocks, and the best excitation signal for each microblock is determined through an analysis-by-synthesis procedure. The MFCELP method has been applied to multispectral magnetic resonance (MR) images. To satisfy the high quality requirement for medical images, the error between the original image set and the synthesized one is further specified using a vector quantizer. This method has been applied to images from 26 clinical MR neuro studies (20 slices/study, three spectral bands/slice, 256x256 pixels/band, 12 b/pixel). The MFCELP method provides a significant visual improvement over the discrete cosine transform (DCT) based Joint Photographers Expert Group (JPEG) method, the wavelet transform based embedded zero-tree wavelet (EZW) coding method, and the vector tree (VT) coding method, as well as the multispectral segmented autoregressive moving average (MSARMA) method we developed previously.

  9. Framework for making better predictions by directly estimating variables' predictivity.

    PubMed

    Lo, Adeline; Chernoff, Herman; Zheng, Tian; Lo, Shaw-Hwa

    2016-12-13

    We propose approaching prediction from a framework grounded in the theoretical correct prediction rate of a variable set as a parameter of interest. This framework allows us to define a measure of predictivity that enables assessing variable sets for, preferably high, predictivity. We first define the prediction rate for a variable set and consider, and ultimately reject, the naive estimator, a statistic based on the observed sample data, due to its inflated bias for moderate sample size and its sensitivity to noisy useless variables. We demonstrate that the [Formula: see text]-score of the PR method of VS yields a relatively unbiased estimate of a parameter that is not sensitive to noisy variables and is a lower bound to the parameter of interest. Thus, the PR method using the [Formula: see text]-score provides an effective approach to selecting highly predictive variables. We offer simulations and an application of the [Formula: see text]-score on real data to demonstrate the statistic's predictive performance on sample data. We conjecture that using the partition retention and [Formula: see text]-score can aid in finding variable sets with promising prediction rates; however, further research in the avenue of sample-based measures of predictivity is much desired.

  10. Framework for making better predictions by directly estimating variables’ predictivity

    PubMed Central

    Chernoff, Herman; Lo, Shaw-Hwa

    2016-01-01

    We propose approaching prediction from a framework grounded in the theoretical correct prediction rate of a variable set as a parameter of interest. This framework allows us to define a measure of predictivity that enables assessing variable sets for, preferably high, predictivity. We first define the prediction rate for a variable set and consider, and ultimately reject, the naive estimator, a statistic based on the observed sample data, due to its inflated bias for moderate sample size and its sensitivity to noisy useless variables. We demonstrate that the I-score of the PR method of VS yields a relatively unbiased estimate of a parameter that is not sensitive to noisy variables and is a lower bound to the parameter of interest. Thus, the PR method using the I-score provides an effective approach to selecting highly predictive variables. We offer simulations and an application of the I-score on real data to demonstrate the statistic’s predictive performance on sample data. We conjecture that using the partition retention and I-score can aid in finding variable sets with promising prediction rates; however, further research in the avenue of sample-based measures of predictivity is much desired. PMID:27911830

  11. Constrained Active Learning for Anchor Link Prediction Across Multiple Heterogeneous Social Networks

    PubMed Central

    Zhu, Junxing; Zhang, Jiawei; Wu, Quanyuan; Jia, Yan; Zhou, Bin; Wei, Xiaokai; Yu, Philip S.

    2017-01-01

    Nowadays, people are usually involved in multiple heterogeneous social networks simultaneously. Discovering the anchor links between the accounts owned by the same users across different social networks is crucial for many important inter-network applications, e.g., cross-network link transfer and cross-network recommendation. Many different supervised models have been proposed to predict anchor links so far, but they are effective only when the labeled anchor links are abundant. However, in real scenarios, such a requirement can hardly be met and most anchor links are unlabeled, since manually labeling the inter-network anchor links is quite costly and tedious. To overcome such a problem and utilize the numerous unlabeled anchor links in model building, in this paper, we introduce the active learning based anchor link prediction problem. Different from the traditional active learning problems, due to the one-to-one constraint on anchor links, if an unlabeled anchor link a=(u,v) is identified as positive (i.e., existing), all the other unlabeled anchor links incident to account u or account v will be negative (i.e., non-existing) automatically. Viewed in such a perspective, asking for the labels of potential positive anchor links in the unlabeled set will be rewarding in the active anchor link prediction problem. Various novel anchor link information gain measures are defined in this paper, based on which several constraint active anchor link prediction methods are introduced. Extensive experiments have been done on real-world social network datasets to compare the performance of these methods with state-of-art anchor link prediction methods. The experimental results show that the proposed Mean-entropy-based Constrained Active Learning (MC) method can outperform other methods with significant advantages. PMID:28771201

  12. Constrained Active Learning for Anchor Link Prediction Across Multiple Heterogeneous Social Networks.

    PubMed

    Zhu, Junxing; Zhang, Jiawei; Wu, Quanyuan; Jia, Yan; Zhou, Bin; Wei, Xiaokai; Yu, Philip S

    2017-08-03

    Nowadays, people are usually involved in multiple heterogeneous social networks simultaneously. Discovering the anchor links between the accounts owned by the same users across different social networks is crucial for many important inter-network applications, e.g., cross-network link transfer and cross-network recommendation. Many different supervised models have been proposed to predict anchor links so far, but they are effective only when the labeled anchor links are abundant. However, in real scenarios, such a requirement can hardly be met and most anchor links are unlabeled, since manually labeling the inter-network anchor links is quite costly and tedious. To overcome such a problem and utilize the numerous unlabeled anchor links in model building, in this paper, we introduce the active learning based anchor link prediction problem. Different from the traditional active learning problems, due to the one-to-one constraint on anchor links, if an unlabeled anchor link a = ( u , v ) is identified as positive (i.e., existing), all the other unlabeled anchor links incident to account u or account v will be negative (i.e., non-existing) automatically. Viewed in such a perspective, asking for the labels of potential positive anchor links in the unlabeled set will be rewarding in the active anchor link prediction problem. Various novel anchor link information gain measures are defined in this paper, based on which several constraint active anchor link prediction methods are introduced. Extensive experiments have been done on real-world social network datasets to compare the performance of these methods with state-of-art anchor link prediction methods. The experimental results show that the proposed Mean-entropy-based Constrained Active Learning (MC) method can outperform other methods with significant advantages.

  13. Prediction and Validation of Disease Genes Using HeteSim Scores.

    PubMed

    Zeng, Xiangxiang; Liao, Yuanlu; Liu, Yuansheng; Zou, Quan

    2017-01-01

    Deciphering the gene disease association is an important goal in biomedical research. In this paper, we use a novel relevance measure, called HeteSim, to prioritize candidate disease genes. Two methods based on heterogeneous networks constructed using protein-protein interaction, gene-phenotype associations, and phenotype-phenotype similarity, are presented. In HeteSim_MultiPath (HSMP), HeteSim scores of different paths are combined with a constant that dampens the contributions of longer paths. In HeteSim_SVM (HSSVM), HeteSim scores are combined with a machine learning method. The 3-fold experiments show that our non-machine learning method HSMP performs better than the existing non-machine learning methods, our machine learning method HSSVM obtains similar accuracy with the best existing machine learning method CATAPULT. From the analysis of the top 10 predicted genes for different diseases, we found that HSSVM avoid the disadvantage of the existing machine learning based methods, which always predict similar genes for different diseases. The data sets and Matlab code for the two methods are freely available for download at http://lab.malab.cn/data/HeteSim/index.jsp.

  14. A prediction model of drug-induced ototoxicity developed by an optimal support vector machine (SVM) method.

    PubMed

    Zhou, Shu; Li, Guo-Bo; Huang, Lu-Yi; Xie, Huan-Zhang; Zhao, Ying-Lan; Chen, Yu-Zong; Li, Lin-Li; Yang, Sheng-Yong

    2014-08-01

    Drug-induced ototoxicity, as a toxic side effect, is an important issue needed to be considered in drug discovery. Nevertheless, current experimental methods used to evaluate drug-induced ototoxicity are often time-consuming and expensive, indicating that they are not suitable for a large-scale evaluation of drug-induced ototoxicity in the early stage of drug discovery. We thus, in this investigation, established an effective computational prediction model of drug-induced ototoxicity using an optimal support vector machine (SVM) method, GA-CG-SVM. Three GA-CG-SVM models were developed based on three training sets containing agents bearing different risk levels of drug-induced ototoxicity. For comparison, models based on naïve Bayesian (NB) and recursive partitioning (RP) methods were also used on the same training sets. Among all the prediction models, the GA-CG-SVM model II showed the best performance, which offered prediction accuracies of 85.33% and 83.05% for two independent test sets, respectively. Overall, the good performance of the GA-CG-SVM model II indicates that it could be used for the prediction of drug-induced ototoxicity in the early stage of drug discovery. Copyright © 2014 Elsevier Ltd. All rights reserved.

  15. A novel prediction method about single components of analog circuits based on complex field modeling.

    PubMed

    Zhou, Jingyu; Tian, Shulin; Yang, Chenglin

    2014-01-01

    Few researches pay attention to prediction about analog circuits. The few methods lack the correlation with circuit analysis during extracting and calculating features so that FI (fault indicator) calculation often lack rationality, thus affecting prognostic performance. To solve the above problem, this paper proposes a novel prediction method about single components of analog circuits based on complex field modeling. Aiming at the feature that faults of single components hold the largest number in analog circuits, the method starts with circuit structure, analyzes transfer function of circuits, and implements complex field modeling. Then, by an established parameter scanning model related to complex field, it analyzes the relationship between parameter variation and degeneration of single components in the model in order to obtain a more reasonable FI feature set via calculation. According to the obtained FI feature set, it establishes a novel model about degeneration trend of analog circuits' single components. At last, it uses particle filter (PF) to update parameters for the model and predicts remaining useful performance (RUP) of analog circuits' single components. Since calculation about the FI feature set is more reasonable, accuracy of prediction is improved to some extent. Finally, the foregoing conclusions are verified by experiments.

  16. Prediction of redox-sensitive cysteines using sequential distance and other sequence-based features.

    PubMed

    Sun, Ming-An; Zhang, Qing; Wang, Yejun; Ge, Wei; Guo, Dianjing

    2016-08-24

    Reactive oxygen species can modify the structure and function of proteins and may also act as important signaling molecules in various cellular processes. Cysteine thiol groups of proteins are particularly susceptible to oxidation. Meanwhile, their reversible oxidation is of critical roles for redox regulation and signaling. Recently, several computational tools have been developed for predicting redox-sensitive cysteines; however, those methods either only focus on catalytic redox-sensitive cysteines in thiol oxidoreductases, or heavily depend on protein structural data, thus cannot be widely used. In this study, we analyzed various sequence-based features potentially related to cysteine redox-sensitivity, and identified three types of features for efficient computational prediction of redox-sensitive cysteines. These features are: sequential distance to the nearby cysteines, PSSM profile and predicted secondary structure of flanking residues. After further feature selection using SVM-RFE, we developed Redox-Sensitive Cysteine Predictor (RSCP), a SVM based classifier for redox-sensitive cysteine prediction using primary sequence only. Using 10-fold cross-validation on RSC758 dataset, the accuracy, sensitivity, specificity, MCC and AUC were estimated as 0.679, 0.602, 0.756, 0.362 and 0.727, respectively. When evaluated using 10-fold cross-validation with BALOSCTdb dataset which has structure information, the model achieved performance comparable to current structure-based method. Further validation using an independent dataset indicates it is robust and of relatively better accuracy for predicting redox-sensitive cysteines from non-enzyme proteins. In this study, we developed a sequence-based classifier for predicting redox-sensitive cysteines. The major advantage of this method is that it does not rely on protein structure data, which ensures more extensive application compared to other current implementations. Accurate prediction of redox-sensitive cysteines not only enhances our understanding about the redox sensitivity of cysteine, it may also complement the proteomics approach and facilitate further experimental investigation of important redox-sensitive cysteines.

  17. The clustering-based case-based reasoning for imbalanced business failure prediction: a hybrid approach through integrating unsupervised process with supervised process

    NASA Astrophysics Data System (ADS)

    Li, Hui; Yu, Jun-Ling; Yu, Le-An; Sun, Jie

    2014-05-01

    Case-based reasoning (CBR) is one of the main forecasting methods in business forecasting, which performs well in prediction and holds the ability of giving explanations for the results. In business failure prediction (BFP), the number of failed enterprises is relatively small, compared with the number of non-failed ones. However, the loss is huge when an enterprise fails. Therefore, it is necessary to develop methods (trained on imbalanced samples) which forecast well for this small proportion of failed enterprises and performs accurately on total accuracy meanwhile. Commonly used methods constructed on the assumption of balanced samples do not perform well in predicting minority samples on imbalanced samples consisting of the minority/failed enterprises and the majority/non-failed ones. This article develops a new method called clustering-based CBR (CBCBR), which integrates clustering analysis, an unsupervised process, with CBR, a supervised process, to enhance the efficiency of retrieving information from both minority and majority in CBR. In CBCBR, various case classes are firstly generated through hierarchical clustering inside stored experienced cases, and class centres are calculated out by integrating cases information in the same clustered class. When predicting the label of a target case, its nearest clustered case class is firstly retrieved by ranking similarities between the target case and each clustered case class centre. Then, nearest neighbours of the target case in the determined clustered case class are retrieved. Finally, labels of the nearest experienced cases are used in prediction. In the empirical experiment with two imbalanced samples from China, the performance of CBCBR was compared with the classical CBR, a support vector machine, a logistic regression and a multi-variant discriminate analysis. The results show that compared with the other four methods, CBCBR performed significantly better in terms of sensitivity for identifying the minority samples and generated high total accuracy meanwhile. The proposed approach makes CBR useful in imbalanced forecasting.

  18. Life prediction modeling based on cyclic damage accumulation

    NASA Technical Reports Server (NTRS)

    Nelson, Richard S.

    1988-01-01

    A high temperature, low cycle fatigue life prediction method was developed. This method, Cyclic Damage Accumulation (CDA), was developed for use in predicting the crack initiation lifetime of gas turbine engine materials, where initiation was defined as a 0.030 inch surface length crack. A principal engineering feature of the CDA method is the minimum data base required for implementation. Model constants can be evaluated through a few simple specimen tests such as monotonic loading and rapic cycle fatigue. The method was expanded to account for the effects on creep-fatigue life of complex loadings such as thermomechanical fatigue, hold periods, waveshapes, mean stresses, multiaxiality, cumulative damage, coatings, and environmental attack. A significant data base was generated on the behavior of the cast nickel-base superalloy B1900+Hf, including hundreds of specimen tests under such loading conditions. This information is being used to refine and extend the CDA life prediction model, which is now nearing completion. The model is also being verified using additional specimen tests on wrought INCO 718, and the final version of the model is expected to be adaptable to most any high-temperature alloy. The model is currently available in the form of equations and related constants. A proposed contract addition will make the model available in the near future in the form of a computer code to potential users.

  19. A Comparative study of two RVE modelling methods for chopped carbon fiber SMC

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chen, Zhangxing; Li, Yi; Shao, Yimin

    To achieve vehicle light-weighting, the chopped carbon fiber sheet molding compound (SMC) is identified as a promising material to replace metals. However, there are no effective tools and methods to predict the mechanical property of the chopped carbon fiber SMC due to the high complexity in microstructure features and the anisotropic properties. In this paper, the Representative Volume Element (RVE) approach is used to model the SMC microstructure. Two modeling methods, the Voronoi diagram-based method and the chip packing method, are developed for material RVE property prediction. The two methods are compared in terms of the predicted elastic modulus andmore » the predicted results are validated using the Digital Image Correlation (DIC) tensile test results. Furthermore, the advantages and shortcomings of these two methods are discussed in terms of the required input information and the convenience of use in the integrated processing-microstructure-property analysis.« less

  20. Short-term solar flare prediction using image-case-based reasoning

    NASA Astrophysics Data System (ADS)

    Liu, Jin-Fu; Li, Fei; Zhang, Huai-Peng; Yu, Da-Ren

    2017-10-01

    Solar flares strongly influence space weather and human activities, and their prediction is highly complex. The existing solutions such as data based approaches and model based approaches have a common shortcoming which is the lack of human engagement in the forecasting process. An image-case-based reasoning method is introduced to achieve this goal. The image case library is composed of SOHO/MDI longitudinal magnetograms, the images from which exhibit the maximum horizontal gradient, the length of the neutral line and the number of singular points that are extracted for retrieving similar image cases. Genetic optimization algorithms are employed for optimizing the weight assignment for image features and the number of similar image cases retrieved. Similar image cases and prediction results derived by majority voting for these similar image cases are output and shown to the forecaster in order to integrate his/her experience with the final prediction results. Experimental results demonstrate that the case-based reasoning approach has slightly better performance than other methods, and is more efficient with forecasts improved by humans.

Top