Sample records for prediction accuracy based

  1. ShinyGPAS: interactive genomic prediction accuracy simulator based on deterministic formulas.

    PubMed

    Morota, Gota

    2017-12-20

    Deterministic formulas for the accuracy of genomic predictions highlight the relationships among prediction accuracy and potential factors influencing prediction accuracy prior to performing computationally intensive cross-validation. Visualizing such deterministic formulas in an interactive manner may lead to a better understanding of how genetic factors control prediction accuracy. The software to simulate deterministic formulas for genomic prediction accuracy was implemented in R and encapsulated as a web-based Shiny application. Shiny genomic prediction accuracy simulator (ShinyGPAS) simulates various deterministic formulas and delivers dynamic scatter plots of prediction accuracy versus genetic factors impacting prediction accuracy, while requiring only mouse navigation in a web browser. ShinyGPAS is available at: https://chikudaisei.shinyapps.io/shinygpas/ . ShinyGPAS is a shiny-based interactive genomic prediction accuracy simulator using deterministic formulas. It can be used for interactively exploring potential factors that influence prediction accuracy in genome-enabled prediction, simulating achievable prediction accuracy prior to genotyping individuals, or supporting in-class teaching. ShinyGPAS is open source software and it is hosted online as a freely available web-based resource with an intuitive graphical user interface.

  2. Research on Improved Depth Belief Network-Based Prediction of Cardiovascular Diseases

    PubMed Central

    Zhang, Hongpo

    2018-01-01

    Quantitative analysis and prediction can help to reduce the risk of cardiovascular disease. Quantitative prediction based on traditional model has low accuracy. The variance of model prediction based on shallow neural network is larger. In this paper, cardiovascular disease prediction model based on improved deep belief network (DBN) is proposed. Using the reconstruction error, the network depth is determined independently, and unsupervised training and supervised optimization are combined. It ensures the accuracy of model prediction while guaranteeing stability. Thirty experiments were performed independently on the Statlog (Heart) and Heart Disease Database data sets in the UCI database. Experimental results showed that the mean of prediction accuracy was 91.26% and 89.78%, respectively. The variance of prediction accuracy was 5.78 and 4.46, respectively. PMID:29854369

  3. Accuracy of Predicted Genomic Breeding Values in Purebred and Crossbred Pigs.

    PubMed

    Hidalgo, André M; Bastiaansen, John W M; Lopes, Marcos S; Harlizius, Barbara; Groenen, Martien A M; de Koning, Dirk-Jan

    2015-05-26

    Genomic selection has been widely implemented in dairy cattle breeding when the aim is to improve performance of purebred animals. In pigs, however, the final product is a crossbred animal. This may affect the efficiency of methods that are currently implemented for dairy cattle. Therefore, the objective of this study was to determine the accuracy of predicted breeding values in crossbred pigs using purebred genomic and phenotypic data. A second objective was to compare the predictive ability of SNPs when training is done in either single or multiple populations for four traits: age at first insemination (AFI); total number of piglets born (TNB); litter birth weight (LBW); and litter variation (LVR). We performed marker-based and pedigree-based predictions. Within-population predictions for the four traits ranged from 0.21 to 0.72. Multi-population prediction yielded accuracies ranging from 0.18 to 0.67. Predictions across purebred populations as well as predicting genetic merit of crossbreds from their purebred parental lines for AFI performed poorly (not significantly different from zero). In contrast, accuracies of across-population predictions and accuracies of purebred to crossbred predictions for LBW and LVR ranged from 0.08 to 0.31 and 0.11 to 0.31, respectively. Accuracy for TNB was zero for across-population prediction, whereas for purebred to crossbred prediction it ranged from 0.08 to 0.22. In general, marker-based outperformed pedigree-based prediction across populations and traits. However, in some cases pedigree-based prediction performed similarly or outperformed marker-based prediction. There was predictive ability when purebred populations were used to predict crossbred genetic merit using an additive model in the populations studied. AFI was the only exception, indicating that predictive ability depends largely on the genetic correlation between PB and CB performance, which was 0.31 for AFI. Multi-population prediction was no better than within-population prediction for the purebred validation set. Accuracy of prediction was very trait-dependent. Copyright © 2015 Hidalgo et al.

  4. Evaluating the accuracy of SHAPE-directed RNA secondary structure predictions

    PubMed Central

    Sükösd, Zsuzsanna; Swenson, M. Shel; Kjems, Jørgen; Heitsch, Christine E.

    2013-01-01

    Recent advances in RNA structure determination include using data from high-throughput probing experiments to improve thermodynamic prediction accuracy. We evaluate the extent and nature of improvements in data-directed predictions for a diverse set of 16S/18S ribosomal sequences using a stochastic model of experimental SHAPE data. The average accuracy for 1000 data-directed predictions always improves over the original minimum free energy (MFE) structure. However, the amount of improvement varies with the sequence, exhibiting a correlation with MFE accuracy. Further analysis of this correlation shows that accurate MFE base pairs are typically preserved in a data-directed prediction, whereas inaccurate ones are not. Thus, the positive predictive value of common base pairs is consistently higher than the directed prediction accuracy. Finally, we confirm sequence dependencies in the directability of thermodynamic predictions and investigate the potential for greater accuracy improvements in the worst performing test sequence. PMID:23325843

  5. The effect of using genealogy-based haplotypes for genomic prediction

    PubMed Central

    2013-01-01

    Background Genomic prediction uses two sources of information: linkage disequilibrium between markers and quantitative trait loci, and additive genetic relationships between individuals. One way to increase the accuracy of genomic prediction is to capture more linkage disequilibrium by regression on haplotypes instead of regression on individual markers. The aim of this study was to investigate the accuracy of genomic prediction using haplotypes based on local genealogy information. Methods A total of 4429 Danish Holstein bulls were genotyped with the 50K SNP chip. Haplotypes were constructed using local genealogical trees. Effects of haplotype covariates were estimated with two types of prediction models: (1) assuming that effects had the same distribution for all haplotype covariates, i.e. the GBLUP method and (2) assuming that a large proportion (π) of the haplotype covariates had zero effect, i.e. a Bayesian mixture method. Results About 7.5 times more covariate effects were estimated when fitting haplotypes based on local genealogical trees compared to fitting individuals markers. Genealogy-based haplotype clustering slightly increased the accuracy of genomic prediction and, in some cases, decreased the bias of prediction. With the Bayesian method, accuracy of prediction was less sensitive to parameter π when fitting haplotypes compared to fitting markers. Conclusions Use of haplotypes based on genealogy can slightly increase the accuracy of genomic prediction. Improved methods to cluster the haplotypes constructed from local genealogy could lead to additional gains in accuracy. PMID:23496971

  6. The effect of using genealogy-based haplotypes for genomic prediction.

    PubMed

    Edriss, Vahid; Fernando, Rohan L; Su, Guosheng; Lund, Mogens S; Guldbrandtsen, Bernt

    2013-03-06

    Genomic prediction uses two sources of information: linkage disequilibrium between markers and quantitative trait loci, and additive genetic relationships between individuals. One way to increase the accuracy of genomic prediction is to capture more linkage disequilibrium by regression on haplotypes instead of regression on individual markers. The aim of this study was to investigate the accuracy of genomic prediction using haplotypes based on local genealogy information. A total of 4429 Danish Holstein bulls were genotyped with the 50K SNP chip. Haplotypes were constructed using local genealogical trees. Effects of haplotype covariates were estimated with two types of prediction models: (1) assuming that effects had the same distribution for all haplotype covariates, i.e. the GBLUP method and (2) assuming that a large proportion (π) of the haplotype covariates had zero effect, i.e. a Bayesian mixture method. About 7.5 times more covariate effects were estimated when fitting haplotypes based on local genealogical trees compared to fitting individuals markers. Genealogy-based haplotype clustering slightly increased the accuracy of genomic prediction and, in some cases, decreased the bias of prediction. With the Bayesian method, accuracy of prediction was less sensitive to parameter π when fitting haplotypes compared to fitting markers. Use of haplotypes based on genealogy can slightly increase the accuracy of genomic prediction. Improved methods to cluster the haplotypes constructed from local genealogy could lead to additional gains in accuracy.

  7. Assessing the accuracy of predictive models for numerical data: Not r nor r2, why not? Then what?

    PubMed Central

    2017-01-01

    Assessing the accuracy of predictive models is critical because predictive models have been increasingly used across various disciplines and predictive accuracy determines the quality of resultant predictions. Pearson product-moment correlation coefficient (r) and the coefficient of determination (r2) are among the most widely used measures for assessing predictive models for numerical data, although they are argued to be biased, insufficient and misleading. In this study, geometrical graphs were used to illustrate what were used in the calculation of r and r2 and simulations were used to demonstrate the behaviour of r and r2 and to compare three accuracy measures under various scenarios. Relevant confusions about r and r2, has been clarified. The calculation of r and r2 is not based on the differences between the predicted and observed values. The existing error measures suffer various limitations and are unable to tell the accuracy. Variance explained by predictive models based on cross-validation (VEcv) is free of these limitations and is a reliable accuracy measure. Legates and McCabe’s efficiency (E1) is also an alternative accuracy measure. The r and r2 do not measure the accuracy and are incorrect accuracy measures. The existing error measures suffer limitations. VEcv and E1 are recommended for assessing the accuracy. The applications of these accuracy measures would encourage accuracy-improved predictive models to be developed to generate predictions for evidence-informed decision-making. PMID:28837692

  8. Analysis of spatial distribution of land cover maps accuracy

    NASA Astrophysics Data System (ADS)

    Khatami, R.; Mountrakis, G.; Stehman, S. V.

    2017-12-01

    Land cover maps have become one of the most important products of remote sensing science. However, classification errors will exist in any classified map and affect the reliability of subsequent map usage. Moreover, classification accuracy often varies over different regions of a classified map. These variations of accuracy will affect the reliability of subsequent analyses of different regions based on the classified maps. The traditional approach of map accuracy assessment based on an error matrix does not capture the spatial variation in classification accuracy. Here, per-pixel accuracy prediction methods are proposed based on interpolating accuracy values from a test sample to produce wall-to-wall accuracy maps. Different accuracy prediction methods were developed based on four factors: predictive domain (spatial versus spectral), interpolation function (constant, linear, Gaussian, and logistic), incorporation of class information (interpolating each class separately versus grouping them together), and sample size. Incorporation of spectral domain as explanatory feature spaces of classification accuracy interpolation was done for the first time in this research. Performance of the prediction methods was evaluated using 26 test blocks, with 10 km × 10 km dimensions, dispersed throughout the United States. The performance of the predictions was evaluated using the area under the curve (AUC) of the receiver operating characteristic. Relative to existing accuracy prediction methods, our proposed methods resulted in improvements of AUC of 0.15 or greater. Evaluation of the four factors comprising the accuracy prediction methods demonstrated that: i) interpolations should be done separately for each class instead of grouping all classes together; ii) if an all-classes approach is used, the spectral domain will result in substantially greater AUC than the spatial domain; iii) for the smaller sample size and per-class predictions, the spectral and spatial domain yielded similar AUC; iv) for the larger sample size (i.e., very dense spatial sample) and per-class predictions, the spatial domain yielded larger AUC; v) increasing the sample size improved accuracy predictions with a greater benefit accruing to the spatial domain; and vi) the function used for interpolation had the smallest effect on AUC.

  9. Improving transmembrane protein consensus topology prediction using inter-helical interaction.

    PubMed

    Wang, Han; Zhang, Chao; Shi, Xiaohu; Zhang, Li; Zhou, You

    2012-11-01

    Alpha helix transmembrane proteins (αTMPs) represent roughly 30% of all open reading frames (ORFs) in a typical genome and are involved in many critical biological processes. Due to the special physicochemical properties, it is hard to crystallize and obtain high resolution structures experimentally, thus, sequence-based topology prediction is highly desirable for the study of transmembrane proteins (TMPs), both in structure prediction and function prediction. Various model-based topology prediction methods have been developed, but the accuracy of those individual predictors remain poor due to the limitation of the methods or the features they used. Thus, the consensus topology prediction method becomes practical for high accuracy applications by combining the advances of the individual predictors. Here, based on the observation that inter-helical interactions are commonly found within the transmembrane helixes (TMHs) and strongly indicate the existence of them, we present a novel consensus topology prediction method for αTMPs, CNTOP, which incorporates four top leading individual topology predictors, and further improves the prediction accuracy by using the predicted inter-helical interactions. The method achieved 87% prediction accuracy based on a benchmark dataset and 78% accuracy based on a non-redundant dataset which is composed of polytopic αTMPs. Our method derives the highest topology accuracy than any other individual predictors and consensus predictors, at the same time, the TMHs are more accurately predicted in their length and locations, where both the false positives (FPs) and the false negatives (FNs) decreased dramatically. The CNTOP is available at: http://ccst.jlu.edu.cn/JCSB/cntop/CNTOP.html. Copyright © 2012 Elsevier B.V. All rights reserved.

  10. Improved method for predicting protein fold patterns with ensemble classifiers.

    PubMed

    Chen, W; Liu, X; Huang, Y; Jiang, Y; Zou, Q; Lin, C

    2012-01-27

    Protein folding is recognized as a critical problem in the field of biophysics in the 21st century. Predicting protein-folding patterns is challenging due to the complex structure of proteins. In an attempt to solve this problem, we employed ensemble classifiers to improve prediction accuracy. In our experiments, 188-dimensional features were extracted based on the composition and physical-chemical property of proteins and 20-dimensional features were selected using a coupled position-specific scoring matrix. Compared with traditional prediction methods, these methods were superior in terms of prediction accuracy. The 188-dimensional feature-based method achieved 71.2% accuracy in five cross-validations. The accuracy rose to 77% when we used a 20-dimensional feature vector. These methods were used on recent data, with 54.2% accuracy. Source codes and dataset, together with web server and software tools for prediction, are available at: http://datamining.xmu.edu.cn/main/~cwc/ProteinPredict.html.

  11. Analysis of energy-based algorithms for RNA secondary structure prediction

    PubMed Central

    2012-01-01

    Background RNA molecules play critical roles in the cells of organisms, including roles in gene regulation, catalysis, and synthesis of proteins. Since RNA function depends in large part on its folded structures, much effort has been invested in developing accurate methods for prediction of RNA secondary structure from the base sequence. Minimum free energy (MFE) predictions are widely used, based on nearest neighbor thermodynamic parameters of Mathews, Turner et al. or those of Andronescu et al. Some recently proposed alternatives that leverage partition function calculations find the structure with maximum expected accuracy (MEA) or pseudo-expected accuracy (pseudo-MEA) methods. Advances in prediction methods are typically benchmarked using sensitivity, positive predictive value and their harmonic mean, namely F-measure, on datasets of known reference structures. Since such benchmarks document progress in improving accuracy of computational prediction methods, it is important to understand how measures of accuracy vary as a function of the reference datasets and whether advances in algorithms or thermodynamic parameters yield statistically significant improvements. Our work advances such understanding for the MFE and (pseudo-)MEA-based methods, with respect to the latest datasets and energy parameters. Results We present three main findings. First, using the bootstrap percentile method, we show that the average F-measure accuracy of the MFE and (pseudo-)MEA-based algorithms, as measured on our largest datasets with over 2000 RNAs from diverse families, is a reliable estimate (within a 2% range with high confidence) of the accuracy of a population of RNA molecules represented by this set. However, average accuracy on smaller classes of RNAs such as a class of 89 Group I introns used previously in benchmarking algorithm accuracy is not reliable enough to draw meaningful conclusions about the relative merits of the MFE and MEA-based algorithms. Second, on our large datasets, the algorithm with best overall accuracy is a pseudo MEA-based algorithm of Hamada et al. that uses a generalized centroid estimator of base pairs. However, between MFE and other MEA-based methods, there is no clear winner in the sense that the relative accuracy of the MFE versus MEA-based algorithms changes depending on the underlying energy parameters. Third, of the four parameter sets we considered, the best accuracy for the MFE-, MEA-based, and pseudo-MEA-based methods is 0.686, 0.680, and 0.711, respectively (on a scale from 0 to 1 with 1 meaning perfect structure predictions) and is obtained with a thermodynamic parameter set obtained by Andronescu et al. called BL* (named after the Boltzmann likelihood method by which the parameters were derived). Conclusions Large datasets should be used to obtain reliable measures of the accuracy of RNA structure prediction algorithms, and average accuracies on specific classes (such as Group I introns and Transfer RNAs) should be interpreted with caution, considering the relatively small size of currently available datasets for such classes. The accuracy of the MEA-based methods is significantly higher when using the BL* parameter set of Andronescu et al. than when using the parameters of Mathews and Turner, and there is no significant difference between the accuracy of MEA-based methods and MFE when using the BL* parameters. The pseudo-MEA-based method of Hamada et al. with the BL* parameter set significantly outperforms all other MFE and MEA-based algorithms on our large data sets. PMID:22296803

  12. Analysis of energy-based algorithms for RNA secondary structure prediction.

    PubMed

    Hajiaghayi, Monir; Condon, Anne; Hoos, Holger H

    2012-02-01

    RNA molecules play critical roles in the cells of organisms, including roles in gene regulation, catalysis, and synthesis of proteins. Since RNA function depends in large part on its folded structures, much effort has been invested in developing accurate methods for prediction of RNA secondary structure from the base sequence. Minimum free energy (MFE) predictions are widely used, based on nearest neighbor thermodynamic parameters of Mathews, Turner et al. or those of Andronescu et al. Some recently proposed alternatives that leverage partition function calculations find the structure with maximum expected accuracy (MEA) or pseudo-expected accuracy (pseudo-MEA) methods. Advances in prediction methods are typically benchmarked using sensitivity, positive predictive value and their harmonic mean, namely F-measure, on datasets of known reference structures. Since such benchmarks document progress in improving accuracy of computational prediction methods, it is important to understand how measures of accuracy vary as a function of the reference datasets and whether advances in algorithms or thermodynamic parameters yield statistically significant improvements. Our work advances such understanding for the MFE and (pseudo-)MEA-based methods, with respect to the latest datasets and energy parameters. We present three main findings. First, using the bootstrap percentile method, we show that the average F-measure accuracy of the MFE and (pseudo-)MEA-based algorithms, as measured on our largest datasets with over 2000 RNAs from diverse families, is a reliable estimate (within a 2% range with high confidence) of the accuracy of a population of RNA molecules represented by this set. However, average accuracy on smaller classes of RNAs such as a class of 89 Group I introns used previously in benchmarking algorithm accuracy is not reliable enough to draw meaningful conclusions about the relative merits of the MFE and MEA-based algorithms. Second, on our large datasets, the algorithm with best overall accuracy is a pseudo MEA-based algorithm of Hamada et al. that uses a generalized centroid estimator of base pairs. However, between MFE and other MEA-based methods, there is no clear winner in the sense that the relative accuracy of the MFE versus MEA-based algorithms changes depending on the underlying energy parameters. Third, of the four parameter sets we considered, the best accuracy for the MFE-, MEA-based, and pseudo-MEA-based methods is 0.686, 0.680, and 0.711, respectively (on a scale from 0 to 1 with 1 meaning perfect structure predictions) and is obtained with a thermodynamic parameter set obtained by Andronescu et al. called BL* (named after the Boltzmann likelihood method by which the parameters were derived). Large datasets should be used to obtain reliable measures of the accuracy of RNA structure prediction algorithms, and average accuracies on specific classes (such as Group I introns and Transfer RNAs) should be interpreted with caution, considering the relatively small size of currently available datasets for such classes. The accuracy of the MEA-based methods is significantly higher when using the BL* parameter set of Andronescu et al. than when using the parameters of Mathews and Turner, and there is no significant difference between the accuracy of MEA-based methods and MFE when using the BL* parameters. The pseudo-MEA-based method of Hamada et al. with the BL* parameter set significantly outperforms all other MFE and MEA-based algorithms on our large data sets.

  13. Potential and limits to unravel the genetic architecture and predict the variation of Fusarium head blight resistance in European winter wheat (Triticum aestivum L.).

    PubMed

    Jiang, Y; Zhao, Y; Rodemann, B; Plieske, J; Kollers, S; Korzun, V; Ebmeyer, E; Argillier, O; Hinze, M; Ling, J; Röder, M S; Ganal, M W; Mette, M F; Reif, J C

    2015-03-01

    Genome-wide mapping approaches in diverse populations are powerful tools to unravel the genetic architecture of complex traits. The main goals of our study were to investigate the potential and limits to unravel the genetic architecture and to identify the factors determining the accuracy of prediction of the genotypic variation of Fusarium head blight (FHB) resistance in wheat (Triticum aestivum L.) based on data collected with a diverse panel of 372 European varieties. The wheat lines were phenotyped in multi-location field trials for FHB resistance and genotyped with 782 simple sequence repeat (SSR) markers, and 9k and 90k single-nucleotide polymorphism (SNP) arrays. We applied genome-wide association mapping in combination with fivefold cross-validations and observed surprisingly high accuracies of prediction for marker-assisted selection based on the detected quantitative trait loci (QTLs). Using a random sample of markers not selected for marker-trait associations revealed only a slight decrease in prediction accuracy compared with marker-based selection exploiting the QTL information. The same picture was confirmed in a simulation study, suggesting that relatedness is a main driver of the accuracy of prediction in marker-assisted selection of FHB resistance. When the accuracy of prediction of three genomic selection models was contrasted for the three marker data sets, no significant differences in accuracies among marker platforms and genomic selection models were observed. Marker density impacted the accuracy of prediction only marginally. Consequently, genomic selection of FHB resistance can be implemented most cost-efficiently based on low- to medium-density SNP arrays.

  14. The use of genomic information increases the accuracy of breeding value predictions for sea louse (Caligus rogercresseyi) resistance in Atlantic salmon (Salmo salar).

    PubMed

    Correa, Katharina; Bangera, Rama; Figueroa, René; Lhorente, Jean P; Yáñez, José M

    2017-01-31

    Sea lice infestations caused by Caligus rogercresseyi are a main concern to the salmon farming industry due to associated economic losses. Resistance to this parasite was shown to have low to moderate genetic variation and its genetic architecture was suggested to be polygenic. The aim of this study was to compare accuracies of breeding value predictions obtained with pedigree-based best linear unbiased prediction (P-BLUP) methodology against different genomic prediction approaches: genomic BLUP (G-BLUP), Bayesian Lasso, and Bayes C. To achieve this, 2404 individuals from 118 families were measured for C. rogercresseyi count after a challenge and genotyped using 37 K single nucleotide polymorphisms. Accuracies were assessed using fivefold cross-validation and SNP densities of 0.5, 1, 5, 10, 25 and 37 K. Accuracy of genomic predictions increased with increasing SNP density and was higher than pedigree-based BLUP predictions by up to 22%. Both Bayesian and G-BLUP methods can predict breeding values with higher accuracies than pedigree-based BLUP, however, G-BLUP may be the preferred method because of reduced computation time and ease of implementation. A relatively low marker density (i.e. 10 K) is sufficient for maximal increase in accuracy when using G-BLUP or Bayesian methods for genomic prediction of C. rogercresseyi resistance in Atlantic salmon.

  15. Genetic algorithm based adaptive neural network ensemble and its application in predicting carbon flux

    USGS Publications Warehouse

    Xue, Y.; Liu, S.; Hu, Y.; Yang, J.; Chen, Q.

    2007-01-01

    To improve the accuracy in prediction, Genetic Algorithm based Adaptive Neural Network Ensemble (GA-ANNE) is presented. Intersections are allowed between different training sets based on the fuzzy clustering analysis, which ensures the diversity as well as the accuracy of individual Neural Networks (NNs). Moreover, to improve the accuracy of the adaptive weights of individual NNs, GA is used to optimize the cluster centers. Empirical results in predicting carbon flux of Duke Forest reveal that GA-ANNE can predict the carbon flux more accurately than Radial Basis Function Neural Network (RBFNN), Bagging NN ensemble, and ANNE. ?? 2007 IEEE.

  16. Accuracy of predicting genomic breeding values for residual feed intake in Angus and Charolais beef cattle.

    PubMed

    Chen, L; Schenkel, F; Vinsky, M; Crews, D H; Li, C

    2013-10-01

    In beef cattle, phenotypic data that are difficult and/or costly to measure, such as feed efficiency, and DNA marker genotypes are usually available on a small number of animals of different breeds or populations. To achieve a maximal accuracy of genomic prediction using the phenotype and genotype data, strategies for forming a training population to predict genomic breeding values (GEBV) of the selection candidates need to be evaluated. In this study, we examined the accuracy of predicting GEBV for residual feed intake (RFI) based on 522 Angus and 395 Charolais steers genotyped on SNP with the Illumina Bovine SNP50 Beadchip for 3 training population forming strategies: within breed, across breed, and by pooling data from the 2 breeds (i.e., combined). Two other scenarios with the training and validation data split by birth year and by sire family within a breed were also investigated to assess the impact of genetic relationships on the accuracy of genomic prediction. Three statistical methods including the best linear unbiased prediction with the relationship matrix defined based on the pedigree (PBLUP), based on the SNP genotypes (GBLUP), and a Bayesian method (BayesB) were used to predict the GEBV. The results showed that the accuracy of the GEBV prediction was the highest when the prediction was within breed and when the validation population had greater genetic relationships with the training population, with a maximum of 0.58 for Angus and 0.64 for Charolais. The within-breed prediction accuracies dropped to 0.29 and 0.38, respectively, when the validation populations had a minimal pedigree link with the training population. When the training population of a different breed was used to predict the GEBV of the validation population, that is, across-breed genomic prediction, the accuracies were further reduced to 0.10 to 0.22, depending on the prediction method used. Pooling data from the 2 breeds to form the training population resulted in accuracies increased to 0.31 and 0.43, respectively, for the Angus and Charolais validation populations. The results suggested that the genetic relationship of selection candidates with the training population has a greater impact on the accuracy of GEBV using the Illumina Bovine SNP50 Beadchip. Pooling data from different breeds to form the training population will improve the accuracy of across breed genomic prediction for RFI in beef cattle.

  17. Exploring Mouse Protein Function via Multiple Approaches.

    PubMed

    Huang, Guohua; Chu, Chen; Huang, Tao; Kong, Xiangyin; Zhang, Yunhua; Zhang, Ning; Cai, Yu-Dong

    2016-01-01

    Although the number of available protein sequences is growing exponentially, functional protein annotations lag far behind. Therefore, accurate identification of protein functions remains one of the major challenges in molecular biology. In this study, we presented a novel approach to predict mouse protein functions. The approach was a sequential combination of a similarity-based approach, an interaction-based approach and a pseudo amino acid composition-based approach. The method achieved an accuracy of about 0.8450 for the 1st-order predictions in the leave-one-out and ten-fold cross-validations. For the results yielded by the leave-one-out cross-validation, although the similarity-based approach alone achieved an accuracy of 0.8756, it was unable to predict the functions of proteins with no homologues. Comparatively, the pseudo amino acid composition-based approach alone reached an accuracy of 0.6786. Although the accuracy was lower than that of the previous approach, it could predict the functions of almost all proteins, even proteins with no homologues. Therefore, the combined method balanced the advantages and disadvantages of both approaches to achieve efficient performance. Furthermore, the results yielded by the ten-fold cross-validation indicate that the combined method is still effective and stable when there are no close homologs are available. However, the accuracy of the predicted functions can only be determined according to known protein functions based on current knowledge. Many protein functions remain unknown. By exploring the functions of proteins for which the 1st-order predicted functions are wrong but the 2nd-order predicted functions are correct, the 1st-order wrongly predicted functions were shown to be closely associated with the genes encoding the proteins. The so-called wrongly predicted functions could also potentially be correct upon future experimental verification. Therefore, the accuracy of the presented method may be much higher in reality.

  18. Exploring Mouse Protein Function via Multiple Approaches

    PubMed Central

    Huang, Tao; Kong, Xiangyin; Zhang, Yunhua; Zhang, Ning

    2016-01-01

    Although the number of available protein sequences is growing exponentially, functional protein annotations lag far behind. Therefore, accurate identification of protein functions remains one of the major challenges in molecular biology. In this study, we presented a novel approach to predict mouse protein functions. The approach was a sequential combination of a similarity-based approach, an interaction-based approach and a pseudo amino acid composition-based approach. The method achieved an accuracy of about 0.8450 for the 1st-order predictions in the leave-one-out and ten-fold cross-validations. For the results yielded by the leave-one-out cross-validation, although the similarity-based approach alone achieved an accuracy of 0.8756, it was unable to predict the functions of proteins with no homologues. Comparatively, the pseudo amino acid composition-based approach alone reached an accuracy of 0.6786. Although the accuracy was lower than that of the previous approach, it could predict the functions of almost all proteins, even proteins with no homologues. Therefore, the combined method balanced the advantages and disadvantages of both approaches to achieve efficient performance. Furthermore, the results yielded by the ten-fold cross-validation indicate that the combined method is still effective and stable when there are no close homologs are available. However, the accuracy of the predicted functions can only be determined according to known protein functions based on current knowledge. Many protein functions remain unknown. By exploring the functions of proteins for which the 1st-order predicted functions are wrong but the 2nd-order predicted functions are correct, the 1st-order wrongly predicted functions were shown to be closely associated with the genes encoding the proteins. The so-called wrongly predicted functions could also potentially be correct upon future experimental verification. Therefore, the accuracy of the presented method may be much higher in reality. PMID:27846315

  19. The importance of information on relatives for the prediction of genomic breeding values and the implications for the makeup of reference data sets in livestock breeding schemes.

    PubMed

    Clark, Samuel A; Hickey, John M; Daetwyler, Hans D; van der Werf, Julius H J

    2012-02-09

    The theory of genomic selection is based on the prediction of the effects of genetic markers in linkage disequilibrium with quantitative trait loci. However, genomic selection also relies on relationships between individuals to accurately predict genetic value. This study aimed to examine the importance of information on relatives versus that of unrelated or more distantly related individuals on the estimation of genomic breeding values. Simulated and real data were used to examine the effects of various degrees of relationship on the accuracy of genomic selection. Genomic Best Linear Unbiased Prediction (gBLUP) was compared to two pedigree based BLUP methods, one with a shallow one generation pedigree and the other with a deep ten generation pedigree. The accuracy of estimated breeding values for different groups of selection candidates that had varying degrees of relationships to a reference data set of 1750 animals was investigated. The gBLUP method predicted breeding values more accurately than BLUP. The most accurate breeding values were estimated using gBLUP for closely related animals. Similarly, the pedigree based BLUP methods were also accurate for closely related animals, however when the pedigree based BLUP methods were used to predict unrelated animals, the accuracy was close to zero. In contrast, gBLUP breeding values, for animals that had no pedigree relationship with animals in the reference data set, allowed substantial accuracy. An animal's relationship to the reference data set is an important factor for the accuracy of genomic predictions. Animals that share a close relationship to the reference data set had the highest accuracy from genomic predictions. However a baseline accuracy that is driven by the reference data set size and the overall population effective population size enables gBLUP to estimate a breeding value for unrelated animals within a population (breed), using information previously ignored by pedigree based BLUP methods.

  20. Impact of fitting dominance and additive effects on accuracy of genomic prediction of breeding values in layers.

    PubMed

    Heidaritabar, M; Wolc, A; Arango, J; Zeng, J; Settar, P; Fulton, J E; O'Sullivan, N P; Bastiaansen, J W M; Fernando, R L; Garrick, D J; Dekkers, J C M

    2016-10-01

    Most genomic prediction studies fit only additive effects in models to estimate genomic breeding values (GEBV). However, if dominance genetic effects are an important source of variation for complex traits, accounting for them may improve the accuracy of GEBV. We investigated the effect of fitting dominance and additive effects on the accuracy of GEBV for eight egg production and quality traits in a purebred line of brown layers using pedigree or genomic information (42K single-nucleotide polymorphism (SNP) panel). Phenotypes were corrected for the effect of hatch date. Additive and dominance genetic variances were estimated using genomic-based [genomic best linear unbiased prediction (GBLUP)-REML and BayesC] and pedigree-based (PBLUP-REML) methods. Breeding values were predicted using a model that included both additive and dominance effects and a model that included only additive effects. The reference population consisted of approximately 1800 animals hatched between 2004 and 2009, while approximately 300 young animals hatched in 2010 were used for validation. Accuracy of prediction was computed as the correlation between phenotypes and estimated breeding values of the validation animals divided by the square root of the estimate of heritability in the whole population. The proportion of dominance variance to total phenotypic variance ranged from 0.03 to 0.22 with PBLUP-REML across traits, from 0 to 0.03 with GBLUP-REML and from 0.01 to 0.05 with BayesC. Accuracies of GEBV ranged from 0.28 to 0.60 across traits. Inclusion of dominance effects did not improve the accuracy of GEBV, and differences in their accuracies between genomic-based methods were small (0.01-0.05), with GBLUP-REML yielding higher prediction accuracies than BayesC for egg production, egg colour and yolk weight, while BayesC yielded higher accuracies than GBLUP-REML for the other traits. In conclusion, fitting dominance effects did not impact accuracy of genomic prediction of breeding values in this population. © 2016 Blackwell Verlag GmbH.

  1. Validity of Predictive Equations for Resting Energy Expenditure Developed for Obese Patients: Impact of Body Composition Method

    PubMed Central

    Achamrah, Najate; Jésus, Pierre; Grigioni, Sébastien; Rimbert, Agnès; Petit, André; Déchelotte, Pierre; Folope, Vanessa; Coëffier, Moïse

    2018-01-01

    Predictive equations have been specifically developed for obese patients to estimate resting energy expenditure (REE). Body composition (BC) assessment is needed for some of these equations. We assessed the impact of BC methods on the accuracy of specific predictive equations developed in obese patients. REE was measured (mREE) by indirect calorimetry and BC assessed by bioelectrical impedance analysis (BIA) and dual-energy X-ray absorptiometry (DXA). mREE, percentages of prediction accuracy (±10% of mREE) were compared. Predictive equations were studied in 2588 obese patients. Mean mREE was 1788 ± 6.3 kcal/24 h. Only the Müller (BIA) and Harris & Benedict (HB) equations provided REE with no difference from mREE. The Huang, Müller, Horie-Waitzberg, and HB formulas provided a higher accurate prediction (>60% of cases). The use of BIA provided better predictions of REE than DXA for the Huang and Müller equations. Inversely, the Horie-Waitzberg and Lazzer formulas provided a higher accuracy using DXA. Accuracy decreased when applied to patients with BMI ≥ 40, except for the Horie-Waitzberg and Lazzer (DXA) formulas. Müller equations based on BIA provided a marked improvement of REE prediction accuracy than equations not based on BC. The interest of BC to improve REE predictive equations accuracy in obese patients should be confirmed. PMID:29320432

  2. A review of propeller noise prediction methodology: 1919-1994

    NASA Technical Reports Server (NTRS)

    Metzger, F. Bruce

    1995-01-01

    This report summarizes a review of the literature regarding propeller noise prediction methods. The review is divided into six sections: (1) early methods; (2) more recent methods based on earlier theory; (3) more recent methods based on the Acoustic Analogy; (4) more recent methods based on Computational Acoustics; (5) empirical methods; and (6) broadband methods. The report concludes that there are a large number of noise prediction procedures available which vary markedly in complexity. Deficiencies in accuracy of methods in many cases may be related, not to the methods themselves, but the accuracy and detail of the aerodynamic inputs used to calculate noise. The steps recommended in the report to provide accurate and easy to use prediction methods are: (1) identify reliable test data; (2) define and conduct test programs to fill gaps in the existing data base; (3) identify the most promising prediction methods; (4) evaluate promising prediction methods relative to the data base; (5) identify and correct the weaknesses in the prediction methods, including lack of user friendliness, and include features now available only in research codes; (6) confirm the accuracy of improved prediction methods to the data base; and (7) make the methods widely available and provide training in their use.

  3. Performance of genomic prediction within and across generations in maritime pine.

    PubMed

    Bartholomé, Jérôme; Van Heerwaarden, Joost; Isik, Fikret; Boury, Christophe; Vidal, Marjorie; Plomion, Christophe; Bouffier, Laurent

    2016-08-11

    Genomic selection (GS) is a promising approach for decreasing breeding cycle length in forest trees. Assessment of progeny performance and of the prediction accuracy of GS models over generations is therefore a key issue. A reference population of maritime pine (Pinus pinaster) with an estimated effective inbreeding population size (status number) of 25 was first selected with simulated data. This reference population (n = 818) covered three generations (G0, G1 and G2) and was genotyped with 4436 single-nucleotide polymorphism (SNP) markers. We evaluated the effects on prediction accuracy of both the relatedness between the calibration and validation sets and validation on the basis of progeny performance. Pedigree-based (best linear unbiased prediction, ABLUP) and marker-based (genomic BLUP and Bayesian LASSO) models were used to predict breeding values for three different traits: circumference, height and stem straightness. On average, the ABLUP model outperformed genomic prediction models, with a maximum difference in prediction accuracies of 0.12, depending on the trait and the validation method. A mean difference in prediction accuracy of 0.17 was found between validation methods differing in terms of relatedness. Including the progenitors in the calibration set reduced this difference in prediction accuracy to 0.03. When only genotypes from the G0 and G1 generations were used in the calibration set and genotypes from G2 were used in the validation set (progeny validation), prediction accuracies ranged from 0.70 to 0.85. This study suggests that the training of prediction models on parental populations can predict the genetic merit of the progeny with high accuracy: an encouraging result for the implementation of GS in the maritime pine breeding program.

  4. Posterior Predictive Checks for Conditional Independence between Response Time and Accuracy

    ERIC Educational Resources Information Center

    Bolsinova, Maria; Tijmstra, Jesper

    2016-01-01

    Conditional independence (CI) between response time and response accuracy is a fundamental assumption of many joint models for time and accuracy used in educational measurement. In this study, posterior predictive checks (PPCs) are proposed for testing this assumption. These PPCs are based on three discrepancy measures reflecting different…

  5. Model training across multiple breeding cycles significantly improves genomic prediction accuracy in rye (Secale cereale L.).

    PubMed

    Auinger, Hans-Jürgen; Schönleben, Manfred; Lehermeier, Christina; Schmidt, Malthe; Korzun, Viktor; Geiger, Hartwig H; Piepho, Hans-Peter; Gordillo, Andres; Wilde, Peer; Bauer, Eva; Schön, Chris-Carolin

    2016-11-01

    Genomic prediction accuracy can be significantly increased by model calibration across multiple breeding cycles as long as selection cycles are connected by common ancestors. In hybrid rye breeding, application of genome-based prediction is expected to increase selection gain because of long selection cycles in population improvement and development of hybrid components. Essentially two prediction scenarios arise: (1) prediction of the genetic value of lines from the same breeding cycle in which model training is performed and (2) prediction of lines from subsequent cycles. It is the latter from which a reduction in cycle length and consequently the strongest impact on selection gain is expected. We empirically investigated genome-based prediction of grain yield, plant height and thousand kernel weight within and across four selection cycles of a hybrid rye breeding program. Prediction performance was assessed using genomic and pedigree-based best linear unbiased prediction (GBLUP and PBLUP). A total of 1040 S 2 lines were genotyped with 16 k SNPs and each year testcrosses of 260 S 2 lines were phenotyped in seven or eight locations. The performance gap between GBLUP and PBLUP increased significantly for all traits when model calibration was performed on aggregated data from several cycles. Prediction accuracies obtained from cross-validation were in the order of 0.70 for all traits when data from all cycles (N CS  = 832) were used for model training and exceeded within-cycle accuracies in all cases. As long as selection cycles are connected by a sufficient number of common ancestors and prediction accuracy has not reached a plateau when increasing sample size, aggregating data from several preceding cycles is recommended for predicting genetic values in subsequent cycles despite decreasing relatedness over time.

  6. Effects of sample survey design on the accuracy of classification tree models in species distribution models

    USGS Publications Warehouse

    Edwards, T.C.; Cutler, D.R.; Zimmermann, N.E.; Geiser, L.; Moisen, Gretchen G.

    2006-01-01

    We evaluated the effects of probabilistic (hereafter DESIGN) and non-probabilistic (PURPOSIVE) sample surveys on resultant classification tree models for predicting the presence of four lichen species in the Pacific Northwest, USA. Models derived from both survey forms were assessed using an independent data set (EVALUATION). Measures of accuracy as gauged by resubstitution rates were similar for each lichen species irrespective of the underlying sample survey form. Cross-validation estimates of prediction accuracies were lower than resubstitution accuracies for all species and both design types, and in all cases were closer to the true prediction accuracies based on the EVALUATION data set. We argue that greater emphasis should be placed on calculating and reporting cross-validation accuracy rates rather than simple resubstitution accuracy rates. Evaluation of the DESIGN and PURPOSIVE tree models on the EVALUATION data set shows significantly lower prediction accuracy for the PURPOSIVE tree models relative to the DESIGN models, indicating that non-probabilistic sample surveys may generate models with limited predictive capability. These differences were consistent across all four lichen species, with 11 of the 12 possible species and sample survey type comparisons having significantly lower accuracy rates. Some differences in accuracy were as large as 50%. The classification tree structures also differed considerably both among and within the modelled species, depending on the sample survey form. Overlap in the predictor variables selected by the DESIGN and PURPOSIVE tree models ranged from only 20% to 38%, indicating the classification trees fit the two evaluated survey forms on different sets of predictor variables. The magnitude of these differences in predictor variables throws doubt on ecological interpretation derived from prediction models based on non-probabilistic sample surveys. ?? 2006 Elsevier B.V. All rights reserved.

  7. Genomic prediction of reproduction traits for Merino sheep.

    PubMed

    Bolormaa, S; Brown, D J; Swan, A A; van der Werf, J H J; Hayes, B J; Daetwyler, H D

    2017-06-01

    Economically important reproduction traits in sheep, such as number of lambs weaned and litter size, are expressed only in females and later in life after most selection decisions are made, which makes them ideal candidates for genomic selection. Accurate genomic predictions would lead to greater genetic gain for these traits by enabling accurate selection of young rams with high genetic merit. The aim of this study was to design and evaluate the accuracy of a genomic prediction method for female reproduction in sheep using daughter trait deviations (DTD) for sires and ewe phenotypes (when individual ewes were genotyped) for three reproduction traits: number of lambs born (NLB), litter size (LSIZE) and number of lambs weaned. Genomic best linear unbiased prediction (GBLUP), BayesR and pedigree BLUP analyses of the three reproduction traits measured on 5340 sheep (4503 ewes and 837 sires) with real and imputed genotypes for 510 174 SNPs were performed. The prediction of breeding values using both sire and ewe trait records was validated in Merino sheep. Prediction accuracy was evaluated by across sire family and random cross-validations. Accuracies of genomic estimated breeding values (GEBVs) were assessed as the mean Pearson correlation adjusted by the accuracy of the input phenotypes. The addition of sire DTD into the prediction analysis resulted in higher accuracies compared with using only ewe records in genomic predictions or pedigree BLUP. Using GBLUP, the average accuracy based on the combined records (ewes and sire DTD) was 0.43 across traits, but the accuracies varied by trait and type of cross-validations. The accuracies of GEBVs from random cross-validations (range 0.17-0.61) were higher than were those from sire family cross-validations (range 0.00-0.51). The GEBV accuracies of 0.41-0.54 for NLB and LSIZE based on the combined records were amongst the highest in the study. Although BayesR was not significantly different from GBLUP in prediction accuracy, it identified several candidate genes which are known to be associated with NLB and LSIZE. The approach provides a way to make use of all data available in genomic prediction for traits that have limited recording. © 2017 Stichting International Foundation for Animal Genetics.

  8. Physiologically-based, predictive analytics using the heart-rate-to-Systolic-Ratio significantly improves the timeliness and accuracy of sepsis prediction compared to SIRS.

    PubMed

    Danner, Omar K; Hendren, Sandra; Santiago, Ethel; Nye, Brittany; Abraham, Prasad

    2017-04-01

    Enhancing the efficiency of diagnosis and treatment of severe sepsis by using physiologically-based, predictive analytical strategies has not been fully explored. We hypothesize assessment of heart-rate-to-systolic-ratio significantly increases the timeliness and accuracy of sepsis prediction after emergency department (ED) presentation. We evaluated the records of 53,313 ED patients from a large, urban teaching hospital between January and June 2015. The HR-to-systolic ratio was compared to SIRS criteria for sepsis prediction. There were 884 patients with discharge diagnoses of sepsis, severe sepsis, and/or septic shock. Variations in three presenting variables, heart rate, systolic BP and temperature were determined to be primary early predictors of sepsis with a 74% (654/884) accuracy compared to 34% (304/884) using SIRS criteria (p < 0.0001)in confirmed septic patients. Physiologically-based predictive analytics improved the accuracy and expediency of sepsis identification via detection of variations in HR-to-systolic ratio. This approach may lead to earlier sepsis workup and life-saving interventions. Copyright © 2017 Elsevier Inc. All rights reserved.

  9. Bridge Structure Deformation Prediction Based on GNSS Data Using Kalman-ARIMA-GARCH Model

    PubMed Central

    Li, Xiaoqing; Wang, Yu

    2018-01-01

    Bridges are an essential part of the ground transportation system. Health monitoring is fundamentally important for the safety and service life of bridges. A large amount of structural information is obtained from various sensors using sensing technology, and the data processing has become a challenging issue. To improve the prediction accuracy of bridge structure deformation based on data mining and to accurately evaluate the time-varying characteristics of bridge structure performance evolution, this paper proposes a new method for bridge structure deformation prediction, which integrates the Kalman filter, autoregressive integrated moving average model (ARIMA), and generalized autoregressive conditional heteroskedasticity (GARCH). Firstly, the raw deformation data is directly pre-processed using the Kalman filter to reduce the noise. After that, the linear recursive ARIMA model is established to analyze and predict the structure deformation. Finally, the nonlinear recursive GARCH model is introduced to further improve the accuracy of the prediction. Simulation results based on measured sensor data from the Global Navigation Satellite System (GNSS) deformation monitoring system demonstrated that: (1) the Kalman filter is capable of denoising the bridge deformation monitoring data; (2) the prediction accuracy of the proposed Kalman-ARIMA-GARCH model is satisfactory, where the mean absolute error increases only from 3.402 mm to 5.847 mm with the increment of the prediction step; and (3) in comparision to the Kalman-ARIMA model, the Kalman-ARIMA-GARCH model results in superior prediction accuracy as it includes partial nonlinear characteristics (heteroscedasticity); the mean absolute error of five-step prediction using the proposed model is improved by 10.12%. This paper provides a new way for structural behavior prediction based on data processing, which can lay a foundation for the early warning of bridge health monitoring system based on sensor data using sensing technology. PMID:29351254

  10. Bridge Structure Deformation Prediction Based on GNSS Data Using Kalman-ARIMA-GARCH Model.

    PubMed

    Xin, Jingzhou; Zhou, Jianting; Yang, Simon X; Li, Xiaoqing; Wang, Yu

    2018-01-19

    Bridges are an essential part of the ground transportation system. Health monitoring is fundamentally important for the safety and service life of bridges. A large amount of structural information is obtained from various sensors using sensing technology, and the data processing has become a challenging issue. To improve the prediction accuracy of bridge structure deformation based on data mining and to accurately evaluate the time-varying characteristics of bridge structure performance evolution, this paper proposes a new method for bridge structure deformation prediction, which integrates the Kalman filter, autoregressive integrated moving average model (ARIMA), and generalized autoregressive conditional heteroskedasticity (GARCH). Firstly, the raw deformation data is directly pre-processed using the Kalman filter to reduce the noise. After that, the linear recursive ARIMA model is established to analyze and predict the structure deformation. Finally, the nonlinear recursive GARCH model is introduced to further improve the accuracy of the prediction. Simulation results based on measured sensor data from the Global Navigation Satellite System (GNSS) deformation monitoring system demonstrated that: (1) the Kalman filter is capable of denoising the bridge deformation monitoring data; (2) the prediction accuracy of the proposed Kalman-ARIMA-GARCH model is satisfactory, where the mean absolute error increases only from 3.402 mm to 5.847 mm with the increment of the prediction step; and (3) in comparision to the Kalman-ARIMA model, the Kalman-ARIMA-GARCH model results in superior prediction accuracy as it includes partial nonlinear characteristics (heteroscedasticity); the mean absolute error of five-step prediction using the proposed model is improved by 10.12%. This paper provides a new way for structural behavior prediction based on data processing, which can lay a foundation for the early warning of bridge health monitoring system based on sensor data using sensing technology.

  11. Parameter prediction based on Improved Process neural network and ARMA error compensation in Evaporation Process

    NASA Astrophysics Data System (ADS)

    Qian, Xiaoshan

    2018-01-01

    The traditional model of evaporation process parameters have continuity and cumulative characteristics of the prediction error larger issues, based on the basis of the process proposed an adaptive particle swarm neural network forecasting method parameters established on the autoregressive moving average (ARMA) error correction procedure compensated prediction model to predict the results of the neural network to improve prediction accuracy. Taking a alumina plant evaporation process to analyze production data validation, and compared with the traditional model, the new model prediction accuracy greatly improved, can be used to predict the dynamic process of evaporation of sodium aluminate solution components.

  12. Prediction of welding shrinkage deformation of bridge steel box girder based on wavelet neural network

    NASA Astrophysics Data System (ADS)

    Tao, Yulong; Miao, Yunshui; Han, Jiaqi; Yan, Feiyun

    2018-05-01

    Aiming at the low accuracy of traditional forecasting methods such as linear regression method, this paper presents a prediction method for predicting the relationship between bridge steel box girder and its displacement with wavelet neural network. Compared with traditional forecasting methods, this scheme has better local characteristics and learning ability, which greatly improves the prediction ability of deformation. Through analysis of the instance and found that after compared with the traditional prediction method based on wavelet neural network, the rigid beam deformation prediction accuracy is higher, and is superior to the BP neural network prediction results, conform to the actual demand of engineering design.

  13. Validity of Teacher-Based Vision Screening and Factors Associated with the Accuracy of Vision Screening in Vietnamese Children.

    PubMed

    Paudel, Prakash; Kovai, Vilas; Naduvilath, Thomas; Phuong, Ha Thanh; Ho, Suit May; Giap, Nguyen Viet

    2016-01-01

    To assess validity of teacher-based vision screening and elicit factors associated with accuracy of vision screening in Vietnam. After brief training, teachers independently measured visual acuity (VA) in 555 children aged 12-15 years in Ba Ria - Vung Tau Province. Teacher VA measurements were compared to those of refractionists. Sensitivity, specificity, positive predictive value and negative predictive value were calculated for uncorrected VA (UVA) and presenting VA (PVA) 20/40 or worse in either eye. Chi-square, Fisher's exact test and multivariate logistic regression were used to assess factors associated with accuracy of vision screening. Level of significance was set at 5%. Trained teachers in Vietnam demonstrated 86.7% sensitivity, 95.7% specificity, 86.7% positive predictive value and 95.7% negative predictive value in identifying children with visual impairment using the UVA measurement. PVA measurement revealed low accuracy for teachers, which was significantly associated with child's age, sex, spectacle wear and myopic status, but UVA measurement showed no such associations. Better accuracy was achieved in measurement of VA and identification of children with visual impairment using UVA measurement compared to PVA. UVA measurement is recommended for teacher-based vision screening programs.

  14. Improving orbit prediction accuracy through supervised machine learning

    NASA Astrophysics Data System (ADS)

    Peng, Hao; Bai, Xiaoli

    2018-05-01

    Due to the lack of information such as the space environment condition and resident space objects' (RSOs') body characteristics, current orbit predictions that are solely grounded on physics-based models may fail to achieve required accuracy for collision avoidance and have led to satellite collisions already. This paper presents a methodology to predict RSOs' trajectories with higher accuracy than that of the current methods. Inspired by the machine learning (ML) theory through which the models are learned based on large amounts of observed data and the prediction is conducted without explicitly modeling space objects and space environment, the proposed ML approach integrates physics-based orbit prediction algorithms with a learning-based process that focuses on reducing the prediction errors. Using a simulation-based space catalog environment as the test bed, the paper demonstrates three types of generalization capability for the proposed ML approach: (1) the ML model can be used to improve the same RSO's orbit information that is not available during the learning process but shares the same time interval as the training data; (2) the ML model can be used to improve predictions of the same RSO at future epochs; and (3) the ML model based on a RSO can be applied to other RSOs that share some common features.

  15. Selecting Optimal Random Forest Predictive Models: A Case Study on Predicting the Spatial Distribution of Seabed Hardness

    PubMed Central

    Li, Jin; Tran, Maggie; Siwabessy, Justy

    2016-01-01

    Spatially continuous predictions of seabed hardness are important baseline environmental information for sustainable management of Australia’s marine jurisdiction. Seabed hardness is often inferred from multibeam backscatter data with unknown accuracy and can be inferred from underwater video footage at limited locations. In this study, we classified the seabed into four classes based on two new seabed hardness classification schemes (i.e., hard90 and hard70). We developed optimal predictive models to predict seabed hardness using random forest (RF) based on the point data of hardness classes and spatially continuous multibeam data. Five feature selection (FS) methods that are variable importance (VI), averaged variable importance (AVI), knowledge informed AVI (KIAVI), Boruta and regularized RF (RRF) were tested based on predictive accuracy. Effects of highly correlated, important and unimportant predictors on the accuracy of RF predictive models were examined. Finally, spatial predictions generated using the most accurate models were visually examined and analysed. This study confirmed that: 1) hard90 and hard70 are effective seabed hardness classification schemes; 2) seabed hardness of four classes can be predicted with a high degree of accuracy; 3) the typical approach used to pre-select predictive variables by excluding highly correlated variables needs to be re-examined; 4) the identification of the important and unimportant predictors provides useful guidelines for further improving predictive models; 5) FS methods select the most accurate predictive model(s) instead of the most parsimonious ones, and AVI and Boruta are recommended for future studies; and 6) RF is an effective modelling method with high predictive accuracy for multi-level categorical data and can be applied to ‘small p and large n’ problems in environmental sciences. Additionally, automated computational programs for AVI need to be developed to increase its computational efficiency and caution should be taken when applying filter FS methods in selecting predictive models. PMID:26890307

  16. Selecting Optimal Random Forest Predictive Models: A Case Study on Predicting the Spatial Distribution of Seabed Hardness.

    PubMed

    Li, Jin; Tran, Maggie; Siwabessy, Justy

    2016-01-01

    Spatially continuous predictions of seabed hardness are important baseline environmental information for sustainable management of Australia's marine jurisdiction. Seabed hardness is often inferred from multibeam backscatter data with unknown accuracy and can be inferred from underwater video footage at limited locations. In this study, we classified the seabed into four classes based on two new seabed hardness classification schemes (i.e., hard90 and hard70). We developed optimal predictive models to predict seabed hardness using random forest (RF) based on the point data of hardness classes and spatially continuous multibeam data. Five feature selection (FS) methods that are variable importance (VI), averaged variable importance (AVI), knowledge informed AVI (KIAVI), Boruta and regularized RF (RRF) were tested based on predictive accuracy. Effects of highly correlated, important and unimportant predictors on the accuracy of RF predictive models were examined. Finally, spatial predictions generated using the most accurate models were visually examined and analysed. This study confirmed that: 1) hard90 and hard70 are effective seabed hardness classification schemes; 2) seabed hardness of four classes can be predicted with a high degree of accuracy; 3) the typical approach used to pre-select predictive variables by excluding highly correlated variables needs to be re-examined; 4) the identification of the important and unimportant predictors provides useful guidelines for further improving predictive models; 5) FS methods select the most accurate predictive model(s) instead of the most parsimonious ones, and AVI and Boruta are recommended for future studies; and 6) RF is an effective modelling method with high predictive accuracy for multi-level categorical data and can be applied to 'small p and large n' problems in environmental sciences. Additionally, automated computational programs for AVI need to be developed to increase its computational efficiency and caution should be taken when applying filter FS methods in selecting predictive models.

  17. Genomic selection accuracies within and between environments and small breeding groups in white spruce.

    PubMed

    Beaulieu, Jean; Doerksen, Trevor K; MacKay, John; Rainville, André; Bousquet, Jean

    2014-12-02

    Genomic selection (GS) may improve selection response over conventional pedigree-based selection if markers capture more detailed information than pedigrees in recently domesticated tree species and/or make it more cost effective. Genomic prediction accuracies using 1748 trees and 6932 SNPs representative of as many distinct gene loci were determined for growth and wood traits in white spruce, within and between environments and breeding groups (BG), each with an effective size of Ne ≈ 20. Marker subsets were also tested. Model fits and/or cross-validation (CV) prediction accuracies for ridge regression (RR) and the least absolute shrinkage and selection operator models approached those of pedigree-based models. With strong relatedness between CV sets, prediction accuracies for RR within environment and BG were high for wood (r = 0.71-0.79) and moderately high for growth (r = 0.52-0.69) traits, in line with trends in heritabilities. For both classes of traits, these accuracies achieved between 83% and 92% of those obtained with phenotypes and pedigree information. Prediction into untested environments remained moderately high for wood (r ≥ 0.61) but dropped significantly for growth (r ≥ 0.24) traits, emphasizing the need to phenotype in all test environments and model genotype-by-environment interactions for growth traits. Removing relatedness between CV sets sharply decreased prediction accuracies for all traits and subpopulations, falling near zero between BGs with no known shared ancestry. For marker subsets, similar patterns were observed but with lower prediction accuracies. Given the need for high relatedness between CV sets to obtain good prediction accuracies, we recommend to build GS models for prediction within the same breeding population only. Breeding groups could be merged to build genomic prediction models as long as the total effective population size does not exceed 50 individuals in order to obtain high prediction accuracy such as that obtained in the present study. A number of markers limited to a few hundred would not negatively impact prediction accuracies, but these could decrease more rapidly over generations. The most promising short-term approach for genomic selection would likely be the selection of superior individuals within large full-sib families vegetatively propagated to implement multiclonal forestry.

  18. A comparison of five methods to predict genomic breeding values of dairy bulls from genome-wide SNP markers

    PubMed Central

    2009-01-01

    Background Genomic selection (GS) uses molecular breeding values (MBV) derived from dense markers across the entire genome for selection of young animals. The accuracy of MBV prediction is important for a successful application of GS. Recently, several methods have been proposed to estimate MBV. Initial simulation studies have shown that these methods can accurately predict MBV. In this study we compared the accuracies and possible bias of five different regression methods in an empirical application in dairy cattle. Methods Genotypes of 7,372 SNP and highly accurate EBV of 1,945 dairy bulls were used to predict MBV for protein percentage (PPT) and a profit index (Australian Selection Index, ASI). Marker effects were estimated by least squares regression (FR-LS), Bayesian regression (Bayes-R), random regression best linear unbiased prediction (RR-BLUP), partial least squares regression (PLSR) and nonparametric support vector regression (SVR) in a training set of 1,239 bulls. Accuracy and bias of MBV prediction were calculated from cross-validation of the training set and tested against a test team of 706 young bulls. Results For both traits, FR-LS using a subset of SNP was significantly less accurate than all other methods which used all SNP. Accuracies obtained by Bayes-R, RR-BLUP, PLSR and SVR were very similar for ASI (0.39-0.45) and for PPT (0.55-0.61). Overall, SVR gave the highest accuracy. All methods resulted in biased MBV predictions for ASI, for PPT only RR-BLUP and SVR predictions were unbiased. A significant decrease in accuracy of prediction of ASI was seen in young test cohorts of bulls compared to the accuracy derived from cross-validation of the training set. This reduction was not apparent for PPT. Combining MBV predictions with pedigree based predictions gave 1.05 - 1.34 times higher accuracies compared to predictions based on pedigree alone. Some methods have largely different computational requirements, with PLSR and RR-BLUP requiring the least computing time. Conclusions The four methods which use information from all SNP namely RR-BLUP, Bayes-R, PLSR and SVR generate similar accuracies of MBV prediction for genomic selection, and their use in the selection of immediate future generations in dairy cattle will be comparable. The use of FR-LS in genomic selection is not recommended. PMID:20043835

  19. Improving Prediction Accuracy for WSN Data Reduction by Applying Multivariate Spatio-Temporal Correlation

    PubMed Central

    Carvalho, Carlos; Gomes, Danielo G.; Agoulmine, Nazim; de Souza, José Neuman

    2011-01-01

    This paper proposes a method based on multivariate spatial and temporal correlation to improve prediction accuracy in data reduction for Wireless Sensor Networks (WSN). Prediction of data not sent to the sink node is a technique used to save energy in WSNs by reducing the amount of data traffic. However, it may not be very accurate. Simulations were made involving simple linear regression and multiple linear regression functions to assess the performance of the proposed method. The results show a higher correlation between gathered inputs when compared to time, which is an independent variable widely used for prediction and forecasting. Prediction accuracy is lower when simple linear regression is used, whereas multiple linear regression is the most accurate one. In addition to that, our proposal outperforms some current solutions by about 50% in humidity prediction and 21% in light prediction. To the best of our knowledge, we believe that we are probably the first to address prediction based on multivariate correlation for WSN data reduction. PMID:22346626

  20. Prediction of Industrial Electric Energy Consumption in Anhui Province Based on GA-BP Neural Network

    NASA Astrophysics Data System (ADS)

    Zhang, Jiajing; Yin, Guodong; Ni, Youcong; Chen, Jinlan

    2018-01-01

    In order to improve the prediction accuracy of industrial electrical energy consumption, a prediction model of industrial electrical energy consumption was proposed based on genetic algorithm and neural network. The model use genetic algorithm to optimize the weights and thresholds of BP neural network, and the model is used to predict the energy consumption of industrial power in Anhui Province, to improve the prediction accuracy of industrial electric energy consumption in Anhui province. By comparing experiment of GA-BP prediction model and BP neural network model, the GA-BP model is more accurate with smaller number of neurons in the hidden layer.

  1. Radiomics-based Prognosis Analysis for Non-Small Cell Lung Cancer

    NASA Astrophysics Data System (ADS)

    Zhang, Yucheng; Oikonomou, Anastasia; Wong, Alexander; Haider, Masoom A.; Khalvati, Farzad

    2017-04-01

    Radiomics characterizes tumor phenotypes by extracting large numbers of quantitative features from radiological images. Radiomic features have been shown to provide prognostic value in predicting clinical outcomes in several studies. However, several challenges including feature redundancy, unbalanced data, and small sample sizes have led to relatively low predictive accuracy. In this study, we explore different strategies for overcoming these challenges and improving predictive performance of radiomics-based prognosis for non-small cell lung cancer (NSCLC). CT images of 112 patients (mean age 75 years) with NSCLC who underwent stereotactic body radiotherapy were used to predict recurrence, death, and recurrence-free survival using a comprehensive radiomics analysis. Different feature selection and predictive modeling techniques were used to determine the optimal configuration of prognosis analysis. To address feature redundancy, comprehensive analysis indicated that Random Forest models and Principal Component Analysis were optimum predictive modeling and feature selection methods, respectively, for achieving high prognosis performance. To address unbalanced data, Synthetic Minority Over-sampling technique was found to significantly increase predictive accuracy. A full analysis of variance showed that data endpoints, feature selection techniques, and classifiers were significant factors in affecting predictive accuracy, suggesting that these factors must be investigated when building radiomics-based predictive models for cancer prognosis.

  2. Diagnostic accuracy of liver fibrosis based on red cell distribution width (RDW) to platelet ratio with fibroscan in chronic hepatitis B

    NASA Astrophysics Data System (ADS)

    Sembiring, J.; Jones, F.

    2018-03-01

    Red cell Distribution Width (RDW) and platelet ratio (RPR) can predict liver fibrosis and cirrhosis in chronic hepatitis B with relatively high accuracy. RPR was superior to other non-invasive methods to predict liver fibrosis, such as AST and ALT ratio, AST and platelet ratio Index and FIB-4. The aim of this study was to assess diagnostic accuracy liver fibrosis by using RDW and platelets ratio in chronic hepatitis B patients based on compared with Fibroscan. This cross-sectional study was conducted at Adam Malik Hospital from January-June 2015. We examine 34 patients hepatitis B chronic, screen RDW, platelet, and fibroscan. Data were statistically analyzed. The result RPR with ROC procedure has an accuracy of 72.3% (95% CI: 84.1% - 97%). In this study, the RPR had a moderate ability to predict fibrosis degree (p = 0.029 with AUC> 70%). The cutoff value RPR was 0.0591, sensitivity and spesificity were 71.4% and 60%, Positive Prediction Value (PPV) was 55.6% and Negative Predictions Value (NPV) was 75%, positive likelihood ratio was 1.79 and negative likelihood ratio was 0.48. RPR have the ability to predict the degree of liver fibrosis in chronic hepatitis B patients with moderate accuracy.

  3. Genomic selection models double the accuracy of predicted breeding values for bacterial cold water disease resistance compared to a traditional pedigree-based model in rainbow trout aquaculture.

    PubMed

    Vallejo, Roger L; Leeds, Timothy D; Gao, Guangtu; Parsons, James E; Martin, Kyle E; Evenhuis, Jason P; Fragomeni, Breno O; Wiens, Gregory D; Palti, Yniv

    2017-02-01

    Previously, we have shown that bacterial cold water disease (BCWD) resistance in rainbow trout can be improved using traditional family-based selection, but progress has been limited to exploiting only between-family genetic variation. Genomic selection (GS) is a new alternative that enables exploitation of within-family genetic variation. We compared three GS models [single-step genomic best linear unbiased prediction (ssGBLUP), weighted ssGBLUP (wssGBLUP), and BayesB] to predict genomic-enabled breeding values (GEBV) for BCWD resistance in a commercial rainbow trout population, and compared the accuracy of GEBV to traditional estimates of breeding values (EBV) from a pedigree-based BLUP (P-BLUP) model. We also assessed the impact of sampling design on the accuracy of GEBV predictions. For these comparisons, we used BCWD survival phenotypes recorded on 7893 fish from 102 families, of which 1473 fish from 50 families had genotypes [57 K single nucleotide polymorphism (SNP) array]. Naïve siblings of the training fish (n = 930 testing fish) were genotyped to predict their GEBV and mated to produce 138 progeny testing families. In the following generation, 9968 progeny were phenotyped to empirically assess the accuracy of GEBV predictions made on their non-phenotyped parents. The accuracy of GEBV from all tested GS models were substantially higher than the P-BLUP model EBV. The highest increase in accuracy relative to the P-BLUP model was achieved with BayesB (97.2 to 108.8%), followed by wssGBLUP at iteration 2 (94.4 to 97.1%) and 3 (88.9 to 91.2%) and ssGBLUP (83.3 to 85.3%). Reducing the training sample size to n = ~1000 had no negative impact on the accuracy (0.67 to 0.72), but with n = ~500 the accuracy dropped to 0.53 to 0.61 if the training and testing fish were full-sibs, and even substantially lower, to 0.22 to 0.25, when they were not full-sibs. Using progeny performance data, we showed that the accuracy of genomic predictions is substantially higher than estimates obtained from the traditional pedigree-based BLUP model for BCWD resistance. Overall, we found that using a much smaller training sample size compared to similar studies in livestock, GS can substantially improve the selection accuracy and genetic gains for this trait in a commercial rainbow trout breeding population.

  4. Addressing issues associated with evaluating prediction models for survival endpoints based on the concordance statistic.

    PubMed

    Wang, Ming; Long, Qi

    2016-09-01

    Prediction models for disease risk and prognosis play an important role in biomedical research, and evaluating their predictive accuracy in the presence of censored data is of substantial interest. The standard concordance (c) statistic has been extended to provide a summary measure of predictive accuracy for survival models. Motivated by a prostate cancer study, we address several issues associated with evaluating survival prediction models based on c-statistic with a focus on estimators using the technique of inverse probability of censoring weighting (IPCW). Compared to the existing work, we provide complete results on the asymptotic properties of the IPCW estimators under the assumption of coarsening at random (CAR), and propose a sensitivity analysis under the mechanism of noncoarsening at random (NCAR). In addition, we extend the IPCW approach as well as the sensitivity analysis to high-dimensional settings. The predictive accuracy of prediction models for cancer recurrence after prostatectomy is assessed by applying the proposed approaches. We find that the estimated predictive accuracy for the models in consideration is sensitive to NCAR assumption, and thus identify the best predictive model. Finally, we further evaluate the performance of the proposed methods in both settings of low-dimensional and high-dimensional data under CAR and NCAR through simulations. © 2016, The International Biometric Society.

  5. Improving the Accuracy of Software-Based Energy Analysis for Residential Buildings (Presentation)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Polly, B.

    2011-09-01

    This presentation describes the basic components of software-based energy analysis for residential buildings, explores the concepts of 'error' and 'accuracy' when analysis predictions are compared to measured data, and explains how NREL is working to continuously improve the accuracy of energy analysis methods.

  6. Protein Secondary Structure Prediction Using AutoEncoder Network and Bayes Classifier

    NASA Astrophysics Data System (ADS)

    Wang, Leilei; Cheng, Jinyong

    2018-03-01

    Protein secondary structure prediction is belong to bioinformatics,and it's important in research area. In this paper, we propose a new prediction way of protein using bayes classifier and autoEncoder network. Our experiments show some algorithms including the construction of the model, the classification of parameters and so on. The data set is a typical CB513 data set for protein. In terms of accuracy, the method is the cross validation based on the 3-fold. Then we can get the Q3 accuracy. Paper results illustrate that the autoencoder network improved the prediction accuracy of protein secondary structure.

  7. SCPRED: accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences.

    PubMed

    Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke

    2008-05-01

    Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods.

  8. SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences

    PubMed Central

    Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke

    2008-01-01

    Background Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. Results SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. Conclusion The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods. PMID:18452616

  9. Prediction of drug synergy in cancer using ensemble-based machine learning techniques

    NASA Astrophysics Data System (ADS)

    Singh, Harpreet; Rana, Prashant Singh; Singh, Urvinder

    2018-04-01

    Drug synergy prediction plays a significant role in the medical field for inhibiting specific cancer agents. It can be developed as a pre-processing tool for therapeutic successes. Examination of different drug-drug interaction can be done by drug synergy score. It needs efficient regression-based machine learning approaches to minimize the prediction errors. Numerous machine learning techniques such as neural networks, support vector machines, random forests, LASSO, Elastic Nets, etc., have been used in the past to realize requirement as mentioned above. However, these techniques individually do not provide significant accuracy in drug synergy score. Therefore, the primary objective of this paper is to design a neuro-fuzzy-based ensembling approach. To achieve this, nine well-known machine learning techniques have been implemented by considering the drug synergy data. Based on the accuracy of each model, four techniques with high accuracy are selected to develop ensemble-based machine learning model. These models are Random forest, Fuzzy Rules Using Genetic Cooperative-Competitive Learning method (GFS.GCCL), Adaptive-Network-Based Fuzzy Inference System (ANFIS) and Dynamic Evolving Neural-Fuzzy Inference System method (DENFIS). Ensembling is achieved by evaluating the biased weighted aggregation (i.e. adding more weights to the model with a higher prediction score) of predicted data by selected models. The proposed and existing machine learning techniques have been evaluated on drug synergy score data. The comparative analysis reveals that the proposed method outperforms others in terms of accuracy, root mean square error and coefficient of correlation.

  10. Bayesian model aggregation for ensemble-based estimates of protein pKa values

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gosink, Luke J.; Hogan, Emilie A.; Pulsipher, Trenton C.

    2014-03-01

    This paper investigates an ensemble-based technique called Bayesian Model Averaging (BMA) to improve the performance of protein amino acid pmore » $$K_a$$ predictions. Structure-based p$$K_a$$ calculations play an important role in the mechanistic interpretation of protein structure and are also used to determine a wide range of protein properties. A diverse set of methods currently exist for p$$K_a$$ prediction, ranging from empirical statistical models to {\\it ab initio} quantum mechanical approaches. However, each of these methods are based on a set of assumptions that have inherent bias and sensitivities that can effect a model's accuracy and generalizability for p$$K_a$$ prediction in complicated biomolecular systems. We use BMA to combine eleven diverse prediction methods that each estimate pKa values of amino acids in staphylococcal nuclease. These methods are based on work conducted for the pKa Cooperative and the pKa measurements are based on experimental work conducted by the Garc{\\'i}a-Moreno lab. Our study demonstrates that the aggregated estimate obtained from BMA outperforms all individual prediction methods in our cross-validation study with improvements from 40-70\\% over other method classes. This work illustrates a new possible mechanism for improving the accuracy of p$$K_a$$ prediction and lays the foundation for future work on aggregate models that balance computational cost with prediction accuracy.« less

  11. New insights from cluster analysis methods for RNA secondary structure prediction

    PubMed Central

    Rogers, Emily; Heitsch, Christine

    2016-01-01

    A widening gap exists between the best practices for RNA secondary structure prediction developed by computational researchers and the methods used in practice by experimentalists. Minimum free energy (MFE) predictions, although broadly used, are outperformed by methods which sample from the Boltzmann distribution and data mine the results. In particular, moving beyond the single structure prediction paradigm yields substantial gains in accuracy. Furthermore, the largest improvements in accuracy and precision come from viewing secondary structures not at the base pair level but at lower granularity/higher abstraction. This suggests that random errors affecting precision and systematic ones affecting accuracy are both reduced by this “fuzzier” view of secondary structures. Thus experimentalists who are willing to adopt a more rigorous, multilayered approach to secondary structure prediction by iterating through these levels of granularity will be much better able to capture fundamental aspects of RNA base pairing. PMID:26971529

  12. Assessing genomic selection prediction accuracy in a dynamic barley breeding

    USDA-ARS?s Scientific Manuscript database

    Genomic selection is a method to improve quantitative traits in crops and livestock by estimating breeding values of selection candidates using phenotype and genome-wide marker data sets. Prediction accuracy has been evaluated through simulation and cross-validation, however validation based on prog...

  13. Entropy-based link prediction in weighted networks

    NASA Astrophysics Data System (ADS)

    Xu, Zhongqi; Pu, Cunlai; Ramiz Sharafat, Rajput; Li, Lunbo; Yang, Jian

    2017-01-01

    Information entropy has been proved to be an effective tool to quantify the structural importance of complex networks. In the previous work (Xu et al, 2016 \\cite{xu2016}), we measure the contribution of a path in link prediction with information entropy. In this paper, we further quantify the contribution of a path with both path entropy and path weight, and propose a weighted prediction index based on the contributions of paths, namely Weighted Path Entropy (WPE), to improve the prediction accuracy in weighted networks. Empirical experiments on six weighted real-world networks show that WPE achieves higher prediction accuracy than three typical weighted indices.

  14. Diagnosis and prediction of periodontally compromised teeth using a deep learning-based convolutional neural network algorithm.

    PubMed

    Lee, Jae-Hong; Kim, Do-Hyung; Jeong, Seong-Nyum; Choi, Seong-Ho

    2018-04-01

    The aim of the current study was to develop a computer-assisted detection system based on a deep convolutional neural network (CNN) algorithm and to evaluate the potential usefulness and accuracy of this system for the diagnosis and prediction of periodontally compromised teeth (PCT). Combining pretrained deep CNN architecture and a self-trained network, periapical radiographic images were used to determine the optimal CNN algorithm and weights. The diagnostic and predictive accuracy, sensitivity, specificity, positive predictive value, negative predictive value, receiver operating characteristic (ROC) curve, area under the ROC curve, confusion matrix, and 95% confidence intervals (CIs) were calculated using our deep CNN algorithm, based on a Keras framework in Python. The periapical radiographic dataset was split into training (n=1,044), validation (n=348), and test (n=348) datasets. With the deep learning algorithm, the diagnostic accuracy for PCT was 81.0% for premolars and 76.7% for molars. Using 64 premolars and 64 molars that were clinically diagnosed as severe PCT, the accuracy of predicting extraction was 82.8% (95% CI, 70.1%-91.2%) for premolars and 73.4% (95% CI, 59.9%-84.0%) for molars. We demonstrated that the deep CNN algorithm was useful for assessing the diagnosis and predictability of PCT. Therefore, with further optimization of the PCT dataset and improvements in the algorithm, a computer-aided detection system can be expected to become an effective and efficient method of diagnosing and predicting PCT.

  15. Genomic Prediction Accounting for Residual Heteroskedasticity

    PubMed Central

    Ou, Zhining; Tempelman, Robert J.; Steibel, Juan P.; Ernst, Catherine W.; Bates, Ronald O.; Bello, Nora M.

    2015-01-01

    Whole-genome prediction (WGP) models that use single-nucleotide polymorphism marker information to predict genetic merit of animals and plants typically assume homogeneous residual variance. However, variability is often heterogeneous across agricultural production systems and may subsequently bias WGP-based inferences. This study extends classical WGP models based on normality, heavy-tailed specifications and variable selection to explicitly account for environmentally-driven residual heteroskedasticity under a hierarchical Bayesian mixed-models framework. WGP models assuming homogeneous or heterogeneous residual variances were fitted to training data generated under simulation scenarios reflecting a gradient of increasing heteroskedasticity. Model fit was based on pseudo-Bayes factors and also on prediction accuracy of genomic breeding values computed on a validation data subset one generation removed from the simulated training dataset. Homogeneous vs. heterogeneous residual variance WGP models were also fitted to two quantitative traits, namely 45-min postmortem carcass temperature and loin muscle pH, recorded in a swine resource population dataset prescreened for high and mild residual heteroskedasticity, respectively. Fit of competing WGP models was compared using pseudo-Bayes factors. Predictive ability, defined as the correlation between predicted and observed phenotypes in validation sets of a five-fold cross-validation was also computed. Heteroskedastic error WGP models showed improved model fit and enhanced prediction accuracy compared to homoskedastic error WGP models although the magnitude of the improvement was small (less than two percentage points net gain in prediction accuracy). Nevertheless, accounting for residual heteroskedasticity did improve accuracy of selection, especially on individuals of extreme genetic merit. PMID:26564950

  16. Bio-knowledge based filters improve residue-residue contact prediction accuracy.

    PubMed

    Wozniak, P P; Pelc, J; Skrzypecki, M; Vriend, G; Kotulska, M

    2018-05-29

    Residue-residue contact prediction through direct coupling analysis has reached impressive accuracy, but yet higher accuracy will be needed to allow for routine modelling of protein structures. One way to improve the prediction accuracy is to filter predicted contacts using knowledge about the particular protein of interest or knowledge about protein structures in general. We focus on the latter and discuss a set of filters that can be used to remove false positive contact predictions. Each filter depends on one or a few cut-off parameters for which the filter performance was investigated. Combining all filters while using default parameters resulted for a test-set of 851 protein domains in the removal of 29% of the predictions of which 92% were indeed false positives. All data and scripts are available from http://comprec-lin.iiar.pwr.edu.pl/FPfilter/. malgorzata.kotulska@pwr.edu.pl. Supplementary data are available at Bioinformatics online.

  17. Variable selection models for genomic selection using whole-genome sequence data and singular value decomposition.

    PubMed

    Meuwissen, Theo H E; Indahl, Ulf G; Ødegård, Jørgen

    2017-12-27

    Non-linear Bayesian genomic prediction models such as BayesA/B/C/R involve iteration and mostly Markov chain Monte Carlo (MCMC) algorithms, which are computationally expensive, especially when whole-genome sequence (WGS) data are analyzed. Singular value decomposition (SVD) of the genotype matrix can facilitate genomic prediction in large datasets, and can be used to estimate marker effects and their prediction error variances (PEV) in a computationally efficient manner. Here, we developed, implemented, and evaluated a direct, non-iterative method for the estimation of marker effects for the BayesC genomic prediction model. The BayesC model assumes a priori that markers have normally distributed effects with probability [Formula: see text] and no effect with probability (1 - [Formula: see text]). Marker effects and their PEV are estimated by using SVD and the posterior probability of the marker having a non-zero effect is calculated. These posterior probabilities are used to obtain marker-specific effect variances, which are subsequently used to approximate BayesC estimates of marker effects in a linear model. A computer simulation study was conducted to compare alternative genomic prediction methods, where a single reference generation was used to estimate marker effects, which were subsequently used for 10 generations of forward prediction, for which accuracies were evaluated. SVD-based posterior probabilities of markers having non-zero effects were generally lower than MCMC-based posterior probabilities, but for some regions the opposite occurred, resulting in clear signals for QTL-rich regions. The accuracies of breeding values estimated using SVD- and MCMC-based BayesC analyses were similar across the 10 generations of forward prediction. For an intermediate number of generations (2 to 5) of forward prediction, accuracies obtained with the BayesC model tended to be slightly higher than accuracies obtained using the best linear unbiased prediction of SNP effects (SNP-BLUP model). When reducing marker density from WGS data to 30 K, SNP-BLUP tended to yield the highest accuracies, at least in the short term. Based on SVD of the genotype matrix, we developed a direct method for the calculation of BayesC estimates of marker effects. Although SVD- and MCMC-based marker effects differed slightly, their prediction accuracies were similar. Assuming that the SVD of the marker genotype matrix is already performed for other reasons (e.g. for SNP-BLUP), computation times for the BayesC predictions were comparable to those of SNP-BLUP.

  18. A hybrid PSO-SVM-based method for predicting the friction coefficient between aircraft tire and coating

    NASA Astrophysics Data System (ADS)

    Zhan, Liwei; Li, Chengwei

    2017-02-01

    A hybrid PSO-SVM-based model is proposed to predict the friction coefficient between aircraft tire and coating. The presented hybrid model combines a support vector machine (SVM) with particle swarm optimization (PSO) technique. SVM has been adopted to solve regression problems successfully. Its regression accuracy is greatly related to optimizing parameters such as the regularization constant C , the parameter gamma γ corresponding to RBF kernel and the epsilon parameter \\varepsilon in the SVM training procedure. However, the friction coefficient which is predicted based on SVM has yet to be explored between aircraft tire and coating. The experiment reveals that drop height and tire rotational speed are the factors affecting friction coefficient. Bearing in mind, the friction coefficient can been predicted using the hybrid PSO-SVM-based model by the measured friction coefficient between aircraft tire and coating. To compare regression accuracy, a grid search (GS) method and a genetic algorithm (GA) are used to optimize the relevant parameters (C , γ and \\varepsilon ), respectively. The regression accuracy could be reflected by the coefficient of determination ({{R}2} ). The result shows that the hybrid PSO-RBF-SVM-based model has better accuracy compared with the GS-RBF-SVM- and GA-RBF-SVM-based models. The agreement of this model (PSO-RBF-SVM) with experiment data confirms its good performance.

  19. Alternatives to accuracy and bias metrics based on percentage errors for radiation belt modeling applications

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Morley, Steven Karl

    This report reviews existing literature describing forecast accuracy metrics, concentrating on those based on relative errors and percentage errors. We then review how the most common of these metrics, the mean absolute percentage error (MAPE), has been applied in recent radiation belt modeling literature. Finally, we describe metrics based on the ratios of predicted to observed values (the accuracy ratio) that address the drawbacks inherent in using MAPE. Specifically, we define and recommend the median log accuracy ratio as a measure of bias and the median symmetric accuracy as a measure of accuracy.

  20. Effects of urban microcellular environments on ray-tracing-based coverage predictions.

    PubMed

    Liu, Zhongyu; Guo, Lixin; Guan, Xiaowei; Sun, Jiejing

    2016-09-01

    The ray-tracing (RT) algorithm, which is based on geometrical optics and the uniform theory of diffraction, has become a typical deterministic approach of studying wave-propagation characteristics. Under urban microcellular environments, the RT method highly depends on detailed environmental information. The aim of this paper is to provide help in selecting the appropriate level of accuracy required in building databases to achieve good tradeoffs between database costs and prediction accuracy. After familiarization with the operating procedures of the RT-based prediction model, this study focuses on the effect of errors in environmental information on prediction results. The environmental information consists of two parts, namely, geometric and electrical parameters. The geometric information can be obtained from a digital map of a city. To study the effects of inaccuracies in geometry information (building layout) on RT-based coverage prediction, two different artificial erroneous maps are generated based on the original digital map, and systematic analysis is performed by comparing the predictions with the erroneous maps and measurements or the predictions with the original digital map. To make the conclusion more persuasive, the influence of random errors on RMS delay spread results is investigated. Furthermore, given the electrical parameters' effect on the accuracy of the predicted results of the RT model, the dielectric constant and conductivity of building materials are set with different values. The path loss and RMS delay spread under the same circumstances are simulated by the RT prediction model.

  1. Protein docking prediction using predicted protein-protein interface.

    PubMed

    Li, Bin; Kihara, Daisuke

    2012-01-10

    Many important cellular processes are carried out by protein complexes. To provide physical pictures of interacting proteins, many computational protein-protein prediction methods have been developed in the past. However, it is still difficult to identify the correct docking complex structure within top ranks among alternative conformations. We present a novel protein docking algorithm that utilizes imperfect protein-protein binding interface prediction for guiding protein docking. Since the accuracy of protein binding site prediction varies depending on cases, the challenge is to develop a method which does not deteriorate but improves docking results by using a binding site prediction which may not be 100% accurate. The algorithm, named PI-LZerD (using Predicted Interface with Local 3D Zernike descriptor-based Docking algorithm), is based on a pair wise protein docking prediction algorithm, LZerD, which we have developed earlier. PI-LZerD starts from performing docking prediction using the provided protein-protein binding interface prediction as constraints, which is followed by the second round of docking with updated docking interface information to further improve docking conformation. Benchmark results on bound and unbound cases show that PI-LZerD consistently improves the docking prediction accuracy as compared with docking without using binding site prediction or using the binding site prediction as post-filtering. We have developed PI-LZerD, a pairwise docking algorithm, which uses imperfect protein-protein binding interface prediction to improve docking accuracy. PI-LZerD consistently showed better prediction accuracy over alternative methods in the series of benchmark experiments including docking using actual docking interface site predictions as well as unbound docking cases.

  2. CD-Based Indices for Link Prediction in Complex Network.

    PubMed

    Wang, Tao; Wang, Hongjue; Wang, Xiaoxia

    2016-01-01

    Lots of similarity-based algorithms have been designed to deal with the problem of link prediction in the past decade. In order to improve prediction accuracy, a novel cosine similarity index CD based on distance between nodes and cosine value between vectors is proposed in this paper. Firstly, node coordinate matrix can be obtained by node distances which are different from distance matrix and row vectors of the matrix are regarded as coordinates of nodes. Then, cosine value between node coordinates is used as their similarity index. A local community density index LD is also proposed. Then, a series of CD-based indices include CD-LD-k, CD*LD-k, CD-k and CDI are presented and applied in ten real networks. Experimental results demonstrate the effectiveness of CD-based indices. The effects of network clustering coefficient and assortative coefficient on prediction accuracy of indices are analyzed. CD-LD-k and CD*LD-k can improve prediction accuracy without considering the assortative coefficient of network is negative or positive. According to analysis of relative precision of each method on each network, CD-LD-k and CD*LD-k indices have excellent average performance and robustness. CD and CD-k indices perform better on positive assortative networks than on negative assortative networks. For negative assortative networks, we improve and refine CD index, referred as CDI index, combining the advantages of CD index and evolutionary mechanism of the network model BA. Experimental results reveal that CDI index can increase prediction accuracy of CD on negative assortative networks.

  3. CD-Based Indices for Link Prediction in Complex Network

    PubMed Central

    Wang, Tao; Wang, Hongjue; Wang, Xiaoxia

    2016-01-01

    Lots of similarity-based algorithms have been designed to deal with the problem of link prediction in the past decade. In order to improve prediction accuracy, a novel cosine similarity index CD based on distance between nodes and cosine value between vectors is proposed in this paper. Firstly, node coordinate matrix can be obtained by node distances which are different from distance matrix and row vectors of the matrix are regarded as coordinates of nodes. Then, cosine value between node coordinates is used as their similarity index. A local community density index LD is also proposed. Then, a series of CD-based indices include CD-LD-k, CD*LD-k, CD-k and CDI are presented and applied in ten real networks. Experimental results demonstrate the effectiveness of CD-based indices. The effects of network clustering coefficient and assortative coefficient on prediction accuracy of indices are analyzed. CD-LD-k and CD*LD-k can improve prediction accuracy without considering the assortative coefficient of network is negative or positive. According to analysis of relative precision of each method on each network, CD-LD-k and CD*LD-k indices have excellent average performance and robustness. CD and CD-k indices perform better on positive assortative networks than on negative assortative networks. For negative assortative networks, we improve and refine CD index, referred as CDI index, combining the advantages of CD index and evolutionary mechanism of the network model BA. Experimental results reveal that CDI index can increase prediction accuracy of CD on negative assortative networks. PMID:26752405

  4. Accuracy of genomic prediction using deregressed breeding values estimated from purebred and crossbred offspring phenotypes in pigs.

    PubMed

    Hidalgo, A M; Bastiaansen, J W M; Lopes, M S; Veroneze, R; Groenen, M A M; de Koning, D-J

    2015-07-01

    Genomic selection is applied to dairy cattle breeding to improve the genetic progress of purebred (PB) animals, whereas in pigs and poultry the target is a crossbred (CB) animal for which a different strategy appears to be needed. The source of information used to estimate the breeding values, i.e., using phenotypes of CB or PB animals, may affect the accuracy of prediction. The objective of our study was to assess the direct genomic value (DGV) accuracy of CB and PB pigs using different sources of phenotypic information. Data used were from 3 populations: 2,078 Dutch Landrace-based, 2,301 Large White-based, and 497 crossbreds from an F1 cross between the 2 lines. Two female reproduction traits were analyzed: gestation length (GLE) and total number of piglets born (TNB). Phenotypes used in the analyses originated from offspring of genotyped individuals. Phenotypes collected on CB and PB animals were analyzed as separate traits using a single-trait model. Breeding values were estimated separately for each trait in a pedigree BLUP analysis and subsequently deregressed. Deregressed EBV for each trait originating from different sources (CB or PB offspring) were used to study the accuracy of genomic prediction. Accuracy of prediction was computed as the correlation between DGV and the DEBV of the validation population. Accuracy of prediction within PB populations ranged from 0.43 to 0.62 across GLE and TNB. Accuracies to predict genetic merit of CB animals with one PB population in the training set ranged from 0.12 to 0.28, with the exception of using the CB offspring phenotype of the Dutch Landrace that resulted in an accuracy estimate around 0 for both traits. Accuracies to predict genetic merit of CB animals with both parental PB populations in the training set ranged from 0.17 to 0.30. We conclude that prediction within population and trait had good predictive ability regardless of the trait being the PB or CB performance, whereas using PB population(s) to predict genetic merit of CB animals had zero to moderate predictive ability. We observed that the DGV accuracy of CB animals when training on PB data was greater than or equal to training on CB data. However, when results are corrected for the different levels of reliabilities in the PB and CB training data, we showed that training on CB data does outperform PB data for the prediction of CB genetic merit, indicating that more CB animals should be phenotyped to increase the reliability and, consequently, accuracy of DGV for CB genetic merit.

  5. Effect of genetic architecture on the prediction accuracy of quantitative traits in samples of unrelated individuals.

    PubMed

    Morgante, Fabio; Huang, Wen; Maltecca, Christian; Mackay, Trudy F C

    2018-06-01

    Predicting complex phenotypes from genomic data is a fundamental aim of animal and plant breeding, where we wish to predict genetic merits of selection candidates; and of human genetics, where we wish to predict disease risk. While genomic prediction models work well with populations of related individuals and high linkage disequilibrium (LD) (e.g., livestock), comparable models perform poorly for populations of unrelated individuals and low LD (e.g., humans). We hypothesized that low prediction accuracies in the latter situation may occur when the genetics architecture of the trait departs from the infinitesimal and additive architecture assumed by most prediction models. We used simulated data for 10,000 lines based on sequence data from a population of unrelated, inbred Drosophila melanogaster lines to evaluate this hypothesis. We show that, even in very simplified scenarios meant as a stress test of the commonly used Genomic Best Linear Unbiased Predictor (G-BLUP) method, using all common variants yields low prediction accuracy regardless of the trait genetic architecture. However, prediction accuracy increases when predictions are informed by the genetic architecture inferred from mapping the top variants affecting main effects and interactions in the training data, provided there is sufficient power for mapping. When the true genetic architecture is largely or partially due to epistatic interactions, the additive model may not perform well, while models that account explicitly for interactions generally increase prediction accuracy. Our results indicate that accounting for genetic architecture can improve prediction accuracy for quantitative traits.

  6. State of Jet Noise Prediction-NASA Perspective

    NASA Technical Reports Server (NTRS)

    Bridges, James E.

    2008-01-01

    This presentation covers work primarily done under the Airport Noise Technical Challenge portion of the Supersonics Project in the Fundamental Aeronautics Program. To provide motivation and context, the presentation starts with a brief overview of the Airport Noise Technical Challenge. It then covers the state of NASA s jet noise prediction tools in empirical, RANS-based, and time-resolved categories. The empirical tools, requires seconds to provide a prediction of noise spectral directivity with an accuracy of a few dB, but only for axisymmetric configurations. The RANS-based tools are able to discern the impact of three-dimensional features, but are currently deficient in predicting noise from heated jets and jets with high speed and require hours to produce their prediction. The time-resolved codes are capable of predicting resonances and other time-dependent phenomena, but are very immature, requiring months to deliver predictions without unknown accuracies and dependabilities. In toto, however, when one considers the progress being made it appears that aeroacoustic prediction tools are soon to approach the level of sophistication and accuracy of aerodynamic engineering tools.

  7. Genome-based prediction of test cross performance in two subsequent breeding cycles.

    PubMed

    Hofheinz, Nina; Borchardt, Dietrich; Weissleder, Knuth; Frisch, Matthias

    2012-12-01

    Genome-based prediction of genetic values is expected to overcome shortcomings that limit the application of QTL mapping and marker-assisted selection in plant breeding. Our goal was to study the genome-based prediction of test cross performance with genetic effects that were estimated using genotypes from the preceding breeding cycle. In particular, our objectives were to employ a ridge regression approach that approximates best linear unbiased prediction of genetic effects, compare cross validation with validation using genetic material of the subsequent breeding cycle, and investigate the prospects of genome-based prediction in sugar beet breeding. We focused on the traits sugar content and standard molasses loss (ML) and used a set of 310 sugar beet lines to estimate genetic effects at 384 SNP markers. In cross validation, correlations >0.8 between observed and predicted test cross performance were observed for both traits. However, in validation with 56 lines from the next breeding cycle, a correlation of 0.8 could only be observed for sugar content, for standard ML the correlation reduced to 0.4. We found that ridge regression based on preliminary estimates of the heritability provided a very good approximation of best linear unbiased prediction and was not accompanied with a loss in prediction accuracy. We conclude that prediction accuracy assessed with cross validation within one cycle of a breeding program can not be used as an indicator for the accuracy of predicting lines of the next cycle. Prediction of lines of the next cycle seems promising for traits with high heritabilities.

  8. On the accuracy of ERS-1 orbit predictions

    NASA Technical Reports Server (NTRS)

    Koenig, Rolf; Li, H.; Massmann, Franz-Heinrich; Raimondo, J. C.; Rajasenan, C.; Reigber, C.

    1993-01-01

    Since the launch of ERS-1, the D-PAF (German Processing and Archiving Facility) provides regularly orbit predictions for the worldwide SLR (Satellite Laser Ranging) tracking network. The weekly distributed orbital elements are so called tuned IRV's and tuned SAO-elements. The tuning procedure, designed to improve the accuracy of the recovery of the orbit at the stations, is discussed based on numerical results. This shows that tuning of elements is essential for ERS-1 with the currently applied tracking procedures. The orbital elements are updated by daily distributed time bias functions. The generation of the time bias function is explained. Problems and numerical results are presented. The time bias function increases the prediction accuracy considerably. Finally, the quality assessment of ERS-1 orbit predictions is described. The accuracy is compiled for about 250 days since launch. The average accuracy lies in the range of 50-100 ms and has considerably improved.

  9. Accuracy statistics in predicting Independent Activities of Daily Living (IADL) capacity with comprehensive and brief neuropsychological test batteries.

    PubMed

    Karzmark, Peter; Deutsch, Gayle K

    2018-01-01

    This investigation was designed to determine the predictive accuracy of a comprehensive neuropsychological and brief neuropsychological test battery with regard to the capacity to perform instrumental activities of daily living (IADLs). Accuracy statistics that included measures of sensitivity, specificity, positive and negative predicted power and positive likelihood ratio were calculated for both types of batteries. The sample was drawn from a general neurological group of adults (n = 117) that included a number of older participants (age >55; n = 38). Standardized neuropsychological assessments were administered to all participants and were comprised of the Halstead Reitan Battery and portions of the Wechsler Adult Intelligence Scale-III. A comprehensive test battery yielded a moderate increase over base-rate in predictive accuracy that generalized to older individuals. There was only limited support for using a brief battery, for although sensitivity was high, specificity was low. We found that a comprehensive neuropsychological test battery provided good classification accuracy for predicting IADL capacity.

  10. Medium- and Long-term Prediction of LOD Change by the Leap-step Autoregressive Model

    NASA Astrophysics Data System (ADS)

    Wang, Qijie

    2015-08-01

    The accuracy of medium- and long-term prediction of length of day (LOD) change base on combined least-square and autoregressive (LS+AR) deteriorates gradually. Leap-step autoregressive (LSAR) model can significantly reduce the edge effect of the observation sequence. Especially, LSAR model greatly improves the resolution of signals’ low-frequency components. Therefore, it can improve the efficiency of prediction. In this work, LSAR is used to forecast the LOD change. The LOD series from EOP 08 C04 provided by IERS is modeled by both the LSAR and AR models. The results of the two models are analyzed and compared. When the prediction length is between 10-30 days, the accuracy improvement is less than 10%. When the prediction length amounts to above 30 day, the accuracy improved obviously, with the maximum being around 19%. The results show that the LSAR model has higher prediction accuracy and stability in medium- and long-term prediction.

  11. Prediction-Oriented Marker Selection (PROMISE): With Application to High-Dimensional Regression.

    PubMed

    Kim, Soyeon; Baladandayuthapani, Veerabhadran; Lee, J Jack

    2017-06-01

    In personalized medicine, biomarkers are used to select therapies with the highest likelihood of success based on an individual patient's biomarker/genomic profile. Two goals are to choose important biomarkers that accurately predict treatment outcomes and to cull unimportant biomarkers to reduce the cost of biological and clinical verifications. These goals are challenging due to the high dimensionality of genomic data. Variable selection methods based on penalized regression (e.g., the lasso and elastic net) have yielded promising results. However, selecting the right amount of penalization is critical to simultaneously achieving these two goals. Standard approaches based on cross-validation (CV) typically provide high prediction accuracy with high true positive rates but at the cost of too many false positives. Alternatively, stability selection (SS) controls the number of false positives, but at the cost of yielding too few true positives. To circumvent these issues, we propose prediction-oriented marker selection (PROMISE), which combines SS with CV to conflate the advantages of both methods. Our application of PROMISE with the lasso and elastic net in data analysis shows that, compared to CV, PROMISE produces sparse solutions, few false positives, and small type I + type II error, and maintains good prediction accuracy, with a marginal decrease in the true positive rates. Compared to SS, PROMISE offers better prediction accuracy and true positive rates. In summary, PROMISE can be applied in many fields to select regularization parameters when the goals are to minimize false positives and maximize prediction accuracy.

  12. Genomic relationships based on X chromosome markers and accuracy of genomic predictions with and without X chromosome markers

    PubMed Central

    2014-01-01

    Background Although the X chromosome is the second largest bovine chromosome, markers on the X chromosome are not used for genomic prediction in some countries and populations. In this study, we presented a method for computing genomic relationships using X chromosome markers, investigated the accuracy of imputation from a low density (7K) to the 54K SNP (single nucleotide polymorphism) panel, and compared the accuracy of genomic prediction with and without using X chromosome markers. Methods The impact of considering X chromosome markers on prediction accuracy was assessed using data from Nordic Holstein bulls and different sets of SNPs: (a) the 54K SNPs for reference and test animals, (b) SNPs imputed from the 7K to the 54K SNP panel for test animals, (c) SNPs imputed from the 7K to the 54K panel for half of the reference animals, and (d) the 7K SNP panel for all animals. Beagle and Findhap were used for imputation. GBLUP (genomic best linear unbiased prediction) models with or without X chromosome markers and with or without a residual polygenic effect were used to predict genomic breeding values for 15 traits. Results Averaged over the two imputation datasets, correlation coefficients between imputed and true genotypes for autosomal markers, pseudo-autosomal markers, and X-specific markers were 0.971, 0.831 and 0.935 when using Findhap, and 0.983, 0.856 and 0.937 when using Beagle. Estimated reliabilities of genomic predictions based on the imputed datasets using Findhap or Beagle were very close to those using the real 54K data. Genomic prediction using all markers gave slightly higher reliabilities than predictions without X chromosome markers. Based on our data which included only bulls, using a G matrix that accounted for sex-linked relationships did not improve prediction, compared with a G matrix that did not account for sex-linked relationships. A model that included a polygenic effect did not recover the loss of prediction accuracy from exclusion of X chromosome markers. Conclusions The results from this study suggest that markers on the X chromosome contribute to accuracy of genomic predictions and should be used for routine genomic evaluation. PMID:25080199

  13. Leuconostoc mesenteroides growth in food products: prediction and sensitivity analysis by adaptive-network-based fuzzy inference systems.

    PubMed

    Wang, Hue-Yu; Wen, Ching-Feng; Chiu, Yu-Hsien; Lee, I-Nong; Kao, Hao-Yun; Lee, I-Chen; Ho, Wen-Hsien

    2013-01-01

    An adaptive-network-based fuzzy inference system (ANFIS) was compared with an artificial neural network (ANN) in terms of accuracy in predicting the combined effects of temperature (10.5 to 24.5°C), pH level (5.5 to 7.5), sodium chloride level (0.25% to 6.25%) and sodium nitrite level (0 to 200 ppm) on the growth rate of Leuconostoc mesenteroides under aerobic and anaerobic conditions. THE ANFIS AND ANN MODELS WERE COMPARED IN TERMS OF SIX STATISTICAL INDICES CALCULATED BY COMPARING THEIR PREDICTION RESULTS WITH ACTUAL DATA: mean absolute percentage error (MAPE), root mean square error (RMSE), standard error of prediction percentage (SEP), bias factor (Bf), accuracy factor (Af), and absolute fraction of variance (R (2)). Graphical plots were also used for model comparison. The learning-based systems obtained encouraging prediction results. Sensitivity analyses of the four environmental factors showed that temperature and, to a lesser extent, NaCl had the most influence on accuracy in predicting the growth rate of Leuconostoc mesenteroides under aerobic and anaerobic conditions. The observed effectiveness of ANFIS for modeling microbial kinetic parameters confirms its potential use as a supplemental tool in predictive mycology. Comparisons between growth rates predicted by ANFIS and actual experimental data also confirmed the high accuracy of the Gaussian membership function in ANFIS. Comparisons of the six statistical indices under both aerobic and anaerobic conditions also showed that the ANFIS model was better than all ANN models in predicting the four kinetic parameters. Therefore, the ANFIS model is a valuable tool for quickly predicting the growth rate of Leuconostoc mesenteroides under aerobic and anaerobic conditions.

  14. Leuconostoc Mesenteroides Growth in Food Products: Prediction and Sensitivity Analysis by Adaptive-Network-Based Fuzzy Inference Systems

    PubMed Central

    Wang, Hue-Yu; Wen, Ching-Feng; Chiu, Yu-Hsien; Lee, I-Nong; Kao, Hao-Yun; Lee, I-Chen; Ho, Wen-Hsien

    2013-01-01

    Background An adaptive-network-based fuzzy inference system (ANFIS) was compared with an artificial neural network (ANN) in terms of accuracy in predicting the combined effects of temperature (10.5 to 24.5°C), pH level (5.5 to 7.5), sodium chloride level (0.25% to 6.25%) and sodium nitrite level (0 to 200 ppm) on the growth rate of Leuconostoc mesenteroides under aerobic and anaerobic conditions. Methods The ANFIS and ANN models were compared in terms of six statistical indices calculated by comparing their prediction results with actual data: mean absolute percentage error (MAPE), root mean square error (RMSE), standard error of prediction percentage (SEP), bias factor (Bf), accuracy factor (Af), and absolute fraction of variance (R 2). Graphical plots were also used for model comparison. Conclusions The learning-based systems obtained encouraging prediction results. Sensitivity analyses of the four environmental factors showed that temperature and, to a lesser extent, NaCl had the most influence on accuracy in predicting the growth rate of Leuconostoc mesenteroides under aerobic and anaerobic conditions. The observed effectiveness of ANFIS for modeling microbial kinetic parameters confirms its potential use as a supplemental tool in predictive mycology. Comparisons between growth rates predicted by ANFIS and actual experimental data also confirmed the high accuracy of the Gaussian membership function in ANFIS. Comparisons of the six statistical indices under both aerobic and anaerobic conditions also showed that the ANFIS model was better than all ANN models in predicting the four kinetic parameters. Therefore, the ANFIS model is a valuable tool for quickly predicting the growth rate of Leuconostoc mesenteroides under aerobic and anaerobic conditions. PMID:23705023

  15. High accuracy operon prediction method based on STRING database scores.

    PubMed

    Taboada, Blanca; Verde, Cristina; Merino, Enrique

    2010-07-01

    We present a simple and highly accurate computational method for operon prediction, based on intergenic distances and functional relationships between the protein products of contiguous genes, as defined by STRING database (Jensen,L.J., Kuhn,M., Stark,M., Chaffron,S., Creevey,C., Muller,J., Doerks,T., Julien,P., Roth,A., Simonovic,M. et al. (2009) STRING 8-a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res., 37, D412-D416). These two parameters were used to train a neural network on a subset of experimentally characterized Escherichia coli and Bacillus subtilis operons. Our predictive model was successfully tested on the set of experimentally defined operons in E. coli and B. subtilis, with accuracies of 94.6 and 93.3%, respectively. As far as we know, these are the highest accuracies ever obtained for predicting bacterial operons. Furthermore, in order to evaluate the predictable accuracy of our model when using an organism's data set for the training procedure, and a different organism's data set for testing, we repeated the E. coli operon prediction analysis using a neural network trained with B. subtilis data, and a B. subtilis analysis using a neural network trained with E. coli data. Even for these cases, the accuracies reached with our method were outstandingly high, 91.5 and 93%, respectively. These results show the potential use of our method for accurately predicting the operons of any other organism. Our operon predictions for fully-sequenced genomes are available at http://operons.ibt.unam.mx/OperonPredictor/.

  16. Validity, accuracy, and predictive value of urinary tract infection signs and symptoms in individuals with spinal cord injury on intermittent catheterization.

    PubMed

    Massa, Luiz M; Hoffman, Jeanne M; Cardenas, Diana D

    2009-01-01

    To determine the validity, accuracy, and predictive value of the signs and symptoms of urinary tract infection (UTI) for individuals with spinal cord injury (SCI) using intermittent catheterization (IC) and the accuracy of individuals with SCI on IC at predicting their own UTI. Prospective cohort based on data from the first 3 months of a 1-year randomized controlled trial to evaluate UTI prevention effectiveness of hydrophilic and standard catheters. Fifty-six community-based individuals on IC. Presence of UTI as defined as bacteriuria with a colony count of at least 10(5) colony-forming units/mL and at least 1 sign or symptom of UTI. Analysis of monthly urine culture and urinalysis data combined with analysis of monthly data collected using a questionnaire that asked subjects to self-report on UTI signs and symptoms and whether or not they felt they had a UTI. Overall, "cloudy urine" had the highest accuracy (83.1%), and "leukocytes in the urine" had the highest sensitivity (82.8%). The highest specificity was for "fever" (99.0%); however, it had a very low sensitivity (6.9%). Subjects were able to predict their own UTI with an accuracy of 66.2%, and the negative predictive value (82.8%) was substantially higher than the positive predictive value (32.6%). The UTI signs and symptoms can predict a UTI more accurately than individual subjects can by using subjective impressions of their own signs and symptoms. Subjects were better at predicting when they did not have a UTI than when they did have a UTI.

  17. Effects of number of training generations on genomic prediction for various traits in a layer chicken population.

    PubMed

    Weng, Ziqing; Wolc, Anna; Shen, Xia; Fernando, Rohan L; Dekkers, Jack C M; Arango, Jesus; Settar, Petek; Fulton, Janet E; O'Sullivan, Neil P; Garrick, Dorian J

    2016-03-19

    Genomic estimated breeding values (GEBV) based on single nucleotide polymorphism (SNP) genotypes are widely used in animal improvement programs. It is typically assumed that the larger the number of animals is in the training set, the higher is the prediction accuracy of GEBV. The aim of this study was to quantify genomic prediction accuracy depending on the number of ancestral generations included in the training set, and to determine the optimal number of training generations for different traits in an elite layer breeding line. Phenotypic records for 16 traits on 17,793 birds were used. All parents and some selection candidates from nine non-overlapping generations were genotyped for 23,098 segregating SNPs. An animal model with pedigree relationships (PBLUP) and the BayesB genomic prediction model were applied to predict EBV or GEBV at each validation generation (progeny of the most recent training generation) based on varying numbers of immediately preceding ancestral generations. Prediction accuracy of EBV or GEBV was assessed as the correlation between EBV and phenotypes adjusted for fixed effects, divided by the square root of trait heritability. The optimal number of training generations that resulted in the greatest prediction accuracy of GEBV was determined for each trait. The relationship between optimal number of training generations and heritability was investigated. On average, accuracies were higher with the BayesB model than with PBLUP. Prediction accuracies of GEBV increased as the number of closely-related ancestral generations included in the training set increased, but reached an asymptote or slightly decreased when distant ancestral generations were used in the training set. The optimal number of training generations was 4 or more for high heritability traits but less than that for low heritability traits. For less heritable traits, limiting the training datasets to individuals closely related to the validation population resulted in the best predictions. The effect of adding distant ancestral generations in the training set on prediction accuracy differed between traits and the optimal number of necessary training generations is associated with the heritability of traits.

  18. Genomic Prediction Accounting for Residual Heteroskedasticity.

    PubMed

    Ou, Zhining; Tempelman, Robert J; Steibel, Juan P; Ernst, Catherine W; Bates, Ronald O; Bello, Nora M

    2015-11-12

    Whole-genome prediction (WGP) models that use single-nucleotide polymorphism marker information to predict genetic merit of animals and plants typically assume homogeneous residual variance. However, variability is often heterogeneous across agricultural production systems and may subsequently bias WGP-based inferences. This study extends classical WGP models based on normality, heavy-tailed specifications and variable selection to explicitly account for environmentally-driven residual heteroskedasticity under a hierarchical Bayesian mixed-models framework. WGP models assuming homogeneous or heterogeneous residual variances were fitted to training data generated under simulation scenarios reflecting a gradient of increasing heteroskedasticity. Model fit was based on pseudo-Bayes factors and also on prediction accuracy of genomic breeding values computed on a validation data subset one generation removed from the simulated training dataset. Homogeneous vs. heterogeneous residual variance WGP models were also fitted to two quantitative traits, namely 45-min postmortem carcass temperature and loin muscle pH, recorded in a swine resource population dataset prescreened for high and mild residual heteroskedasticity, respectively. Fit of competing WGP models was compared using pseudo-Bayes factors. Predictive ability, defined as the correlation between predicted and observed phenotypes in validation sets of a five-fold cross-validation was also computed. Heteroskedastic error WGP models showed improved model fit and enhanced prediction accuracy compared to homoskedastic error WGP models although the magnitude of the improvement was small (less than two percentage points net gain in prediction accuracy). Nevertheless, accounting for residual heteroskedasticity did improve accuracy of selection, especially on individuals of extreme genetic merit. Copyright © 2016 Ou et al.

  19. Improving Fermi Orbit Determination and Prediction in an Uncertain Atmospheric Drag Environment

    NASA Technical Reports Server (NTRS)

    Vavrina, Matthew A.; Newman, Clark P.; Slojkowski, Steven E.; Carpenter, J. Russell

    2014-01-01

    Orbit determination and prediction of the Fermi Gamma-ray Space Telescope trajectory is strongly impacted by the unpredictability and variability of atmospheric density and the spacecraft's ballistic coefficient. Operationally, Global Positioning System point solutions are processed with an extended Kalman filter for orbit determination, and predictions are generated for conjunction assessment with secondary objects. When these predictions are compared to Joint Space Operations Center radar-based solutions, the close approach distance between the two predictions can greatly differ ahead of the conjunction. This work explores strategies for improving prediction accuracy and helps to explain the prediction disparities. Namely, a tuning analysis is performed to determine atmospheric drag modeling and filter parameters that can improve orbit determination as well as prediction accuracy. A 45% improvement in three-day prediction accuracy is realized by tuning the ballistic coefficient and atmospheric density stochastic models, measurement frequency, and other modeling and filter parameters.

  20. Genomic prediction of piglet response to infection with one of two porcine reproductive and respiratory syndrome virus isolates.

    PubMed

    Waide, Emily H; Tuggle, Christopher K; Serão, Nick V L; Schroyen, Martine; Hess, Andrew; Rowland, Raymond R R; Lunney, Joan K; Plastow, Graham; Dekkers, Jack C M

    2018-02-01

    Genomic prediction of the pig's response to the porcine reproductive and respiratory syndrome (PRRS) virus (PRRSV) would be a useful tool in the swine industry. This study investigated the accuracy of genomic prediction based on porcine SNP60 Beadchip data using training and validation datasets from populations with different genetic backgrounds that were challenged with different PRRSV isolates. Genomic prediction accuracy averaged 0.34 for viral load (VL) and 0.23 for weight gain (WG) following experimental PRRSV challenge, which demonstrates that genomic selection could be used to improve response to PRRSV infection. Training on WG data during infection with a less virulent PRRSV, KS06, resulted in poor accuracy of prediction for WG during infection with a more virulent PRRSV, NVSL. Inclusion of single nucleotide polymorphisms (SNPs) that are in linkage disequilibrium with a major quantitative trait locus (QTL) on chromosome 4 was vital for accurate prediction of VL. Overall, SNPs that were significantly associated with either trait in single SNP genome-wide association analysis were unable to predict the phenotypes with an accuracy as high as that obtained by using all genotyped SNPs across the genome. Inclusion of data from close relatives into the training population increased whole genome prediction accuracy by 33% for VL and by 37% for WG but did not affect the accuracy of prediction when using only SNPs in the major QTL region. Results show that genomic prediction of response to PRRSV infection is moderately accurate and, when using all SNPs on the porcine SNP60 Beadchip, is not very sensitive to differences in virulence of the PRRSV in training and validation populations. Including close relatives in the training population increased prediction accuracy when using the whole genome or SNPs other than those near a major QTL.

  1. Fatigue life prediction of rotor blade composites: Validation of constant amplitude formulations with variable amplitude experiments

    NASA Astrophysics Data System (ADS)

    Westphal, T.; Nijssen, R. P. L.

    2014-12-01

    The effect of Constant Life Diagram (CLD) formulation on the fatigue life prediction under variable amplitude (VA) loading was investigated based on variable amplitude tests using three different load spectra representative for wind turbine loading. Next to the Wisper and WisperX spectra, the recently developed NewWisper2 spectrum was used. Based on these variable amplitude fatigue results the prediction accuracy of 4 CLD formulations is investigated. In the study a piecewise linear CLD based on the S-N curves for 9 load ratios compares favourably in terms of prediction accuracy and conservativeness. For the specific laminate used in this study Boerstra's Multislope model provides a good alternative at reduced test effort.

  2. Predicting human olfactory perception from chemical features of odor molecules.

    PubMed

    Keller, Andreas; Gerkin, Richard C; Guan, Yuanfang; Dhurandhar, Amit; Turu, Gabor; Szalai, Bence; Mainland, Joel D; Ihara, Yusuke; Yu, Chung Wen; Wolfinger, Russ; Vens, Celine; Schietgat, Leander; De Grave, Kurt; Norel, Raquel; Stolovitzky, Gustavo; Cecchi, Guillermo A; Vosshall, Leslie B; Meyer, Pablo

    2017-02-24

    It is still not possible to predict whether a given molecule will have a perceived odor or what olfactory percept it will produce. We therefore organized the crowd-sourced DREAM Olfaction Prediction Challenge. Using a large olfactory psychophysical data set, teams developed machine-learning algorithms to predict sensory attributes of molecules based on their chemoinformatic features. The resulting models accurately predicted odor intensity and pleasantness and also successfully predicted 8 among 19 rated semantic descriptors ("garlic," "fish," "sweet," "fruit," "burnt," "spices," "flower," and "sour"). Regularized linear models performed nearly as well as random forest-based ones, with a predictive accuracy that closely approaches a key theoretical limit. These models help to predict the perceptual qualities of virtually any molecule with high accuracy and also reverse-engineer the smell of a molecule. Copyright © 2017, American Association for the Advancement of Science.

  3. Ensemble-based prediction of RNA secondary structures.

    PubMed

    Aghaeepour, Nima; Hoos, Holger H

    2013-04-24

    Accurate structure prediction methods play an important role for the understanding of RNA function. Energy-based, pseudoknot-free secondary structure prediction is one of the most widely used and versatile approaches, and improved methods for this task have received much attention over the past five years. Despite the impressive progress that as been achieved in this area, existing evaluations of the prediction accuracy achieved by various algorithms do not provide a comprehensive, statistically sound assessment. Furthermore, while there is increasing evidence that no prediction algorithm consistently outperforms all others, no work has been done to exploit the complementary strengths of multiple approaches. In this work, we present two contributions to the area of RNA secondary structure prediction. Firstly, we use state-of-the-art, resampling-based statistical methods together with a previously published and increasingly widely used dataset of high-quality RNA structures to conduct a comprehensive evaluation of existing RNA secondary structure prediction procedures. The results from this evaluation clarify the performance relationship between ten well-known existing energy-based pseudoknot-free RNA secondary structure prediction methods and clearly demonstrate the progress that has been achieved in recent years. Secondly, we introduce AveRNA, a generic and powerful method for combining a set of existing secondary structure prediction procedures into an ensemble-based method that achieves significantly higher prediction accuracies than obtained from any of its component procedures. Our new, ensemble-based method, AveRNA, improves the state of the art for energy-based, pseudoknot-free RNA secondary structure prediction by exploiting the complementary strengths of multiple existing prediction procedures, as demonstrated using a state-of-the-art statistical resampling approach. In addition, AveRNA allows an intuitive and effective control of the trade-off between false negative and false positive base pair predictions. Finally, AveRNA can make use of arbitrary sets of secondary structure prediction procedures and can therefore be used to leverage improvements in prediction accuracy offered by algorithms and energy models developed in the future. Our data, MATLAB software and a web-based version of AveRNA are publicly available at http://www.cs.ubc.ca/labs/beta/Software/AveRNA.

  4. Predicting protein subcellular locations using hierarchical ensemble of Bayesian classifiers based on Markov chains.

    PubMed

    Bulashevska, Alla; Eils, Roland

    2006-06-14

    The subcellular location of a protein is closely related to its function. It would be worthwhile to develop a method to predict the subcellular location for a given protein when only the amino acid sequence of the protein is known. Although many efforts have been made to predict subcellular location from sequence information only, there is the need for further research to improve the accuracy of prediction. A novel method called HensBC is introduced to predict protein subcellular location. HensBC is a recursive algorithm which constructs a hierarchical ensemble of classifiers. The classifiers used are Bayesian classifiers based on Markov chain models. We tested our method on six various datasets; among them are Gram-negative bacteria dataset, data for discriminating outer membrane proteins and apoptosis proteins dataset. We observed that our method can predict the subcellular location with high accuracy. Another advantage of the proposed method is that it can improve the accuracy of the prediction of some classes with few sequences in training and is therefore useful for datasets with imbalanced distribution of classes. This study introduces an algorithm which uses only the primary sequence of a protein to predict its subcellular location. The proposed recursive scheme represents an interesting methodology for learning and combining classifiers. The method is computationally efficient and competitive with the previously reported approaches in terms of prediction accuracies as empirical results indicate. The code for the software is available upon request.

  5. Family-Based Benchmarking of Copy Number Variation Detection Software.

    PubMed

    Nutsua, Marcel Elie; Fischer, Annegret; Nebel, Almut; Hofmann, Sylvia; Schreiber, Stefan; Krawczak, Michael; Nothnagel, Michael

    2015-01-01

    The analysis of structural variants, in particular of copy-number variations (CNVs), has proven valuable in unraveling the genetic basis of human diseases. Hence, a large number of algorithms have been developed for the detection of CNVs in SNP array signal intensity data. Using the European and African HapMap trio data, we undertook a comparative evaluation of six commonly used CNV detection software tools, namely Affymetrix Power Tools (APT), QuantiSNP, PennCNV, GLAD, R-gada and VEGA, and assessed their level of pair-wise prediction concordance. The tool-specific CNV prediction accuracy was assessed in silico by way of intra-familial validation. Software tools differed greatly in terms of the number and length of the CNVs predicted as well as the number of markers included in a CNV. All software tools predicted substantially more deletions than duplications. Intra-familial validation revealed consistently low levels of prediction accuracy as measured by the proportion of validated CNVs (34-60%). Moreover, up to 20% of apparent family-based validations were found to be due to chance alone. Software using Hidden Markov models (HMM) showed a trend to predict fewer CNVs than segmentation-based algorithms albeit with greater validity. PennCNV yielded the highest prediction accuracy (60.9%). Finally, the pairwise concordance of CNV prediction was found to vary widely with the software tools involved. We recommend HMM-based software, in particular PennCNV, rather than segmentation-based algorithms when validity is the primary concern of CNV detection. QuantiSNP may be used as an additional tool to detect sets of CNVs not detectable by the other tools. Our study also reemphasizes the need for laboratory-based validation, such as qPCR, of CNVs predicted in silico.

  6. A novel hybrid decomposition-and-ensemble model based on CEEMD and GWO for short-term PM2.5 concentration forecasting

    NASA Astrophysics Data System (ADS)

    Niu, Mingfei; Wang, Yufang; Sun, Shaolong; Li, Yongwu

    2016-06-01

    To enhance prediction reliability and accuracy, a hybrid model based on the promising principle of "decomposition and ensemble" and a recently proposed meta-heuristic called grey wolf optimizer (GWO) is introduced for daily PM2.5 concentration forecasting. Compared with existing PM2.5 forecasting methods, this proposed model has improved the prediction accuracy and hit rates of directional prediction. The proposed model involves three main steps, i.e., decomposing the original PM2.5 series into several intrinsic mode functions (IMFs) via complementary ensemble empirical mode decomposition (CEEMD) for simplifying the complex data; individually predicting each IMF with support vector regression (SVR) optimized by GWO; integrating all predicted IMFs for the ensemble result as the final prediction by another SVR optimized by GWO. Seven benchmark models, including single artificial intelligence (AI) models, other decomposition-ensemble models with different decomposition methods and models with the same decomposition-ensemble method but optimized by different algorithms, are considered to verify the superiority of the proposed hybrid model. The empirical study indicates that the proposed hybrid decomposition-ensemble model is remarkably superior to all considered benchmark models for its higher prediction accuracy and hit rates of directional prediction.

  7. Improving risk prediction accuracy for new soldiers in the U.S. Army by adding self-report survey data to administrative data.

    PubMed

    Bernecker, Samantha L; Rosellini, Anthony J; Nock, Matthew K; Chiu, Wai Tat; Gutierrez, Peter M; Hwang, Irving; Joiner, Thomas E; Naifeh, James A; Sampson, Nancy A; Zaslavsky, Alan M; Stein, Murray B; Ursano, Robert J; Kessler, Ronald C

    2018-04-03

    High rates of mental disorders, suicidality, and interpersonal violence early in the military career have raised interest in implementing preventive interventions with high-risk new enlistees. The Army Study to Assess Risk and Resilience in Servicemembers (STARRS) developed risk-targeting systems for these outcomes based on machine learning methods using administrative data predictors. However, administrative data omit many risk factors, raising the question whether risk targeting could be improved by adding self-report survey data to prediction models. If so, the Army may gain from routinely administering surveys that assess additional risk factors. The STARRS New Soldier Survey was administered to 21,790 Regular Army soldiers who agreed to have survey data linked to administrative records. As reported previously, machine learning models using administrative data as predictors found that small proportions of high-risk soldiers accounted for high proportions of negative outcomes. Other machine learning models using self-report survey data as predictors were developed previously for three of these outcomes: major physical violence and sexual violence perpetration among men and sexual violence victimization among women. Here we examined the extent to which this survey information increases prediction accuracy, over models based solely on administrative data, for those three outcomes. We used discrete-time survival analysis to estimate a series of models predicting first occurrence, assessing how model fit improved and concentration of risk increased when adding the predicted risk score based on survey data to the predicted risk score based on administrative data. The addition of survey data improved prediction significantly for all outcomes. In the most extreme case, the percentage of reported sexual violence victimization among the 5% of female soldiers with highest predicted risk increased from 17.5% using only administrative predictors to 29.4% adding survey predictors, a 67.9% proportional increase in prediction accuracy. Other proportional increases in concentration of risk ranged from 4.8% to 49.5% (median = 26.0%). Data from an ongoing New Soldier Survey could substantially improve accuracy of risk models compared to models based exclusively on administrative predictors. Depending upon the characteristics of interventions used, the increase in targeting accuracy from survey data might offset survey administration costs.

  8. Predict the fatigue life of crack based on extended finite element method and SVR

    NASA Astrophysics Data System (ADS)

    Song, Weizhen; Jiang, Zhansi; Jiang, Hui

    2018-05-01

    Using extended finite element method (XFEM) and support vector regression (SVR) to predict the fatigue life of plate crack. Firstly, the XFEM is employed to calculate the stress intensity factors (SIFs) with given crack sizes. Then predicetion model can be built based on the function relationship of the SIFs with the fatigue life or crack length. Finally, according to the prediction model predict the SIFs at different crack sizes or different cycles. Because of the accuracy of the forward Euler method only ensured by the small step size, a new prediction method is presented to resolve the issue. The numerical examples were studied to demonstrate the proposed method allow a larger step size and have a high accuracy.

  9. Health-based risk adjustment: is inpatient and outpatient diagnostic information sufficient?

    PubMed

    Lamers, L M

    Adequate risk adjustment is critical to the success of market-oriented health care reforms in many countries. Currently used risk adjusters based on demographic and diagnostic cost groups (DCGs) do not reflect expected costs accurately. This study examines the simultaneous predictive accuracy of inpatient and outpatient morbidity measures and prior costs. DCGs, pharmacy cost groups (PCGs), and prior year's costs improve the predictive accuracy of the demographic model substantially. DCGs and PCGs seem complementary in their ability to predict future costs. However, this study shows that the combination of DCGs and PCGs still leaves room for cream skimming.

  10. Preoperative prediction of histopathological outcome in basal cell carcinoma: flat surface and multiple small erosions predict superficial basal cell carcinoma in lighter skin types.

    PubMed

    Ahnlide, I; Zalaudek, I; Nilsson, F; Bjellerup, M; Nielsen, K

    2016-10-01

    Prediction of the histopathological subtype of basal cell carcinoma (BCC) is important for tailoring optimal treatment, especially in patients with suspected superficial BCC (sBCC). To assess the accuracy of the preoperative prediction of subtypes of BCC in clinical practice, to evaluate whether dermoscopic examination enhances accuracy and to find dermoscopic criteria for discriminating sBCC from other subtypes. The main presurgical diagnosis was compared with the histopathological, postoperative diagnosis of routinely excised skin tumours in a predominantly fair-skinned patient cohort of northern Europe during a study period of 3 years (2011-13). The study period was split in two: during period 1, dermoscopy was optional (850 cases with a pre- or postoperative diagnosis of BCC), while during period 2 (after an educational dermoscopic update) dermoscopy was mandatory (651 cases). A classification tree based on clinical and dermoscopic features for prediction of sBCC was applied. For a total of 3544 excised skin tumours, the sensitivity for the diagnosis of BCC (any subtype) was 93·3%, specificity 91·8%, and the positive predictive value (PPV) 89·0%. The diagnostic accuracy as well as the PPV and the positive likelihood ratio for sBCC were significantly higher when dermoscopy was mandatory. A flat surface and multiple small erosions predicted sBCC. The study shows a high accuracy for an overall diagnosis of BCC and increased accuracy in prediction of sBCC for the period when dermoscopy was applied in all cases. The most discriminating findings for sBCC, based on clinical and dermoscopic features in this fair-skinned population, were a flat surface and multiple small erosions. © 2016 British Association of Dermatologists.

  11. Predicting School Enrollments Using the Modified Regression Technique.

    ERIC Educational Resources Information Center

    Grip, Richard S.; Young, John W.

    This report is based on a study in which a regression model was constructed to increase accuracy in enrollment predictions. A model, known as the Modified Regression Technique (MRT), was used to examine K-12 enrollment over the past 20 years in 2 New Jersey school districts of similar size and ethnicity. To test the model's accuracy, MRT was…

  12. Predicting Intervention Effectiveness from Reading Accuracy and Rate Measures through the Instructional Hierarchy: Evidence for a Skill-by-Treatment Interaction

    ERIC Educational Resources Information Center

    Szadokierski, Isadora; Burns, Matthew K.; McComas, Jennifer J.

    2017-01-01

    The current study used the learning hierarchy/instructional hierarchy phases of acquisition and fluency to predict intervention effectiveness based on preintervention reading skills. Preintervention reading accuracy (percentage of words read correctly) and rate (number of words read correctly per minute) were assessed for 49 second- and…

  13. Developing Local Oral Reading Fluency Cut Scores for Predicting High-Stakes Test Performance

    ERIC Educational Resources Information Center

    Grapin, Sally L.; Kranzler, John H.; Waldron, Nancy; Joyce-Beaulieu, Diana; Algina, James

    2017-01-01

    This study evaluated the classification accuracy of a second grade oral reading fluency curriculum-based measure (R-CBM) in predicting third grade state test performance. It also compared the long-term classification accuracy of local and publisher-recommended R-CBM cut scores. Participants were 266 students who were divided into a calibration…

  14. Predicting Intervention Effectiveness from Oral Reading Accuracy and Rate Measures through the Learning Hierarchy/Instructional Hierarchy

    ERIC Educational Resources Information Center

    Szadokierski, Isadora Elisabeth

    2012-01-01

    The current study used the Learning Hierarchy/Instructional Hierarchy (LH/IH) to predict intervention effectiveness based on the reading skills of students who are developing reading fluency. Pre-intervention reading accuracy and rate were assessed for 49 second and third grade participants who then participated in a brief experimental analysis…

  15. A comparison of accuracy validation methods for genomic and pedigree-based predictions of swine litter size traits using Large White and simulated data.

    PubMed

    Putz, A M; Tiezzi, F; Maltecca, C; Gray, K A; Knauer, M T

    2018-02-01

    The objective of this study was to compare and determine the optimal validation method when comparing accuracy from single-step GBLUP (ssGBLUP) to traditional pedigree-based BLUP. Field data included six litter size traits. Simulated data included ten replicates designed to mimic the field data in order to determine the method that was closest to the true accuracy. Data were split into training and validation sets. The methods used were as follows: (i) theoretical accuracy derived from the prediction error variance (PEV) of the direct inverse (iLHS), (ii) approximated accuracies from the accf90(GS) program in the BLUPF90 family of programs (Approx), (iii) correlation between predictions and the single-step GEBVs from the full data set (GEBV Full ), (iv) correlation between predictions and the corrected phenotypes of females from the full data set (Y c ), (v) correlation from method iv divided by the square root of the heritability (Y ch ) and (vi) correlation between sire predictions and the average of their daughters' corrected phenotypes (Y cs ). Accuracies from iLHS increased from 0.27 to 0.37 (37%) in the Large White. Approximation accuracies were very consistent and close in absolute value (0.41 to 0.43). Both iLHS and Approx were much less variable than the corrected phenotype methods (ranging from 0.04 to 0.27). On average, simulated data showed an increase in accuracy from 0.34 to 0.44 (29%) using ssGBLUP. Both iLHS and Y ch approximated the increase well, 0.30 to 0.46 and 0.36 to 0.45, respectively. GEBV Full performed poorly in both data sets and is not recommended. Results suggest that for within-breed selection, theoretical accuracy using PEV was consistent and accurate. When direct inversion is infeasible to get the PEV, correlating predictions to the corrected phenotypes divided by the square root of heritability is adequate given a large enough validation data set. © 2017 Blackwell Verlag GmbH.

  16. Predicting online ratings based on the opinion spreading process

    NASA Astrophysics Data System (ADS)

    He, Xing-Sheng; Zhou, Ming-Yang; Zhuo, Zhao; Fu, Zhong-Qian; Liu, Jian-Guo

    2015-10-01

    Predicting users' online ratings is always a challenge issue and has drawn lots of attention. In this paper, we present a rating prediction method by combining the user opinion spreading process with the collaborative filtering algorithm, where user similarity is defined by measuring the amount of opinion a user transfers to another based on the primitive user-item rating matrix. The proposed method could produce a more precise rating prediction for each unrated user-item pair. In addition, we introduce a tunable parameter λ to regulate the preferential diffusion relevant to the degree of both opinion sender and receiver. The numerical results for Movielens and Netflix data sets show that this algorithm has a better accuracy than the standard user-based collaborative filtering algorithm using Cosine and Pearson correlation without increasing computational complexity. By tuning λ, our method could further boost the prediction accuracy when using Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) as measurements. In the optimal cases, on Movielens and Netflix data sets, the corresponding algorithmic accuracy (MAE and RMSE) are improved 11.26% and 8.84%, 13.49% and 10.52% compared to the item average method, respectively.

  17. Improving the spectral measurement accuracy based on temperature distribution and spectra-temperature relationship

    NASA Astrophysics Data System (ADS)

    Li, Zhe; Feng, Jinchao; Liu, Pengyu; Sun, Zhonghua; Li, Gang; Jia, Kebin

    2018-05-01

    Temperature is usually considered as a fluctuation in near-infrared spectral measurement. Chemometric methods were extensively studied to correct the effect of temperature variations. However, temperature can be considered as a constructive parameter that provides detailed chemical information when systematically changed during the measurement. Our group has researched the relationship between temperature-induced spectral variation (TSVC) and normalized squared temperature. In this study, we focused on the influence of temperature distribution in calibration set. Multi-temperature calibration set selection (MTCS) method was proposed to improve the prediction accuracy by considering the temperature distribution of calibration samples. Furthermore, double-temperature calibration set selection (DTCS) method was proposed based on MTCS method and the relationship between TSVC and normalized squared temperature. We compare the prediction performance of PLS models based on random sampling method and proposed methods. The results from experimental studies showed that the prediction performance was improved by using proposed methods. Therefore, MTCS method and DTCS method will be the alternative methods to improve prediction accuracy in near-infrared spectral measurement.

  18. Predicting and analyzing DNA-binding domains using a systematic approach to identifying a set of informative physicochemical and biochemical properties

    PubMed Central

    2011-01-01

    Background Existing methods of predicting DNA-binding proteins used valuable features of physicochemical properties to design support vector machine (SVM) based classifiers. Generally, selection of physicochemical properties and determination of their corresponding feature vectors rely mainly on known properties of binding mechanism and experience of designers. However, there exists a troublesome problem for designers that some different physicochemical properties have similar vectors of representing 20 amino acids and some closely related physicochemical properties have dissimilar vectors. Results This study proposes a systematic approach (named Auto-IDPCPs) to automatically identify a set of physicochemical and biochemical properties in the AAindex database to design SVM-based classifiers for predicting and analyzing DNA-binding domains/proteins. Auto-IDPCPs consists of 1) clustering 531 amino acid indices in AAindex into 20 clusters using a fuzzy c-means algorithm, 2) utilizing an efficient genetic algorithm based optimization method IBCGA to select an informative feature set of size m to represent sequences, and 3) analyzing the selected features to identify related physicochemical properties which may affect the binding mechanism of DNA-binding domains/proteins. The proposed Auto-IDPCPs identified m=22 features of properties belonging to five clusters for predicting DNA-binding domains with a five-fold cross-validation accuracy of 87.12%, which is promising compared with the accuracy of 86.62% of the existing method PSSM-400. For predicting DNA-binding sequences, the accuracy of 75.50% was obtained using m=28 features, where PSSM-400 has an accuracy of 74.22%. Auto-IDPCPs and PSSM-400 have accuracies of 80.73% and 82.81%, respectively, applied to an independent test data set of DNA-binding domains. Some typical physicochemical properties discovered are hydrophobicity, secondary structure, charge, solvent accessibility, polarity, flexibility, normalized Van Der Waals volume, pK (pK-C, pK-N, pK-COOH and pK-a(RCOOH)), etc. Conclusions The proposed approach Auto-IDPCPs would help designers to investigate informative physicochemical and biochemical properties by considering both prediction accuracy and analysis of binding mechanism simultaneously. The approach Auto-IDPCPs can be also applicable to predict and analyze other protein functions from sequences. PMID:21342579

  19. Short-arc measurement and fitting based on the bidirectional prediction of observed data

    NASA Astrophysics Data System (ADS)

    Fei, Zhigen; Xu, Xiaojie; Georgiadis, Anthimos

    2016-02-01

    To measure a short arc is a notoriously difficult problem. In this study, the bidirectional prediction method based on the Radial Basis Function Neural Network (RBFNN) to the observed data distributed along a short arc is proposed to increase the corresponding arc length, and thus improve its fitting accuracy. Firstly, the rationality of regarding observed data as a time series is discussed in accordance with the definition of a time series. Secondly, the RBFNN is constructed to predict the observed data where the interpolation method is used for enlarging the size of training examples in order to improve the learning accuracy of the RBFNN’s parameters. Finally, in the numerical simulation section, we focus on simulating how the size of the training sample and noise level influence the learning error and prediction error of the built RBFNN. Typically, the observed data coming from a 5{}^\\circ short arc are used to evaluate the performance of the Hyper method known as the ‘unbiased fitting method of circle’ with a different noise level before and after prediction. A number of simulation experiments reveal that the fitting stability and accuracy of the Hyper method after prediction are far superior to the ones before prediction.

  20. Genotyping by sequencing for genomic prediction in a soybean breeding population.

    PubMed

    Jarquín, Diego; Kocak, Kyle; Posadas, Luis; Hyma, Katie; Jedlicka, Joseph; Graef, George; Lorenz, Aaron

    2014-08-29

    Advances in genotyping technology, such as genotyping by sequencing (GBS), are making genomic prediction more attractive to reduce breeding cycle times and costs associated with phenotyping. Genomic prediction and selection has been studied in several crop species, but no reports exist in soybean. The objectives of this study were (i) evaluate prospects for genomic selection using GBS in a typical soybean breeding program and (ii) evaluate the effect of GBS marker selection and imputation on genomic prediction accuracy. To achieve these objectives, a set of soybean lines sampled from the University of Nebraska Soybean Breeding Program were genotyped using GBS and evaluated for yield and other agronomic traits at multiple Nebraska locations. Genotyping by sequencing scored 16,502 single nucleotide polymorphisms (SNPs) with minor-allele frequency (MAF) > 0.05 and percentage of missing values ≤ 5% on 301 elite soybean breeding lines. When SNPs with up to 80% missing values were included, 52,349 SNPs were scored. Prediction accuracy for grain yield, assessed using cross validation, was estimated to be 0.64, indicating good potential for using genomic selection for grain yield in soybean. Filtering SNPs based on missing data percentage had little to no effect on prediction accuracy, especially when random forest imputation was used to impute missing values. The highest accuracies were observed when random forest imputation was used on all SNPs, but differences were not significant. A standard additive G-BLUP model was robust; modeling additive-by-additive epistasis did not provide any improvement in prediction accuracy. The effect of training population size on accuracy began to plateau around 100, but accuracy steadily climbed until the largest possible size was used in this analysis. Including only SNPs with MAF > 0.30 provided higher accuracies when training populations were smaller. Using GBS for genomic prediction in soybean holds good potential to expedite genetic gain. Our results suggest that standard additive G-BLUP models can be used on unfiltered, imputed GBS data without loss in accuracy.

  1. Evaluating the effect of disturbed ensemble distributions on SCFG based statistical sampling of RNA secondary structures.

    PubMed

    Scheid, Anika; Nebel, Markus E

    2012-07-09

    Over the past years, statistical and Bayesian approaches have become increasingly appreciated to address the long-standing problem of computational RNA structure prediction. Recently, a novel probabilistic method for the prediction of RNA secondary structures from a single sequence has been studied which is based on generating statistically representative and reproducible samples of the entire ensemble of feasible structures for a particular input sequence. This method samples the possible foldings from a distribution implied by a sophisticated (traditional or length-dependent) stochastic context-free grammar (SCFG) that mirrors the standard thermodynamic model applied in modern physics-based prediction algorithms. Specifically, that grammar represents an exact probabilistic counterpart to the energy model underlying the Sfold software, which employs a sampling extension of the partition function (PF) approach to produce statistically representative subsets of the Boltzmann-weighted ensemble. Although both sampling approaches have the same worst-case time and space complexities, it has been indicated that they differ in performance (both with respect to prediction accuracy and quality of generated samples), where neither of these two competing approaches generally outperforms the other. In this work, we will consider the SCFG based approach in order to perform an analysis on how the quality of generated sample sets and the corresponding prediction accuracy changes when different degrees of disturbances are incorporated into the needed sampling probabilities. This is motivated by the fact that if the results prove to be resistant to large errors on the distinct sampling probabilities (compared to the exact ones), then it will be an indication that these probabilities do not need to be computed exactly, but it may be sufficient and more efficient to approximate them. Thus, it might then be possible to decrease the worst-case time requirements of such an SCFG based sampling method without significant accuracy losses. If, on the other hand, the quality of sampled structures can be observed to strongly react to slight disturbances, there is little hope for improving the complexity by heuristic procedures. We hence provide a reliable test for the hypothesis that a heuristic method could be implemented to improve the time scaling of RNA secondary structure prediction in the worst-case - without sacrificing much of the accuracy of the results. Our experiments indicate that absolute errors generally lead to the generation of useless sample sets, whereas relative errors seem to have only small negative impact on both the predictive accuracy and the overall quality of resulting structure samples. Based on these observations, we present some useful ideas for developing a time-reduced sampling method guaranteeing an acceptable predictive accuracy. We also discuss some inherent drawbacks that arise in the context of approximation. The key results of this paper are crucial for the design of an efficient and competitive heuristic prediction method based on the increasingly accepted and attractive statistical sampling approach. This has indeed been indicated by the construction of prototype algorithms.

  2. Evaluating the effect of disturbed ensemble distributions on SCFG based statistical sampling of RNA secondary structures

    PubMed Central

    2012-01-01

    Background Over the past years, statistical and Bayesian approaches have become increasingly appreciated to address the long-standing problem of computational RNA structure prediction. Recently, a novel probabilistic method for the prediction of RNA secondary structures from a single sequence has been studied which is based on generating statistically representative and reproducible samples of the entire ensemble of feasible structures for a particular input sequence. This method samples the possible foldings from a distribution implied by a sophisticated (traditional or length-dependent) stochastic context-free grammar (SCFG) that mirrors the standard thermodynamic model applied in modern physics-based prediction algorithms. Specifically, that grammar represents an exact probabilistic counterpart to the energy model underlying the Sfold software, which employs a sampling extension of the partition function (PF) approach to produce statistically representative subsets of the Boltzmann-weighted ensemble. Although both sampling approaches have the same worst-case time and space complexities, it has been indicated that they differ in performance (both with respect to prediction accuracy and quality of generated samples), where neither of these two competing approaches generally outperforms the other. Results In this work, we will consider the SCFG based approach in order to perform an analysis on how the quality of generated sample sets and the corresponding prediction accuracy changes when different degrees of disturbances are incorporated into the needed sampling probabilities. This is motivated by the fact that if the results prove to be resistant to large errors on the distinct sampling probabilities (compared to the exact ones), then it will be an indication that these probabilities do not need to be computed exactly, but it may be sufficient and more efficient to approximate them. Thus, it might then be possible to decrease the worst-case time requirements of such an SCFG based sampling method without significant accuracy losses. If, on the other hand, the quality of sampled structures can be observed to strongly react to slight disturbances, there is little hope for improving the complexity by heuristic procedures. We hence provide a reliable test for the hypothesis that a heuristic method could be implemented to improve the time scaling of RNA secondary structure prediction in the worst-case – without sacrificing much of the accuracy of the results. Conclusions Our experiments indicate that absolute errors generally lead to the generation of useless sample sets, whereas relative errors seem to have only small negative impact on both the predictive accuracy and the overall quality of resulting structure samples. Based on these observations, we present some useful ideas for developing a time-reduced sampling method guaranteeing an acceptable predictive accuracy. We also discuss some inherent drawbacks that arise in the context of approximation. The key results of this paper are crucial for the design of an efficient and competitive heuristic prediction method based on the increasingly accepted and attractive statistical sampling approach. This has indeed been indicated by the construction of prototype algorithms. PMID:22776037

  3. Risk-adjusted capitation based on the Diagnostic Cost Group Model: an empirical evaluation with health survey information.

    PubMed Central

    Lamers, L M

    1999-01-01

    OBJECTIVE: To evaluate the predictive accuracy of the Diagnostic Cost Group (DCG) model using health survey information. DATA SOURCES/STUDY SETTING: Longitudinal data collected for a sample of members of a Dutch sickness fund. In the Netherlands the sickness funds provide compulsory health insurance coverage for the 60 percent of the population in the lowest income brackets. STUDY DESIGN: A demographic model and DCG capitation models are estimated by means of ordinary least squares, with an individual's annual healthcare expenditures in 1994 as the dependent variable. For subgroups based on health survey information, costs predicted by the models are compared with actual costs. Using stepwise regression procedures a subset of relevant survey variables that could improve the predictive accuracy of the three-year DCG model was identified. Capitation models were extended with these variables. DATA COLLECTION/EXTRACTION METHODS: For the empirical analysis, panel data of sickness fund members were used that contained demographic information, annual healthcare expenditures, and diagnostic information from hospitalizations for each member. In 1993, a mailed health survey was conducted among a random sample of 15,000 persons in the panel data set, with a 70 percent response rate. PRINCIPAL FINDINGS: The predictive accuracy of the demographic model improves when it is extended with diagnostic information from prior hospitalizations (DCGs). A subset of survey variables further improves the predictive accuracy of the DCG capitation models. The predictable profits and losses based on survey information for the DCG models are smaller than for the demographic model. Most persons with predictable losses based on health survey information were not hospitalized in the preceding year. CONCLUSIONS: The use of diagnostic information from prior hospitalizations is a promising option for improving the demographic capitation payment formula. This study suggests that diagnostic information from outpatient utilization is complementary to DCGs in predicting future costs. PMID:10029506

  4. Assessment of a remote sensing-based model for predicting malaria transmission risk in villages of Chiapas, Mexico

    NASA Technical Reports Server (NTRS)

    Beck, L. R.; Rodriguez, M. H.; Dister, S. W.; Rodriguez, A. D.; Washino, R. K.; Roberts, D. R.; Spanner, M. A.

    1997-01-01

    A blind test of two remote sensing-based models for predicting adult populations of Anopheles albimanus in villages, an indicator of malaria transmission risk, was conducted in southern Chiapas, Mexico. One model was developed using a discriminant analysis approach, while the other was based on regression analysis. The models were developed in 1992 for an area around Tapachula, Chiapas, using Landsat Thematic Mapper (TM) satellite data and geographic information system functions. Using two remotely sensed landscape elements, the discriminant model was able to successfully distinguish between villages with high and low An. albimanus abundance with an overall accuracy of 90%. To test the predictive capability of the models, multitemporal TM data were used to generate a landscape map of the Huixtla area, northwest of Tapachula, where the models were used to predict risk for 40 villages. The resulting predictions were not disclosed until the end of the test. Independently, An. albimanus abundance data were collected in the 40 randomly selected villages for which the predictions had been made. These data were subsequently used to assess the models' accuracies. The discriminant model accurately predicted 79% of the high-abundance villages and 50% of the low-abundance villages, for an overall accuracy of 70%. The regression model correctly identified seven of the 10 villages with the highest mosquito abundance. This test demonstrated that remote sensing-based models generated for one area can be used successfully in another, comparable area.

  5. A fresh look at the predictors of naming accuracy and errors in Alzheimer's disease.

    PubMed

    Cuetos, Fernando; Rodríguez-Ferreiro, Javier; Sage, Karen; Ellis, Andrew W

    2012-09-01

    In recent years, a considerable number of studies have tried to establish which characteristics of objects and their names predict the responses of patients with Alzheimer's disease (AD) in the picture-naming task. The frequency of use of words and their age of acquisition (AoA) have been implicated as two of the most influential variables, with naming being best preserved for objects with high-frequency, early-acquired names. The present study takes a fresh look at the predictors of naming success in Spanish and English AD patients using a range of measures of word frequency and AoA along with visual complexity, imageability, and word length as predictors. Analyses using generalized linear mixed modelling found that naming accuracy was better predicted by AoA ratings taken from older adults than conventional ratings from young adults. Older frequency measures based on written language samples predicted accuracy better than more modern measures based on the frequencies of words in film subtitles. Replacing adult frequency with an estimate of cumulative (lifespan) frequency did not reduce the impact of AoA. Semantic error rates were predicted by both written word frequency and senior AoA while null response errors were only predicted by frequency. Visual complexity, imageability, and word length did not predict naming accuracy or errors. ©2012 The British Psychological Society.

  6. Development of predictive mapping techniques for soil survey and salinity mapping

    NASA Astrophysics Data System (ADS)

    Elnaggar, Abdelhamid A.

    Conventional soil maps represent a valuable source of information about soil characteristics, however they are subjective, very expensive, and time-consuming to prepare. Also, they do not include explicit information about the conceptual mental model used in developing them nor information about their accuracy, in addition to the error associated with them. Decision tree analysis (DTA) was successfully used in retrieving the expert knowledge embedded in old soil survey data. This knowledge was efficiently used in developing predictive soil maps for the study areas in Benton and Malheur Counties, Oregon and accessing their consistency. A retrieved soil-landscape model from a reference area in Harney County was extrapolated to develop a preliminary soil map for the neighboring unmapped part of Malheur County. The developed map had a low prediction accuracy and only a few soil map units (SMUs) were predicted with significant accuracy, mostly those shallow SMUs that have either a lithic contact with the bedrock or developed on a duripan. On the other hand, the developed soil map based on field data was predicted with very high accuracy (overall was about 97%). Salt-affected areas of the Malheur County study area are indicated by their high spectral reflectance and they are easily discriminated from the remote sensing data. However, remote sensing data fails to distinguish between the different classes of soil salinity. Using the DTA method, five classes of soil salinity were successfully predicted with an overall accuracy of about 99%. Moreover, the calculated area of salt-affected soil was overestimated when mapped using remote sensing data compared to that predicted by using DTA. Hence, DTA could be a very helpful approach in developing soil survey and soil salinity maps in more objective, effective, less-expensive and quicker ways based on field data.

  7. Development of a deep convolutional neural network to predict grading of canine meningiomas from magnetic resonance images.

    PubMed

    Banzato, T; Cherubini, G B; Atzori, M; Zotti, A

    2018-05-01

    An established deep neural network (DNN) based on transfer learning and a newly designed DNN were tested to predict the grade of meningiomas from magnetic resonance (MR) images in dogs and to determine the accuracy of classification of using pre- and post-contrast T1-weighted (T1W), and T2-weighted (T2W) MR images. The images were randomly assigned to a training set, a validation set and a test set, comprising 60%, 10% and 30% of images, respectively. The combination of DNN and MR sequence displaying the highest discriminating accuracy was used to develop an image classifier to predict the grading of new cases. The algorithm based on transfer learning using the established DNN did not provide satisfactory results, whereas the newly designed DNN had high classification accuracy. On the basis of classification accuracy, an image classifier built on the newly designed DNN using post-contrast T1W images was developed. This image classifier correctly predicted the grading of 8 out of 10 images not included in the data set. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.

  8. A fast and robust iterative algorithm for prediction of RNA pseudoknotted secondary structures

    PubMed Central

    2014-01-01

    Background Improving accuracy and efficiency of computational methods that predict pseudoknotted RNA secondary structures is an ongoing challenge. Existing methods based on free energy minimization tend to be very slow and are limited in the types of pseudoknots that they can predict. Incorporating known structural information can improve prediction accuracy; however, there are not many methods for prediction of pseudoknotted structures that can incorporate structural information as input. There is even less understanding of the relative robustness of these methods with respect to partial information. Results We present a new method, Iterative HFold, for pseudoknotted RNA secondary structure prediction. Iterative HFold takes as input a pseudoknot-free structure, and produces a possibly pseudoknotted structure whose energy is at least as low as that of any (density-2) pseudoknotted structure containing the input structure. Iterative HFold leverages strengths of earlier methods, namely the fast running time of HFold, a method that is based on the hierarchical folding hypothesis, and the energy parameters of HotKnots V2.0. Our experimental evaluation on a large data set shows that Iterative HFold is robust with respect to partial information, with average accuracy on pseudoknotted structures steadily increasing from roughly 54% to 79% as the user provides up to 40% of the input structure. Iterative HFold is much faster than HotKnots V2.0, while having comparable accuracy. Iterative HFold also has significantly better accuracy than IPknot on our HK-PK and IP-pk168 data sets. Conclusions Iterative HFold is a robust method for prediction of pseudoknotted RNA secondary structures, whose accuracy with more than 5% information about true pseudoknot-free structures is better than that of IPknot, and with about 35% information about true pseudoknot-free structures compares well with that of HotKnots V2.0 while being significantly faster. Iterative HFold and all data used in this work are freely available at http://www.cs.ubc.ca/~hjabbari/software.php. PMID:24884954

  9. Automated detection of brain atrophy patterns based on MRI for the prediction of Alzheimer's disease

    PubMed Central

    Plant, Claudia; Teipel, Stefan J.; Oswald, Annahita; Böhm, Christian; Meindl, Thomas; Mourao-Miranda, Janaina; Bokde, Arun W.; Hampel, Harald; Ewers, Michael

    2010-01-01

    Subjects with mild cognitive impairment (MCI) have an increased risk to develop Alzheimer's disease (AD). Voxel-based MRI studies have demonstrated that widely distributed cortical and subcortical brain areas show atrophic changes in MCI, preceding the onset of AD-type dementia. Here we developed a novel data mining framework in combination with three different classifiers including support vector machine (SVM), Bayes statistics, and voting feature intervals (VFI) to derive a quantitative index of pattern matching for the prediction of the conversion from MCI to AD. MRI was collected in 32 AD patients, 24 MCI subjects and 18 healthy controls (HC). Nine out of 24 MCI subjects converted to AD after an average follow-up interval of 2.5 years. Using feature selection algorithms, brain regions showing the highest accuracy for the discrimination between AD and HC were identified, reaching a classification accuracy of up to 92%. The extracted AD clusters were used as a search region to extract those brain areas that are predictive of conversion to AD within MCI subjects. The most predictive brain areas included the anterior cingulate gyrus and orbitofrontal cortex. The best prediction accuracy, which was cross-validated via train-and-test, was 75% for the prediction of the conversion from MCI to AD. The present results suggest that novel multivariate methods of pattern matching reach a clinically relevant accuracy for the a priori prediction of the progression from MCI to AD. PMID:19961938

  10. Relating indices of knowledge structure coherence and accuracy to skill-based performance: Is there utility in using a combination of indices?

    PubMed

    Schuelke, Matthew J; Day, Eric Anthony; McEntire, Lauren E; Boatman, Jazmine Espejo; Wang, Xiaoqian; Kowollik, Vanessa; Boatman, Paul R

    2009-07-01

    The authors examined the relative criterion-related validity of knowledge structure coherence and two accuracy-based indices (closeness and correlation) as well as the utility of using a combination of knowledge structure indices in the prediction of skill acquisition and transfer. Findings from an aggregation of 5 independent samples (N = 958) whose participants underwent training on a complex computer simulation indicated that coherence and the accuracy-based indices yielded comparable zero-order predictive validities. Support for the incremental validity of using a combination of indices was mixed; the most, albeit small, gain came in pairing coherence and closeness when predicting transfer. After controlling for baseline skill, general mental ability, and declarative knowledge, only coherence explained a statistically significant amount of unique variance in transfer. Overall, the results suggested that the different indices largely overlap in their representation of knowledge organization, but that coherence better reflects adaptable aspects of knowledge organization important to skill transfer.

  11. A new software for prediction of femoral neck fractures.

    PubMed

    Testi, Debora; Cappello, Angelo; Sgallari, Fiorella; Rumpf, Martin; Viceconti, Marco

    2004-08-01

    Femoral neck fractures are an important clinical, social and economic problem. Even if many different attempts have been carried out to improve the accuracy predicting the fracture risk, it was demonstrated in retrospective studies that the standard clinical protocol achieves an accuracy of about 65%. A new procedure was developed including for the prediction not only bone mineral density but also geometric and femoral strength information and achieving an accuracy of about 80% in a previous retrospective study. Aim of the present work was to re-engineer research-based procedures and develop a real-time software for the prediction of the risk for femoral fracture. The result was efficient, repeatable and easy to use software for the evaluation of the femoral neck fracture risk to be inserted in the daily clinical practice providing a useful tool for the improvement of fracture prediction.

  12. Predicting Earth orientation changes from global forecasts of atmosphere-hydrosphere dynamics

    NASA Astrophysics Data System (ADS)

    Dobslaw, Henryk; Dill, Robert

    2018-02-01

    Effective Angular Momentum (EAM) functions obtained from global numerical simulations of atmosphere, ocean, and land surface dynamics are routinely processed by the Earth System Modelling group at Deutsches GeoForschungsZentrum. EAM functions are available since January 1976 with up to 3 h temporal resolution. Additionally, 6 days-long EAM forecasts are routinely published every day. Based on hindcast experiments with 305 individual predictions distributed over 15 months, we demonstrate that EAM forecasts improve the prediction accuracy of the Earth Orientation Parameters at all forecast horizons between 1 and 6 days. At day 6, prediction accuracy improves down to 1.76 mas for the terrestrial pole offset, and 2.6 mas for Δ UT1, which correspond to an accuracy increase of about 41% over predictions published in Bulletin A by the International Earth Rotation and Reference System Service.

  13. Probability of criminal acts of violence: a test of jury predictive accuracy.

    PubMed

    Reidy, Thomas J; Sorensen, Jon R; Cunningham, Mark D

    2013-01-01

    The ability of capital juries to accurately predict future prison violence at the sentencing phase of aggravated murder trials was examined through retrospective review of the disciplinary records of 115 male inmates sentenced to either life (n = 65) or death (n = 50) in Oregon from 1985 through 2008, with a mean post-conviction time at risk of 15.3 years. Violent prison behavior was completely unrelated to predictions made by capital jurors, with bidirectional accuracy simply reflecting the base rate of assaultive misconduct in the group. Rejection of the special issue predicting future violence enjoyed 90% accuracy. Conversely, predictions that future violence was probable had 90% error rates. More than 90% of the assaultive rule violations committed by these offenders resulted in no harm or only minor injuries. Copyright © 2013 John Wiley & Sons, Ltd.

  14. Bridging the gap between marker-assisted and genomic selection of heading time and plant height in hybrid wheat.

    PubMed

    Zhao, Y; Mette, M F; Gowda, M; Longin, C F H; Reif, J C

    2014-06-01

    Based on data from field trials with a large collection of 135 elite winter wheat inbred lines and 1604 F1 hybrids derived from them, we compared the accuracy of prediction of marker-assisted selection and current genomic selection approaches for the model traits heading time and plant height in a cross-validation approach. For heading time, the high accuracy seen with marker-assisted selection severely dropped with genomic selection approaches RR-BLUP (ridge regression best linear unbiased prediction) and BayesCπ, whereas for plant height, accuracy was low with marker-assisted selection as well as RR-BLUP and BayesCπ. Differences in the linkage disequilibrium structure of the functional and single-nucleotide polymorphism markers relevant for the two traits were identified in a simulation study as a likely explanation for the different trends in accuracies of prediction. A new genomic selection approach, weighted best linear unbiased prediction (W-BLUP), designed to treat the effects of known functional markers more appropriately, proved to increase the accuracy of prediction for both traits and thus closes the gap between marker-assisted and genomic selection.

  15. Bridging the gap between marker-assisted and genomic selection of heading time and plant height in hybrid wheat

    PubMed Central

    Zhao, Y; Mette, M F; Gowda, M; Longin, C F H; Reif, J C

    2014-01-01

    Based on data from field trials with a large collection of 135 elite winter wheat inbred lines and 1604 F1 hybrids derived from them, we compared the accuracy of prediction of marker-assisted selection and current genomic selection approaches for the model traits heading time and plant height in a cross-validation approach. For heading time, the high accuracy seen with marker-assisted selection severely dropped with genomic selection approaches RR-BLUP (ridge regression best linear unbiased prediction) and BayesCπ, whereas for plant height, accuracy was low with marker-assisted selection as well as RR-BLUP and BayesCπ. Differences in the linkage disequilibrium structure of the functional and single-nucleotide polymorphism markers relevant for the two traits were identified in a simulation study as a likely explanation for the different trends in accuracies of prediction. A new genomic selection approach, weighted best linear unbiased prediction (W-BLUP), designed to treat the effects of known functional markers more appropriately, proved to increase the accuracy of prediction for both traits and thus closes the gap between marker-assisted and genomic selection. PMID:24518889

  16. Pharmacokinetics of low-dose nedaplatin and validation of AUC prediction in patients with non-small-cell lung carcinoma.

    PubMed

    Niioka, Takenori; Uno, Tsukasa; Yasui-Furukori, Norio; Takahata, Takenori; Shimizu, Mikiko; Sugawara, Kazunobu; Tateishi, Tomonori

    2007-04-01

    The aim of this study was to determine the pharmacokinetics of low-dose nedaplatin combined with paclitaxel and radiation therapy in patients having non-small-cell lung carcinoma and establish the optimal dosage regimen for low-dose nedaplatin. We also evaluated predictive accuracy of reported formulas to estimate the area under the plasma concentration-time curve (AUC) of low-dose nedaplatin. A total of 19 patients were administered a constant intravenous infusion of 20 mg/m(2) body surface area (BSA) nedaplatin for an hour, and blood samples were collected at 1, 2, 3, 4, 6, 8, and 19 h after the administration. Plasma concentrations of unbound platinum were measured, and the actual value of platinum AUC (actual AUC) was calculated based on these data. The predicted value of platinum AUC (predicted AUC) was determined by three predictive methods reported in previous studies, consisting of Bayesian method, limited sampling strategies with plasma concentration at a single time point, and simple formula method (SFM) without measured plasma concentration. Three error indices, mean prediction error (ME, measure of bias), mean absolute error (MAE, measure of accuracy), and root mean squared prediction error (RMSE, measure of precision), were obtained from the difference between the actual and the predicted AUC, to compare the accuracy between the three predictive methods. The AUC showed more than threefold inter-patient variation, and there was a favorable correlation between nedaplatin clearance and creatinine clearance (Ccr) (r = 0.832, P < 0.01). In three error indices, MAE and RMSE showed significant difference between the three AUC predictive methods, and the method of SFM had the most favorable results, in which %ME, %MAE, and %RMSE were 5.5, 10.7, and 15.4, respectively. The dosage regimen of low-dose nedaplatin should be established based on Ccr rather than on BSA. Since prediction accuracy of SFM, which did not require measured plasma concentration, was most favorable among the three methods evaluated in this study, SFM could be the most practical method to predict AUC of low-dose nedaplatin in a clinical situation judging from its high accuracy in predicting AUC without measured plasma concentration.

  17. Structural reliability analysis under evidence theory using the active learning kriging model

    NASA Astrophysics Data System (ADS)

    Yang, Xufeng; Liu, Yongshou; Ma, Panke

    2017-11-01

    Structural reliability analysis under evidence theory is investigated. It is rigorously proved that a surrogate model providing only correct sign prediction of the performance function can meet the accuracy requirement of evidence-theory-based reliability analysis. Accordingly, a method based on the active learning kriging model which only correctly predicts the sign of the performance function is proposed. Interval Monte Carlo simulation and a modified optimization method based on Karush-Kuhn-Tucker conditions are introduced to make the method more efficient in estimating the bounds of failure probability based on the kriging model. Four examples are investigated to demonstrate the efficiency and accuracy of the proposed method.

  18. Pseudo CT estimation from MRI using patch-based random forest

    NASA Astrophysics Data System (ADS)

    Yang, Xiaofeng; Lei, Yang; Shu, Hui-Kuo; Rossi, Peter; Mao, Hui; Shim, Hyunsuk; Curran, Walter J.; Liu, Tian

    2017-02-01

    Recently, MR simulators gain popularity because of unnecessary radiation exposure of CT simulators being used in radiation therapy planning. We propose a method for pseudo CT estimation from MR images based on a patch-based random forest. Patient-specific anatomical features are extracted from the aligned training images and adopted as signatures for each voxel. The most robust and informative features are identified using feature selection to train the random forest. The well-trained random forest is used to predict the pseudo CT of a new patient. This prediction technique was tested with human brain images and the prediction accuracy was assessed using the original CT images. Peak signal-to-noise ratio (PSNR) and feature similarity (FSIM) indexes were used to quantify the differences between the pseudo and original CT images. The experimental results showed the proposed method could accurately generate pseudo CT images from MR images. In summary, we have developed a new pseudo CT prediction method based on patch-based random forest, demonstrated its clinical feasibility, and validated its prediction accuracy. This pseudo CT prediction technique could be a useful tool for MRI-based radiation treatment planning and attenuation correction in a PET/MRI scanner.

  19. Metamemory prediction accuracy for simple prospective and retrospective memory tasks in 5-year-old children.

    PubMed

    Kvavilashvili, Lia; Ford, Ruth M

    2014-11-01

    It is well documented that young children greatly overestimate their performance on tests of retrospective memory (RM), but the current investigation is the first to examine children's prediction accuracy for prospective memory (PM). Three studies were conducted, each testing a different group of 5-year-olds. In Study 1 (N=46), participants were asked to predict their success in a simple event-based PM task (remembering to convey a message to a toy mole if they encountered a particular picture during a picture-naming activity). Before naming the pictures, children listened to either a reminder story or a neutral story. Results showed that children were highly accurate in their PM predictions (78% accuracy) and that the reminder story appeared to benefit PM only in children who predicted they would remember the PM response. In Study 2 (N=80), children showed high PM prediction accuracy (69%) regardless of whether the cue was specific or general and despite typical overoptimism regarding their performance on a 10-item RM task using item-by-item prediction. Study 3 (N=35) showed that children were prone to overestimate RM even when asked about their ability to recall a single item-the mole's unusual name. In light of these findings, we consider possible reasons for children's impressive PM prediction accuracy, including the potential involvement of future thinking in performance predictions and PM. Copyright © 2014 Elsevier Inc. All rights reserved.

  20. Dissolved oxygen content prediction in crab culture using a hybrid intelligent method

    PubMed Central

    Yu, Huihui; Chen, Yingyi; Hassan, ShahbazGul; Li, Daoliang

    2016-01-01

    A precise predictive model is needed to obtain a clear understanding of the changing dissolved oxygen content in outdoor crab ponds, to assess how to reduce risk and to optimize water quality management. The uncertainties in the data from multiple sensors are a significant factor when building a dissolved oxygen content prediction model. To increase prediction accuracy, a new hybrid dissolved oxygen content forecasting model based on the radial basis function neural networks (RBFNN) data fusion method and a least squares support vector machine (LSSVM) with an optimal improved particle swarm optimization(IPSO) is developed. In the modelling process, the RBFNN data fusion method is used to improve information accuracy and provide more trustworthy training samples for the IPSO-LSSVM prediction model. The LSSVM is a powerful tool for achieving nonlinear dissolved oxygen content forecasting. In addition, an improved particle swarm optimization algorithm is developed to determine the optimal parameters for the LSSVM with high accuracy and generalizability. In this study, the comparison of the prediction results of different traditional models validates the effectiveness and accuracy of the proposed hybrid RBFNN-IPSO-LSSVM model for dissolved oxygen content prediction in outdoor crab ponds. PMID:27270206

  1. Dissolved oxygen content prediction in crab culture using a hybrid intelligent method.

    PubMed

    Yu, Huihui; Chen, Yingyi; Hassan, ShahbazGul; Li, Daoliang

    2016-06-08

    A precise predictive model is needed to obtain a clear understanding of the changing dissolved oxygen content in outdoor crab ponds, to assess how to reduce risk and to optimize water quality management. The uncertainties in the data from multiple sensors are a significant factor when building a dissolved oxygen content prediction model. To increase prediction accuracy, a new hybrid dissolved oxygen content forecasting model based on the radial basis function neural networks (RBFNN) data fusion method and a least squares support vector machine (LSSVM) with an optimal improved particle swarm optimization(IPSO) is developed. In the modelling process, the RBFNN data fusion method is used to improve information accuracy and provide more trustworthy training samples for the IPSO-LSSVM prediction model. The LSSVM is a powerful tool for achieving nonlinear dissolved oxygen content forecasting. In addition, an improved particle swarm optimization algorithm is developed to determine the optimal parameters for the LSSVM with high accuracy and generalizability. In this study, the comparison of the prediction results of different traditional models validates the effectiveness and accuracy of the proposed hybrid RBFNN-IPSO-LSSVM model for dissolved oxygen content prediction in outdoor crab ponds.

  2. Comparison and optimization of in silico algorithms for predicting the pathogenicity of sodium channel variants in epilepsy.

    PubMed

    Holland, Katherine D; Bouley, Thomas M; Horn, Paul S

    2017-07-01

    Variants in neuronal voltage-gated sodium channel α-subunits genes SCN1A, SCN2A, and SCN8A are common in early onset epileptic encephalopathies and other autosomal dominant childhood epilepsy syndromes. However, in clinical practice, missense variants are often classified as variants of uncertain significance when missense variants are identified but heritability cannot be determined. Genetic testing reports often include results of computational tests to estimate pathogenicity and the frequency of that variant in population-based databases. The objective of this work was to enhance clinicians' understanding of results by (1) determining how effectively computational algorithms predict epileptogenicity of sodium channel (SCN) missense variants; (2) optimizing their predictive capabilities; and (3) determining if epilepsy-associated SCN variants are present in population-based databases. This will help clinicians better understand the results of indeterminate SCN test results in people with epilepsy. Pathogenic, likely pathogenic, and benign variants in SCNs were identified using databases of sodium channel variants. Benign variants were also identified from population-based databases. Eight algorithms commonly used to predict pathogenicity were compared. In addition, logistic regression was used to determine if a combination of algorithms could better predict pathogenicity. Based on American College of Medical Genetic Criteria, 440 variants were classified as pathogenic or likely pathogenic and 84 were classified as benign or likely benign. Twenty-eight variants previously associated with epilepsy were present in population-based gene databases. The output provided by most computational algorithms had a high sensitivity but low specificity with an accuracy of 0.52-0.77. Accuracy could be improved by adjusting the threshold for pathogenicity. Using this adjustment, the Mendelian Clinically Applicable Pathogenicity (M-CAP) algorithm had an accuracy of 0.90 and a combination of algorithms increased the accuracy to 0.92. Potentially pathogenic variants are present in population-based sources. Most computational algorithms overestimate pathogenicity; however, a weighted combination of several algorithms increased classification accuracy to >0.90. Wiley Periodicals, Inc. © 2017 International League Against Epilepsy.

  3. Post processing of protein-compound docking for fragment-based drug discovery (FBDD): in-silico structure-based drug screening and ligand-binding pose prediction.

    PubMed

    Fukunishi, Yoshifumi

    2010-01-01

    For fragment-based drug development, both hit (active) compound prediction and docking-pose (protein-ligand complex structure) prediction of the hit compound are important, since chemical modification (fragment linking, fragment evolution) subsequent to the hit discovery must be performed based on the protein-ligand complex structure. However, the naïve protein-compound docking calculation shows poor accuracy in terms of docking-pose prediction. Thus, post-processing of the protein-compound docking is necessary. Recently, several methods for the post-processing of protein-compound docking have been proposed. In FBDD, the compounds are smaller than those for conventional drug screening. This makes it difficult to perform the protein-compound docking calculation. A method to avoid this problem has been reported. Protein-ligand binding free energy estimation is useful to reduce the procedures involved in the chemical modification of the hit fragment. Several prediction methods have been proposed for high-accuracy estimation of protein-ligand binding free energy. This paper summarizes the various computational methods proposed for docking-pose prediction and their usefulness in FBDD.

  4. Application and analysis of debris-flow early warning system in Wenchuan earthquake-affected area

    NASA Astrophysics Data System (ADS)

    Liu, D. L.; Zhang, S. J.; Yang, H. J.; Zhao, L. Q.; Jiang, Y. H.; Tang, D.; Leng, X. P.

    2016-02-01

    The activities of debris flow (DF) in the Wenchuan earthquake-affected area significantly increased after the earthquake on 12 May 2008. The safety of the lives and property of local people is threatened by DFs. A physics-based early warning system (EWS) for DF forecasting was developed and applied in this earthquake area. This paper introduces an application of the system in the Wenchuan earthquake-affected area and analyzes the prediction results via a comparison to the DF events triggered by the strong rainfall events reported by the local government. The prediction accuracy and efficiency was first compared with a contribution-factor-based system currently used by the weather bureau of Sichuan province. The storm on 17 August 2012 was used as a case study for this comparison. The comparison shows that the false negative rate and false positive rate of the new system is, respectively, 19 and 21 % lower than the system based on the contribution factors. Consequently, the prediction accuracy is obviously higher than the system based on the contribution factors with a higher operational efficiency. On the invitation of the weather bureau of Sichuan province, the authors upgraded their prediction system of DF by using this new system before the monsoon of Wenchuan earthquake-affected area in 2013. Two prediction cases on 9 July 2013 and 10 July 2014 were chosen to further demonstrate that the new EWS has high stability, efficiency, and prediction accuracy.

  5. Evaluating model accuracy for model-based reasoning

    NASA Technical Reports Server (NTRS)

    Chien, Steve; Roden, Joseph

    1992-01-01

    Described here is an approach to automatically assessing the accuracy of various components of a model. In this approach, actual data from the operation of a target system is used to drive statistical measures to evaluate the prediction accuracy of various portions of the model. We describe how these statistical measures of model accuracy can be used in model-based reasoning for monitoring and design. We then describe the application of these techniques to the monitoring and design of the water recovery system of the Environmental Control and Life Support System (ECLSS) of Space Station Freedom.

  6. Including non-additive genetic effects in Bayesian methods for the prediction of genetic values based on genome-wide markers

    PubMed Central

    2011-01-01

    Background Molecular marker information is a common source to draw inferences about the relationship between genetic and phenotypic variation. Genetic effects are often modelled as additively acting marker allele effects. The true mode of biological action can, of course, be different from this plain assumption. One possibility to better understand the genetic architecture of complex traits is to include intra-locus (dominance) and inter-locus (epistasis) interaction of alleles as well as the additive genetic effects when fitting a model to a trait. Several Bayesian MCMC approaches exist for the genome-wide estimation of genetic effects with high accuracy of genetic value prediction. Including pairwise interaction for thousands of loci would probably go beyond the scope of such a sampling algorithm because then millions of effects are to be estimated simultaneously leading to months of computation time. Alternative solving strategies are required when epistasis is studied. Methods We extended a fast Bayesian method (fBayesB), which was previously proposed for a purely additive model, to include non-additive effects. The fBayesB approach was used to estimate genetic effects on the basis of simulated datasets. Different scenarios were simulated to study the loss of accuracy of prediction, if epistatic effects were not simulated but modelled and vice versa. Results If 23 QTL were simulated to cause additive and dominance effects, both fBayesB and a conventional MCMC sampler BayesB yielded similar results in terms of accuracy of genetic value prediction and bias of variance component estimation based on a model including additive and dominance effects. Applying fBayesB to data with epistasis, accuracy could be improved by 5% when all pairwise interactions were modelled as well. The accuracy decreased more than 20% if genetic variation was spread over 230 QTL. In this scenario, accuracy based on modelling only additive and dominance effects was generally superior to that of the complex model including epistatic effects. Conclusions This simulation study showed that the fBayesB approach is convenient for genetic value prediction. Jointly estimating additive and non-additive effects (especially dominance) has reasonable impact on the accuracy of prediction and the proportion of genetic variation assigned to the additive genetic source. PMID:21867519

  7. Accurate disulfide-bonding network predictions improve ab initio structure prediction of cysteine-rich proteins

    PubMed Central

    Yang, Jing; He, Bao-Ji; Jang, Richard; Zhang, Yang; Shen, Hong-Bin

    2015-01-01

    Abstract Motivation: Cysteine-rich proteins cover many important families in nature but there are currently no methods specifically designed for modeling the structure of these proteins. The accuracy of disulfide connectivity pattern prediction, particularly for the proteins of higher-order connections, e.g. >3 bonds, is too low to effectively assist structure assembly simulations. Results: We propose a new hierarchical order reduction protocol called Cyscon for disulfide-bonding prediction. The most confident disulfide bonds are first identified and bonding prediction is then focused on the remaining cysteine residues based on SVR training. Compared with purely machine learning-based approaches, Cyscon improved the average accuracy of connectivity pattern prediction by 21.9%. For proteins with more than 5 disulfide bonds, Cyscon improved the accuracy by 585% on the benchmark set of PDBCYS. When applied to 158 non-redundant cysteine-rich proteins, Cyscon predictions helped increase (or decrease) the TM-score (or RMSD) of the ab initio QUARK modeling by 12.1% (or 14.4%). This result demonstrates a new avenue to improve the ab initio structure modeling for cysteine-rich proteins. Availability and implementation: http://www.csbio.sjtu.edu.cn/bioinf/Cyscon/ Contact: zhng@umich.edu or hbshen@sjtu.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26254435

  8. When high working memory capacity is and is not beneficial for predicting nonlinear processes.

    PubMed

    Fischer, Helen; Holt, Daniel V

    2017-04-01

    Predicting the development of dynamic processes is vital in many areas of life. Previous findings are inconclusive as to whether higher working memory capacity (WMC) is always associated with using more accurate prediction strategies, or whether higher WMC can also be associated with using overly complex strategies that do not improve accuracy. In this study, participants predicted a range of systematically varied nonlinear processes based on exponential functions where prediction accuracy could or could not be enhanced using well-calibrated rules. Results indicate that higher WMC participants seem to rely more on well-calibrated strategies, leading to more accurate predictions for processes with highly nonlinear trajectories in the prediction region. Predictions of lower WMC participants, in contrast, point toward an increased use of simple exemplar-based prediction strategies, which perform just as well as more complex strategies when the prediction region is approximately linear. These results imply that with respect to predicting dynamic processes, working memory capacity limits are not generally a strength or a weakness, but that this depends on the process to be predicted.

  9. Connecting clinical and actuarial prediction with rule-based methods.

    PubMed

    Fokkema, Marjolein; Smits, Niels; Kelderman, Henk; Penninx, Brenda W J H

    2015-06-01

    Meta-analyses comparing the accuracy of clinical versus actuarial prediction have shown actuarial methods to outperform clinical methods, on average. However, actuarial methods are still not widely used in clinical practice, and there has been a call for the development of actuarial prediction methods for clinical practice. We argue that rule-based methods may be more useful than the linear main effect models usually employed in prediction studies, from a data and decision analytic as well as a practical perspective. In addition, decision rules derived with rule-based methods can be represented as fast and frugal trees, which, unlike main effects models, can be used in a sequential fashion, reducing the number of cues that have to be evaluated before making a prediction. We illustrate the usability of rule-based methods by applying RuleFit, an algorithm for deriving decision rules for classification and regression problems, to a dataset on prediction of the course of depressive and anxiety disorders from Penninx et al. (2011). The RuleFit algorithm provided a model consisting of 2 simple decision rules, requiring evaluation of only 2 to 4 cues. Predictive accuracy of the 2-rule model was very similar to that of a logistic regression model incorporating 20 predictor variables, originally applied to the dataset. In addition, the 2-rule model required, on average, evaluation of only 3 cues. Therefore, the RuleFit algorithm appears to be a promising method for creating decision tools that are less time consuming and easier to apply in psychological practice, and with accuracy comparable to traditional actuarial methods. (c) 2015 APA, all rights reserved).

  10. Personality and attention: Levels of neuroticism and extraversion can predict attentional performance during a change detection task.

    PubMed

    Hahn, Sowon; Buttaccio, Daniel R; Hahn, Jungwon; Lee, Taehun

    2015-01-01

    The present study demonstrates that levels of extraversion and neuroticism can predict attentional performance during a change detection task. After completing a change detection task built on the flicker paradigm, participants were assessed for personality traits using the Revised Eysenck Personality Questionnaire (EPQ-R). Multiple regression analyses revealed that higher levels of extraversion predict increased change detection accuracies, while higher levels of neuroticism predict decreased change detection accuracies. In addition, neurotic individuals exhibited decreased sensitivity A' and increased fixation dwell times. Hierarchical regression analyses further revealed that eye movement measures mediate the relationship between neuroticism and change detection accuracies. Based on the current results, we propose that neuroticism is associated with decreased attentional control over the visual field, presumably due to decreased attentional disengagement. Extraversion can predict increased attentional performance, but the effect is smaller than the relationship between neuroticism and attention.

  11. Automated Deep Learning-Based System to Identify Endothelial Cells Derived from Induced Pluripotent Stem Cells.

    PubMed

    Kusumoto, Dai; Lachmann, Mark; Kunihiro, Takeshi; Yuasa, Shinsuke; Kishino, Yoshikazu; Kimura, Mai; Katsuki, Toshiomi; Itoh, Shogo; Seki, Tomohisa; Fukuda, Keiichi

    2018-06-05

    Deep learning technology is rapidly advancing and is now used to solve complex problems. Here, we used deep learning in convolutional neural networks to establish an automated method to identify endothelial cells derived from induced pluripotent stem cells (iPSCs), without the need for immunostaining or lineage tracing. Networks were trained to predict whether phase-contrast images contain endothelial cells based on morphology only. Predictions were validated by comparison to immunofluorescence staining for CD31, a marker of endothelial cells. Method parameters were then automatically and iteratively optimized to increase prediction accuracy. We found that prediction accuracy was correlated with network depth and pixel size of images to be analyzed. Finally, K-fold cross-validation confirmed that optimized convolutional neural networks can identify endothelial cells with high performance, based only on morphology. Copyright © 2018 The Author(s). Published by Elsevier Inc. All rights reserved.

  12. Short communication: Improving the accuracy of genomic prediction of body conformation traits in Chinese Holsteins using markers derived from high-density marker panels.

    PubMed

    Song, H; Li, L; Ma, P; Zhang, S; Su, G; Lund, M S; Zhang, Q; Ding, X

    2018-06-01

    This study investigated the efficiency of genomic prediction with adding the markers identified by genome-wide association study (GWAS) using a data set of imputed high-density (HD) markers from 54K markers in Chinese Holsteins. Among 3,056 Chinese Holsteins with imputed HD data, 2,401 individuals born before October 1, 2009, were used for GWAS and a reference population for genomic prediction, and the 220 younger cows were used as a validation population. In total, 1,403, 1,536, and 1,383 significant single nucleotide polymorphisms (SNP; false discovery rate at 0.05) associated with conformation final score, mammary system, and feet and legs were identified, respectively. About 2 to 3% genetic variance of 3 traits was explained by these significant SNP. Only a very small proportion of significant SNP identified by GWAS was included in the 54K marker panel. Three new marker sets (54K+) were herein produced by adding significant SNP obtained by linear mixed model for each trait into the 54K marker panel. Genomic breeding values were predicted using a Bayesian variable selection (BVS) model. The accuracies of genomic breeding value by BVS based on the 54K+ data were 2.0 to 5.2% higher than those based on the 54K data. The imputed HD markers yielded 1.4% higher accuracy on average (BVS) than the 54K data. Both the 54K+ and HD data generated lower bias of genomic prediction, and the 54K+ data yielded the lowest bias in all situations. Our results show that the imputed HD data were not very useful for improving the accuracy of genomic prediction and that adding the significant markers derived from the imputed HD marker panel could improve the accuracy of genomic prediction and decrease the bias of genomic prediction. Copyright © 2018 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  13. Classification of drug molecules considering their IC50 values using mixed-integer linear programming based hyper-boxes method.

    PubMed

    Armutlu, Pelin; Ozdemir, Muhittin E; Uney-Yuksektepe, Fadime; Kavakli, I Halil; Turkay, Metin

    2008-10-03

    A priori analysis of the activity of drugs on the target protein by computational approaches can be useful in narrowing down drug candidates for further experimental tests. Currently, there are a large number of computational methods that predict the activity of drugs on proteins. In this study, we approach the activity prediction problem as a classification problem and, we aim to improve the classification accuracy by introducing an algorithm that combines partial least squares regression with mixed-integer programming based hyper-boxes classification method, where drug molecules are classified as low active or high active regarding their binding activity (IC50 values) on target proteins. We also aim to determine the most significant molecular descriptors for the drug molecules. We first apply our approach by analyzing the activities of widely known inhibitor datasets including Acetylcholinesterase (ACHE), Benzodiazepine Receptor (BZR), Dihydrofolate Reductase (DHFR), Cyclooxygenase-2 (COX-2) with known IC50 values. The results at this stage proved that our approach consistently gives better classification accuracies compared to 63 other reported classification methods such as SVM, Naïve Bayes, where we were able to predict the experimentally determined IC50 values with a worst case accuracy of 96%. To further test applicability of this approach we first created dataset for Cytochrome P450 C17 inhibitors and then predicted their activities with 100% accuracy. Our results indicate that this approach can be utilized to predict the inhibitory effects of inhibitors based on their molecular descriptors. This approach will not only enhance drug discovery process, but also save time and resources committed.

  14. Best Practices for Mudweight Window Generation and Accuracy Assessment between Seismic Based Pore Pressure Prediction Methodologies for a Near-Salt Field in Mississippi Canyon, Gulf of Mexico

    NASA Astrophysics Data System (ADS)

    Mannon, Timothy Patrick, Jr.

    Improving well design has and always will be the primary goal in drilling operations in the oil and gas industry. Oil and gas plays are continuing to move into increasingly hostile drilling environments, including near and/or sub-salt proximities. The ability to reduce the risk and uncertainly involved in drilling operations in unconventional geologic settings starts with improving the techniques for mudweight window modeling. To address this issue, an analysis of wellbore stability and well design improvement has been conducted. This study will show a systematic approach to well design by focusing on best practices for mudweight window projection for a field in Mississippi Canyon, Gulf of Mexico. The field includes depleted reservoirs and is in close proximity of salt intrusions. Analysis of offset wells has been conducted in the interest of developing an accurate picture of the subsurface environment by making connections between depth, non-productive time (NPT) events, and mudweights used. Commonly practiced petrophysical methods of pore pressure, fracture pressure, and shear failure gradient prediction have been applied to key offset wells in order to enhance the well design for two proposed wells. For the first time in the literature, the accuracy of the commonly accepted, seismic interval velocity based and the relatively new, seismic frequency based methodologies for pore pressure prediction are qualitatively and quantitatively compared for accuracy. Accuracy standards will be based on the agreement of the seismic outputs to pressure data obtained while drilling and petrophysically based pore pressure outputs for each well. The results will show significantly higher accuracy for the seismic frequency based approach in wells that were in near/sub-salt environments and higher overall accuracy for all of the wells in the study as a whole.

  15. A Final Approach Trajectory Model for Current Operations

    NASA Technical Reports Server (NTRS)

    Gong, Chester; Sadovsky, Alexander

    2010-01-01

    Predicting accurate trajectories with limited intent information is a challenge faced by air traffic management decision support tools in operation today. One such tool is the FAA's Terminal Proximity Alert system which is intended to assist controllers in maintaining safe separation of arrival aircraft during final approach. In an effort to improve the performance of such tools, two final approach trajectory models are proposed; one based on polynomial interpolation, the other on the Fourier transform. These models were tested against actual traffic data and used to study effects of the key final approach trajectory modeling parameters of wind, aircraft type, and weight class, on trajectory prediction accuracy. Using only the limited intent data available to today's ATM system, both the polynomial interpolation and Fourier transform models showed improved trajectory prediction accuracy over a baseline dead reckoning model. Analysis of actual arrival traffic showed that this improved trajectory prediction accuracy leads to improved inter-arrival separation prediction accuracy for longer look ahead times. The difference in mean inter-arrival separation prediction error between the Fourier transform and dead reckoning models was 0.2 nmi for a look ahead time of 120 sec, a 33 percent improvement, with a corresponding 32 percent improvement in standard deviation.

  16. Fuzzy regression modeling for tool performance prediction and degradation detection.

    PubMed

    Li, X; Er, M J; Lim, B S; Zhou, J H; Gan, O P; Rutkowski, L

    2010-10-01

    In this paper, the viability of using Fuzzy-Rule-Based Regression Modeling (FRM) algorithm for tool performance and degradation detection is investigated. The FRM is developed based on a multi-layered fuzzy-rule-based hybrid system with Multiple Regression Models (MRM) embedded into a fuzzy logic inference engine that employs Self Organizing Maps (SOM) for clustering. The FRM converts a complex nonlinear problem to a simplified linear format in order to further increase the accuracy in prediction and rate of convergence. The efficacy of the proposed FRM is tested through a case study - namely to predict the remaining useful life of a ball nose milling cutter during a dry machining process of hardened tool steel with a hardness of 52-54 HRc. A comparative study is further made between four predictive models using the same set of experimental data. It is shown that the FRM is superior as compared with conventional MRM, Back Propagation Neural Networks (BPNN) and Radial Basis Function Networks (RBFN) in terms of prediction accuracy and learning speed.

  17. Comparison of Models and Whole-Genome Profiling Approaches for Genomic-Enabled Prediction of Septoria Tritici Blotch, Stagonospora Nodorum Blotch, and Tan Spot Resistance in Wheat.

    PubMed

    Juliana, Philomin; Singh, Ravi P; Singh, Pawan K; Crossa, Jose; Rutkoski, Jessica E; Poland, Jesse A; Bergstrom, Gary C; Sorrells, Mark E

    2017-07-01

    The leaf spotting diseases in wheat that include Septoria tritici blotch (STB) caused by , Stagonospora nodorum blotch (SNB) caused by , and tan spot (TS) caused by pose challenges to breeding programs in selecting for resistance. A promising approach that could enable selection prior to phenotyping is genomic selection that uses genome-wide markers to estimate breeding values (BVs) for quantitative traits. To evaluate this approach for seedling and/or adult plant resistance (APR) to STB, SNB, and TS, we compared the predictive ability of least-squares (LS) approach with genomic-enabled prediction models including genomic best linear unbiased predictor (GBLUP), Bayesian ridge regression (BRR), Bayes A (BA), Bayes B (BB), Bayes Cπ (BC), Bayesian least absolute shrinkage and selection operator (BL), and reproducing kernel Hilbert spaces markers (RKHS-M), a pedigree-based model (RKHS-P) and RKHS markers and pedigree (RKHS-MP). We observed that LS gave the lowest prediction accuracies and RKHS-MP, the highest. The genomic-enabled prediction models and RKHS-P gave similar accuracies. The increase in accuracy using genomic prediction models over LS was 48%. The mean genomic prediction accuracies were 0.45 for STB (APR), 0.55 for SNB (seedling), 0.66 for TS (seedling) and 0.48 for TS (APR). We also compared markers from two whole-genome profiling approaches: genotyping by sequencing (GBS) and diversity arrays technology sequencing (DArTseq) for prediction. While, GBS markers performed slightly better than DArTseq, combining markers from the two approaches did not improve accuracies. We conclude that implementing GS in breeding for these diseases would help to achieve higher accuracies and rapid gains from selection. Copyright © 2017 Crop Science Society of America.

  18. The Upper and Lower Bounds of the Prediction Accuracies of Ensemble Methods for Binary Classification

    PubMed Central

    Wang, Xueyi; Davidson, Nicholas J.

    2011-01-01

    Ensemble methods have been widely used to improve prediction accuracy over individual classifiers. In this paper, we achieve a few results about the prediction accuracies of ensemble methods for binary classification that are missed or misinterpreted in previous literature. First we show the upper and lower bounds of the prediction accuracies (i.e. the best and worst possible prediction accuracies) of ensemble methods. Next we show that an ensemble method can achieve > 0.5 prediction accuracy, while individual classifiers have < 0.5 prediction accuracies. Furthermore, for individual classifiers with different prediction accuracies, the average of the individual accuracies determines the upper and lower bounds. We perform two experiments to verify the results and show that it is hard to achieve the upper and lower bounds accuracies by random individual classifiers and better algorithms need to be developed. PMID:21853162

  19. Geopositioning with a quadcopter: Extracted feature locations and predicted accuracy without a priori sensor attitude information

    NASA Astrophysics Data System (ADS)

    Dolloff, John; Hottel, Bryant; Edwards, David; Theiss, Henry; Braun, Aaron

    2017-05-01

    This paper presents an overview of the Full Motion Video-Geopositioning Test Bed (FMV-GTB) developed to investigate algorithm performance and issues related to the registration of motion imagery and subsequent extraction of feature locations along with predicted accuracy. A case study is included corresponding to a video taken from a quadcopter. Registration of the corresponding video frames is performed without the benefit of a priori sensor attitude (pointing) information. In particular, tie points are automatically measured between adjacent frames using standard optical flow matching techniques from computer vision, an a priori estimate of sensor attitude is then computed based on supplied GPS sensor positions contained in the video metadata and a photogrammetric/search-based structure from motion algorithm, and then a Weighted Least Squares adjustment of all a priori metadata across the frames is performed. Extraction of absolute 3D feature locations, including their predicted accuracy based on the principles of rigorous error propagation, is then performed using a subset of the registered frames. Results are compared to known locations (check points) over a test site. Throughout this entire process, no external control information (e.g. surveyed points) is used other than for evaluation of solution errors and corresponding accuracy.

  20. Study design requirements for RNA sequencing-based breast cancer diagnostics.

    PubMed

    Mer, Arvind Singh; Klevebring, Daniel; Grönberg, Henrik; Rantalainen, Mattias

    2016-02-01

    Sequencing-based molecular characterization of tumors provides information required for individualized cancer treatment. There are well-defined molecular subtypes of breast cancer that provide improved prognostication compared to routine biomarkers. However, molecular subtyping is not yet implemented in routine breast cancer care. Clinical translation is dependent on subtype prediction models providing high sensitivity and specificity. In this study we evaluate sample size and RNA-sequencing read requirements for breast cancer subtyping to facilitate rational design of translational studies. We applied subsampling to ascertain the effect of training sample size and the number of RNA sequencing reads on classification accuracy of molecular subtype and routine biomarker prediction models (unsupervised and supervised). Subtype classification accuracy improved with increasing sample size up to N = 750 (accuracy = 0.93), although with a modest improvement beyond N = 350 (accuracy = 0.92). Prediction of routine biomarkers achieved accuracy of 0.94 (ER) and 0.92 (Her2) at N = 200. Subtype classification improved with RNA-sequencing library size up to 5 million reads. Development of molecular subtyping models for cancer diagnostics requires well-designed studies. Sample size and the number of RNA sequencing reads directly influence accuracy of molecular subtyping. Results in this study provide key information for rational design of translational studies aiming to bring sequencing-based diagnostics to the clinic.

  1. Hydrometeorological model for streamflow prediction

    USGS Publications Warehouse

    Tangborn, Wendell V.

    1979-01-01

    The hydrometeorological model described in this manual was developed to predict seasonal streamflow from water in storage in a basin using streamflow and precipitation data. The model, as described, applies specifically to the Skokomish, Nisqually, and Cowlitz Rivers, in Washington State, and more generally to streams in other regions that derive seasonal runoff from melting snow. Thus the techniques demonstrated for these three drainage basins can be used as a guide for applying this method to other streams. Input to the computer program consists of daily averages of gaged runoff of these streams, and daily values of precipitation collected at Longmire, Kid Valley, and Cushman Dam. Predictions are based on estimates of the absolute storage of water, predominately as snow: storage is approximately equal to basin precipitation less observed runoff. A pre-forecast test season is used to revise the storage estimate and improve the prediction accuracy. To obtain maximum prediction accuracy for operational applications with this model , a systematic evaluation of several hydrologic and meteorologic variables is first necessary. Six input options to the computer program that control prediction accuracy are developed and demonstrated. Predictions of streamflow can be made at any time and for any length of season, although accuracy is usually poor for early-season predictions (before December 1) or for short seasons (less than 15 days). The coefficient of prediction (CP), the chief measure of accuracy used in this manual, approaches zero during the late autumn and early winter seasons and reaches a maximum of about 0.85 during the spring snowmelt season. (Kosco-USGS)

  2. Sequence-Based Prediction of RNA-Binding Proteins Using Random Forest with Minimum Redundancy Maximum Relevance Feature Selection.

    PubMed

    Ma, Xin; Guo, Jing; Sun, Xiao

    2015-01-01

    The prediction of RNA-binding proteins is one of the most challenging problems in computation biology. Although some studies have investigated this problem, the accuracy of prediction is still not sufficient. In this study, a highly accurate method was developed to predict RNA-binding proteins from amino acid sequences using random forests with the minimum redundancy maximum relevance (mRMR) method, followed by incremental feature selection (IFS). We incorporated features of conjoint triad features and three novel features: binding propensity (BP), nonbinding propensity (NBP), and evolutionary information combined with physicochemical properties (EIPP). The results showed that these novel features have important roles in improving the performance of the predictor. Using the mRMR-IFS method, our predictor achieved the best performance (86.62% accuracy and 0.737 Matthews correlation coefficient). High prediction accuracy and successful prediction performance suggested that our method can be a useful approach to identify RNA-binding proteins from sequence information.

  3. A comparison between Bayes discriminant analysis and logistic regression for prediction of debris flow in southwest Sichuan, China

    NASA Astrophysics Data System (ADS)

    Xu, Wenbo; Jing, Shaocai; Yu, Wenjuan; Wang, Zhaoxian; Zhang, Guoping; Huang, Jianxi

    2013-11-01

    In this study, the high risk areas of Sichuan Province with debris flow, Panzhihua and Liangshan Yi Autonomous Prefecture, were taken as the studied areas. By using rainfall and environmental factors as the predictors and based on the different prior probability combinations of debris flows, the prediction of debris flows was compared in the areas with statistical methods: logistic regression (LR) and Bayes discriminant analysis (BDA). The results through the comprehensive analysis show that (a) with the mid-range scale prior probability, the overall predicting accuracy of BDA is higher than those of LR; (b) with equal and extreme prior probabilities, the overall predicting accuracy of LR is higher than those of BDA; (c) the regional predicting models of debris flows with rainfall factors only have worse performance than those introduced environmental factors, and the predicting accuracies of occurrence and nonoccurrence of debris flows have been changed in the opposite direction as the supplemented information.

  4. Automatic Prediction of Conversion from Mild Cognitive Impairment to Probable Alzheimer’s Disease using Structural Magnetic Resonance Imaging

    PubMed Central

    Nho, Kwangsik; Shen, Li; Kim, Sungeun; Risacher, Shannon L.; West, John D.; Foroud, Tatiana; Jack, Clifford R.; Weiner, Michael W.; Saykin, Andrew J.

    2010-01-01

    Mild Cognitive Impairment (MCI) is thought to be a precursor to the development of early Alzheimer’s disease (AD). For early diagnosis of AD, the development of a model that is able to predict the conversion of amnestic MCI to AD is challenging. Using automatic whole-brain MRI analysis techniques and pattern classification methods, we developed a model to differentiate AD from healthy controls (HC), and then applied it to the prediction of MCI conversion to AD. Classification was performed using support vector machines (SVMs) together with a SVM-based feature selection method, which selected a set of most discriminating predictors for optimizing prediction accuracy. We obtained 90.5% cross-validation accuracy for classifying AD and HC, and 72.3% accuracy for predicting MCI conversion to AD. These analyses suggest that a classifier trained to separate HC vs. AD has substantial potential for predicting MCI conversion to AD. PMID:21347037

  5. A Real-time Breakdown Prediction Method for Urban Expressway On-ramp Bottlenecks

    NASA Astrophysics Data System (ADS)

    Ye, Yingjun; Qin, Guoyang; Sun, Jian; Liu, Qiyuan

    2018-01-01

    Breakdown occurrence on expressway is considered to relate with various factors. Therefore, to investigate the association between breakdowns and these factors, a Bayesian network (BN) model is adopted in this paper. Based on the breakdown events identified at 10 urban expressways on-ramp in Shanghai, China, 23 parameters before breakdowns are extracted, including dynamic environment conditions aggregated with 5-minutes and static geometry features. Different time periods data are used to predict breakdown. Results indicate that the models using 5-10 min data prior to breakdown performs the best prediction, with the prediction accuracies higher than 73%. Moreover, one unified model for all bottlenecks is also built and shows reasonably good prediction performance with the classification accuracy of breakdowns about 75%, at best. Additionally, to simplify the model parameter input, the random forests (RF) model is adopted to identify the key variables. Modeling with the selected 7 parameters, the refined BN model can predict breakdown with adequate accuracy.

  6. Predicting the biological condition of streams: Use of geospatial indicators of natural and anthropogenic characteristics of watersheds

    USGS Publications Warehouse

    Carlisle, D.M.; Falcone, J.; Meador, M.R.

    2009-01-01

    We developed and evaluated empirical models to predict biological condition of wadeable streams in a large portion of the eastern USA, with the ultimate goal of prediction for unsampled basins. Previous work had classified (i.e., altered vs. unaltered) the biological condition of 920 streams based on a biological assessment of macroinvertebrate assemblages. Predictor variables were limited to widely available geospatial data, which included land cover, topography, climate, soils, societal infrastructure, and potential hydrologic modification. We compared the accuracy of predictions of biological condition class based on models with continuous and binary responses. We also evaluated the relative importance of specific groups and individual predictor variables, as well as the relationships between the most important predictors and biological condition. Prediction accuracy and the relative importance of predictor variables were different for two subregions for which models were created. Predictive accuracy in the highlands region improved by including predictors that represented both natural and human activities. Riparian land cover and road-stream intersections were the most important predictors. In contrast, predictive accuracy in the lowlands region was best for models limited to predictors representing natural factors, including basin topography and soil properties. Partial dependence plots revealed complex and nonlinear relationships between specific predictors and the probability of biological alteration. We demonstrate a potential application of the model by predicting biological condition in 552 unsampled basins across an ecoregion in southeastern Wisconsin (USA). Estimates of the likelihood of biological condition of unsampled streams could be a valuable tool for screening large numbers of basins to focus targeted monitoring of potentially unaltered or altered stream segments. ?? Springer Science+Business Media B.V. 2008.

  7. Achievable accuracy of hip screw holding power estimation by insertion torque measurement.

    PubMed

    Erani, Paolo; Baleani, Massimiliano

    2018-02-01

    To ensure stability of proximal femoral fractures, the hip screw must firmly engage into the femoral head. Some studies suggested that screw holding power into trabecular bone could be evaluated, intraoperatively, through measurement of screw insertion torque. However, those studies used synthetic bone, instead of trabecular bone, as host material or they did not evaluate accuracy of predictions. We determined prediction accuracy, also assessing the impact of screw design and host material. We measured, under highly-repeatable experimental conditions, disregarding clinical procedure complexities, insertion torque and pullout strength of four screw designs, both in 120 synthetic and 80 trabecular bone specimens of variable density. For both host materials, we calculated the root-mean-square error and the mean-absolute-percentage error of predictions based on the best fitting model of torque-pullout data, in both single-screw and merged dataset. Predictions based on screw-specific regression models were the most accurate. Host material impacts on prediction accuracy: the replacement of synthetic with trabecular bone decreased both root-mean-square errors, from 0.54 ÷ 0.76 kN to 0.21 ÷ 0.40 kN, and mean-absolute-percentage errors, from 14 ÷ 21% to 10 ÷ 12%. However, holding power predicted on low insertion torque remained inaccurate, with errors up to 40% for torques below 1 Nm. In poor-quality trabecular bone, tissue inhomogeneities likely affect pullout strength and insertion torque to different extents, limiting the predictive power of the latter. This bias decreases when the screw engages good-quality bone. Under this condition, predictions become more accurate although this result must be confirmed by close in-vitro simulation of the clinical procedure. Copyright © 2018 Elsevier Ltd. All rights reserved.

  8. Rapid race perception despite individuation and accuracy goals.

    PubMed

    Kubota, Jennifer T; Ito, Tiffany

    2017-08-01

    Perceivers rapidly process social category information and form stereotypic impressions of unfamiliar others. However, a goal to individuate a target or to accurately predict their behavior can result in individuated impressions. It is unknown how the combination of both accuracy and individuation goals affects perceptual category processing. To explore this, participants were given both the goal to individuate targets and accurately predict behavior. We then recorded event-related brain potentials while participants viewed photos of black and white males along with four pieces of individuating information in the form of descriptions of past behavior. Even with explicit individuation and accuracy task goals, participants rapidly differentiated targets by race within 200 ms. Importantly, this rapid categorical processing did not influence behavioral outcomes as participants made individuated predictions. These findings indicate that individuals engage in category processing even when provided with individuation and accuracy goals, but that this processing does not necessarily result in category-based judgments.

  9. Improved Prediction of Blood-Brain Barrier Permeability Through Machine Learning with Combined Use of Molecular Property-Based Descriptors and Fingerprints.

    PubMed

    Yuan, Yaxia; Zheng, Fang; Zhan, Chang-Guo

    2018-03-21

    Blood-brain barrier (BBB) permeability of a compound determines whether the compound can effectively enter the brain. It is an essential property which must be accounted for in drug discovery with a target in the brain. Several computational methods have been used to predict the BBB permeability. In particular, support vector machine (SVM), which is a kernel-based machine learning method, has been used popularly in this field. For SVM training and prediction, the compounds are characterized by molecular descriptors. Some SVM models were based on the use of molecular property-based descriptors (including 1D, 2D, and 3D descriptors) or fragment-based descriptors (known as the fingerprints of a molecule). The selection of descriptors is critical for the performance of a SVM model. In this study, we aimed to develop a generally applicable new SVM model by combining all of the features of the molecular property-based descriptors and fingerprints to improve the accuracy for the BBB permeability prediction. The results indicate that our SVM model has improved accuracy compared to the currently available models of the BBB permeability prediction.

  10. An interpolation method for stream habitat assessments

    USGS Publications Warehouse

    Sheehan, Kenneth R.; Welsh, Stuart A.

    2015-01-01

    Interpolation of stream habitat can be very useful for habitat assessment. Using a small number of habitat samples to predict the habitat of larger areas can reduce time and labor costs as long as it provides accurate estimates of habitat. The spatial correlation of stream habitat variables such as substrate and depth improves the accuracy of interpolated data. Several geographical information system interpolation methods (natural neighbor, inverse distance weighted, ordinary kriging, spline, and universal kriging) were used to predict substrate and depth within a 210.7-m2 section of a second-order stream based on 2.5% and 5.0% sampling of the total area. Depth and substrate were recorded for the entire study site and compared with the interpolated values to determine the accuracy of the predictions. In all instances, the 5% interpolations were more accurate for both depth and substrate than the 2.5% interpolations, which achieved accuracies up to 95% and 92%, respectively. Interpolations of depth based on 2.5% sampling attained accuracies of 49–92%, whereas those based on 5% percent sampling attained accuracies of 57–95%. Natural neighbor interpolation was more accurate than that using the inverse distance weighted, ordinary kriging, spline, and universal kriging approaches. Our findings demonstrate the effective use of minimal amounts of small-scale data for the interpolation of habitat over large areas of a stream channel. Use of this method will provide time and cost savings in the assessment of large sections of rivers as well as functional maps to aid the habitat-based management of aquatic species.

  11. Alternative evaluation metrics for risk adjustment methods.

    PubMed

    Park, Sungchul; Basu, Anirban

    2018-06-01

    Risk adjustment is instituted to counter risk selection by accurately equating payments with expected expenditures. Traditional risk-adjustment methods are designed to estimate accurate payments at the group level. However, this generates residual risks at the individual level, especially for high-expenditure individuals, thereby inducing health plans to avoid those with high residual risks. To identify an optimal risk-adjustment method, we perform a comprehensive comparison of prediction accuracies at the group level, at the tail distributions, and at the individual level across 19 estimators: 9 parametric regression, 7 machine learning, and 3 distributional estimators. Using the 2013-2014 MarketScan database, we find that no one estimator performs best in all prediction accuracies. Generally, machine learning and distribution-based estimators achieve higher group-level prediction accuracy than parametric regression estimators. However, parametric regression estimators show higher tail distribution prediction accuracy and individual-level prediction accuracy, especially at the tails of the distribution. This suggests that there is a trade-off in selecting an appropriate risk-adjustment method between estimating accurate payments at the group level and lower residual risks at the individual level. Our results indicate that an optimal method cannot be determined solely on the basis of statistical metrics but rather needs to account for simulating plans' risk selective behaviors. Copyright © 2018 John Wiley & Sons, Ltd.

  12. A polynomial based model for cell fate prediction in human diseases.

    PubMed

    Ma, Lichun; Zheng, Jie

    2017-12-21

    Cell fate regulation directly affects tissue homeostasis and human health. Research on cell fate decision sheds light on key regulators, facilitates understanding the mechanisms, and suggests novel strategies to treat human diseases that are related to abnormal cell development. In this study, we proposed a polynomial based model to predict cell fate. This model was derived from Taylor series. As a case study, gene expression data of pancreatic cells were adopted to test and verify the model. As numerous features (genes) are available, we employed two kinds of feature selection methods, i.e. correlation based and apoptosis pathway based. Then polynomials of different degrees were used to refine the cell fate prediction function. 10-fold cross-validation was carried out to evaluate the performance of our model. In addition, we analyzed the stability of the resultant cell fate prediction model by evaluating the ranges of the parameters, as well as assessing the variances of the predicted values at randomly selected points. Results show that, within both the two considered gene selection methods, the prediction accuracies of polynomials of different degrees show little differences. Interestingly, the linear polynomial (degree 1 polynomial) is more stable than others. When comparing the linear polynomials based on the two gene selection methods, it shows that although the accuracy of the linear polynomial that uses correlation analysis outcomes is a little higher (achieves 86.62%), the one within genes of the apoptosis pathway is much more stable. Considering both the prediction accuracy and the stability of polynomial models of different degrees, the linear model is a preferred choice for cell fate prediction with gene expression data of pancreatic cells. The presented cell fate prediction model can be extended to other cells, which may be important for basic research as well as clinical study of cell development related diseases.

  13. Modeling additive and non-additive effects in a hybrid population using genome-wide genotyping: prediction accuracy implications

    PubMed Central

    Bouvet, J-M; Makouanzi, G; Cros, D; Vigneron, Ph

    2016-01-01

    Hybrids are broadly used in plant breeding and accurate estimation of variance components is crucial for optimizing genetic gain. Genome-wide information may be used to explore models designed to assess the extent of additive and non-additive variance and test their prediction accuracy for the genomic selection. Ten linear mixed models, involving pedigree- and marker-based relationship matrices among parents, were developed to estimate additive (A), dominance (D) and epistatic (AA, AD and DD) effects. Five complementary models, involving the gametic phase to estimate marker-based relationships among hybrid progenies, were developed to assess the same effects. The models were compared using tree height and 3303 single-nucleotide polymorphism markers from 1130 cloned individuals obtained via controlled crosses of 13 Eucalyptus urophylla females with 9 Eucalyptus grandis males. Akaike information criterion (AIC), variance ratios, asymptotic correlation matrices of estimates, goodness-of-fit, prediction accuracy and mean square error (MSE) were used for the comparisons. The variance components and variance ratios differed according to the model. Models with a parent marker-based relationship matrix performed better than those that were pedigree-based, that is, an absence of singularities, lower AIC, higher goodness-of-fit and accuracy and smaller MSE. However, AD and DD variances were estimated with high s.es. Using the same criteria, progeny gametic phase-based models performed better in fitting the observations and predicting genetic values. However, DD variance could not be separated from the dominance variance and null estimates were obtained for AA and AD effects. This study highlighted the advantages of progeny models using genome-wide information. PMID:26328760

  14. Effects of field plot size on prediction accuracy of aboveground biomass in airborne laser scanning-assisted inventories in tropical rain forests of Tanzania.

    PubMed

    Mauya, Ernest William; Hansen, Endre Hofstad; Gobakken, Terje; Bollandsås, Ole Martin; Malimbwi, Rogers Ernest; Næsset, Erik

    2015-12-01

    Airborne laser scanning (ALS) has recently emerged as a promising tool to acquire auxiliary information for improving aboveground biomass (AGB) estimation in sample-based forest inventories. Under design-based and model-assisted inferential frameworks, the estimation relies on a model that relates the auxiliary ALS metrics to AGB estimated on ground plots. The size of the field plots has been identified as one source of model uncertainty because of the so-called boundary effects which increases with decreasing plot size. Recent research in tropical forests has aimed to quantify the boundary effects on model prediction accuracy, but evidence of the consequences for the final AGB estimates is lacking. In this study we analyzed the effect of field plot size on model prediction accuracy and its implication when used in a model-assisted inferential framework. The results showed that the prediction accuracy of the model improved as the plot size increased. The adjusted R 2 increased from 0.35 to 0.74 while the relative root mean square error decreased from 63.6 to 29.2%. Indicators of boundary effects were identified and confirmed to have significant effects on the model residuals. Variance estimates of model-assisted mean AGB relative to corresponding variance estimates of pure field-based AGB, decreased with increasing plot size in the range from 200 to 3000 m 2 . The variance ratio of field-based estimates relative to model-assisted variance ranged from 1.7 to 7.7. This study showed that the relative improvement in precision of AGB estimation when increasing field-plot size, was greater for an ALS-assisted inventory compared to that of a pure field-based inventory.

  15. Research on light rail electric load forecasting based on ARMA model

    NASA Astrophysics Data System (ADS)

    Huang, Yifan

    2018-04-01

    The article compares a variety of time series models and combines the characteristics of power load forecasting. Then, a light load forecasting model based on ARMA model is established. Based on this model, a light rail system is forecasted. The prediction results show that the accuracy of the model prediction is high.

  16. Electroencephalogram-based decoding cognitive states using convolutional neural network and likelihood ratio based score fusion.

    PubMed

    Zafar, Raheel; Dass, Sarat C; Malik, Aamir Saeed

    2017-01-01

    Electroencephalogram (EEG)-based decoding human brain activity is challenging, owing to the low spatial resolution of EEG. However, EEG is an important technique, especially for brain-computer interface applications. In this study, a novel algorithm is proposed to decode brain activity associated with different types of images. In this hybrid algorithm, convolutional neural network is modified for the extraction of features, a t-test is used for the selection of significant features and likelihood ratio-based score fusion is used for the prediction of brain activity. The proposed algorithm takes input data from multichannel EEG time-series, which is also known as multivariate pattern analysis. Comprehensive analysis was conducted using data from 30 participants. The results from the proposed method are compared with current recognized feature extraction and classification/prediction techniques. The wavelet transform-support vector machine method is the most popular currently used feature extraction and prediction method. This method showed an accuracy of 65.7%. However, the proposed method predicts the novel data with improved accuracy of 79.9%. In conclusion, the proposed algorithm outperformed the current feature extraction and prediction method.

  17. Respiratory motion estimation in x-ray angiography for improved guidance during coronary interventions

    NASA Astrophysics Data System (ADS)

    Baka, N.; Lelieveldt, B. P. F.; Schultz, C.; Niessen, W.; van Walsum, T.

    2015-05-01

    During percutaneous coronary interventions (PCI) catheters and arteries are visualized by x-ray angiography (XA) sequences, using brief contrast injections to show the coronary arteries. If we could continue visualizing the coronary arteries after the contrast agent passed (thus in non-contrast XA frames), we could potentially lower contrast use, which is advantageous due to the toxicity of the contrast agent. This paper explores the possibility of such visualization in mono-plane XA acquisitions with a special focus on respiratory based coronary artery motion estimation. We use the patient specific coronary artery centerlines from pre-interventional 3D CTA images to project on the XA sequence for artery visualization. To achieve this, a framework for registering the 3D centerlines with the mono-plane 2D + time XA sequences is presented. During the registration the patient specific cardiac and respiratory motion is learned. We investigate several respiratory motion estimation strategies with respect to accuracy, plausibility and ease of use for motion prediction in XA frames with and without contrast. The investigated strategies include diaphragm motion based prediction, and respiratory motion extraction from the guiding catheter tip motion. We furthermore compare translational and rigid respiratory based heart motion. We validated the accuracy of the 2D/3D registration and the respiratory and cardiac motion estimations on XA sequences of 12 interventions. The diaphragm based motion model and the catheter tip derived motion achieved 1.58 mm and 1.83 mm median 2D accuracy, respectively. On a subset of four interventions we evaluated the artery visualization accuracy for non-contrast cases. Both diaphragm, and catheter tip based prediction performed similarly, with about half of the cases providing satisfactory accuracy (median error < 2 mm).

  18. A sparse autoencoder-based deep neural network for protein solvent accessibility and contact number prediction.

    PubMed

    Deng, Lei; Fan, Chao; Zeng, Zhiwen

    2017-12-28

    Direct prediction of the three-dimensional (3D) structures of proteins from one-dimensional (1D) sequences is a challenging problem. Significant structural characteristics such as solvent accessibility and contact number are essential for deriving restrains in modeling protein folding and protein 3D structure. Thus, accurately predicting these features is a critical step for 3D protein structure building. In this study, we present DeepSacon, a computational method that can effectively predict protein solvent accessibility and contact number by using a deep neural network, which is built based on stacked autoencoder and a dropout method. The results demonstrate that our proposed DeepSacon achieves a significant improvement in the prediction quality compared with the state-of-the-art methods. We obtain 0.70 three-state accuracy for solvent accessibility, 0.33 15-state accuracy and 0.74 Pearson Correlation Coefficient (PCC) for the contact number on the 5729 monomeric soluble globular protein dataset. We also evaluate the performance on the CASP11 benchmark dataset, DeepSacon achieves 0.68 three-state accuracy and 0.69 PCC for solvent accessibility and contact number, respectively. We have shown that DeepSacon can reliably predict solvent accessibility and contact number with stacked sparse autoencoder and a dropout approach.

  19. An evaluation of selected (Q)SARs/expert systems for predicting skin sensitisation potential.

    PubMed

    Fitzpatrick, J M; Roberts, D W; Patlewicz, G

    2018-06-01

    Predictive testing to characterise substances for their skin sensitisation potential has historically been based on animal models such as the Local Lymph Node Assay (LLNA) and the Guinea Pig Maximisation Test (GPMT). In recent years, EU regulations, have provided a strong incentive to develop non-animal alternatives, such as expert systems software. Here we selected three different types of expert systems: VEGA (statistical), Derek Nexus (knowledge-based) and TIMES-SS (hybrid), and evaluated their performance using two large sets of animal data: one set of 1249 substances from eChemportal and a second set of 515 substances from NICEATM. A model was considered successful at predicting skin sensitisation potential if it had at least the same balanced accuracy as the LLNA and the GPMT had in predicting the other outcomes, which ranged from 79% to 86%. We found that the highest balanced accuracy of any of the expert systems evaluated was 65% when making global predictions. For substances within the domain of TIMES-SS, however, balanced accuracies for the two datasets were found to be 79% and 82%. In those cases where a chemical was within the TIMES-SS domain, the TIMES-SS skin sensitisation hazard prediction had the same confidence as the result from LLNA or GPMT.

  20. Improvement of PM concentration predictability using WRF-CMAQ-DLM coupled system and its applications

    NASA Astrophysics Data System (ADS)

    Lee, Soon Hwan; Kim, Ji Sun; Lee, Kang Yeol; Shon, Keon Tae

    2017-04-01

    Air quality due to increasing Particulate Matter(PM) in Korea in Asia is getting worse. At present, the PM forecast is announced based on the PM concentration predicted from the air quality prediction numerical model. However, forecast accuracy is not as high as expected due to various uncertainties for PM physical and chemical characteristics. The purpose of this study was to develop a numerical-statistically ensemble models to improve the accuracy of prediction of PM10 concentration. Numerical models used in this study are the three dimensional atmospheric model Weather Research and Forecasting(WRF) and the community multiscale air quality model (CMAQ). The target areas for the PM forecast are Seoul, Busan, Daegu, and Daejeon metropolitan areas in Korea. The data used in the model development are PM concentration and CMAQ predictions and the data period is 3 months (March 1 - May 31, 2014). The dynamic-statistical technics for reducing the systematic error of the CMAQ predictions was applied to the dynamic linear model(DLM) based on the Baysian Kalman filter technic. As a result of applying the metrics generated from the dynamic linear model to the forecasting of PM concentrations accuracy was improved. Especially, at the high PM concentration where the damage is relatively large, excellent improvement results are shown.

  1. Pillars of judgment: how memory abilities affect performance in rule-based and exemplar-based judgments.

    PubMed

    Hoffmann, Janina A; von Helversen, Bettina; Rieskamp, Jörg

    2014-12-01

    Making accurate judgments is an essential skill in everyday life. Although how different memory abilities relate to categorization and judgment processes has been hotly debated, the question is far from resolved. We contribute to the solution by investigating how individual differences in memory abilities affect judgment performance in 2 tasks that induced rule-based or exemplar-based judgment strategies. In a study with 279 participants, we investigated how working memory and episodic memory affect judgment accuracy and strategy use. As predicted, participants switched strategies between tasks. Furthermore, structural equation modeling showed that the ability to solve rule-based tasks was predicted by working memory, whereas episodic memory predicted judgment accuracy in the exemplar-based task. Last, the probability of choosing an exemplar-based strategy was related to better episodic memory, but strategy selection was unrelated to working memory capacity. In sum, our results suggest that different memory abilities are essential for successfully adopting different judgment strategies. PsycINFO Database Record (c) 2014 APA, all rights reserved.

  2. Static bending deflection and free vibration analysis of moderate thick symmetric laminated plates using multidimensional wave digital filters

    NASA Astrophysics Data System (ADS)

    Tseng, Chien-Hsun

    2018-06-01

    This paper aims to develop a multidimensional wave digital filtering network for predicting static and dynamic behaviors of composite laminate based on the FSDT. The resultant network is, thus, an integrated platform that can perform not only the free vibration but also the bending deflection of moderate thick symmetric laminated plates with low plate side-to-thickness ratios (< = 20). Safeguarded by the Courant-Friedrichs-Levy stability condition with the least restriction in terms of optimization technique, the present method offers numerically high accuracy, stability and efficiency to proceed a wide range of modulus ratios for the FSDT laminated plates. Instead of using a constant shear correction factor (SCF) with a limited numerical accuracy for the bending deflection, an optimum SCF is particularly sought by looking for a minimum ratio of change in the transverse shear energy. This way, it can predict as good results in terms of accuracy for certain cases of bending deflection. Extensive simulation results carried out for the prediction of maximum bending deflection have demonstratively proven that the present method outperforms those based on the higher-order shear deformation and layerwise plate theories. To the best of our knowledge, this is the first work that shows an optimal selection of SCF can significantly increase the accuracy of FSDT-based laminates especially compared to the higher order theory disclaiming any correction. The highest accuracy of overall solution is compared to the 3D elasticity equilibrium one.

  3. Flight Test Results: CTAS Cruise/Descent Trajectory Prediction Accuracy for En route ATC Advisories

    NASA Technical Reports Server (NTRS)

    Green, S.; Grace, M.; Williams, D.

    1999-01-01

    The Center/TRACON Automation System (CTAS), under development at NASA Ames Research Center, is designed to assist controllers with the management and control of air traffic transitioning to/from congested airspace. This paper focuses on the transition from the en route environment, to high-density terminal airspace, under a time-based arrival-metering constraint. Two flight tests were conducted at the Denver Air Route Traffic Control Center (ARTCC) to study trajectory-prediction accuracy, the key to accurate Decision Support Tool advisories such as conflict detection/resolution and fuel-efficient metering conformance. In collaboration with NASA Langley Research Center, these test were part of an overall effort to research systems and procedures for the integration of CTAS and flight management systems (FMS). The Langley Transport Systems Research Vehicle Boeing 737 airplane flew a combined total of 58 cruise-arrival trajectory runs while following CTAS clearance advisories. Actual trajectories of the airplane were compared to CTAS and FMS predictions to measure trajectory-prediction accuracy and identify the primary sources of error for both. The research airplane was used to evaluate several levels of cockpit automation ranging from conventional avionics to a performance-based vertical navigation (VNAV) FMS. Trajectory prediction accuracy was analyzed with respect to both ARTCC radar tracking and GPS-based aircraft measurements. This paper presents detailed results describing the trajectory accuracy and error sources. Although differences were found in both accuracy and error sources, CTAS accuracy was comparable to the FMS in terms of both meter-fix arrival-time performance (in support of metering) and 4D-trajectory prediction (key to conflict prediction). Overall arrival time errors (mean plus standard deviation) were measured to be approximately 24 seconds during the first flight test (23 runs) and 15 seconds during the second flight test (25 runs). The major source of error during these tests was found to be the predicted winds aloft used by CTAS. Position and velocity estimates of the airplane provided to CTAS by the ATC Host radar tracker were found to be a relatively insignificant error source for the trajectory conditions evaluated. Airplane performance modeling errors within CTAS were found to not significantly affect arrival time errors when the constrained descent procedures were used. The most significant effect related to the flight guidance was observed to be the cross-track and turn-overshoot errors associated with conventional VOR guidance. Lateral navigation (LNAV) guidance significantly reduced both the cross-track and turn-overshoot error. Pilot procedures and VNAV guidance were found to significantly reduce the vertical profile errors associated with atmospheric and aircraft performance model errors.

  4. The effect of stimulus strength on the speed and accuracy of a perceptual decision.

    PubMed

    Palmer, John; Huk, Alexander C; Shadlen, Michael N

    2005-05-02

    Both the speed and the accuracy of a perceptual judgment depend on the strength of the sensory stimulation. When stimulus strength is high, accuracy is high and response time is fast; when stimulus strength is low, accuracy is low and response time is slow. Although the psychometric function is well established as a tool for analyzing the relationship between accuracy and stimulus strength, the corresponding chronometric function for the relationship between response time and stimulus strength has not received as much consideration. In this article, we describe a theory of perceptual decision making based on a diffusion model. In it, a decision is based on the additive accumulation of sensory evidence over time to a bound. Combined with simple scaling assumptions, the proportional-rate and power-rate diffusion models predict simple analytic expressions for both the chronometric and psychometric functions. In a series of psychophysical experiments, we show that this theory accounts for response time and accuracy as a function of both stimulus strength and speed-accuracy instructions. In particular, the results demonstrate a close coupling between response time and accuracy. The theory is also shown to subsume the predictions of Piéron's Law, a power function dependence of response time on stimulus strength. The theory's analytic chronometric function allows one to extend theories of accuracy to response time.

  5. Development and validation of classifiers and variable subsets for predicting nursing home admission.

    PubMed

    Nuutinen, Mikko; Leskelä, Riikka-Leena; Suojalehto, Ella; Tirronen, Anniina; Komssi, Vesa

    2017-04-13

    In previous years a substantial number of studies have identified statistically important predictors of nursing home admission (NHA). However, as far as we know, the analyses have been done at the population-level. No prior research has analysed the prediction accuracy of a NHA model for individuals. This study is an analysis of 3056 longer-term home care customers in the city of Tampere, Finland. Data were collected from the records of social and health service usage and RAI-HC (Resident Assessment Instrument - Home Care) assessment system during January 2011 and September 2015. The aim was to find out the most efficient variable subsets to predict NHA for individuals and validate the accuracy. The variable subsets of predicting NHA were searched by sequential forward selection (SFS) method, a variable ranking metric and the classifiers of logistic regression (LR), support vector machine (SVM) and Gaussian naive Bayes (GNB). The validation of the results was guaranteed using randomly balanced data sets and cross-validation. The primary performance metrics for the classifiers were the prediction accuracy and AUC (average area under the curve). The LR and GNB classifiers achieved 78% accuracy for predicting NHA. The most important variables were RAI MAPLE (Method for Assigning Priority Levels), functional impairment (RAI IADL, Activities of Daily Living), cognitive impairment (RAI CPS, Cognitive Performance Scale), memory disorders (diagnoses G30-G32 and F00-F03) and the use of community-based health-service and prior hospital use (emergency visits and periods of care). The accuracy of the classifier for individuals was high enough to convince the officials of the city of Tampere to integrate the predictive model based on the findings of this study as a part of home care information system. Further work need to be done to evaluate variables that are modifiable and responsive to interventions.

  6. Aircraft noise prediction program validation

    NASA Technical Reports Server (NTRS)

    Shivashankara, B. N.

    1980-01-01

    A modular computer program (ANOPP) for predicting aircraft flyover and sideline noise was developed. A high quality flyover noise data base for aircraft that are representative of the U.S. commercial fleet was assembled. The accuracy of ANOPP with respect to the data base was determined. The data for source and propagation effects were analyzed and suggestions for improvements to the prediction methodology are given.

  7. Mean Expected Error in Prediction of Total Body Water: A True Accuracy Comparison between Bioimpedance Spectroscopy and Single Frequency Regression Equations

    PubMed Central

    Abtahi, Shirin; Abtahi, Farhad; Ellegård, Lars; Johannsson, Gudmundur; Bosaeus, Ingvar

    2015-01-01

    For several decades electrical bioimpedance (EBI) has been used to assess body fluid distribution and body composition. Despite the development of several different approaches for assessing total body water (TBW), it remains uncertain whether bioimpedance spectroscopic (BIS) approaches are more accurate than single frequency regression equations. The main objective of this study was to answer this question by calculating the expected accuracy of a single measurement for different EBI methods. The results of this study showed that all methods produced similarly high correlation and concordance coefficients, indicating good accuracy as a method. Even the limits of agreement produced from the Bland-Altman analysis indicated that the performance of single frequency, Sun's prediction equations, at population level was close to the performance of both BIS methods; however, when comparing the Mean Absolute Percentage Error value between the single frequency prediction equations and the BIS methods, a significant difference was obtained, indicating slightly better accuracy for the BIS methods. Despite the higher accuracy of BIS methods over 50 kHz prediction equations at both population and individual level, the magnitude of the improvement was small. Such slight improvement in accuracy of BIS methods is suggested insufficient to warrant their clinical use where the most accurate predictions of TBW are required, for example, when assessing over-fluidic status on dialysis. To reach expected errors below 4-5%, novel and individualized approaches must be developed to improve the accuracy of bioimpedance-based methods for the advent of innovative personalized health monitoring applications. PMID:26137489

  8. Canopy Temperature and Vegetation Indices from High-Throughput Phenotyping Improve Accuracy of Pedigree and Genomic Selection for Grain Yield in Wheat

    PubMed Central

    Rutkoski, Jessica; Poland, Jesse; Mondal, Suchismita; Autrique, Enrique; Pérez, Lorena González; Crossa, José; Reynolds, Matthew; Singh, Ravi

    2016-01-01

    Genomic selection can be applied prior to phenotyping, enabling shorter breeding cycles and greater rates of genetic gain relative to phenotypic selection. Traits measured using high-throughput phenotyping based on proximal or remote sensing could be useful for improving pedigree and genomic prediction model accuracies for traits not yet possible to phenotype directly. We tested if using aerial measurements of canopy temperature, and green and red normalized difference vegetation index as secondary traits in pedigree and genomic best linear unbiased prediction models could increase accuracy for grain yield in wheat, Triticum aestivum L., using 557 lines in five environments. Secondary traits on training and test sets, and grain yield on the training set were modeled as multivariate, and compared to univariate models with grain yield on the training set only. Cross validation accuracies were estimated within and across-environment, with and without replication, and with and without correcting for days to heading. We observed that, within environment, with unreplicated secondary trait data, and without correcting for days to heading, secondary traits increased accuracies for grain yield by 56% in pedigree, and 70% in genomic prediction models, on average. Secondary traits increased accuracy slightly more when replicated, and considerably less when models corrected for days to heading. In across-environment prediction, trends were similar but less consistent. These results show that secondary traits measured in high-throughput could be used in pedigree and genomic prediction to improve accuracy. This approach could improve selection in wheat during early stages if validated in early-generation breeding plots. PMID:27402362

  9. E-nose based rapid prediction of early mouldy grain using probabilistic neural networks

    PubMed Central

    Ying, Xiaoguo; Liu, Wei; Hui, Guohua; Fu, Jun

    2015-01-01

    In this paper, early mouldy grain rapid prediction method using probabilistic neural network (PNN) and electronic nose (e-nose) was studied. E-nose responses to rice, red bean, and oat samples with different qualities were measured and recorded. E-nose data was analyzed using principal component analysis (PCA), back propagation (BP) network, and PNN, respectively. Results indicated that PCA and BP network could not clearly discriminate grain samples with different mouldy status and showed poor predicting accuracy. PNN showed satisfying discriminating abilities to grain samples with an accuracy of 93.75%. E-nose combined with PNN is effective for early mouldy grain prediction. PMID:25714125

  10. Prediction of composite fatigue life under variable amplitude loading using artificial neural network trained by genetic algorithm

    NASA Astrophysics Data System (ADS)

    Rohman, Muhamad Nur; Hidayat, Mas Irfan P.; Purniawan, Agung

    2018-04-01

    Neural networks (NN) have been widely used in application of fatigue life prediction. In the use of fatigue life prediction for polymeric-base composite, development of NN model is necessary with respect to the limited fatigue data and applicable to be used to predict the fatigue life under varying stress amplitudes in the different stress ratios. In the present paper, Multilayer-Perceptrons (MLP) model of neural network is developed, and Genetic Algorithm was employed to optimize the respective weights of NN for prediction of polymeric-base composite materials under variable amplitude loading. From the simulation result obtained with two different composite systems, named E-glass fabrics/epoxy (layups [(±45)/(0)2]S), and E-glass/polyester (layups [90/0/±45/0]S), NN model were trained with fatigue data from two different stress ratios, which represent limited fatigue data, can be used to predict another four and seven stress ratios respectively, with high accuracy of fatigue life prediction. The accuracy of NN prediction were quantified with the small value of mean square error (MSE). When using 33% from the total fatigue data for training, the NN model able to produce high accuracy for all stress ratios. When using less fatigue data during training (22% from the total fatigue data), the NN model still able to produce high coefficient of determination between the prediction result compared with obtained by experiment.

  11. An Experimental Study in Determining Energy Expenditure from Treadmill Walking using Hip-Worn Inertial Sensors

    PubMed Central

    Vathsangam, Harshvardhan; Emken, Adar; Schroeder, E. Todd; Spruijt-Metz, Donna; Sukhatme, Gaurav S.

    2011-01-01

    This paper describes an experimental study in estimating energy expenditure from treadmill walking using a single hip-mounted triaxial inertial sensor comprised of a triaxial accelerometer and a triaxial gyroscope. Typical physical activity characterization using accelerometer generated counts suffers from two drawbacks - imprecison (due to proprietary counts) and incompleteness (due to incomplete movement description). We address these problems in the context of steady state walking by directly estimating energy expenditure with data from a hip-mounted inertial sensor. We represent the cyclic nature of walking with a Fourier transform of sensor streams and show how one can map this representation to energy expenditure (as measured by V O2 consumption, mL/min) using three regression techniques - Least Squares Regression (LSR), Bayesian Linear Regression (BLR) and Gaussian Process Regression (GPR). We perform a comparative analysis of the accuracy of sensor streams in predicting energy expenditure (measured by RMS prediction accuracy). Triaxial information is more accurate than uniaxial information. LSR based approaches are prone to outlier sensitivity and overfitting. Gyroscopic information showed equivalent if not better prediction accuracy as compared to accelerometers. Combining accelerometer and gyroscopic information provided better accuracy than using either sensor alone. We also analyze the best algorithmic approach among linear and nonlinear methods as measured by RMS prediction accuracy and run time. Nonlinear regression methods showed better prediction accuracy but required an order of magnitude of run time. This paper emphasizes the role of probabilistic techniques in conjunction with joint modeling of triaxial accelerations and rotational rates to improve energy expenditure prediction for steady-state treadmill walking. PMID:21690001

  12. Genomic and pedigree-based prediction for leaf, stem, and stripe rust resistance in wheat.

    PubMed

    Juliana, Philomin; Singh, Ravi P; Singh, Pawan K; Crossa, Jose; Huerta-Espino, Julio; Lan, Caixia; Bhavani, Sridhar; Rutkoski, Jessica E; Poland, Jesse A; Bergstrom, Gary C; Sorrells, Mark E

    2017-07-01

    Genomic prediction for seedling and adult plant resistance to wheat rusts was compared to prediction using few markers as fixed effects in a least-squares approach and pedigree-based prediction. The unceasing plant-pathogen arms race and ephemeral nature of some rust resistance genes have been challenging for wheat (Triticum aestivum L.) breeding programs and farmers. Hence, it is important to devise strategies for effective evaluation and exploitation of quantitative rust resistance. One promising approach that could accelerate gain from selection for rust resistance is 'genomic selection' which utilizes dense genome-wide markers to estimate the breeding values (BVs) for quantitative traits. Our objective was to compare three genomic prediction models including genomic best linear unbiased prediction (GBLUP), GBLUP A that was GBLUP with selected loci as fixed effects and reproducing kernel Hilbert spaces-markers (RKHS-M) with least-squares (LS) approach, RKHS-pedigree (RKHS-P), and RKHS markers and pedigree (RKHS-MP) to determine the BVs for seedling and/or adult plant resistance (APR) to leaf rust (LR), stem rust (SR), and stripe rust (YR). The 333 lines in the 45th IBWSN and the 313 lines in the 46th IBWSN were genotyped using genotyping-by-sequencing and phenotyped in replicated trials. The mean prediction accuracies ranged from 0.31-0.74 for LR seedling, 0.12-0.56 for LR APR, 0.31-0.65 for SR APR, 0.70-0.78 for YR seedling, and 0.34-0.71 for YR APR. For most datasets, the RKHS-MP model gave the highest accuracies, while LS gave the lowest. GBLUP, GBLUP A, RKHS-M, and RKHS-P models gave similar accuracies. Using genome-wide marker-based models resulted in an average of 42% increase in accuracy over LS. We conclude that GS is a promising approach for improvement of quantitative rust resistance and can be implemented in the breeding pipeline.

  13. Comparison of Adjacency and Distance-Based Approaches for Spatial Analysis of Multimodal Traffic Crash Data

    NASA Astrophysics Data System (ADS)

    Gill, G.; Sakrani, T.; Cheng, W.; Zhou, J.

    2017-09-01

    Many studies have utilized the spatial correlations among traffic crash data to develop crash prediction models with the aim to investigate the influential factors or predict crash counts at different sites. The spatial correlation have been observed to account for heterogeneity in different forms of weight matrices which improves the estimation performance of models. But very rarely have the weight matrices been compared for the prediction accuracy for estimation of crash counts. This study was targeted at the comparison of two different approaches for modelling the spatial correlations among crash data at macro-level (County). Multivariate Full Bayesian crash prediction models were developed using Decay-50 (distance-based) and Queen-1 (adjacency-based) weight matrices for simultaneous estimation crash counts of four different modes: vehicle, motorcycle, bike, and pedestrian. The goodness-of-fit and different criteria for accuracy at prediction of crash count reveled the superiority of Decay-50 over Queen-1. Decay-50 was essentially different from Queen-1 with the selection of neighbors and more robust spatial weight structure which rendered the flexibility to accommodate the spatially correlated crash data. The consistently better performance of Decay-50 at prediction accuracy further bolstered its superiority. Although the data collection efforts to gather centroid distance among counties for Decay-50 may appear to be a downside, but the model has a significant edge to fit the crash data without losing the simplicity of computation of estimated crash count.

  14. NNvPDB: Neural Network based Protein Secondary Structure Prediction with PDB Validation.

    PubMed

    Sakthivel, Seethalakshmi; S K M, Habeeb

    2015-01-01

    The predicted secondary structural states are not cross validated by any of the existing servers. Hence, information on the level of accuracy for every sequence is not reported by the existing servers. This was overcome by NNvPDB, which not only reported greater Q3 but also validates every prediction with the homologous PDB entries. NNvPDB is based on the concept of Neural Network, with a new and different approach of training the network every time with five PDB structures that are similar to query sequence. The average accuracy for helix is 76%, beta sheet is 71% and overall (helix, sheet and coil) is 66%. http://bit.srmuniv.ac.in/cgi-bin/bit/cfpdb/nnsecstruct.pl.

  15. Software reliability studies

    NASA Technical Reports Server (NTRS)

    Hoppa, Mary Ann; Wilson, Larry W.

    1994-01-01

    There are many software reliability models which try to predict future performance of software based on data generated by the debugging process. Our research has shown that by improving the quality of the data one can greatly improve the predictions. We are working on methodologies which control some of the randomness inherent in the standard data generation processes in order to improve the accuracy of predictions. Our contribution is twofold in that we describe an experimental methodology using a data structure called the debugging graph and apply this methodology to assess the robustness of existing models. The debugging graph is used to analyze the effects of various fault recovery orders on the predictive accuracy of several well-known software reliability algorithms. We found that, along a particular debugging path in the graph, the predictive performance of different models can vary greatly. Similarly, just because a model 'fits' a given path's data well does not guarantee that the model would perform well on a different path. Further we observed bug interactions and noted their potential effects on the predictive process. We saw that not only do different faults fail at different rates, but that those rates can be affected by the particular debugging stage at which the rates are evaluated. Based on our experiment, we conjecture that the accuracy of a reliability prediction is affected by the fault recovery order as well as by fault interaction.

  16. R2C: improving ab initio residue contact map prediction using dynamic fusion strategy and Gaussian noise filter.

    PubMed

    Yang, Jing; Jin, Qi-Yu; Zhang, Biao; Shen, Hong-Bin

    2016-08-15

    Inter-residue contacts in proteins dictate the topology of protein structures. They are crucial for protein folding and structural stability. Accurate prediction of residue contacts especially for long-range contacts is important to the quality of ab inito structure modeling since they can enforce strong restraints to structure assembly. In this paper, we present a new Residue-Residue Contact predictor called R2C that combines machine learning-based and correlated mutation analysis-based methods, together with a two-dimensional Gaussian noise filter to enhance the long-range residue contact prediction. Our results show that the outputs from the machine learning-based method are concentrated with better performance on short-range contacts; while for correlated mutation analysis-based approach, the predictions are widespread with higher accuracy on long-range contacts. An effective query-driven dynamic fusion strategy proposed here takes full advantages of the two different methods, resulting in an impressive overall accuracy improvement. We also show that the contact map directly from the prediction model contains the interesting Gaussian noise, which has not been discovered before. Different from recent studies that tried to further enhance the quality of contact map by removing its transitive noise, we designed a new two-dimensional Gaussian noise filter, which was especially helpful for reinforcing the long-range residue contact prediction. Tested on recent CASP10/11 datasets, the overall top L/5 accuracy of our final R2C predictor is 17.6%/15.5% higher than the pure machine learning-based method and 7.8%/8.3% higher than the correlated mutation analysis-based approach for the long-range residue contact prediction. http://www.csbio.sjtu.edu.cn/bioinf/R2C/Contact:hbshen@sjtu.edu.cn Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  17. VWPS: A Ventilator Weaning Prediction System with Artificial Intelligence

    NASA Astrophysics Data System (ADS)

    Chen, Austin H.; Chen, Guan-Ting

    How to wean patients efficiently off mechanical ventilation continues to be a challenge for medical professionals. In this paper we have described a novel approach to the study of a ventilator weaning prediction system (VWPS). Firstly, we have developed and written three Artificial Neural Network (ANN) algorithms to predict a weaning successful rate based on the clinical data. Secondly, we have implemented two user-friendly weaning success rate prediction systems; the VWPS system and the BWAP system. Both systems could be used to help doctors objectively and effectively predict whether weaning is appropriate for patients based on the patients' clinical data. Our system utilizes the powerful processing abilities of MatLab. Thirdly, we have calculated the performance through measures such as sensitivity and accuracy for these three algorithms. The results show a very high sensitivity (around 80%) and accuracy (around 70%). To our knowledge, this is the first design approach of its kind to be used in the study of ventilator weaning success rate prediction.

  18. EEG Beta Oscillations in the Temporoparietal Area Related to the Accuracy in Estimating Others' Preference

    PubMed Central

    Park, Jonghyeok; Kim, Hackjin; Sohn, Jeong-Woo; Choi, Jong-ryul; Kim, Sung-Phil

    2018-01-01

    Humans often attempt to predict what others prefer based on a narrow slice of experience, called thin-slicing. According to the theoretical bases for how humans can predict the preference of others, one tends to estimate the other's preference using a perceived difference between the other and self. Previous neuroimaging studies have revealed that the network of dorsal medial prefrontal cortex (dmPFC) and right temporoparietal junction (rTPJ) is related to the ability of predicting others' preference. However, it still remains unknown about the temporal patterns of neural activities for others' preference prediction through thin-slicing. To investigate such temporal aspects of neural activities, we investigated human electroencephalography (EEG) recorded during the task of predicting the preference of others while only a facial picture of others was provided. Twenty participants (all female, average age: 21.86) participated in the study. In each trial of the task, participants were shown a picture of either a target person or self for 3 s, followed by the presentation of a movie poster over which participants predicted the target person's preference as liking or disliking. The time-frequency EEG analysis was employed to analyze temporal changes in the amplitudes of brain oscillations. Participants could predict others' preference for movies with accuracy of 56.89 ± 3.16% and 10 out of 20 participants exhibited prediction accuracy higher than a chance level (95% interval). There was a significant difference in the power of the parietal alpha (10~13 Hz) oscillation 0.6~0.8 s after the onset of poster presentation between the cases when participants predicted others' preference and when they reported self-preference (p < 0.05). The power of brain oscillations at any frequency band and time period during the trial did not show a significant correlation with individual prediction accuracy. However, when we measured differences of the power between the trials of predicting other's preference and reporting self-preference, the right temporal beta oscillations 1.6~1.8 s after the onset of facial picture presentation exhibited a significant correlation with individual accuracy. Our results suggest that right temporoparietal beta oscillations may be correlated with one's ability to predict what others prefer with minimal information. PMID:29479312

  19. EEG Beta Oscillations in the Temporoparietal Area Related to the Accuracy in Estimating Others' Preference.

    PubMed

    Park, Jonghyeok; Kim, Hackjin; Sohn, Jeong-Woo; Choi, Jong-Ryul; Kim, Sung-Phil

    2018-01-01

    Humans often attempt to predict what others prefer based on a narrow slice of experience, called thin-slicing. According to the theoretical bases for how humans can predict the preference of others, one tends to estimate the other's preference using a perceived difference between the other and self. Previous neuroimaging studies have revealed that the network of dorsal medial prefrontal cortex (dmPFC) and right temporoparietal junction (rTPJ) is related to the ability of predicting others' preference. However, it still remains unknown about the temporal patterns of neural activities for others' preference prediction through thin-slicing. To investigate such temporal aspects of neural activities, we investigated human electroencephalography (EEG) recorded during the task of predicting the preference of others while only a facial picture of others was provided. Twenty participants (all female, average age: 21.86) participated in the study. In each trial of the task, participants were shown a picture of either a target person or self for 3 s, followed by the presentation of a movie poster over which participants predicted the target person's preference as liking or disliking. The time-frequency EEG analysis was employed to analyze temporal changes in the amplitudes of brain oscillations. Participants could predict others' preference for movies with accuracy of 56.89 ± 3.16% and 10 out of 20 participants exhibited prediction accuracy higher than a chance level (95% interval). There was a significant difference in the power of the parietal alpha (10~13 Hz) oscillation 0.6~0.8 s after the onset of poster presentation between the cases when participants predicted others' preference and when they reported self-preference ( p < 0.05). The power of brain oscillations at any frequency band and time period during the trial did not show a significant correlation with individual prediction accuracy. However, when we measured differences of the power between the trials of predicting other's preference and reporting self-preference, the right temporal beta oscillations 1.6~1.8 s after the onset of facial picture presentation exhibited a significant correlation with individual accuracy. Our results suggest that right temporoparietal beta oscillations may be correlated with one's ability to predict what others prefer with minimal information.

  20. Exploring the genetic architecture and improving genomic prediction accuracy for mastitis and milk production traits in dairy cattle by mapping variants to hepatic transcriptomic regions responsive to intra-mammary infection.

    PubMed

    Fang, Lingzhao; Sahana, Goutam; Ma, Peipei; Su, Guosheng; Yu, Ying; Zhang, Shengli; Lund, Mogens Sandø; Sørensen, Peter

    2017-05-12

    A better understanding of the genetic architecture of complex traits can contribute to improve genomic prediction. We hypothesized that genomic variants associated with mastitis and milk production traits in dairy cattle are enriched in hepatic transcriptomic regions that are responsive to intra-mammary infection (IMI). Genomic markers [e.g. single nucleotide polymorphisms (SNPs)] from those regions, if included, may improve the predictive ability of a genomic model. We applied a genomic feature best linear unbiased prediction model (GFBLUP) to implement the above strategy by considering the hepatic transcriptomic regions responsive to IMI as genomic features. GFBLUP, an extension of GBLUP, includes a separate genomic effect of SNPs within a genomic feature, and allows differential weighting of the individual marker relationships in the prediction equation. Since GFBLUP is computationally intensive, we investigated whether a SNP set test could be a computationally fast way to preselect predictive genomic features. The SNP set test assesses the association between a genomic feature and a trait based on single-SNP genome-wide association studies. We applied these two approaches to mastitis and milk production traits (milk, fat and protein yield) in Holstein (HOL, n = 5056) and Jersey (JER, n = 1231) cattle. We observed that a majority of genomic features were enriched in genomic variants that were associated with mastitis and milk production traits. Compared to GBLUP, the accuracy of genomic prediction with GFBLUP was marginally improved (3.2 to 3.9%) in within-breed prediction. The highest increase (164.4%) in prediction accuracy was observed in across-breed prediction. The significance of genomic features based on the SNP set test were correlated with changes in prediction accuracy of GFBLUP (P < 0.05). GFBLUP provides a framework for integrating multiple layers of biological knowledge to provide novel insights into the biological basis of complex traits, and to improve the accuracy of genomic prediction. The SNP set test might be used as a first-step to improve GFBLUP models. Approaches like GFBLUP and SNP set test will become increasingly useful, as the functional annotations of genomes keep accumulating for a range of species and traits.

  1. A study on the theoretical and practical accuracy of conoscopic holography-based surface measurements: toward image registration in minimally invasive surgery†

    PubMed Central

    Burgner, J.; Simpson, A. L.; Fitzpatrick, J. M.; Lathrop, R. A.; Herrell, S. D.; Miga, M. I.; Webster, R. J.

    2013-01-01

    Background Registered medical images can assist with surgical navigation and enable image-guided therapy delivery. In soft tissues, surface-based registration is often used and can be facilitated by laser surface scanning. Tracked conoscopic holography (which provides distance measurements) has been recently proposed as a minimally invasive way to obtain surface scans. Moving this technique from concept to clinical use requires a rigorous accuracy evaluation, which is the purpose of our paper. Methods We adapt recent non-homogeneous and anisotropic point-based registration results to provide a theoretical framework for predicting the accuracy of tracked distance measurement systems. Experiments are conducted a complex objects of defined geometry, an anthropomorphic kidney phantom and a human cadaver kidney. Results Experiments agree with model predictions, producing point RMS errors consistently < 1 mm, surface-based registration with mean closest point error < 1 mm in the phantom and a RMS target registration error of 0.8 mm in the human cadaver kidney. Conclusions Tracked conoscopic holography is clinically viable; it enables minimally invasive surface scan accuracy comparable to current clinical methods that require open surgery. PMID:22761086

  2. Application of local binary pattern and human visual Fibonacci texture features for classification different medical images

    NASA Astrophysics Data System (ADS)

    Sanghavi, Foram; Agaian, Sos

    2017-05-01

    The goal of this paper is to (a) test the nuclei based Computer Aided Cancer Detection system using Human Visual based system on the histopathology images and (b) Compare the results of the proposed system with the Local Binary Pattern and modified Fibonacci -p pattern systems. The system performance is evaluated using different parameters such as accuracy, specificity, sensitivity, positive predictive value, and negative predictive value on 251 prostate histopathology images. The accuracy of 96.69% was observed for cancer detection using the proposed human visual based system compared to 87.42% and 94.70% observed for Local Binary patterns and the modified Fibonacci p patterns.

  3. [GSH fermentation process modeling using entropy-criterion based RBF neural network model].

    PubMed

    Tan, Zuoping; Wang, Shitong; Deng, Zhaohong; Du, Guocheng

    2008-05-01

    The prediction accuracy and generalization of GSH fermentation process modeling are often deteriorated by noise existing in the corresponding experimental data. In order to avoid this problem, we present a novel RBF neural network modeling approach based on entropy criterion. It considers the whole distribution structure of the training data set in the parameter learning process compared with the traditional MSE-criterion based parameter learning, and thus effectively avoids the weak generalization and over-learning. Then the proposed approach is applied to the GSH fermentation process modeling. Our results demonstrate that this proposed method has better prediction accuracy, generalization and robustness such that it offers a potential application merit for the GSH fermentation process modeling.

  4. Prediction of β-turns in proteins from multiple alignment using neural network

    PubMed Central

    Kaur, Harpreet; Raghava, Gajendra Pal Singh

    2003-01-01

    A neural network-based method has been developed for the prediction of β-turns in proteins by using multiple sequence alignment. Two feed-forward back-propagation networks with a single hidden layer are used where the first-sequence structure network is trained with the multiple sequence alignment in the form of PSI-BLAST–generated position-specific scoring matrices. The initial predictions from the first network and PSIPRED-predicted secondary structure are used as input to the second structure-structure network to refine the predictions obtained from the first net. A significant improvement in prediction accuracy has been achieved by using evolutionary information contained in the multiple sequence alignment. The final network yields an overall prediction accuracy of 75.5% when tested by sevenfold cross-validation on a set of 426 nonhomologous protein chains. The corresponding Qpred, Qobs, and Matthews correlation coefficient values are 49.8%, 72.3%, and 0.43, respectively, and are the best among all the previously published β-turn prediction methods. The Web server BetaTPred2 (http://www.imtech.res.in/raghava/betatpred2/) has been developed based on this approach. PMID:12592033

  5. A Mechanism-Based Model for the Prediction of the Metabolic Sites of Steroids Mediated by Cytochrome P450 3A4.

    PubMed

    Dai, Zi-Ru; Ai, Chun-Zhi; Ge, Guang-Bo; He, Yu-Qi; Wu, Jing-Jing; Wang, Jia-Yue; Man, Hui-Zi; Jia, Yan; Yang, Ling

    2015-06-30

    Early prediction of xenobiotic metabolism is essential for drug discovery and development. As the most important human drug-metabolizing enzyme, cytochrome P450 3A4 has a large active cavity and metabolizes a broad spectrum of substrates. The poor substrate specificity of CYP3A4 makes it a huge challenge to predict the metabolic site(s) on its substrates. This study aimed to develop a mechanism-based prediction model based on two key parameters, including the binding conformation and the reaction activity of ligands, which could reveal the process of real metabolic reaction(s) and the site(s) of modification. The newly established model was applied to predict the metabolic site(s) of steroids; a class of CYP3A4-preferred substrates. 38 steroids and 12 non-steroids were randomly divided into training and test sets. Two major metabolic reactions, including aliphatic hydroxylation and N-dealkylation, were involved in this study. At least one of the top three predicted metabolic sites was validated by the experimental data. The overall accuracy for the training and test were 82.14% and 86.36%, respectively. In summary, a mechanism-based prediction model was established for the first time, which could be used to predict the metabolic site(s) of CYP3A4 on steroids with high predictive accuracy.

  6. Group-regularized individual prediction: theory and application to pain.

    PubMed

    Lindquist, Martin A; Krishnan, Anjali; López-Solà, Marina; Jepma, Marieke; Woo, Choong-Wan; Koban, Leonie; Roy, Mathieu; Atlas, Lauren Y; Schmidt, Liane; Chang, Luke J; Reynolds Losin, Elizabeth A; Eisenbarth, Hedwig; Ashar, Yoni K; Delk, Elizabeth; Wager, Tor D

    2017-01-15

    Multivariate pattern analysis (MVPA) has become an important tool for identifying brain representations of psychological processes and clinical outcomes using fMRI and related methods. Such methods can be used to predict or 'decode' psychological states in individual subjects. Single-subject MVPA approaches, however, are limited by the amount and quality of individual-subject data. In spite of higher spatial resolution, predictive accuracy from single-subject data often does not exceed what can be accomplished using coarser, group-level maps, because single-subject patterns are trained on limited amounts of often-noisy data. Here, we present a method that combines population-level priors, in the form of biomarker patterns developed on prior samples, with single-subject MVPA maps to improve single-subject prediction. Theoretical results and simulations motivate a weighting based on the relative variances of biomarker-based prediction-based on population-level predictive maps from prior groups-and individual-subject, cross-validated prediction. Empirical results predicting pain using brain activity on a trial-by-trial basis (single-trial prediction) across 6 studies (N=180 participants) confirm the theoretical predictions. Regularization based on a population-level biomarker-in this case, the Neurologic Pain Signature (NPS)-improved single-subject prediction accuracy compared with idiographic maps based on the individuals' data alone. The regularization scheme that we propose, which we term group-regularized individual prediction (GRIP), can be applied broadly to within-person MVPA-based prediction. We also show how GRIP can be used to evaluate data quality and provide benchmarks for the appropriateness of population-level maps like the NPS for a given individual or study. Copyright © 2015 Elsevier Inc. All rights reserved.

  7. Increased genomic prediction accuracy in wheat breeding using a large Australian panel.

    PubMed

    Norman, Adam; Taylor, Julian; Tanaka, Emi; Telfer, Paul; Edwards, James; Martinant, Jean-Pierre; Kuchel, Haydn

    2017-12-01

    Genomic prediction accuracy within a large panel was found to be substantially higher than that previously observed in smaller populations, and also higher than QTL-based prediction. In recent years, genomic selection for wheat breeding has been widely studied, but this has typically been restricted to population sizes under 1000 individuals. To assess its efficacy in germplasm representative of commercial breeding programmes, we used a panel of 10,375 Australian wheat breeding lines to investigate the accuracy of genomic prediction for grain yield, physical grain quality and other physiological traits. To achieve this, the complete panel was phenotyped in a dedicated field trial and genotyped using a custom Axiom TM Affymetrix SNP array. A high-quality consensus map was also constructed, allowing the linkage disequilibrium present in the germplasm to be investigated. Using the complete SNP array, genomic prediction accuracies were found to be substantially higher than those previously observed in smaller populations and also more accurate compared to prediction approaches using a finite number of selected quantitative trait loci. Multi-trait genetic correlations were also assessed at an additive and residual genetic level, identifying a negative genetic correlation between grain yield and protein as well as a positive genetic correlation between grain size and test weight.

  8. Comparison of Marker-Based Genomic Estimated Breeding Values and Phenotypic Evaluation for Selection of Bacterial Spot Resistance in Tomato.

    PubMed

    Liabeuf, Debora; Sim, Sung-Chur; Francis, David M

    2018-03-01

    Bacterial spot affects tomato crops (Solanum lycopersicum) grown under humid conditions. Major genes and quantitative trait loci (QTL) for resistance have been described, and multiple loci from diverse sources need to be combined to improve disease control. We investigated genomic selection (GS) prediction models for resistance to Xanthomonas euvesicatoria and experimentally evaluated the accuracy of these models. The training population consisted of 109 families combining resistance from four sources and directionally selected from a population of 1,100 individuals. The families were evaluated on a plot basis in replicated inoculated trials and genotyped with single nucleotide polymorphisms (SNP). We compared the prediction ability of models developed with 14 to 387 SNP. Genomic estimated breeding values (GEBV) were derived using Bayesian least absolute shrinkage and selection operator regression (BL) and ridge regression (RR). Evaluations were based on leave-one-out cross validation and on empirical observations in replicated field trials using the next generation of inbred progeny and a hybrid population resulting from selections in the training population. Prediction ability was evaluated based on correlations between GEBV and phenotypes (r g ), percentage of coselection between genomic and phenotypic selection, and relative efficiency of selection (r g /r p ). Results were similar with BL and RR models. Models using only markers previously identified as significantly associated with resistance but weighted based on GEBV and mixed models with markers associated with resistance treated as fixed effects and markers distributed in the genome treated as random effects offered greater accuracy and a high percentage of coselection. The accuracy of these models to predict the performance of progeny and hybrids exceeded the accuracy of phenotypic selection.

  9. Prediction of Protein Structure by Template-Based Modeling Combined with the UNRES Force Field.

    PubMed

    Krupa, Paweł; Mozolewska, Magdalena A; Joo, Keehyoung; Lee, Jooyoung; Czaplewski, Cezary; Liwo, Adam

    2015-06-22

    A new approach to the prediction of protein structures that uses distance and backbone virtual-bond dihedral angle restraints derived from template-based models and simulations with the united residue (UNRES) force field is proposed. The approach combines the accuracy and reliability of template-based methods for the segments of the target sequence with high similarity to those having known structures with the ability of UNRES to pack the domains correctly. Multiplexed replica-exchange molecular dynamics with restraints derived from template-based models of a given target, in which each restraint is weighted according to the accuracy of the prediction of the corresponding section of the molecule, is used to search the conformational space, and the weighted histogram analysis method and cluster analysis are applied to determine the families of the most probable conformations, from which candidate predictions are selected. To test the capability of the method to recover template-based models from restraints, five single-domain proteins with structures that have been well-predicted by template-based methods were used; it was found that the resulting structures were of the same quality as the best of the original models. To assess whether the new approach can improve template-based predictions with incorrectly predicted domain packing, four such targets were selected from the CASP10 targets; for three of them the new approach resulted in significantly better predictions compared with the original template-based models. The new approach can be used to predict the structures of proteins for which good templates can be found for sections of the sequence or an overall good template can be found for the entire sequence but the prediction quality is remarkably weaker in putative domain-linker regions.

  10. RNA secondary structure prediction with pseudoknots: Contribution of algorithm versus energy model.

    PubMed

    Jabbari, Hosna; Wark, Ian; Montemagno, Carlo

    2018-01-01

    RNA is a biopolymer with various applications inside the cell and in biotechnology. Structure of an RNA molecule mainly determines its function and is essential to guide nanostructure design. Since experimental structure determination is time-consuming and expensive, accurate computational prediction of RNA structure is of great importance. Prediction of RNA secondary structure is relatively simpler than its tertiary structure and provides information about its tertiary structure, therefore, RNA secondary structure prediction has received attention in the past decades. Numerous methods with different folding approaches have been developed for RNA secondary structure prediction. While methods for prediction of RNA pseudoknot-free structure (structures with no crossing base pairs) have greatly improved in terms of their accuracy, methods for prediction of RNA pseudoknotted secondary structure (structures with crossing base pairs) still have room for improvement. A long-standing question for improving the prediction accuracy of RNA pseudoknotted secondary structure is whether to focus on the prediction algorithm or the underlying energy model, as there is a trade-off on computational cost of the prediction algorithm versus the generality of the method. The aim of this work is to argue when comparing different methods for RNA pseudoknotted structure prediction, the combination of algorithm and energy model should be considered and a method should not be considered superior or inferior to others if they do not use the same scoring model. We demonstrate that while the folding approach is important in structure prediction, it is not the only important factor in prediction accuracy of a given method as the underlying energy model is also as of great value. Therefore we encourage researchers to pay particular attention in comparing methods with different energy models.

  11. Bayesian modeling and inference for diagnostic accuracy and probability of disease based on multiple diagnostic biomarkers with and without a perfect reference standard.

    PubMed

    Jafarzadeh, S Reza; Johnson, Wesley O; Gardner, Ian A

    2016-03-15

    The area under the receiver operating characteristic (ROC) curve (AUC) is used as a performance metric for quantitative tests. Although multiple biomarkers may be available for diagnostic or screening purposes, diagnostic accuracy is often assessed individually rather than in combination. In this paper, we consider the interesting problem of combining multiple biomarkers for use in a single diagnostic criterion with the goal of improving the diagnostic accuracy above that of an individual biomarker. The diagnostic criterion created from multiple biomarkers is based on the predictive probability of disease, conditional on given multiple biomarker outcomes. If the computed predictive probability exceeds a specified cutoff, the corresponding subject is allocated as 'diseased'. This defines a standard diagnostic criterion that has its own ROC curve, namely, the combined ROC (cROC). The AUC metric for cROC, namely, the combined AUC (cAUC), is used to compare the predictive criterion based on multiple biomarkers to one based on fewer biomarkers. A multivariate random-effects model is proposed for modeling multiple normally distributed dependent scores. Bayesian methods for estimating ROC curves and corresponding (marginal) AUCs are developed when a perfect reference standard is not available. In addition, cAUCs are computed to compare the accuracy of different combinations of biomarkers for diagnosis. The methods are evaluated using simulations and are applied to data for Johne's disease (paratuberculosis) in cattle. Copyright © 2015 John Wiley & Sons, Ltd.

  12. Clinical Inquiry: What's the best way to predict the success of a trial of labor after a previous C-section?

    PubMed

    Warren, Johanna B; Hamilton, Andrew

    2015-12-01

    Seven validated prospective scoring systems, and one unvalidated system, predict a successful TOLAC based on a variety of clinical factors. The systems use different outcome statistics, so their predictive accuracy can't be directly compared.

  13. An evidential link prediction method and link predictability based on Shannon entropy

    NASA Astrophysics Data System (ADS)

    Yin, Likang; Zheng, Haoyang; Bian, Tian; Deng, Yong

    2017-09-01

    Predicting missing links is of both theoretical value and practical interest in network science. In this paper, we empirically investigate a new link prediction method base on similarity and compare nine well-known local similarity measures on nine real networks. Most of the previous studies focus on the accuracy, however, it is crucial to consider the link predictability as an initial property of networks itself. Hence, this paper has proposed a new link prediction approach called evidential measure (EM) based on Dempster-Shafer theory. Moreover, this paper proposed a new method to measure link predictability via local information and Shannon entropy.

  14. A Technical Analysis Information Fusion Approach for Stock Price Analysis and Modeling

    NASA Astrophysics Data System (ADS)

    Lahmiri, Salim

    In this paper, we address the problem of technical analysis information fusion in improving stock market index-level prediction. We present an approach for analyzing stock market price behavior based on different categories of technical analysis metrics and a multiple predictive system. Each category of technical analysis measures is used to characterize stock market price movements. The presented predictive system is based on an ensemble of neural networks (NN) coupled with particle swarm intelligence for parameter optimization where each single neural network is trained with a specific category of technical analysis measures. The experimental evaluation on three international stock market indices and three individual stocks show that the presented ensemble-based technical indicators fusion system significantly improves forecasting accuracy in comparison with single NN. Also, it outperforms the classical neural network trained with index-level lagged values and NN trained with stationary wavelet transform details and approximation coefficients. As a result, technical information fusion in NN ensemble architecture helps improving prediction accuracy.

  15. Systematic bias of correlation coefficient may explain negative accuracy of genomic prediction.

    PubMed

    Zhou, Yao; Vales, M Isabel; Wang, Aoxue; Zhang, Zhiwu

    2017-09-01

    Accuracy of genomic prediction is commonly calculated as the Pearson correlation coefficient between the predicted and observed phenotypes in the inference population by using cross-validation analysis. More frequently than expected, significant negative accuracies of genomic prediction have been reported in genomic selection studies. These negative values are surprising, given that the minimum value for prediction accuracy should hover around zero when randomly permuted data sets are analyzed. We reviewed the two common approaches for calculating the Pearson correlation and hypothesized that these negative accuracy values reflect potential bias owing to artifacts caused by the mathematical formulas used to calculate prediction accuracy. The first approach, Instant accuracy, calculates correlations for each fold and reports prediction accuracy as the mean of correlations across fold. The other approach, Hold accuracy, predicts all phenotypes in all fold and calculates correlation between the observed and predicted phenotypes at the end of the cross-validation process. Using simulated and real data, we demonstrated that our hypothesis is true. Both approaches are biased downward under certain conditions. The biases become larger when more fold are employed and when the expected accuracy is low. The bias of Instant accuracy can be corrected using a modified formula. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  16. Medium- and Long-term Prediction of LOD Change with the Leap-step Autoregressive Model

    NASA Astrophysics Data System (ADS)

    Liu, Q. B.; Wang, Q. J.; Lei, M. F.

    2015-09-01

    It is known that the accuracies of medium- and long-term prediction of changes of length of day (LOD) based on the combined least-square and autoregressive (LS+AR) decrease gradually. The leap-step autoregressive (LSAR) model is more accurate and stable in medium- and long-term prediction, therefore it is used to forecast the LOD changes in this work. Then the LOD series from EOP 08 C04 provided by IERS (International Earth Rotation and Reference Systems Service) is used to compare the effectiveness of the LSAR and traditional AR methods. The predicted series resulted from the two models show that the prediction accuracy with the LSAR model is better than that from AR model in medium- and long-term prediction.

  17. Data Prediction for Public Events in Professional Domains Based on Improved RNN- LSTM

    NASA Astrophysics Data System (ADS)

    Song, Bonan; Fan, Chunxiao; Wu, Yuexin; Sun, Juanjuan

    2018-02-01

    The traditional data services of prediction for emergency or non-periodic events usually cannot generate satisfying result or fulfill the correct prediction purpose. However, these events are influenced by external causes, which mean certain a priori information of these events generally can be collected through the Internet. This paper studied the above problems and proposed an improved model—LSTM (Long Short-term Memory) dynamic prediction and a priori information sequence generation model by combining RNN-LSTM and public events a priori information. In prediction tasks, the model is qualified for determining trends, and its accuracy also is validated. This model generates a better performance and prediction results than the previous one. Using a priori information can increase the accuracy of prediction; LSTM can better adapt to the changes of time sequence; LSTM can be widely applied to the same type of prediction tasks, and other prediction tasks related to time sequence.

  18. The wisdom of crowds in action: Forecasting epidemic diseases with a web-based prediction market system.

    PubMed

    Li, Eldon Y; Tung, Chen-Yuan; Chang, Shu-Hsun

    2016-08-01

    The quest for an effective system capable of monitoring and predicting the trends of epidemic diseases is a critical issue for communities worldwide. With the prevalence of Internet access, more and more researchers today are using data from both search engines and social media to improve the prediction accuracy. In particular, a prediction market system (PMS) exploits the wisdom of crowds on the Internet to effectively accomplish relatively high accuracy. This study presents the architecture of a PMS and demonstrates the matching mechanism of logarithmic market scoring rules. The system was implemented to predict infectious diseases in Taiwan with the wisdom of crowds in order to improve the accuracy of epidemic forecasting. The PMS architecture contains three design components: database clusters, market engine, and Web applications. The system accumulated knowledge from 126 health professionals for 31 weeks to predict five disease indicators: the confirmed cases of dengue fever, the confirmed cases of severe and complicated influenza, the rate of enterovirus infections, the rate of influenza-like illnesses, and the confirmed cases of severe and complicated enterovirus infection. Based on the winning ratio, the PMS predicts the trends of three out of five disease indicators more accurately than does the existing system that uses the five-year average values of historical data for the same weeks. In addition, the PMS with the matching mechanism of logarithmic market scoring rules is easy to understand for health professionals and applicable to predict all the five disease indicators. The PMS architecture of this study affords organizations and individuals to implement it for various purposes in our society. The system can continuously update the data and improve prediction accuracy in monitoring and forecasting the trends of epidemic diseases. Future researchers could replicate and apply the PMS demonstrated in this study to more infectious diseases and wider geographical areas, especially the under-developed countries across Asia and Africa. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  19. Using simple artificial intelligence methods for predicting amyloidogenesis in antibodies

    PubMed Central

    2010-01-01

    Background All polypeptide backbones have the potential to form amyloid fibrils, which are associated with a number of degenerative disorders. However, the likelihood that amyloidosis would actually occur under physiological conditions depends largely on the amino acid composition of a protein. We explore using a naive Bayesian classifier and a weighted decision tree for predicting the amyloidogenicity of immunoglobulin sequences. Results The average accuracy based on leave-one-out (LOO) cross validation of a Bayesian classifier generated from 143 amyloidogenic sequences is 60.84%. This is consistent with the average accuracy of 61.15% for a holdout test set comprised of 103 AM and 28 non-amyloidogenic sequences. The LOO cross validation accuracy increases to 81.08% when the training set is augmented by the holdout test set. In comparison, the average classification accuracy for the holdout test set obtained using a decision tree is 78.64%. Non-amyloidogenic sequences are predicted with average LOO cross validation accuracies between 74.05% and 77.24% using the Bayesian classifier, depending on the training set size. The accuracy for the holdout test set was 89%. For the decision tree, the non-amyloidogenic prediction accuracy is 75.00%. Conclusions This exploratory study indicates that both classification methods may be promising in providing straightforward predictions on the amyloidogenicity of a sequence. Nevertheless, the number of available sequences that satisfy the premises of this study are limited, and are consequently smaller than the ideal training set size. Increasing the size of the training set clearly increases the accuracy, and the expansion of the training set to include not only more derivatives, but more alignments, would make the method more sound. The accuracy of the classifiers may also be improved when additional factors, such as structural and physico-chemical data, are considered. The development of this type of classifier has significant applications in evaluating engineered antibodies, and may be adapted for evaluating engineered proteins in general. PMID:20144194

  20. Using simple artificial intelligence methods for predicting amyloidogenesis in antibodies.

    PubMed

    David, Maria Pamela C; Concepcion, Gisela P; Padlan, Eduardo A

    2010-02-08

    All polypeptide backbones have the potential to form amyloid fibrils, which are associated with a number of degenerative disorders. However, the likelihood that amyloidosis would actually occur under physiological conditions depends largely on the amino acid composition of a protein. We explore using a naive Bayesian classifier and a weighted decision tree for predicting the amyloidogenicity of immunoglobulin sequences. The average accuracy based on leave-one-out (LOO) cross validation of a Bayesian classifier generated from 143 amyloidogenic sequences is 60.84%. This is consistent with the average accuracy of 61.15% for a holdout test set comprised of 103 AM and 28 non-amyloidogenic sequences. The LOO cross validation accuracy increases to 81.08% when the training set is augmented by the holdout test set. In comparison, the average classification accuracy for the holdout test set obtained using a decision tree is 78.64%. Non-amyloidogenic sequences are predicted with average LOO cross validation accuracies between 74.05% and 77.24% using the Bayesian classifier, depending on the training set size. The accuracy for the holdout test set was 89%. For the decision tree, the non-amyloidogenic prediction accuracy is 75.00%. This exploratory study indicates that both classification methods may be promising in providing straightforward predictions on the amyloidogenicity of a sequence. Nevertheless, the number of available sequences that satisfy the premises of this study are limited, and are consequently smaller than the ideal training set size. Increasing the size of the training set clearly increases the accuracy, and the expansion of the training set to include not only more derivatives, but more alignments, would make the method more sound. The accuracy of the classifiers may also be improved when additional factors, such as structural and physico-chemical data, are considered. The development of this type of classifier has significant applications in evaluating engineered antibodies, and may be adapted for evaluating engineered proteins in general.

  1. Predicting metabolic syndrome using decision tree and support vector machine methods.

    PubMed

    Karimi-Alavijeh, Farzaneh; Jalili, Saeed; Sadeghi, Masoumeh

    2016-05-01

    Metabolic syndrome which underlies the increased prevalence of cardiovascular disease and Type 2 diabetes is considered as a group of metabolic abnormalities including central obesity, hypertriglyceridemia, glucose intolerance, hypertension, and dyslipidemia. Recently, artificial intelligence based health-care systems are highly regarded because of its success in diagnosis, prediction, and choice of treatment. This study employs machine learning technics for predict the metabolic syndrome. This study aims to employ decision tree and support vector machine (SVM) to predict the 7-year incidence of metabolic syndrome. This research is a practical one in which data from 2107 participants of Isfahan Cohort Study has been utilized. The subjects without metabolic syndrome according to the ATPIII criteria were selected. The features that have been used in this data set include: gender, age, weight, body mass index, waist circumference, waist-to-hip ratio, hip circumference, physical activity, smoking, hypertension, antihypertensive medication use, systolic blood pressure (BP), diastolic BP, fasting blood sugar, 2-hour blood glucose, triglycerides (TGs), total cholesterol, low-density lipoprotein, high density lipoprotein-cholesterol, mean corpuscular volume, and mean corpuscular hemoglobin. Metabolic syndrome was diagnosed based on ATPIII criteria and two methods of decision tree and SVM were selected to predict the metabolic syndrome. The criteria of sensitivity, specificity and accuracy were used for validation. SVM and decision tree methods were examined according to the criteria of sensitivity, specificity and accuracy. Sensitivity, specificity and accuracy were 0.774 (0.758), 0.74 (0.72) and 0.757 (0.739) in SVM (decision tree) method. The results show that SVM method sensitivity, specificity and accuracy is more efficient than decision tree. The results of decision tree method show that the TG is the most important feature in predicting metabolic syndrome. According to this study, in cases where only the final result of the decision is regarded significant, SVM method can be used with acceptable accuracy in decision making medical issues. This method has not been implemented in the previous research.

  2. Decision curve analysis assessing the clinical benefit of NMP22 in the detection of bladder cancer: secondary analysis of a prospective trial.

    PubMed

    Barbieri, Christopher E; Cha, Eugene K; Chromecki, Thomas F; Dunning, Allison; Lotan, Yair; Svatek, Robert S; Scherr, Douglas S; Karakiewicz, Pierre I; Sun, Maxine; Mazumdar, Madhu; Shariat, Shahrokh F

    2012-03-01

    • To employ decision curve analysis to determine the impact of nuclear matrix protein 22 (NMP22) on clinical decision making in the detection of bladder cancer using data from a prospective trial. • The study included 1303 patients at risk for bladder cancer who underwent cystoscopy, urine cytology and measurement of urinary NMP22 levels. • We constructed several prediction models to estimate risk of bladder cancer. The base model was generated using patient characteristics (age, gender, race, smoking and haematuria); cytology and NMP22 were added to the base model to determine effects on predictive accuracy. • Clinical net benefit was calculated by summing the benefits and subtracting the harms and weighting these by the threshold probability at which a patient or clinician would opt for cystoscopy. • In all, 72 patients were found to have bladder cancer (5.5%). In univariate analyses, NMP22 was the strongest predictor of bladder cancer presence (predictive accuracy 71.3%), followed by age (67.5%) and cytology (64.3%). • In multivariable prediction models, NMP22 improved the predictive accuracy of the base model by 8.2% (area under the curve 70.2-78.4%) and of the base model plus cytology by 4.2% (area under the curve 75.9-80.1%). • Decision curve analysis revealed that adding NMP22 to other models increased clinical benefit, particularly at higher threshold probabilities. • NMP22 is a strong, independent predictor of bladder cancer. • Addition of NMP22 improves the accuracy of standard predictors by a statistically and clinically significant margin. • Decision curve analysis suggests that integration of NMP22 into clinical decision making helps avoid unnecessary cystoscopies, with minimal increased risk of missing a cancer. © 2011 THE AUTHORS. BJU INTERNATIONAL © 2011 BJU INTERNATIONAL.

  3. Video image analysis in the Australian meat industry - precision and accuracy of predicting lean meat yield in lamb carcasses.

    PubMed

    Hopkins, D L; Safari, E; Thompson, J M; Smith, C R

    2004-06-01

    A wide selection of lamb types of mixed sex (ewes and wethers) were slaughtered at a commercial abattoir and during this process images of 360 carcasses were obtained online using the VIAScan® system developed by Meat and Livestock Australia. Soft tissue depth at the GR site (thickness of tissue over the 12th rib 110 mm from the midline) was measured by an abattoir employee using the AUS-MEAT sheep probe (PGR). Another measure of this thickness was taken in the chiller using a GR knife (NGR). Each carcass was subsequently broken down to a range of trimmed boneless retail cuts and the lean meat yield determined. The current industry model for predicting meat yield uses hot carcass weight (HCW) and tissue depth at the GR site. A low level of accuracy and precision was found when HCW and PGR were used to predict lean meat yield (R(2)=0.19, r.s.d.=2.80%), which could be improved markedly when PGR was replaced by NGR (R(2)=0.41, r.s.d.=2.39%). If the GR measures were replaced by 8 VIAScan® measures then greater prediction accuracy could be achieved (R(2)=0.52, r.s.d.=2.17%). A similar result was achieved when the model was based on principal components (PCs) computed from the 8 VIAScan® measures (R(2)=0.52, r.s.d.=2.17%). The use of PCs also improved the stability of the model compared to a regression model based on HCW and NGR. The transportability of the models was tested by randomly dividing the data set and comparing coefficients and the level of accuracy and precision. Those models based on PCs were superior to those based on regression. It is demonstrated that with the appropriate modeling the VIAScan® system offers a workable method for predicting lean meat yield automatically.

  4. Calm water resistance prediction of a bulk carrier using Reynolds averaged Navier-Stokes based solver

    NASA Astrophysics Data System (ADS)

    Rahaman, Md. Mashiur; Islam, Hafizul; Islam, Md. Tariqul; Khondoker, Md. Reaz Hasan

    2017-12-01

    Maneuverability and resistance prediction with suitable accuracy is essential for optimum ship design and propulsion power prediction. This paper aims at providing some of the maneuverability characteristics of a Japanese bulk carrier model, JBC in calm water using a computational fluid dynamics solver named SHIP Motion and OpenFOAM. The solvers are based on the Reynolds average Navier-Stokes method (RaNS) and solves structured grid using the Finite Volume Method (FVM). This paper comprises the numerical results of calm water test for the JBC model with available experimental results. The calm water test results include the total drag co-efficient, average sinkage, and trim data. Visualization data for pressure distribution on the hull surface and free water surface have also been included. The paper concludes that the presented solvers predict the resistance and maneuverability characteristics of the bulk carrier with reasonable accuracy utilizing minimum computational resources.

  5. Robust prediction of consensus secondary structures using averaged base pairing probability matrices.

    PubMed

    Kiryu, Hisanori; Kin, Taishin; Asai, Kiyoshi

    2007-02-15

    Recent transcriptomic studies have revealed the existence of a considerable number of non-protein-coding RNA transcripts in higher eukaryotic cells. To investigate the functional roles of these transcripts, it is of great interest to find conserved secondary structures from multiple alignments on a genomic scale. Since multiple alignments are often created using alignment programs that neglect the special conservation patterns of RNA secondary structures for computational efficiency, alignment failures can cause potential risks of overlooking conserved stem structures. We investigated the dependence of the accuracy of secondary structure prediction on the quality of alignments. We compared three algorithms that maximize the expected accuracy of secondary structures as well as other frequently used algorithms. We found that one of our algorithms, called McCaskill-MEA, was more robust against alignment failures than others. The McCaskill-MEA method first computes the base pairing probability matrices for all the sequences in the alignment and then obtains the base pairing probability matrix of the alignment by averaging over these matrices. The consensus secondary structure is predicted from this matrix such that the expected accuracy of the prediction is maximized. We show that the McCaskill-MEA method performs better than other methods, particularly when the alignment quality is low and when the alignment consists of many sequences. Our model has a parameter that controls the sensitivity and specificity of predictions. We discussed the uses of that parameter for multi-step screening procedures to search for conserved secondary structures and for assigning confidence values to the predicted base pairs. The C++ source code that implements the McCaskill-MEA algorithm and the test dataset used in this paper are available at http://www.ncrna.org/papers/McCaskillMEA/. Supplementary data are available at Bioinformatics online.

  6. Mathematical-Artificial Neural Network Hybrid Model to Predict Roll Force during Hot Rolling of Steel

    NASA Astrophysics Data System (ADS)

    Rath, S.; Sengupta, P. P.; Singh, A. P.; Marik, A. K.; Talukdar, P.

    2013-07-01

    Accurate prediction of roll force during hot strip rolling is essential for model based operation of hot strip mills. Traditionally, mathematical models based on theory of plastic deformation have been used for prediction of roll force. In the last decade, data driven models like artificial neural network have been tried for prediction of roll force. Pure mathematical models have accuracy limitations whereas data driven models have difficulty in convergence when applied to industrial conditions. Hybrid models by integrating the traditional mathematical formulations and data driven methods are being developed in different parts of world. This paper discusses the methodology of development of an innovative hybrid mathematical-artificial neural network model. In mathematical model, the most important factor influencing accuracy is flow stress of steel. Coefficients of standard flow stress equation, calculated by parameter estimation technique, have been used in the model. The hybrid model has been trained and validated with input and output data collected from finishing stands of Hot Strip Mill, Bokaro Steel Plant, India. It has been found that the model accuracy has been improved with use of hybrid model, over the traditional mathematical model.

  7. DIRProt: a computational approach for discriminating insecticide resistant proteins from non-resistant proteins.

    PubMed

    Meher, Prabina Kumar; Sahu, Tanmaya Kumar; Banchariya, Anjali; Rao, Atmakuri Ramakrishna

    2017-03-24

    Insecticide resistance is a major challenge for the control program of insect pests in the fields of crop protection, human and animal health etc. Resistance to different insecticides is conferred by the proteins encoded from certain class of genes of the insects. To distinguish the insecticide resistant proteins from non-resistant proteins, no computational tool is available till date. Thus, development of such a computational tool will be helpful in predicting the insecticide resistant proteins, which can be targeted for developing appropriate insecticides. Five different sets of feature viz., amino acid composition (AAC), di-peptide composition (DPC), pseudo amino acid composition (PAAC), composition-transition-distribution (CTD) and auto-correlation function (ACF) were used to map the protein sequences into numeric feature vectors. The encoded numeric vectors were then used as input in support vector machine (SVM) for classification of insecticide resistant and non-resistant proteins. Higher accuracies were obtained under RBF kernel than that of other kernels. Further, accuracies were observed to be higher for DPC feature set as compared to others. The proposed approach achieved an overall accuracy of >90% in discriminating resistant from non-resistant proteins. Further, the two classes of resistant proteins i.e., detoxification-based and target-based were discriminated from non-resistant proteins with >95% accuracy. Besides, >95% accuracy was also observed for discrimination of proteins involved in detoxification- and target-based resistance mechanisms. The proposed approach not only outperformed Blastp, PSI-Blast and Delta-Blast algorithms, but also achieved >92% accuracy while assessed using an independent dataset of 75 insecticide resistant proteins. This paper presents the first computational approach for discriminating the insecticide resistant proteins from non-resistant proteins. Based on the proposed approach, an online prediction server DIRProt has also been developed for computational prediction of insecticide resistant proteins, which is accessible at http://cabgrid.res.in:8080/dirprot/ . The proposed approach is believed to supplement the efforts needed to develop dynamic insecticides in wet-lab by targeting the insecticide resistant proteins.

  8. The Application of FIA-based Data to Wildlife Habitat Modeling: A Comparative Study

    Treesearch

    Thomas C., Jr. Edwards; Gretchen G. Moisen; Tracey S. Frescino; Randall J. Schultz

    2005-01-01

    We evaluated the capability of two types of models, one based on spatially explicit variables derived from FIA data and one using so-called traditional habitat evaluation methods, for predicting the presence of cavity-nesting bird habitat in Fishlake National Forest, Utah. Both models performed equally well, in measures of predictive accuracy, with the FIA-based model...

  9. HIV-1 protease cleavage site prediction based on two-stage feature selection method.

    PubMed

    Niu, Bing; Yuan, Xiao-Cheng; Roeper, Preston; Su, Qiang; Peng, Chun-Rong; Yin, Jing-Yuan; Ding, Juan; Li, HaiPeng; Lu, Wen-Cong

    2013-03-01

    Knowledge of the mechanism of HIV protease cleavage specificity is critical to the design of specific and effective HIV inhibitors. Searching for an accurate, robust, and rapid method to correctly predict the cleavage sites in proteins is crucial when searching for possible HIV inhibitors. In this article, HIV-1 protease specificity was studied using the correlation-based feature subset (CfsSubset) selection method combined with Genetic Algorithms method. Thirty important biochemical features were found based on a jackknife test from the original data set containing 4,248 features. By using the AdaBoost method with the thirty selected features the prediction model yields an accuracy of 96.7% for the jackknife test and 92.1% for an independent set test, with increased accuracy over the original dataset by 6.7% and 77.4%, respectively. Our feature selection scheme could be a useful technique for finding effective competitive inhibitors of HIV protease.

  10. Electroencephalogram-based decoding cognitive states using convolutional neural network and likelihood ratio based score fusion

    PubMed Central

    2017-01-01

    Electroencephalogram (EEG)-based decoding human brain activity is challenging, owing to the low spatial resolution of EEG. However, EEG is an important technique, especially for brain–computer interface applications. In this study, a novel algorithm is proposed to decode brain activity associated with different types of images. In this hybrid algorithm, convolutional neural network is modified for the extraction of features, a t-test is used for the selection of significant features and likelihood ratio-based score fusion is used for the prediction of brain activity. The proposed algorithm takes input data from multichannel EEG time-series, which is also known as multivariate pattern analysis. Comprehensive analysis was conducted using data from 30 participants. The results from the proposed method are compared with current recognized feature extraction and classification/prediction techniques. The wavelet transform-support vector machine method is the most popular currently used feature extraction and prediction method. This method showed an accuracy of 65.7%. However, the proposed method predicts the novel data with improved accuracy of 79.9%. In conclusion, the proposed algorithm outperformed the current feature extraction and prediction method. PMID:28558002

  11. Bayesian averaging over Decision Tree models for trauma severity scoring.

    PubMed

    Schetinin, V; Jakaite, L; Krzanowski, W

    2018-01-01

    Health care practitioners analyse possible risks of misleading decisions and need to estimate and quantify uncertainty in predictions. We have examined the "gold" standard of screening a patient's conditions for predicting survival probability, based on logistic regression modelling, which is used in trauma care for clinical purposes and quality audit. This methodology is based on theoretical assumptions about data and uncertainties. Models induced within such an approach have exposed a number of problems, providing unexplained fluctuation of predicted survival and low accuracy of estimating uncertainty intervals within which predictions are made. Bayesian method, which in theory is capable of providing accurate predictions and uncertainty estimates, has been adopted in our study using Decision Tree models. Our approach has been tested on a large set of patients registered in the US National Trauma Data Bank and has outperformed the standard method in terms of prediction accuracy, thereby providing practitioners with accurate estimates of the predictive posterior densities of interest that are required for making risk-aware decisions. Copyright © 2017 Elsevier B.V. All rights reserved.

  12. Lurking systematics in predicting galaxy cold gas masses using dust luminosities and star formation rates

    NASA Astrophysics Data System (ADS)

    Janowiecki, Steven; Cortese, Luca; Catinella, Barbara; Goodwin, Adelle J.

    2018-05-01

    We use galaxies from the Herschel Reference Survey to evaluate commonly used indirect predictors of cold gas masses. We calibrate predictions for cold neutral atomic and molecular gas using infrared dust emission and gas depletion time methods that are self-consistent and have ˜20 per cent accuracy (with the highest accuracy in the prediction of total cold gas mass). However, modest systematic residual dependences are found in all calibrations that depend on the partition between molecular and atomic gas, and can over/underpredict gas masses by up to 0.3 dex. As expected, dust-based estimates are best at predicting the total gas mass while depletion time-based estimates are only able to predict the (star-forming) molecular gas mass. Additionally, we advise caution when applying these predictions to high-z galaxies, as significant (0.5 dex or more) errors can arise when incorrect assumptions are made about the dominant gas phase. Any scaling relations derived using predicted gas masses may be more closely related to the calibrations used than to the actual galaxies observed.

  13. Sequence-based predictive modeling to identify cancerlectins

    PubMed Central

    Lai, Hong-Yan; Chen, Xin-Xin; Chen, Wei; Tang, Hua; Lin, Hao

    2017-01-01

    Lectins are a diverse type of glycoproteins or carbohydrate-binding proteins that have a wide distribution to various species. They can specially identify and exclusively bind to a certain kind of saccharide groups. Cancerlectins are a group of lectins that are closely related to cancer and play a major role in the initiation, survival, growth, metastasis and spread of tumor. Several computational methods have emerged to discriminate cancerlectins from non-cancerlectins, which promote the study on pathogenic mechanisms and clinical treatment of cancer. However, the predictive accuracies of most of these techniques are very limited. In this work, by constructing a benchmark dataset based on the CancerLectinDB database, a new amino acid sequence-based strategy for feature description was developed, and then the binomial distribution was applied to screen the optimal feature set. Ultimately, an SVM-based predictor was performed to distinguish cancerlectins from non-cancerlectins, and achieved an accuracy of 77.48% with AUC of 85.52% in jackknife cross-validation. The results revealed that our prediction model could perform better comparing with published predictive tools. PMID:28423655

  14. SSGP: SNP-set based genomic prediction to incorporate biological information

    USDA-ARS?s Scientific Manuscript database

    Genomic prediction has emerged as an effective approach in plant and animal breeding and in precision medicine. Much research has been devoted to an improved accuracy in genomic prediction, and one of the potential ways is to incorporate biological information. Due to the statistical and computation...

  15. Development of machine learning models for diagnosis of glaucoma.

    PubMed

    Kim, Seong Jae; Cho, Kyong Jin; Oh, Sejong

    2017-01-01

    The study aimed to develop machine learning models that have strong prediction power and interpretability for diagnosis of glaucoma based on retinal nerve fiber layer (RNFL) thickness and visual field (VF). We collected various candidate features from the examination of retinal nerve fiber layer (RNFL) thickness and visual field (VF). We also developed synthesized features from original features. We then selected the best features proper for classification (diagnosis) through feature evaluation. We used 100 cases of data as a test dataset and 399 cases of data as a training and validation dataset. To develop the glaucoma prediction model, we considered four machine learning algorithms: C5.0, random forest (RF), support vector machine (SVM), and k-nearest neighbor (KNN). We repeatedly composed a learning model using the training dataset and evaluated it by using the validation dataset. Finally, we got the best learning model that produces the highest validation accuracy. We analyzed quality of the models using several measures. The random forest model shows best performance and C5.0, SVM, and KNN models show similar accuracy. In the random forest model, the classification accuracy is 0.98, sensitivity is 0.983, specificity is 0.975, and AUC is 0.979. The developed prediction models show high accuracy, sensitivity, specificity, and AUC in classifying among glaucoma and healthy eyes. It will be used for predicting glaucoma against unknown examination records. Clinicians may reference the prediction results and be able to make better decisions. We may combine multiple learning models to increase prediction accuracy. The C5.0 model includes decision rules for prediction. It can be used to explain the reasons for specific predictions.

  16. Accuracy of risk scales for predicting repeat self-harm and suicide: a multicentre, population-level cohort study using routine clinical data.

    PubMed

    Steeg, Sarah; Quinlivan, Leah; Nowland, Rebecca; Carroll, Robert; Casey, Deborah; Clements, Caroline; Cooper, Jayne; Davies, Linda; Knipe, Duleeka; Ness, Jennifer; O'Connor, Rory C; Hawton, Keith; Gunnell, David; Kapur, Nav

    2018-04-25

    Risk scales are used widely in the management of patients presenting to hospital following self-harm. However, there is evidence that their diagnostic accuracy in predicting repeat self-harm is limited. Their predictive accuracy in population settings, and in identifying those at highest risk of suicide is not known. We compared the predictive accuracy of the Manchester Self-Harm Rule (MSHR), ReACT Self-Harm Rule (ReACT), SAD PERSONS Scale (SPS) and Modified SAD PERSONS Scale (MSPS) in an unselected sample of patients attending hospital following self-harm. Data on 4000 episodes of self-harm presenting to Emergency Departments (ED) between 2010 and 2012 were obtained from four established monitoring systems in England. Episodes were assigned a risk category for each scale and followed up for 6 months. The episode-based repeat rate was 28% (1133/4000) and the incidence of suicide was 0.5% (18/3962). The MSHR and ReACT performed with high sensitivity (98% and 94% respectively) and low specificity (15% and 23%). The SPS and the MSPS performed with relatively low sensitivity (24-29% and 9-12% respectively) and high specificity (76-77% and 90%). The area under the curve was 71% for both MSHR and ReACT, 51% for SPS and 49% for MSPS. Differences in predictive accuracy by subgroup were small. The scales were less accurate at predicting suicide than repeat self-harm. The scales failed to accurately predict repeat self-harm and suicide. The findings support existing clinical guidance not to use risk classification scales alone to determine treatment or predict future risk.

  17. In vivo real-time assessment of colorectal polyp histology using an optical biopsy forceps system based on laser-induced fluorescence spectroscopy.

    PubMed

    Rath, Timo; Tontini, Gian E; Vieth, Michael; Nägel, Andreas; Neurath, Markus F; Neumann, Helmut

    2016-06-01

    In order to reduce time, costs, and risks associated with resection of diminutive colorectal polyps, the American Society for Gastrointestinal Endoscopy (ASGE) recently proposed performance thresholds that new technologies should meet for the accurate real-time assessment of histology of colorectal polyps. In this study, we prospectively assessed whether laser-induced fluorescence spectroscopy (LIFS), using the new WavSTAT4 optical biopsy system, can meet the ASGE criteria. 27 patients undergoing screening or surveillance colonoscopy were included. The histology of 137 diminutive colorectal polyps was predicted in real time using LIFS and findings were compared with the results of conventional histopathological examination. The accuracy of predicting polyp histology with WavSTAT4 was assessed according to the ASGE criteria. The overall accuracy of LIFS using WavSTAT4 for predicting polyp histology was 84.7 % with sensitivity, specificity, and negative predictive value (NPV) of 81.8 %, 85.2 %, and 96.1 %. When only distal colorectal diminutive polyps were considered, the NPV for excluding adenomatous histology increased to 100 % (accuracy 82.4 %, sensitivity 100 %, specificity 80.6 %). On-site, LIFS correctly predicted the recommended surveillance intervals with an accuracy of 88.9 % (24/27 patients) when compared with histology-based United States guideline recommendations; in the 3 patients for whom LIFS- and histopathology-based recommended surveillance intervals differed, LIFS predicted shorter surveillance intervals. From the data of this pilot study, LIFS using the WavSTAT4 system appears accurate enough to allow distal colorectal polyps to be left in place and nearly reaches the threshold to "resect and discard" them without pathologic assessment. WavSTAT4 therefore has the potential to reduce costs and risks associated with the removal of diminutive colorectal polyps. © Georg Thieme Verlag KG Stuttgart · New York.

  18. A signal detection model predicts the effects of set size on visual search accuracy for feature, conjunction, triple conjunction, and disjunction displays

    NASA Technical Reports Server (NTRS)

    Eckstein, M. P.; Thomas, J. P.; Palmer, J.; Shimozaki, S. S.

    2000-01-01

    Recently, quantitative models based on signal detection theory have been successfully applied to the prediction of human accuracy in visual search for a target that differs from distractors along a single attribute (feature search). The present paper extends these models for visual search accuracy to multidimensional search displays in which the target differs from the distractors along more than one feature dimension (conjunction, disjunction, and triple conjunction displays). The model assumes that each element in the display elicits a noisy representation for each of the relevant feature dimensions. The observer combines the representations across feature dimensions to obtain a single decision variable, and the stimulus with the maximum value determines the response. The model accurately predicts human experimental data on visual search accuracy in conjunctions and disjunctions of contrast and orientation. The model accounts for performance degradation without resorting to a limited-capacity spatially localized and temporally serial mechanism by which to bind information across feature dimensions.

  19. Adaptive time-variant models for fuzzy-time-series forecasting.

    PubMed

    Wong, Wai-Keung; Bai, Enjian; Chu, Alice Wai-Ching

    2010-12-01

    A fuzzy time series has been applied to the prediction of enrollment, temperature, stock indices, and other domains. Related studies mainly focus on three factors, namely, the partition of discourse, the content of forecasting rules, and the methods of defuzzification, all of which greatly influence the prediction accuracy of forecasting models. These studies use fixed analysis window sizes for forecasting. In this paper, an adaptive time-variant fuzzy-time-series forecasting model (ATVF) is proposed to improve forecasting accuracy. The proposed model automatically adapts the analysis window size of fuzzy time series based on the prediction accuracy in the training phase and uses heuristic rules to generate forecasting values in the testing phase. The performance of the ATVF model is tested using both simulated and actual time series including the enrollments at the University of Alabama, Tuscaloosa, and the Taiwan Stock Exchange Capitalization Weighted Stock Index (TAIEX). The experiment results show that the proposed ATVF model achieves a significant improvement in forecasting accuracy as compared to other fuzzy-time-series forecasting models.

  20. Accuracy of genomic selection in European maize elite breeding populations.

    PubMed

    Zhao, Yusheng; Gowda, Manje; Liu, Wenxin; Würschum, Tobias; Maurer, Hans P; Longin, Friedrich H; Ranc, Nicolas; Reif, Jochen C

    2012-03-01

    Genomic selection is a promising breeding strategy for rapid improvement of complex traits. The objective of our study was to investigate the prediction accuracy of genomic breeding values through cross validation. The study was based on experimental data of six segregating populations from a half-diallel mating design with 788 testcross progenies from an elite maize breeding program. The plants were intensively phenotyped in multi-location field trials and fingerprinted with 960 SNP markers. We used random regression best linear unbiased prediction in combination with fivefold cross validation. The prediction accuracy across populations was higher for grain moisture (0.90) than for grain yield (0.58). The accuracy of genomic selection realized for grain yield corresponds to the precision of phenotyping at unreplicated field trials in 3-4 locations. As for maize up to three generations are feasible per year, selection gain per unit time is high and, consequently, genomic selection holds great promise for maize breeding programs.

  1. Ab-initio conformational epitope structure prediction using genetic algorithm and SVM for vaccine design.

    PubMed

    Moghram, Basem Ameen; Nabil, Emad; Badr, Amr

    2018-01-01

    T-cell epitope structure identification is a significant challenging immunoinformatic problem within epitope-based vaccine design. Epitopes or antigenic peptides are a set of amino acids that bind with the Major Histocompatibility Complex (MHC) molecules. The aim of this process is presented by Antigen Presenting Cells to be inspected by T-cells. MHC-molecule-binding epitopes are responsible for triggering the immune response to antigens. The epitope's three-dimensional (3D) molecular structure (i.e., tertiary structure) reflects its proper function. Therefore, the identification of MHC class-II epitopes structure is a significant step towards epitope-based vaccine design and understanding of the immune system. In this paper, we propose a new technique using a Genetic Algorithm for Predicting the Epitope Structure (GAPES), to predict the structure of MHC class-II epitopes based on their sequence. The proposed Elitist-based genetic algorithm for predicting the epitope's tertiary structure is based on Ab-Initio Empirical Conformational Energy Program for Peptides (ECEPP) Force Field Model. The developed secondary structure prediction technique relies on Ramachandran Plot. We used two alignment algorithms: the ROSS alignment and TM-Score alignment. We applied four different alignment approaches to calculate the similarity scores of the dataset under test. We utilized the support vector machine (SVM) classifier as an evaluation of the prediction performance. The prediction accuracy and the Area Under Receiver Operating Characteristic (ROC) Curve (AUC) were calculated as measures of performance. The calculations are performed on twelve similarity-reduced datasets of the Immune Epitope Data Base (IEDB) and a large dataset of peptide-binding affinities to HLA-DRB1*0101. The results showed that GAPES was reliable and very accurate. We achieved an average prediction accuracy of 93.50% and an average AUC of 0.974 in the IEDB dataset. Also, we achieved an accuracy of 95.125% and an AUC of 0.987 on the HLA-DRB1*0101 allele of the Wang benchmark dataset. The results indicate that the proposed prediction technique "GAPES" is a promising technique that will help researchers and scientists to predict the protein structure and it will assist them in the intelligent design of new epitope-based vaccines. Copyright © 2017 Elsevier B.V. All rights reserved.

  2. Maximizing lipocalin prediction through balanced and diversified training set and decision fusion.

    PubMed

    Nath, Abhigyan; Subbiah, Karthikeyan

    2015-12-01

    Lipocalins are short in sequence length and perform several important biological functions. These proteins are having less than 20% sequence similarity among paralogs. Experimentally identifying them is an expensive and time consuming process. The computational methods based on the sequence similarity for allocating putative members to this family are also far elusive due to the low sequence similarity existing among the members of this family. Consequently, the machine learning methods become a viable alternative for their prediction by using the underlying sequence/structurally derived features as the input. Ideally, any machine learning based prediction method must be trained with all possible variations in the input feature vector (all the sub-class input patterns) to achieve perfect learning. A near perfect learning can be achieved by training the model with diverse types of input instances belonging to the different regions of the entire input space. Furthermore, the prediction performance can be improved through balancing the training set as the imbalanced data sets will tend to produce the prediction bias towards majority class and its sub-classes. This paper is aimed to achieve (i) the high generalization ability without any classification bias through the diversified and balanced training sets as well as (ii) enhanced the prediction accuracy by combining the results of individual classifiers with an appropriate fusion scheme. Instead of creating the training set randomly, we have first used the unsupervised Kmeans clustering algorithm to create diversified clusters of input patterns and created the diversified and balanced training set by selecting an equal number of patterns from each of these clusters. Finally, probability based classifier fusion scheme was applied on boosted random forest algorithm (which produced greater sensitivity) and K nearest neighbour algorithm (which produced greater specificity) to achieve the enhanced predictive performance than that of individual base classifiers. The performance of the learned models trained on Kmeans preprocessed training set is far better than the randomly generated training sets. The proposed method achieved a sensitivity of 90.6%, specificity of 91.4% and accuracy of 91.0% on the first test set and sensitivity of 92.9%, specificity of 96.2% and accuracy of 94.7% on the second blind test set. These results have established that diversifying training set improves the performance of predictive models through superior generalization ability and balancing the training set improves prediction accuracy. For smaller data sets, unsupervised Kmeans based sampling can be an effective technique to increase generalization than that of the usual random splitting method. Copyright © 2015 Elsevier Ltd. All rights reserved.

  3. Accuracy of active chirp linearization for broadband frequency modulated continuous wave ladar.

    PubMed

    Barber, Zeb W; Babbitt, Wm Randall; Kaylor, Brant; Reibel, Randy R; Roos, Peter A

    2010-01-10

    As the bandwidth and linearity of frequency modulated continuous wave chirp ladar increase, the resulting range resolution, precisions, and accuracy are improved correspondingly. An analysis of a very broadband (several THz) and linear (<1 ppm) chirped ladar system based on active chirp linearization is presented. Residual chirp nonlinearity and material dispersion are analyzed as to their effect on the dynamic range, precision, and accuracy of the system. Measurement precision and accuracy approaching the part per billion level is predicted.

  4. Prediction accuracy of direct and indirect approaches, and their relationships with prediction ability of calibration models.

    PubMed

    Belay, T K; Dagnachew, B S; Boison, S A; Ådnøy, T

    2018-03-28

    Milk infrared spectra are routinely used for phenotyping traits of interest through links developed between the traits and spectra. Predicted individual traits are then used in genetic analyses for estimated breeding value (EBV) or for phenotypic predictions using a single-trait mixed model; this approach is referred to as indirect prediction (IP). An alternative approach [direct prediction (DP)] is a direct genetic analysis of (a reduced dimension of) the spectra using a multitrait model to predict multivariate EBV of the spectral components and, ultimately, also to predict the univariate EBV or phenotype for the traits of interest. We simulated 3 traits under different genetic (low: 0.10 to high: 0.90) and residual (zero to high: ±0.90) correlation scenarios between the 3 traits and assumed the first trait is a linear combination of the other 2 traits. The aim was to compare the IP and DP approaches for predictions of EBV and phenotypes under the different correlation scenarios. We also evaluated relationships between performances of the 2 approaches and the accuracy of calibration equations. Moreover, the effect of using different regression coefficients estimated from simulated phenotypes (β p ), true breeding values (β g ), and residuals (β r ) on performance of the 2 approaches were evaluated. The simulated data contained 2,100 parents (100 sires and 2,000 cows) and 8,000 offspring (4 offspring per cow). Of the 8,000 observations, 2,000 were randomly selected and used to develop links between the first and the other 2 traits using partial least square (PLS) regression analysis. The different PLS regression coefficients, such as β p , β g , and β r , were used in subsequent predictions following the IP and DP approaches. We used BLUP analyses for the remaining 6,000 observations using the true (co)variance components that had been used for the simulation. Accuracy of prediction (of EBV and phenotype) was calculated as a correlation between predicted and true values from the simulations. The results showed that accuracies of EBV prediction were higher in the DP than in the IP approach. The reverse was true for accuracy of phenotypic prediction when using β p but not when using β g and β r , where accuracy of phenotypic prediction in the DP was slightly higher than in the IP approach. Within the DP approach, accuracies of EBV when using β g were higher than when using β p only at the low genetic correlation scenario. However, we found no differences in EBV prediction accuracy between the β p and β g in the IP approach. Accuracy of the calibration models increased with an increase in genetic and residual correlations between the traits. Performance of both approaches increased with an increase in accuracy of the calibration models. In conclusion, the DP approach is a good strategy for EBV prediction but not for phenotypic prediction, where the classical PLS regression-based equations or the IP approach provided better results. The Authors. Published by FASS Inc. and Elsevier Inc. on behalf of the American Dairy Science Association®. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/).

  5. Adjustment of regional regression models of urban-runoff quality using data for Chattanooga, Knoxville, and Nashville, Tennessee

    USGS Publications Warehouse

    Hoos, Anne B.; Patel, Anant R.

    1996-01-01

    Model-adjustment procedures were applied to the combined data bases of storm-runoff quality for Chattanooga, Knoxville, and Nashville, Tennessee, to improve predictive accuracy for storm-runoff quality for urban watersheds in these three cities and throughout Middle and East Tennessee. Data for 45 storms at 15 different sites (five sites in each city) constitute the data base. Comparison of observed values of storm-runoff load and event-mean concentration to the predicted values from the regional regression models for 10 constituents shows prediction errors, as large as 806,000 percent. Model-adjustment procedures, which combine the regional model predictions with local data, are applied to improve predictive accuracy. Standard error of estimate after model adjustment ranges from 67 to 322 percent. Calibration results may be biased due to sampling error in the Tennessee data base. The relatively large values of standard error of estimate for some of the constituent models, although representing significant reduction (at least 50 percent) in prediction error compared to estimation with unadjusted regional models, may be unacceptable for some applications. The user may wish to collect additional local data for these constituents and repeat the analysis, or calibrate an independent local regression model.

  6. Subcellular location prediction of proteins using support vector machines with alignment of block sequences utilizing amino acid composition.

    PubMed

    Tamura, Takeyuki; Akutsu, Tatsuya

    2007-11-30

    Subcellular location prediction of proteins is an important and well-studied problem in bioinformatics. This is a problem of predicting which part in a cell a given protein is transported to, where an amino acid sequence of the protein is given as an input. This problem is becoming more important since information on subcellular location is helpful for annotation of proteins and genes and the number of complete genomes is rapidly increasing. Since existing predictors are based on various heuristics, it is important to develop a simple method with high prediction accuracies. In this paper, we propose a novel and general predicting method by combining techniques for sequence alignment and feature vectors based on amino acid composition. We implemented this method with support vector machines on plant data sets extracted from the TargetP database. Through fivefold cross validation tests, the obtained overall accuracies and average MCC were 0.9096 and 0.8655 respectively. We also applied our method to other datasets including that of WoLF PSORT. Although there is a predictor which uses the information of gene ontology and yields higher accuracy than ours, our accuracies are higher than existing predictors which use only sequence information. Since such information as gene ontology can be obtained only for known proteins, our predictor is considered to be useful for subcellular location prediction of newly-discovered proteins. Furthermore, the idea of combination of alignment and amino acid frequency is novel and general so that it may be applied to other problems in bioinformatics. Our method for plant is also implemented as a web-system and available on http://sunflower.kuicr.kyoto-u.ac.jp/~tamura/slpfa.html.

  7. Prediction of soil properties using imaging spectroscopy: Considering fractional vegetation cover to improve accuracy

    NASA Astrophysics Data System (ADS)

    Franceschini, M. H. D.; Demattê, J. A. M.; da Silva Terra, F.; Vicente, L. E.; Bartholomeus, H.; de Souza Filho, C. R.

    2015-06-01

    Spectroscopic techniques have become attractive to assess soil properties because they are fast, require little labor and may reduce the amount of laboratory waste produced when compared to conventional methods. Imaging spectroscopy (IS) can have further advantages compared to laboratory or field proximal spectroscopic approaches such as providing spatially continuous information with a high density. However, the accuracy of IS derived predictions decreases when the spectral mixture of soil with other targets occurs. This paper evaluates the use of spectral data obtained by an airborne hyperspectral sensor (ProSpecTIR-VS - Aisa dual sensor) for prediction of physical and chemical properties of Brazilian highly weathered soils (i.e., Oxisols). A methodology to assess the soil spectral mixture is adapted and a progressive spectral dataset selection procedure, based on bare soil fractional cover, is proposed and tested. Satisfactory performances are obtained specially for the quantification of clay, sand and CEC using airborne sensor data (R2 of 0.77, 0.79 and 0.54; RPD of 2.14, 2.22 and 1.50, respectively), after spectral data selection is performed; although results obtained for laboratory data are more accurate (R2 of 0.92, 0.85 and 0.75; RPD of 3.52, 2.62 and 2.04, for clay, sand and CEC, respectively). Most importantly, predictions based on airborne-derived spectra for which the bare soil fractional cover is not taken into account show considerable lower accuracy, for example for clay, sand and CEC (RPD of 1.52, 1.64 and 1.16, respectively). Therefore, hyperspectral remotely sensed data can be used to predict topsoil properties of highly weathered soils, although spectral mixture of bare soil with vegetation must be considered in order to achieve an improved prediction accuracy.

  8. Improving accuracy of genomic prediction in Brangus cattle by adding animals with imputed low-density SNP genotypes.

    PubMed

    Lopes, F B; Wu, X-L; Li, H; Xu, J; Perkins, T; Genho, J; Ferretti, R; Tait, R G; Bauck, S; Rosa, G J M

    2018-02-01

    Reliable genomic prediction of breeding values for quantitative traits requires the availability of sufficient number of animals with genotypes and phenotypes in the training set. As of 31 October 2016, there were 3,797 Brangus animals with genotypes and phenotypes. These Brangus animals were genotyped using different commercial SNP chips. Of them, the largest group consisted of 1,535 animals genotyped by the GGP-LDV4 SNP chip. The remaining 2,262 genotypes were imputed to the SNP content of the GGP-LDV4 chip, so that the number of animals available for training the genomic prediction models was more than doubled. The present study showed that the pooling of animals with both original or imputed 40K SNP genotypes substantially increased genomic prediction accuracies on the ten traits. By supplementing imputed genotypes, the relative gains in genomic prediction accuracies on estimated breeding values (EBV) were from 12.60% to 31.27%, and the relative gain in genomic prediction accuracies on de-regressed EBV was slightly small (i.e. 0.87%-18.75%). The present study also compared the performance of five genomic prediction models and two cross-validation methods. The five genomic models predicted EBV and de-regressed EBV of the ten traits similarly well. Of the two cross-validation methods, leave-one-out cross-validation maximized the number of animals at the stage of training for genomic prediction. Genomic prediction accuracy (GPA) on the ten quantitative traits was validated in 1,106 newly genotyped Brangus animals based on the SNP effects estimated in the previous set of 3,797 Brangus animals, and they were slightly lower than GPA in the original data. The present study was the first to leverage currently available genotype and phenotype resources in order to harness genomic prediction in Brangus beef cattle. © 2018 Blackwell Verlag GmbH.

  9. Prediction Accuracy of Error Rates for MPTB Space Experiment

    NASA Technical Reports Server (NTRS)

    Buchner, S. P.; Campbell, A. B.; Davis, D.; McMorrow, D.; Petersen, E. L.; Stassinopoulos, E. G.; Ritter, J. C.

    1998-01-01

    This paper addresses the accuracy of radiation-induced upset-rate predictions in space using the results of ground-based measurements together with standard environmental and device models. The study is focused on two part types - 16 Mb NEC DRAM's (UPD4216) and 1 Kb SRAM's (AMD93L422) - both of which are currently in space on board the Microelectronics and Photonics Test Bed (MPTB). To date, ground-based measurements of proton-induced single event upset (SEM cross sections as a function of energy have been obtained and combined with models of the proton environment to predict proton-induced error rates in space. The role played by uncertainties in the environmental models will be determined by comparing the modeled radiation environment with the actual environment measured aboard MPTB. Heavy-ion induced upsets have also been obtained from MPTB and will be compared with the "predicted" error rate following ground testing that will be done in the near future. These results should help identify sources of uncertainty in predictions of SEU rates in space.

  10. Glucose Prediction Algorithms from Continuous Monitoring Data: Assessment of Accuracy via Continuous Glucose Error-Grid Analysis.

    PubMed

    Zanderigo, Francesca; Sparacino, Giovanni; Kovatchev, Boris; Cobelli, Claudio

    2007-09-01

    The aim of this article was to use continuous glucose error-grid analysis (CG-EGA) to assess the accuracy of two time-series modeling methodologies recently developed to predict glucose levels ahead of time using continuous glucose monitoring (CGM) data. We considered subcutaneous time series of glucose concentration monitored every 3 minutes for 48 hours by the minimally invasive CGM sensor Glucoday® (Menarini Diagnostics, Florence, Italy) in 28 type 1 diabetic volunteers. Two prediction algorithms, based on first-order polynomial and autoregressive (AR) models, respectively, were considered with prediction horizons of 30 and 45 minutes and forgetting factors (ff) of 0.2, 0.5, and 0.8. CG-EGA was used on the predicted profiles to assess their point and dynamic accuracies using original CGM profiles as reference. Continuous glucose error-grid analysis showed that the accuracy of both prediction algorithms is overall very good and that their performance is similar from a clinical point of view. However, the AR model seems preferable for hypoglycemia prevention. CG-EGA also suggests that, irrespective of the time-series model, the use of ff = 0.8 yields the highest accurate readings in all glucose ranges. For the first time, CG-EGA is proposed as a tool to assess clinically relevant performance of a prediction method separately at hypoglycemia, euglycemia, and hyperglycemia. In particular, we have shown that CG-EGA can be helpful in comparing different prediction algorithms, as well as in optimizing their parameters.

  11. Artificial neural network modeling using clinical and knowledge independent variables predicts salt intake reduction behavior

    PubMed Central

    Isma’eel, Hussain A.; Sakr, George E.; Almedawar, Mohamad M.; Fathallah, Jihan; Garabedian, Torkom; Eddine, Savo Bou Zein

    2015-01-01

    Background High dietary salt intake is directly linked to hypertension and cardiovascular diseases (CVDs). Predicting behaviors regarding salt intake habits is vital to guide interventions and increase their effectiveness. We aim to compare the accuracy of an artificial neural network (ANN) based tool that predicts behavior from key knowledge questions along with clinical data in a high cardiovascular risk cohort relative to the least square models (LSM) method. Methods We collected knowledge, attitude and behavior data on 115 patients. A behavior score was calculated to classify patients’ behavior towards reducing salt intake. Accuracy comparison between ANN and regression analysis was calculated using the bootstrap technique with 200 iterations. Results Starting from a 69-item questionnaire, a reduced model was developed and included eight knowledge items found to result in the highest accuracy of 62% CI (58-67%). The best prediction accuracy in the full and reduced models was attained by ANN at 66% and 62%, respectively, compared to full and reduced LSM at 40% and 34%, respectively. The average relative increase in accuracy over all in the full and reduced models is 82% and 102%, respectively. Conclusions Using ANN modeling, we can predict salt reduction behaviors with 66% accuracy. The statistical model has been implemented in an online calculator and can be used in clinics to estimate the patient’s behavior. This will help implementation in future research to further prove clinical utility of this tool to guide therapeutic salt reduction interventions in high cardiovascular risk individuals. PMID:26090333

  12. Protein structure refinement using a quantum mechanics-based chemical shielding predictor.

    PubMed

    Bratholm, Lars A; Jensen, Jan H

    2017-03-01

    The accurate prediction of protein chemical shifts using a quantum mechanics (QM)-based method has been the subject of intense research for more than 20 years but so far empirical methods for chemical shift prediction have proven more accurate. In this paper we show that a QM-based predictor of a protein backbone and CB chemical shifts (ProCS15, PeerJ , 2016, 3, e1344) is of comparable accuracy to empirical chemical shift predictors after chemical shift-based structural refinement that removes small structural errors. We present a method by which quantum chemistry based predictions of isotropic chemical shielding values (ProCS15) can be used to refine protein structures using Markov Chain Monte Carlo (MCMC) simulations, relating the chemical shielding values to the experimental chemical shifts probabilistically. Two kinds of MCMC structural refinement simulations were performed using force field geometry optimized X-ray structures as starting points: simulated annealing of the starting structure and constant temperature MCMC simulation followed by simulated annealing of a representative ensemble structure. Annealing of the CHARMM structure changes the CA-RMSD by an average of 0.4 Å but lowers the chemical shift RMSD by 1.0 and 0.7 ppm for CA and N. Conformational averaging has a relatively small effect (0.1-0.2 ppm) on the overall agreement with carbon chemical shifts but lowers the error for nitrogen chemical shifts by 0.4 ppm. If an amino acid specific offset is included the ProCS15 predicted chemical shifts have RMSD values relative to experiments that are comparable to popular empirical chemical shift predictors. The annealed representative ensemble structures differ in CA-RMSD relative to the initial structures by an average of 2.0 Å, with >2.0 Å difference for six proteins. In four of the cases, the largest structural differences arise in structurally flexible regions of the protein as determined by NMR, and in the remaining two cases, the large structural change may be due to force field deficiencies. The overall accuracy of the empirical methods are slightly improved by annealing the CHARMM structure with ProCS15, which may suggest that the minor structural changes introduced by ProCS15-based annealing improves the accuracy of the protein structures. Having established that QM-based chemical shift prediction can deliver the same accuracy as empirical shift predictors we hope this can help increase the accuracy of related approaches such as QM/MM or linear scaling approaches or interpreting protein structural dynamics from QM-derived chemical shift.

  13. SVM-PB-Pred: SVM based protein block prediction method using sequence profiles and secondary structures.

    PubMed

    Suresh, V; Parthasarathy, S

    2014-01-01

    We developed a support vector machine based web server called SVM-PB-Pred, to predict the Protein Block for any given amino acid sequence. The input features of SVM-PB-Pred include i) sequence profiles (PSSM) and ii) actual secondary structures (SS) from DSSP method or predicted secondary structures from NPS@ and GOR4 methods. There were three combined input features PSSM+SS(DSSP), PSSM+SS(NPS@) and PSSM+SS(GOR4) used to test and train the SVM models. Similarly, four datasets RS90, DB433, LI1264 and SP1577 were used to develop the SVM models. These four SVM models developed were tested using three different benchmarking tests namely; (i) self consistency, (ii) seven fold cross validation test and (iii) independent case test. The maximum possible prediction accuracy of ~70% was observed in self consistency test for the SVM models of both LI1264 and SP1577 datasets, where PSSM+SS(DSSP) input features was used to test. The prediction accuracies were reduced to ~53% for PSSM+SS(NPS@) and ~43% for PSSM+SS(GOR4) in independent case test, for the SVM models of above two same datasets. Using our method, it is possible to predict the protein block letters for any query protein sequence with ~53% accuracy, when the SP1577 dataset and predicted secondary structure from NPS@ server were used. The SVM-PB-Pred server can be freely accessed through http://bioinfo.bdu.ac.in/~svmpbpred.

  14. Testing the applicability of artificial intelligence techniques to the subject of erythemal ultraviolet solar radiation. Part two: an intelligent system based on multi-classifier technique.

    PubMed

    Elminir, Hamdy K; Own, Hala S; Azzam, Yosry A; Riad, A M

    2008-03-28

    The problem we address here describes the on-going research effort that takes place to shed light on the applicability of using artificial intelligence techniques to predict the local noon erythemal UV irradiance in the plain areas of Egypt. In light of this fact, we use the bootstrap aggregating (bagging) algorithm to improve the prediction accuracy reported by a multi-layer perceptron (MLP) network. The results showed that, the overall prediction accuracy for the MLP network was only 80.9%. When bagging algorithm is used, the accuracy reached 94.8%; an improvement of about 13.9% was achieved. These improvements demonstrate the efficiency of the bagging procedure, and may be used as a promising tool at least for the plain areas of Egypt.

  15. Contingency Table Browser - prediction of early stage protein structure.

    PubMed

    Kalinowska, Barbara; Krzykalski, Artur; Roterman, Irena

    2015-01-01

    The Early Stage (ES) intermediate represents the starting structure in protein folding simulations based on the Fuzzy Oil Drop (FOD) model. The accuracy of FOD predictions is greatly dependent on the accuracy of the chosen intermediate. A suitable intermediate can be constructed using the sequence-structure relationship information contained in the so-called contingency table - this table expresses the likelihood of encountering various structural motifs for each tetrapeptide fragment in the amino acid sequence. The limited accuracy with which such structures could previously be predicted provided the motivation for a more indepth study of the contingency table itself. The Contingency Table Browser is a tool which can visualize, search and analyze the table. Our work presents possible applications of Contingency Table Browser, among them - analysis of specific protein sequences from the point of view of their structural ambiguity.

  16. Operationalizing the Diagnostic Criteria for Mild Cognitive Impairment: The Salience of Objective Measures in Predicting Incident Dementia.

    PubMed

    Brodaty, Henry; Aerts, Liesbeth; Crawford, John D; Heffernan, Megan; Kochan, Nicole A; Reppermund, Simone; Kang, Kristan; Maston, Kate; Draper, Brian; Trollor, Julian N; Sachdev, Perminder S

    2017-05-01

    Mild cognitive impairment (MCI) is considered an intermediate stage between normal aging and dementia. It is diagnosed in the presence of subjective cognitive decline and objective cognitive impairment without significant functional impairment, although there are no standard operationalizations for each of these criteria. The objective of this study is to determine which operationalization of the MCI criteria is most accurate at predicting dementia. Six-year longitudinal study, part of the Sydney Memory and Ageing Study. Community-based. 873 community-dwelling dementia-free adults between 70 and 90 years of age. Persons from a non-English speaking background were excluded. Seven different operationalizations for subjective cognitive decline and eight measures of objective cognitive impairment (resulting in 56 different MCI operational algorithms) were applied. The accuracy of each algorithm to predict progression to dementia over 6 years was examined for 618 individuals. Baseline MCI prevalence varied between 0.4% and 30.2% and dementia conversion between 15.9% and 61.9% across different algorithms. The predictive accuracy for progression to dementia was poor. The highest accuracy was achieved based on objective cognitive impairment alone. Inclusion of subjective cognitive decline or mild functional impairment did not improve dementia prediction accuracy. Not MCI, but objective cognitive impairment alone, is the best predictor for progression to dementia in a community sample. Nevertheless, clinical assessment procedures need to be refined to improve the identification of pre-dementia individuals. Copyright © 2016 American Association for Geriatric Psychiatry. Published by Elsevier Inc. All rights reserved.

  17. Integrated Strategy Improves the Prediction Accuracy of miRNA in Large Dataset

    PubMed Central

    Lipps, David; Devineni, Sree

    2016-01-01

    MiRNAs are short non-coding RNAs of about 22 nucleotides, which play critical roles in gene expression regulation. The biogenesis of miRNAs is largely determined by the sequence and structural features of their parental RNA molecules. Based on these features, multiple computational tools have been developed to predict if RNA transcripts contain miRNAs or not. Although being very successful, these predictors started to face multiple challenges in recent years. Many predictors were optimized using datasets of hundreds of miRNA samples. The sizes of these datasets are much smaller than the number of known miRNAs. Consequently, the prediction accuracy of these predictors in large dataset becomes unknown and needs to be re-tested. In addition, many predictors were optimized for either high sensitivity or high specificity. These optimization strategies may bring in serious limitations in applications. Moreover, to meet continuously raised expectations on these computational tools, improving the prediction accuracy becomes extremely important. In this study, a meta-predictor mirMeta was developed by integrating a set of non-linear transformations with meta-strategy. More specifically, the outputs of five individual predictors were first preprocessed using non-linear transformations, and then fed into an artificial neural network to make the meta-prediction. The prediction accuracy of meta-predictor was validated using both multi-fold cross-validation and independent dataset. The final accuracy of meta-predictor in newly-designed large dataset is improved by 7% to 93%. The meta-predictor is also proved to be less dependent on datasets, as well as has refined balance between sensitivity and specificity. This study has two folds of importance: First, it shows that the combination of non-linear transformations and artificial neural networks improves the prediction accuracy of individual predictors. Second, a new miRNA predictor with significantly improved prediction accuracy is developed for the community for identifying novel miRNAs and the complete set of miRNAs. Source code is available at: https://github.com/xueLab/mirMeta PMID:28002428

  18. COMSAT: Residue contact prediction of transmembrane proteins based on support vector machines and mixed integer linear programming.

    PubMed

    Zhang, Huiling; Huang, Qingsheng; Bei, Zhendong; Wei, Yanjie; Floudas, Christodoulos A

    2016-03-01

    In this article, we present COMSAT, a hybrid framework for residue contact prediction of transmembrane (TM) proteins, integrating a support vector machine (SVM) method and a mixed integer linear programming (MILP) method. COMSAT consists of two modules: COMSAT_SVM which is trained mainly on position-specific scoring matrix features, and COMSAT_MILP which is an ab initio method based on optimization models. Contacts predicted by the SVM model are ranked by SVM confidence scores, and a threshold is trained to improve the reliability of the predicted contacts. For TM proteins with no contacts above the threshold, COMSAT_MILP is used. The proposed hybrid contact prediction scheme was tested on two independent TM protein sets based on the contact definition of 14 Å between Cα-Cα atoms. First, using a rigorous leave-one-protein-out cross validation on the training set of 90 TM proteins, an accuracy of 66.8%, a coverage of 12.3%, a specificity of 99.3% and a Matthews' correlation coefficient (MCC) of 0.184 were obtained for residue pairs that are at least six amino acids apart. Second, when tested on a test set of 87 TM proteins, the proposed method showed a prediction accuracy of 64.5%, a coverage of 5.3%, a specificity of 99.4% and a MCC of 0.106. COMSAT shows satisfactory results when compared with 12 other state-of-the-art predictors, and is more robust in terms of prediction accuracy as the length and complexity of TM protein increase. COMSAT is freely accessible at http://hpcc.siat.ac.cn/COMSAT/. © 2016 Wiley Periodicals, Inc.

  19. In silico platform for predicting and initiating β-turns in a protein at desired locations.

    PubMed

    Singh, Harinder; Singh, Sandeep; Raghava, Gajendra P S

    2015-05-01

    Numerous studies have been performed for analysis and prediction of β-turns in a protein. This study focuses on analyzing, predicting, and designing of β-turns to understand the preference of amino acids in β-turn formation. We analyzed around 20,000 PDB chains to understand the preference of residues or pair of residues at different positions in β-turns. Based on the results, a propensity-based method has been developed for predicting β-turns with an accuracy of 82%. We introduced a new approach entitled "Turn level prediction method," which predicts the complete β-turn rather than focusing on the residues in a β-turn. Finally, we developed BetaTPred3, a Random forest based method for predicting β-turns by utilizing various features of four residues present in β-turns. The BetaTPred3 achieved an accuracy of 79% with 0.51 MCC that is comparable or better than existing methods on BT426 dataset. Additionally, models were developed to predict β-turn types with better performance than other methods available in the literature. In order to improve the quality of prediction of turns, we developed prediction models on a large and latest dataset of 6376 nonredundant protein chains. Based on this study, a web server has been developed for prediction of β-turns and their types in proteins. This web server also predicts minimum number of mutations required to initiate or break a β-turn in a protein at specified location of a protein. © 2015 Wiley Periodicals, Inc.

  20. Weather-based prediction of Plasmodium falciparum malaria in epidemic-prone regions of Ethiopia II. Weather-based prediction systems perform comparably to early detection systems in identifying times for interventions.

    PubMed

    Teklehaimanot, Hailay D; Schwartz, Joel; Teklehaimanot, Awash; Lipsitch, Marc

    2004-11-19

    Timely and accurate information about the onset of malaria epidemics is essential for effective control activities in epidemic-prone regions. Early warning methods that provide earlier alerts (usually by the use of weather variables) may permit control measures to interrupt transmission earlier in the epidemic, perhaps at the expense of some level of accuracy. Expected case numbers were modeled using a Poisson regression with lagged weather factors in a 4th-degree polynomial distributed lag model. For each week, the numbers of malaria cases were predicted using coefficients obtained using all years except that for which the prediction was being made. The effectiveness of alerts generated by the prediction system was compared against that of alerts based on observed cases. The usefulness of the prediction system was evaluated in cold and hot districts. The system predicts the overall pattern of cases well, yet underestimates the height of the largest peaks. Relative to alerts triggered by observed cases, the alerts triggered by the predicted number of cases performed slightly worse, within 5% of the detection system. The prediction-based alerts were able to prevent 10-25% more cases at a given sensitivity in cold districts than in hot ones. The prediction of malaria cases using lagged weather performed well in identifying periods of increased malaria cases. Weather-derived predictions identified epidemics with reasonable accuracy and better timeliness than early detection systems; therefore, the prediction of malarial epidemics using weather is a plausible alternative to early detection systems.

  1. Automatically identifying health outcome information in MEDLINE records.

    PubMed

    Demner-Fushman, Dina; Few, Barbara; Hauser, Susan E; Thoma, George

    2006-01-01

    Understanding the effect of a given intervention on the patient's health outcome is one of the key elements in providing optimal patient care. This study presents a methodology for automatic identification of outcomes-related information in medical text and evaluates its potential in satisfying clinical information needs related to health care outcomes. An annotation scheme based on an evidence-based medicine model for critical appraisal of evidence was developed and used to annotate 633 MEDLINE citations. Textual, structural, and meta-information features essential to outcome identification were learned from the created collection and used to develop an automatic system. Accuracy of automatic outcome identification was assessed in an intrinsic evaluation and in an extrinsic evaluation, in which ranking of MEDLINE search results obtained using PubMed Clinical Queries relied on identified outcome statements. The accuracy and positive predictive value of outcome identification were calculated. Effectiveness of the outcome-based ranking was measured using mean average precision and precision at rank 10. Automatic outcome identification achieved 88% to 93% accuracy. The positive predictive value of individual sentences identified as outcomes ranged from 30% to 37%. Outcome-based ranking improved retrieval accuracy, tripling mean average precision and achieving 389% improvement in precision at rank 10. Preliminary results in outcome-based document ranking show potential validity of the evidence-based medicine-model approach in timely delivery of information critical to clinical decision support at the point of service.

  2. Numerical simulation of turbulence flow in a Kaplan turbine -Evaluation on turbine performance prediction accuracy-

    NASA Astrophysics Data System (ADS)

    Ko, P.; Kurosawa, S.

    2014-03-01

    The understanding and accurate prediction of the flow behaviour related to cavitation and pressure fluctuation in a Kaplan turbine are important to the design work enhancing the turbine performance including the elongation of the operation life span and the improvement of turbine efficiency. In this paper, high accuracy turbine and cavitation performance prediction method based on entire flow passage for a Kaplan turbine is presented and evaluated. Two-phase flow field is predicted by solving Reynolds-Averaged Navier-Stokes equations expressed by volume of fluid method tracking the free surface and combined with Reynolds Stress model. The growth and collapse of cavitation bubbles are modelled by the modified Rayleigh-Plesset equation. The prediction accuracy is evaluated by comparing with the model test results of Ns 400 Kaplan model turbine. As a result that the experimentally measured data including turbine efficiency, cavitation performance, and pressure fluctuation are accurately predicted. Furthermore, the cavitation occurrence on the runner blade surface and the influence to the hydraulic loss of the flow passage are discussed. Evaluated prediction method for the turbine flow and performance is introduced to facilitate the future design and research works on Kaplan type turbine.

  3. Genomic Prediction of Testcross Performance in Canola (Brassica napus)

    PubMed Central

    Jan, Habib U.; Abbadi, Amine; Lücke, Sophie; Nichols, Richard A.; Snowdon, Rod J.

    2016-01-01

    Genomic selection (GS) is a modern breeding approach where genome-wide single-nucleotide polymorphism (SNP) marker profiles are simultaneously used to estimate performance of untested genotypes. In this study, the potential of genomic selection methods to predict testcross performance for hybrid canola breeding was applied for various agronomic traits based on genome-wide marker profiles. A total of 475 genetically diverse spring-type canola pollinator lines were genotyped at 24,403 single-copy, genome-wide SNP loci. In parallel, the 950 F1 testcross combinations between the pollinators and two representative testers were evaluated for a number of important agronomic traits including seedling emergence, days to flowering, lodging, oil yield and seed yield along with essential seed quality characters including seed oil content and seed glucosinolate content. A ridge-regression best linear unbiased prediction (RR-BLUP) model was applied in combination with 500 cross-validations for each trait to predict testcross performance, both across the whole population as well as within individual subpopulations or clusters, based solely on SNP profiles. Subpopulations were determined using multidimensional scaling and K-means clustering. Genomic prediction accuracy across the whole population was highest for seed oil content (0.81) followed by oil yield (0.75) and lowest for seedling emergence (0.29). For seed yieId, seed glucosinolate, lodging resistance and days to onset of flowering (DTF), prediction accuracies were 0.45, 0.61, 0.39 and 0.56, respectively. Prediction accuracies could be increased for some traits by treating subpopulations separately; a strategy which only led to moderate improvements for some traits with low heritability, like seedling emergence. No useful or consistent increase in accuracy was obtained by inclusion of a population substructure covariate in the model. Testcross performance prediction using genome-wide SNP markers shows considerable potential for pre-selection of promising hybrid combinations prior to resource-intensive field testing over multiple locations and years. PMID:26824924

  4. Multi-centre diagnostic classification of individual structural neuroimaging scans from patients with major depressive disorder.

    PubMed

    Mwangi, Benson; Ebmeier, Klaus P; Matthews, Keith; Steele, J Douglas

    2012-05-01

    Quantitative abnormalities of brain structure in patients with major depressive disorder have been reported at a group level for decades. However, these structural differences appear subtle in comparison with conventional radiologically defined abnormalities, with considerable inter-subject variability. Consequently, it has not been possible to readily identify scans from patients with major depressive disorder at an individual level. Recently, machine learning techniques such as relevance vector machines and support vector machines have been applied to predictive classification of individual scans with variable success. Here we describe a novel hybrid method, which combines machine learning with feature selection and characterization, with the latter aimed at maximizing the accuracy of machine learning prediction. The method was tested using a multi-centre dataset of T(1)-weighted 'structural' scans. A total of 62 patients with major depressive disorder and matched controls were recruited from referred secondary care clinical populations in Aberdeen and Edinburgh, UK. The generalization ability and predictive accuracy of the classifiers was tested using data left out of the training process. High prediction accuracy was achieved (~90%). While feature selection was important for maximizing high predictive accuracy with machine learning, feature characterization contributed only a modest improvement to relevance vector machine-based prediction (~5%). Notably, while the only information provided for training the classifiers was T(1)-weighted scans plus a categorical label (major depressive disorder versus controls), both relevance vector machine and support vector machine 'weighting factors' (used for making predictions) correlated strongly with subjective ratings of illness severity. These results indicate that machine learning techniques have the potential to inform clinical practice and research, as they can make accurate predictions about brain scan data from individual subjects. Furthermore, machine learning weighting factors may reflect an objective biomarker of major depressive disorder illness severity, based on abnormalities of brain structure.

  5. Sphinx: merging knowledge-based and ab initio approaches to improve protein loop prediction

    PubMed Central

    Marks, Claire; Nowak, Jaroslaw; Klostermann, Stefan; Georges, Guy; Dunbar, James; Shi, Jiye; Kelm, Sebastian

    2017-01-01

    Abstract Motivation: Loops are often vital for protein function, however, their irregular structures make them difficult to model accurately. Current loop modelling algorithms can mostly be divided into two categories: knowledge-based, where databases of fragments are searched to find suitable conformations and ab initio, where conformations are generated computationally. Existing knowledge-based methods only use fragments that are the same length as the target, even though loops of slightly different lengths may adopt similar conformations. Here, we present a novel method, Sphinx, which combines ab initio techniques with the potential extra structural information contained within loops of a different length to improve structure prediction. Results: We show that Sphinx is able to generate high-accuracy predictions and decoy sets enriched with near-native loop conformations, performing better than the ab initio algorithm on which it is based. In addition, it is able to provide predictions for every target, unlike some knowledge-based methods. Sphinx can be used successfully for the difficult problem of antibody H3 prediction, outperforming RosettaAntibody, one of the leading H3-specific ab initio methods, both in accuracy and speed. Availability and Implementation: Sphinx is available at http://opig.stats.ox.ac.uk/webapps/sphinx. Contact: deane@stats.ox.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28453681

  6. Sphinx: merging knowledge-based and ab initio approaches to improve protein loop prediction.

    PubMed

    Marks, Claire; Nowak, Jaroslaw; Klostermann, Stefan; Georges, Guy; Dunbar, James; Shi, Jiye; Kelm, Sebastian; Deane, Charlotte M

    2017-05-01

    Loops are often vital for protein function, however, their irregular structures make them difficult to model accurately. Current loop modelling algorithms can mostly be divided into two categories: knowledge-based, where databases of fragments are searched to find suitable conformations and ab initio, where conformations are generated computationally. Existing knowledge-based methods only use fragments that are the same length as the target, even though loops of slightly different lengths may adopt similar conformations. Here, we present a novel method, Sphinx, which combines ab initio techniques with the potential extra structural information contained within loops of a different length to improve structure prediction. We show that Sphinx is able to generate high-accuracy predictions and decoy sets enriched with near-native loop conformations, performing better than the ab initio algorithm on which it is based. In addition, it is able to provide predictions for every target, unlike some knowledge-based methods. Sphinx can be used successfully for the difficult problem of antibody H3 prediction, outperforming RosettaAntibody, one of the leading H3-specific ab initio methods, both in accuracy and speed. Sphinx is available at http://opig.stats.ox.ac.uk/webapps/sphinx. deane@stats.ox.ac.uk. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.

  7. Integrative Chemical-Biological Read-Across Approach for Chemical Hazard Classification

    PubMed Central

    Low, Yen; Sedykh, Alexander; Fourches, Denis; Golbraikh, Alexander; Whelan, Maurice; Rusyn, Ivan; Tropsha, Alexander

    2013-01-01

    Traditional read-across approaches typically rely on the chemical similarity principle to predict chemical toxicity; however, the accuracy of such predictions is often inadequate due to the underlying complex mechanisms of toxicity. Here we report on the development of a hazard classification and visualization method that draws upon both chemical structural similarity and comparisons of biological responses to chemicals measured in multiple short-term assays (”biological” similarity). The Chemical-Biological Read-Across (CBRA) approach infers each compound's toxicity from those of both chemical and biological analogs whose similarities are determined by the Tanimoto coefficient. Classification accuracy of CBRA was compared to that of classical RA and other methods using chemical descriptors alone, or in combination with biological data. Different types of adverse effects (hepatotoxicity, hepatocarcinogenicity, mutagenicity, and acute lethality) were classified using several biological data types (gene expression profiling and cytotoxicity screening). CBRA-based hazard classification exhibited consistently high external classification accuracy and applicability to diverse chemicals. Transparency of the CBRA approach is aided by the use of radial plots that show the relative contribution of analogous chemical and biological neighbors. Identification of both chemical and biological features that give rise to the high accuracy of CBRA-based toxicity prediction facilitates mechanistic interpretation of the models. PMID:23848138

  8. Haplotype-Based Genome-Wide Prediction Models Exploit Local Epistatic Interactions Among Markers

    PubMed Central

    Jiang, Yong; Schmidt, Renate H.; Reif, Jochen C.

    2018-01-01

    Genome-wide prediction approaches represent versatile tools for the analysis and prediction of complex traits. Mostly they rely on marker-based information, but scenarios have been reported in which models capitalizing on closely-linked markers that were combined into haplotypes outperformed marker-based models. Detailed comparisons were undertaken to reveal under which circumstances haplotype-based genome-wide prediction models are superior to marker-based models. Specifically, it was of interest to analyze whether and how haplotype-based models may take local epistatic effects between markers into account. Assuming that populations consisted of fully homozygous individuals, a marker-based model in which local epistatic effects inside haplotype blocks were exploited (LEGBLUP) was linearly transformable into a haplotype-based model (HGBLUP). This theoretical derivation formally revealed that haplotype-based genome-wide prediction models capitalize on local epistatic effects among markers. Simulation studies corroborated this finding. Due to its computational efficiency the HGBLUP model promises to be an interesting tool for studies in which ultra-high-density SNP data sets are studied. Applying the HGBLUP model to empirical data sets revealed higher prediction accuracies than for marker-based models for both traits studied using a mouse panel. In contrast, only a small subset of the traits analyzed in crop populations showed such a benefit. Cases in which higher prediction accuracies are observed for HGBLUP than for marker-based models are expected to be of immediate relevance for breeders, due to the tight linkage a beneficial haplotype will be preserved for many generations. In this respect the inheritance of local epistatic effects very much resembles the one of additive effects. PMID:29549092

  9. Haplotype-Based Genome-Wide Prediction Models Exploit Local Epistatic Interactions Among Markers.

    PubMed

    Jiang, Yong; Schmidt, Renate H; Reif, Jochen C

    2018-05-04

    Genome-wide prediction approaches represent versatile tools for the analysis and prediction of complex traits. Mostly they rely on marker-based information, but scenarios have been reported in which models capitalizing on closely-linked markers that were combined into haplotypes outperformed marker-based models. Detailed comparisons were undertaken to reveal under which circumstances haplotype-based genome-wide prediction models are superior to marker-based models. Specifically, it was of interest to analyze whether and how haplotype-based models may take local epistatic effects between markers into account. Assuming that populations consisted of fully homozygous individuals, a marker-based model in which local epistatic effects inside haplotype blocks were exploited (LEGBLUP) was linearly transformable into a haplotype-based model (HGBLUP). This theoretical derivation formally revealed that haplotype-based genome-wide prediction models capitalize on local epistatic effects among markers. Simulation studies corroborated this finding. Due to its computational efficiency the HGBLUP model promises to be an interesting tool for studies in which ultra-high-density SNP data sets are studied. Applying the HGBLUP model to empirical data sets revealed higher prediction accuracies than for marker-based models for both traits studied using a mouse panel. In contrast, only a small subset of the traits analyzed in crop populations showed such a benefit. Cases in which higher prediction accuracies are observed for HGBLUP than for marker-based models are expected to be of immediate relevance for breeders, due to the tight linkage a beneficial haplotype will be preserved for many generations. In this respect the inheritance of local epistatic effects very much resembles the one of additive effects. Copyright © 2018 Jiang et al.

  10. Forensic individual age estimation with DNA: From initial approaches to methylation tests.

    PubMed

    Freire-Aradas, A; Phillips, C; Lareu, M V

    2017-07-01

    Individual age estimation is a key factor in forensic science analysis that can provide very useful information applicable to criminal, legal, and anthropological investigations. Forensic age inference was initially based on morphological inspection or radiography and only later began to adopt molecular approaches. However, a lack of accuracy or technical problems hampered the introduction of these DNA-based methodologies in casework analysis. A turning point occurred when the epigenetic signature of DNA methylation was observed to gradually change during an individual´s lifespan. In the last four years, the number of publications reporting DNA methylation age-correlated changes has gradually risen and the forensic community now has a range of age methylation tests applicable to forensic casework. Most forensic age predictor models have been developed based on blood DNA samples, but additional tissues are now also being explored. This review assesses the most widely adopted genes harboring methylation sites, detection technologies, statistical age-predictive analyses, and potential causes of variation in age estimates. Despite the need for further work to improve predictive accuracy and establishing a broader range of tissues for which tests can analyze the most appropriate methylation sites, several forensic age predictors have now been reported that provide consistency in their prediction accuracies (predictive error of ±4 years); this makes them compelling tools with the potential to contribute key information to help guide criminal investigations. Copyright © 2017 Central Police University.

  11. Evaluation of approaches for estimating the accuracy of genomic prediction in plant breeding

    PubMed Central

    2013-01-01

    Background In genomic prediction, an important measure of accuracy is the correlation between the predicted and the true breeding values. Direct computation of this quantity for real datasets is not possible, because the true breeding value is unknown. Instead, the correlation between the predicted breeding values and the observed phenotypic values, called predictive ability, is often computed. In order to indirectly estimate predictive accuracy, this latter correlation is usually divided by an estimate of the square root of heritability. In this study we use simulation to evaluate estimates of predictive accuracy for seven methods, four (1 to 4) of which use an estimate of heritability to divide predictive ability computed by cross-validation. Between them the seven methods cover balanced and unbalanced datasets as well as correlated and uncorrelated genotypes. We propose one new indirect method (4) and two direct methods (5 and 6) for estimating predictive accuracy and compare their performances and those of four other existing approaches (three indirect (1 to 3) and one direct (7)) with simulated true predictive accuracy as the benchmark and with each other. Results The size of the estimated genetic variance and hence heritability exerted the strongest influence on the variation in the estimated predictive accuracy. Increasing the number of genotypes considerably increases the time required to compute predictive accuracy by all the seven methods, most notably for the five methods that require cross-validation (Methods 1, 2, 3, 4 and 6). A new method that we propose (Method 5) and an existing method (Method 7) used in animal breeding programs were the fastest and gave the least biased, most precise and stable estimates of predictive accuracy. Of the methods that use cross-validation Methods 4 and 6 were often the best. Conclusions The estimated genetic variance and the number of genotypes had the greatest influence on predictive accuracy. Methods 5 and 7 were the fastest and produced the least biased, the most precise, robust and stable estimates of predictive accuracy. These properties argue for routinely using Methods 5 and 7 to assess predictive accuracy in genomic selection studies. PMID:24314298

  12. Evaluation of approaches for estimating the accuracy of genomic prediction in plant breeding.

    PubMed

    Ould Estaghvirou, Sidi Boubacar; Ogutu, Joseph O; Schulz-Streeck, Torben; Knaak, Carsten; Ouzunova, Milena; Gordillo, Andres; Piepho, Hans-Peter

    2013-12-06

    In genomic prediction, an important measure of accuracy is the correlation between the predicted and the true breeding values. Direct computation of this quantity for real datasets is not possible, because the true breeding value is unknown. Instead, the correlation between the predicted breeding values and the observed phenotypic values, called predictive ability, is often computed. In order to indirectly estimate predictive accuracy, this latter correlation is usually divided by an estimate of the square root of heritability. In this study we use simulation to evaluate estimates of predictive accuracy for seven methods, four (1 to 4) of which use an estimate of heritability to divide predictive ability computed by cross-validation. Between them the seven methods cover balanced and unbalanced datasets as well as correlated and uncorrelated genotypes. We propose one new indirect method (4) and two direct methods (5 and 6) for estimating predictive accuracy and compare their performances and those of four other existing approaches (three indirect (1 to 3) and one direct (7)) with simulated true predictive accuracy as the benchmark and with each other. The size of the estimated genetic variance and hence heritability exerted the strongest influence on the variation in the estimated predictive accuracy. Increasing the number of genotypes considerably increases the time required to compute predictive accuracy by all the seven methods, most notably for the five methods that require cross-validation (Methods 1, 2, 3, 4 and 6). A new method that we propose (Method 5) and an existing method (Method 7) used in animal breeding programs were the fastest and gave the least biased, most precise and stable estimates of predictive accuracy. Of the methods that use cross-validation Methods 4 and 6 were often the best. The estimated genetic variance and the number of genotypes had the greatest influence on predictive accuracy. Methods 5 and 7 were the fastest and produced the least biased, the most precise, robust and stable estimates of predictive accuracy. These properties argue for routinely using Methods 5 and 7 to assess predictive accuracy in genomic selection studies.

  13. Transition index maps for urban growth simulation: application of artificial neural networks, weight of evidence and fuzzy multi-criteria evaluation.

    PubMed

    Shafizadeh-Moghadam, Hossein; Tayyebi, Amin; Helbich, Marco

    2017-06-01

    Transition index maps (TIMs) are key products in urban growth simulation models. However, their operationalization is still conflicting. Our aim was to compare the prediction accuracy of three TIM-based spatially explicit land cover change (LCC) models in the mega city of Mumbai, India. These LCC models include two data-driven approaches, namely artificial neural networks (ANNs) and weight of evidence (WOE), and one knowledge-based approach which integrates an analytical hierarchical process with fuzzy membership functions (FAHP). Using the relative operating characteristics (ROC), the performance of these three LCC models were evaluated. The results showed 85%, 75%, and 73% accuracy for the ANN, FAHP, and WOE. The ANN was clearly superior compared to the other LCC models when simulating urban growth for the year 2010; hence, ANN was used to predict urban growth for 2020 and 2030. Projected urban growth maps were assessed using statistical measures, including figure of merit, average spatial distance deviation, producer accuracy, and overall accuracy. Based on our findings, we recomend ANNs as an and accurate method for simulating future patterns of urban growth.

  14. [Research on partial least squares for determination of impurities in the presence of high concentration of matrix by ICP-AES].

    PubMed

    Wang, Yan-peng; Gong, Qi; Yu, Sheng-rong; Liu, You-yan

    2012-04-01

    A method for detecting trace impurities in high concentration matrix by ICP-AES based on partial least squares (PLS) was established. The research showed that PLS could effectively correct the interference caused by high level of matrix concentration error and could withstand higher concentrations of matrix than multicomponent spectral fitting (MSF). When the mass ratios of matrix to impurities were from 1 000 : 1 to 20 000 : 1, the recoveries of standard addition were between 95% and 105% by PLS. For the system in which interference effect has nonlinear correlation with the matrix concentrations, the prediction accuracy of normal PLS method was poor, but it can be improved greatly by using LIN-PPLS, which was based on matrix transformation of sample concentration. The contents of Co, Pb and Ga in stream sediment (GBW07312) were detected by MSF, PLS and LIN-PPLS respectively. The results showed that the prediction accuracy of LIN-PPLS was better than PLS, and the prediction accuracy of PLS was better than MSF.

  15. Wind Prediction Accuracy for Air Traffic Management Decision Support Tools

    NASA Technical Reports Server (NTRS)

    Cole, Rod; Green, Steve; Jardin, Matt; Schwartz, Barry; Benjamin, Stan

    2000-01-01

    The performance of Air Traffic Management and flight deck decision support tools depends in large part on the accuracy of the supporting 4D trajectory predictions. This is particularly relevant to conflict prediction and active advisories for the resolution of conflicts and the conformance with of traffic-flow management flow-rate constraints (e.g., arrival metering / required time of arrival). Flight test results have indicated that wind prediction errors may represent the largest source of trajectory prediction error. The tests also discovered relatively large errors (e.g., greater than 20 knots), existing in pockets of space and time critical to ATM DST performance (one or more sectors, greater than 20 minutes), are inadequately represented by the classic RMS aggregate prediction-accuracy studies of the past. To facilitate the identification and reduction of DST-critical wind-prediction errors, NASA has lead a collaborative research and development activity with MIT Lincoln Laboratories and the Forecast Systems Lab of the National Oceanographic and Atmospheric Administration (NOAA). This activity, begun in 1996, has focussed on the development of key metrics for ATM DST performance, assessment of wind-prediction skill for state of the art systems, and development/validation of system enhancements to improve skill. A 13 month study was conducted for the Denver Center airspace in 1997. Two complementary wind-prediction systems were analyzed and compared to the forecast performance of the then standard 60 km Rapid Update Cycle - version 1 (RUC-1). One system, developed by NOAA, was the prototype 40-km RUC-2 that became operational at NCEP in 1999. RUC-2 introduced a faster cycle (1 hr vs. 3 hr) and improved mesoscale physics. The second system, Augmented Winds (AW), is a prototype en route wind application developed by MITLL based on the Integrated Terminal Wind System (ITWS). AW is run at a local facility (Center) level, and updates RUC predictions based on an optimal interpolation of the latest ACARS reports since the RUC run. This paper presents an overview of the study's results including the identification and use of new large mor wind-prediction accuracy metrics that are key to ATM DST performance.

  16. Local-search based prediction of medical image registration error

    NASA Astrophysics Data System (ADS)

    Saygili, Görkem

    2018-03-01

    Medical image registration is a crucial task in many different medical imaging applications. Hence, considerable amount of work has been published recently that aim to predict the error in a registration without any human effort. If provided, these error predictions can be used as a feedback to the registration algorithm to further improve its performance. Recent methods generally start with extracting image-based and deformation-based features, then apply feature pooling and finally train a Random Forest (RF) regressor to predict the real registration error. Image-based features can be calculated after applying a single registration but provide limited accuracy whereas deformation-based features such as variation of deformation vector field may require up to 20 registrations which is a considerably high time-consuming task. This paper proposes to use extracted features from a local search algorithm as image-based features to estimate the error of a registration. The proposed method comprises a local search algorithm to find corresponding voxels between registered image pairs and based on the amount of shifts and stereo confidence measures, it predicts the amount of registration error in millimetres densely using a RF regressor. Compared to other algorithms in the literature, the proposed algorithm does not require multiple registrations, can be efficiently implemented on a Graphical Processing Unit (GPU) and can still provide highly accurate error predictions in existence of large registration error. Experimental results with real registrations on a public dataset indicate a substantially high accuracy achieved by using features from the local search algorithm.

  17. Response of data-driven artificial neural network-based TEC models to neutral wind for different locations, seasons, and solar activity levels from the Indian longitude sector

    NASA Astrophysics Data System (ADS)

    Sur, D.; Haldar, S.; Ray, S.; Paul, A.

    2017-07-01

    The perturbations imposed on transionospheric signals by the ionosphere are a major concern for navigation. The dynamic nature of the ionosphere in the low-latitude equatorial region and the Indian longitude sector has some specific characteristics such as sharp temporal and latitudinal variation of total electron content (TEC). TEC in the Indian longitude sector also undergoes seasonal variations. The large magnitude and sharp variation of TEC cause large and variable range errors for satellite-based navigation system such as Global Positioning System (GPS) throughout the day. For accurate navigation using satellite-based augmentation systems, proper prediction of TEC under certain geophysical conditions is necessary in the equatorial region. It has been reported in the literature that prediction accuracy of TEC has been improved using measured data-driven artificial neural network (ANN)-based vertical TEC (VTEC) models, compared to standard ionospheric models. A set of observations carried out in the Indian longitude sector have been reported in this paper in order to find the amount of improvement in performance accuracy of an ANN-based VTEC model after incorporation of neutral wind as model input. The variations of this improvement in prediction accuracy with respect to latitude, longitude, season, and solar activity have also been reported in this paper.

  18. Comparison of integrated clustering methods for accurate and stable prediction of building energy consumption data

    DOE PAGES

    Hsu, David

    2015-09-27

    Clustering methods are often used to model energy consumption for two reasons. First, clustering is often used to process data and to improve the predictive accuracy of subsequent energy models. Second, stable clusters that are reproducible with respect to non-essential changes can be used to group, target, and interpret observed subjects. However, it is well known that clustering methods are highly sensitive to the choice of algorithms and variables. This can lead to misleading assessments of predictive accuracy and mis-interpretation of clusters in policymaking. This paper therefore introduces two methods to the modeling of energy consumption in buildings: clusterwise regression,more » also known as latent class regression, which integrates clustering and regression simultaneously; and cluster validation methods to measure stability. Using a large dataset of multifamily buildings in New York City, clusterwise regression is compared to common two-stage algorithms that use K-means and model-based clustering with linear regression. Predictive accuracy is evaluated using 20-fold cross validation, and the stability of the perturbed clusters is measured using the Jaccard coefficient. These results show that there seems to be an inherent tradeoff between prediction accuracy and cluster stability. This paper concludes by discussing which clustering methods may be appropriate for different analytical purposes.« less

  19. OH-PRED: prediction of protein hydroxylation sites by incorporating adapted normal distribution bi-profile Bayes feature extraction and physicochemical properties of amino acids.

    PubMed

    Jia, Cang-Zhi; He, Wen-Ying; Yao, Yu-Hua

    2017-03-01

    Hydroxylation of proline or lysine residues in proteins is a common post-translational modification event, and such modifications are found in many physiological and pathological processes. Nonetheless, the exact molecular mechanism of hydroxylation remains under investigation. Because experimental identification of hydroxylation is time-consuming and expensive, bioinformatics tools with high accuracy represent desirable alternatives for large-scale rapid identification of protein hydroxylation sites. In view of this, we developed a supporter vector machine-based tool, OH-PRED, for the prediction of protein hydroxylation sites using the adapted normal distribution bi-profile Bayes feature extraction in combination with the physicochemical property indexes of the amino acids. In a jackknife cross validation, OH-PRED yields an accuracy of 91.88% and a Matthew's correlation coefficient (MCC) of 0.838 for the prediction of hydroxyproline sites, and yields an accuracy of 97.42% and a MCC of 0.949 for the prediction of hydroxylysine sites. These results demonstrate that OH-PRED increased significantly the prediction accuracy of hydroxyproline and hydroxylysine sites by 7.37 and 14.09%, respectively, when compared with the latest predictor PredHydroxy. In independent tests, OH-PRED also outperforms previously published methods.

  20. Building a knowledge-based statistical potential by capturing high-order inter-residue interactions and its applications in protein secondary structure assessment.

    PubMed

    Li, Yaohang; Liu, Hui; Rata, Ionel; Jakobsson, Eric

    2013-02-25

    The rapidly increasing number of protein crystal structures available in the Protein Data Bank (PDB) has naturally made statistical analyses feasible in studying complex high-order inter-residue correlations. In this paper, we report a context-based secondary structure potential (CSSP) for assessing the quality of predicted protein secondary structures generated by various prediction servers. CSSP is a sequence-position-specific knowledge-based potential generated based on the potentials of mean force approach, where high-order inter-residue interactions are taken into consideration. The CSSP potential is effective in identifying secondary structure predictions with good quality. In 56% of the targets in the CB513 benchmark, the optimal CSSP potential is able to recognize the native secondary structure or a prediction with Q3 accuracy higher than 90% as best scored in the predicted secondary structures generated by 10 popularly used secondary structure prediction servers. In more than 80% of the CB513 targets, the predicted secondary structures with the lowest CSSP potential values yield higher than 80% Q3 accuracy. Similar performance of CSSP is found on the CASP9 targets as well. Moreover, our computational results also show that the CSSP potential using triplets outperforms the CSSP potential using doublets and is currently better than the CSSP potential using quartets.

  1. Research on cardiovascular disease prediction based on distance metric learning

    NASA Astrophysics Data System (ADS)

    Ni, Zhuang; Liu, Kui; Kang, Guixia

    2018-04-01

    Distance metric learning algorithm has been widely applied to medical diagnosis and exhibited its strengths in classification problems. The k-nearest neighbour (KNN) is an efficient method which treats each feature equally. The large margin nearest neighbour classification (LMNN) improves the accuracy of KNN by learning a global distance metric, which did not consider the locality of data distributions. In this paper, we propose a new distance metric algorithm adopting cosine metric and LMNN named COS-SUBLMNN which takes more care about local feature of data to overcome the shortage of LMNN and improve the classification accuracy. The proposed methodology is verified on CVDs patient vector derived from real-world medical data. The Experimental results show that our method provides higher accuracy than KNN and LMNN did, which demonstrates the effectiveness of the Risk predictive model of CVDs based on COS-SUBLMNN.

  2. Parent-based diagnosis of ADHD is as accurate as a teacher-based diagnosis of ADHD.

    PubMed

    Bied, Adam; Biederman, Joseph; Faraone, Stephen

    2017-04-01

    To review the literature evaluating the psychometric properties of parent and teacher informants relative to a gold-standard ADHD diagnosis in pediatric populations. We included studies that included both a parent and teacher informant, a gold-standard diagnosis, and diagnostic accuracy metrics. Potential confounds were evaluated. We also assessed the 'OR' and the 'AND' rules for combining informant reports. Eight articles met inclusion criteria. The diagnostic accuracy for predicting gold standard ADHD diagnoses did not differ between parents and teachers. Sample size, sample type, participant drop-out, participant age, participant gender, geographic area of the study, and date of study publication were assessed as potential confounds. Parent and teachers both yielded moderate to good diagnostic accuracy for ADHD diagnoses. Parent reports were statistically indistinguishable from those of teachers. The predictive features of the 'OR' and 'AND' rules are useful in evaluating approaches to better integrating information from these informants.

  3. Thermodynamics and proton activities of protic ionic liquids with quantum cluster equilibrium theory

    NASA Astrophysics Data System (ADS)

    Ingenmey, Johannes; von Domaros, Michael; Perlt, Eva; Verevkin, Sergey P.; Kirchner, Barbara

    2018-05-01

    We applied the binary Quantum Cluster Equilibrium (bQCE) method to a number of alkylammonium-based protic ionic liquids in order to predict boiling points, vaporization enthalpies, and proton activities. The theory combines statistical thermodynamics of van-der-Waals-type clusters with ab initio quantum chemistry and yields the partition functions (and associated thermodynamic potentials) of binary mixtures over a wide range of thermodynamic phase points. Unlike conventional cluster approaches that are limited to the prediction of thermodynamic properties, dissociation reactions can be effortlessly included into the bQCE formalism, giving access to ionicities, as well. The method is open to quantum chemical methods at any level of theory, but combination with low-cost composite density functional theory methods and the proposed systematic approach to generate cluster sets provides a computationally inexpensive and mostly parameter-free way to predict such properties at good-to-excellent accuracy. Boiling points can be predicted within an accuracy of 50 K, reaching excellent accuracy for ethylammonium nitrate. Vaporization enthalpies are predicted within an accuracy of 20 kJ mol-1 and can be systematically interpreted on a molecular level. We present the first theoretical approach to predict proton activities in protic ionic liquids, with results fitting well into the experimentally observed correlation. Furthermore, enthalpies of vaporization were measured experimentally for some alkylammonium nitrates and an excellent linear correlation with vaporization enthalpies of their respective parent amines is observed.

  4. Advanced turboprop noise prediction based on recent theoretical results

    NASA Technical Reports Server (NTRS)

    Farassat, F.; Padula, S. L.; Dunn, M. H.

    1987-01-01

    The development of a high speed propeller noise prediction code at Langley Research Center is described. The code utilizes two recent acoustic formulations in the time domain for subsonic and supersonic sources. The structure and capabilities of the code are discussed. Grid size study for accuracy and speed of execution on a computer is also presented. The code is tested against an earlier Langley code. Considerable increase in accuracy and speed of execution are observed. Some examples of noise prediction of a high speed propeller for which acoustic test data are available are given. A brisk derivation of formulations used is given in an appendix.

  5. SeqRate: sequence-based protein folding type classification and rates prediction

    PubMed Central

    2010-01-01

    Background Protein folding rate is an important property of a protein. Predicting protein folding rate is useful for understanding protein folding process and guiding protein design. Most previous methods of predicting protein folding rate require the tertiary structure of a protein as an input. And most methods do not distinguish the different kinetic nature (two-state folding or multi-state folding) of the proteins. Here we developed a method, SeqRate, to predict both protein folding kinetic type (two-state versus multi-state) and real-value folding rate using sequence length, amino acid composition, contact order, contact number, and secondary structure information predicted from only protein sequence with support vector machines. Results We systematically studied the contributions of individual features to folding rate prediction. On a standard benchmark dataset, the accuracy of folding kinetic type classification is 80%. The Pearson correlation coefficient and the mean absolute difference between predicted and experimental folding rates (sec-1) in the base-10 logarithmic scale are 0.81 and 0.79 for two-state protein folders, and 0.80 and 0.68 for three-state protein folders. SeqRate is the first sequence-based method for protein folding type classification and its accuracy of fold rate prediction is improved over previous sequence-based methods. Its performance can be further enhanced with additional information, such as structure-based geometric contacts, as inputs. Conclusions Both the web server and software of predicting folding rate are publicly available at http://casp.rnet.missouri.edu/fold_rate/index.html. PMID:20438647

  6. A Combination of Geographically Weighted Regression, Particle Swarm Optimization and Support Vector Machine for Landslide Susceptibility Mapping: A Case Study at Wanzhou in the Three Gorges Area, China

    PubMed Central

    Yu, Xianyu; Wang, Yi; Niu, Ruiqing; Hu, Youjian

    2016-01-01

    In this study, a novel coupling model for landslide susceptibility mapping is presented. In practice, environmental factors may have different impacts at a local scale in study areas. To provide better predictions, a geographically weighted regression (GWR) technique is firstly used in our method to segment study areas into a series of prediction regions with appropriate sizes. Meanwhile, a support vector machine (SVM) classifier is exploited in each prediction region for landslide susceptibility mapping. To further improve the prediction performance, the particle swarm optimization (PSO) algorithm is used in the prediction regions to obtain optimal parameters for the SVM classifier. To evaluate the prediction performance of our model, several SVM-based prediction models are utilized for comparison on a study area of the Wanzhou district in the Three Gorges Reservoir. Experimental results, based on three objective quantitative measures and visual qualitative evaluation, indicate that our model can achieve better prediction accuracies and is more effective for landslide susceptibility mapping. For instance, our model can achieve an overall prediction accuracy of 91.10%, which is 7.8%–19.1% higher than the traditional SVM-based models. In addition, the obtained landslide susceptibility map by our model can demonstrate an intensive correlation between the classified very high-susceptibility zone and the previously investigated landslides. PMID:27187430

  7. A Combination of Geographically Weighted Regression, Particle Swarm Optimization and Support Vector Machine for Landslide Susceptibility Mapping: A Case Study at Wanzhou in the Three Gorges Area, China.

    PubMed

    Yu, Xianyu; Wang, Yi; Niu, Ruiqing; Hu, Youjian

    2016-05-11

    In this study, a novel coupling model for landslide susceptibility mapping is presented. In practice, environmental factors may have different impacts at a local scale in study areas. To provide better predictions, a geographically weighted regression (GWR) technique is firstly used in our method to segment study areas into a series of prediction regions with appropriate sizes. Meanwhile, a support vector machine (SVM) classifier is exploited in each prediction region for landslide susceptibility mapping. To further improve the prediction performance, the particle swarm optimization (PSO) algorithm is used in the prediction regions to obtain optimal parameters for the SVM classifier. To evaluate the prediction performance of our model, several SVM-based prediction models are utilized for comparison on a study area of the Wanzhou district in the Three Gorges Reservoir. Experimental results, based on three objective quantitative measures and visual qualitative evaluation, indicate that our model can achieve better prediction accuracies and is more effective for landslide susceptibility mapping. For instance, our model can achieve an overall prediction accuracy of 91.10%, which is 7.8%-19.1% higher than the traditional SVM-based models. In addition, the obtained landslide susceptibility map by our model can demonstrate an intensive correlation between the classified very high-susceptibility zone and the previously investigated landslides.

  8. Analysis of the Mean Absolute Error (MAE) and the Root Mean Square Error (RMSE) in Assessing Rounding Model

    NASA Astrophysics Data System (ADS)

    Wang, Weijie; Lu, Yanmin

    2018-03-01

    Most existing Collaborative Filtering (CF) algorithms predict a rating as the preference of an active user toward a given item, which is always a decimal fraction. Meanwhile, the actual ratings in most data sets are integers. In this paper, we discuss and demonstrate why rounding can bring different influences to these two metrics; prove that rounding is necessary in post-processing of the predicted ratings, eliminate of model prediction bias, improving the accuracy of the prediction. In addition, we also propose two new rounding approaches based on the predicted rating probability distribution, which can be used to round the predicted rating to an optimal integer rating, and get better prediction accuracy compared to the Basic Rounding approach. Extensive experiments on different data sets validate the correctness of our analysis and the effectiveness of our proposed rounding approaches.

  9. Experimental and computational prediction of glass transition temperature of drugs.

    PubMed

    Alzghoul, Ahmad; Alhalaweh, Amjad; Mahlin, Denny; Bergström, Christel A S

    2014-12-22

    Glass transition temperature (Tg) is an important inherent property of an amorphous solid material which is usually determined experimentally. In this study, the relation between Tg and melting temperature (Tm) was evaluated using a data set of 71 structurally diverse druglike compounds. Further, in silico models for prediction of Tg were developed based on calculated molecular descriptors and linear (multilinear regression, partial least-squares, principal component regression) and nonlinear (neural network, support vector regression) modeling techniques. The models based on Tm predicted Tg with an RMSE of 19.5 K for the test set. Among the five computational models developed herein the support vector regression gave the best result with RMSE of 18.7 K for the test set using only four chemical descriptors. Hence, two different models that predict Tg of drug-like molecules with high accuracy were developed. If Tm is available, a simple linear regression can be used to predict Tg. However, the results also suggest that support vector regression and calculated molecular descriptors can predict Tg with equal accuracy, already before compound synthesis.

  10. Developing an in silico minimum inhibitory concentration panel test for Klebsiella pneumoniae

    DOE PAGES

    Nguyen, Marcus; Brettin, Thomas; Long, S. Wesley; ...

    2018-01-11

    Here, antimicrobial resistant infections are a serious public health threat worldwide. Whole genome sequencing approaches to rapidly identify pathogens and predict antibiotic resistance phenotypes are becoming more feasible and may offer a way to reduce clinical test turnaround times compared to conventional culture-based methods, and in turn, improve patient outcomes. In this study, we use whole genome sequence data from 1668 clinical isolates of Klebsiella pneumoniae to develop a XGBoost-based machine learning model that accurately predicts minimum inhibitory concentrations (MICs) for 20 antibiotics. The overall accuracy of the model, within ± 1 two-fold dilution factor, is 92%. Individual accuracies aremore » >= 90% for 15/20 antibiotics. We show that the MICs predicted by the model correlate with known antimicrobial resistance genes. Importantly, the genome-wide approach described in this study offers a way to predict MICs for isolates without knowledge of the underlying gene content. This study shows that machine learning can be used to build a complete in silico MIC prediction panel for K. pneumoniae and provides a framework for building MIC prediction models for other pathogenic bacteria.« less

  11. Developing an in silico minimum inhibitory concentration panel test for Klebsiella pneumoniae

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nguyen, Marcus; Brettin, Thomas; Long, S. Wesley

    Here, antimicrobial resistant infections are a serious public health threat worldwide. Whole genome sequencing approaches to rapidly identify pathogens and predict antibiotic resistance phenotypes are becoming more feasible and may offer a way to reduce clinical test turnaround times compared to conventional culture-based methods, and in turn, improve patient outcomes. In this study, we use whole genome sequence data from 1668 clinical isolates of Klebsiella pneumoniae to develop a XGBoost-based machine learning model that accurately predicts minimum inhibitory concentrations (MICs) for 20 antibiotics. The overall accuracy of the model, within ± 1 two-fold dilution factor, is 92%. Individual accuracies aremore » >= 90% for 15/20 antibiotics. We show that the MICs predicted by the model correlate with known antimicrobial resistance genes. Importantly, the genome-wide approach described in this study offers a way to predict MICs for isolates without knowledge of the underlying gene content. This study shows that machine learning can be used to build a complete in silico MIC prediction panel for K. pneumoniae and provides a framework for building MIC prediction models for other pathogenic bacteria.« less

  12. Towards an Online Seizure Advisory System-An Adaptive Seizure Prediction Framework Using Active Learning Heuristics.

    PubMed

    Karuppiah Ramachandran, Vignesh Raja; Alblas, Huibert J; Le, Duc V; Meratnia, Nirvana

    2018-05-24

    In the last decade, seizure prediction systems have gained a lot of attention because of their enormous potential to largely improve the quality-of-life of the epileptic patients. The accuracy of the prediction algorithms to detect seizure in real-world applications is largely limited because the brain signals are inherently uncertain and affected by various factors, such as environment, age, drug intake, etc., in addition to the internal artefacts that occur during the process of recording the brain signals. To deal with such ambiguity, researchers transitionally use active learning, which selects the ambiguous data to be annotated by an expert and updates the classification model dynamically. However, selecting the particular data from a pool of large ambiguous datasets to be labelled by an expert is still a challenging problem. In this paper, we propose an active learning-based prediction framework that aims to improve the accuracy of the prediction with a minimum number of labelled data. The core technique of our framework is employing the Bernoulli-Gaussian Mixture model (BGMM) to determine the feature samples that have the most ambiguity to be annotated by an expert. By doing so, our approach facilitates expert intervention as well as increasing medical reliability. We evaluate seven different classifiers in terms of the classification time and memory required. An active learning framework built on top of the best performing classifier is evaluated in terms of required annotation effort to achieve a high level of prediction accuracy. The results show that our approach can achieve the same accuracy as a Support Vector Machine (SVM) classifier using only 20 % of the labelled data and also improve the prediction accuracy even under the noisy condition.

  13. Comparing ordinary kriging and inverse distance weighting for soil as pollution in Beijing.

    PubMed

    Qiao, Pengwei; Lei, Mei; Yang, Sucai; Yang, Jun; Guo, Guanghui; Zhou, Xiaoyong

    2018-06-01

    Spatial interpolation method is the basis of soil heavy metal pollution assessment and remediation. The existing evaluation index for interpolation accuracy did not combine with actual situation. The selection of interpolation methods needs to be based on specific research purposes and research object characteristics. In this paper, As pollution in soils of Beijing was taken as an example. The prediction accuracy of ordinary kriging (OK) and inverse distance weighted (IDW) were evaluated based on the cross validation results and spatial distribution characteristics of influencing factors. The results showed that, under the condition of specific spatial correlation, the cross validation results of OK and IDW for every soil point and the prediction accuracy of spatial distribution trend are similar. But the prediction accuracy of OK for the maximum and minimum is less than IDW, while the number of high pollution areas identified by OK are less than IDW. It is difficult to identify the high pollution areas fully by OK, which shows that the smoothing effect of OK is obvious. In addition, with increasing of the spatial correlation of As concentration, the cross validation error of OK and IDW decreases, and the high pollution area identified by OK is approaching the result of IDW, which can identify the high pollution areas more comprehensively. However, because the semivariogram constructed by OK interpolation method is more subjective and requires larger number of soil samples, IDW is more suitable for spatial prediction of heavy metal pollution in soils.

  14. Genomic estimation of additive and dominance effects and impact of accounting for dominance on accuracy of genomic evaluation in sheep populations.

    PubMed

    Moghaddar, N; van der Werf, J H J

    2017-12-01

    The objectives of this study were to estimate the additive and dominance variance component of several weight and ultrasound scanned body composition traits in purebred and combined cross-bred sheep populations based on single nucleotide polymorphism (SNP) marker genotypes and then to investigate the effect of fitting additive and dominance effects on accuracy of genomic evaluation. Additive and dominance variance components were estimated in a mixed model equation based on "average information restricted maximum likelihood" using additive and dominance (co)variances between animals calculated from 48,599 SNP marker genotypes. Genomic prediction was based on genomic best linear unbiased prediction (GBLUP), and the accuracy of prediction was assessed based on a random 10-fold cross-validation. Across different weight and scanned body composition traits, dominance variance ranged from 0.0% to 7.3% of the phenotypic variance in the purebred population and from 7.1% to 19.2% in the combined cross-bred population. In the combined cross-bred population, the range of dominance variance decreased to 3.1% and 9.9% after accounting for heterosis effects. Accounting for dominance effects significantly improved the likelihood of the fitting model in the combined cross-bred population. This study showed a substantial dominance genetic variance for weight and ultrasound scanned body composition traits particularly in cross-bred population; however, improvement in the accuracy of genomic breeding values was small and statistically not significant. Dominance variance estimates in combined cross-bred population could be overestimated if heterosis is not fitted in the model. © 2017 Blackwell Verlag GmbH.

  15. Improving prediction accuracy of cooling load using EMD, PSR and RBFNN

    NASA Astrophysics Data System (ADS)

    Shen, Limin; Wen, Yuanmei; Li, Xiaohong

    2017-08-01

    To increase the accuracy for the prediction of cooling load demand, this work presents an EMD (empirical mode decomposition)-PSR (phase space reconstruction) based RBFNN (radial basis function neural networks) method. Firstly, analyzed the chaotic nature of the real cooling load demand, transformed the non-stationary cooling load historical data into several stationary intrinsic mode functions (IMFs) by using EMD. Secondly, compared the RBFNN prediction accuracies of each IMFs and proposed an IMF combining scheme that is combine the lower-frequency components (called IMF4-IMF6 combined) while keep the higher frequency component (IMF1, IMF2, IMF3) and the residual unchanged. Thirdly, reconstruct phase space for each combined components separately, process the highest frequency component (IMF1) by differential method and predict with RBFNN in the reconstructed phase spaces. Real cooling load data of a centralized ice storage cooling systems in Guangzhou are used for simulation. The results show that the proposed hybrid method outperforms the traditional methods.

  16. Predictive accuracy of combined genetic and environmental risk scores.

    PubMed

    Dudbridge, Frank; Pashayan, Nora; Yang, Jian

    2018-02-01

    The substantial heritability of most complex diseases suggests that genetic data could provide useful risk prediction. To date the performance of genetic risk scores has fallen short of the potential implied by heritability, but this can be explained by insufficient sample sizes for estimating highly polygenic models. When risk predictors already exist based on environment or lifestyle, two key questions are to what extent can they be improved by adding genetic information, and what is the ultimate potential of combined genetic and environmental risk scores? Here, we extend previous work on the predictive accuracy of polygenic scores to allow for an environmental score that may be correlated with the polygenic score, for example when the environmental factors mediate the genetic risk. We derive common measures of predictive accuracy and improvement as functions of the training sample size, chip heritabilities of disease and environmental score, and genetic correlation between disease and environmental risk factors. We consider simple addition of the two scores and a weighted sum that accounts for their correlation. Using examples from studies of cardiovascular disease and breast cancer, we show that improvements in discrimination are generally small but reasonable degrees of reclassification could be obtained with current sample sizes. Correlation between genetic and environmental scores has only minor effects on numerical results in realistic scenarios. In the longer term, as the accuracy of polygenic scores improves they will come to dominate the predictive accuracy compared to environmental scores. © 2017 WILEY PERIODICALS, INC.

  17. Predictive accuracy of combined genetic and environmental risk scores

    PubMed Central

    Pashayan, Nora; Yang, Jian

    2017-01-01

    ABSTRACT The substantial heritability of most complex diseases suggests that genetic data could provide useful risk prediction. To date the performance of genetic risk scores has fallen short of the potential implied by heritability, but this can be explained by insufficient sample sizes for estimating highly polygenic models. When risk predictors already exist based on environment or lifestyle, two key questions are to what extent can they be improved by adding genetic information, and what is the ultimate potential of combined genetic and environmental risk scores? Here, we extend previous work on the predictive accuracy of polygenic scores to allow for an environmental score that may be correlated with the polygenic score, for example when the environmental factors mediate the genetic risk. We derive common measures of predictive accuracy and improvement as functions of the training sample size, chip heritabilities of disease and environmental score, and genetic correlation between disease and environmental risk factors. We consider simple addition of the two scores and a weighted sum that accounts for their correlation. Using examples from studies of cardiovascular disease and breast cancer, we show that improvements in discrimination are generally small but reasonable degrees of reclassification could be obtained with current sample sizes. Correlation between genetic and environmental scores has only minor effects on numerical results in realistic scenarios. In the longer term, as the accuracy of polygenic scores improves they will come to dominate the predictive accuracy compared to environmental scores. PMID:29178508

  18. Does Maternal Body Mass Index Have an Effect on the Accuracy of Ultrasound-Derived Estimated Birth Weight?: A Retrospective Study.

    PubMed

    Gonzalez, Maritza G; Reed, Kathryn L; Center, Katherine E; Hill, Meghan G

    2017-05-01

    The purpose of this study was to investigate the relationship between the maternal body mass index (BMI) and the accuracy of ultrasound-derived birth weight. A retrospective chart review was performed on women who had an ultrasound examination between 36 and 43 weeks' gestation and had complete delivery data available through electronic medical records. The ultrasound-derived fetal weight was adjusted by 30 g per day of gestation that elapsed between the ultrasound examination and delivery to arrive at the predicted birth weight. A total of 403 pregnant women met inclusion criteria. Age ranged from 13-44 years (mean ± SD, 28.38 ± 5.97 years). The mean BMI was 32.62 ± 8.59 kg/m 2 . Most of the women did not have diabetes (n = 300 [74.0%]). The sample was primarily white (n = 165 [40.9%]) and Hispanic (n = 147 [36.5%]). The predicted weight of neonates at delivery (3677.07 ± 540.51 g) was higher than the actual birth weight (3335.92 ± 585.46 g). Based on regression analyses, as the BMI increased, so did the predicted weight (P < .01) and weight at delivery (P < .01). The accuracy of the estimated ultrasound-derived birth weight was not predicted by the maternal BMI (P = .22). Maternal race and diabetes status were not associated with the accuracy of ultrasound in predicting birth weight. Both predicted and actual birth weight increased as the BMI increased. However, the BMI did not affect the accuracy of the estimated ultrasound-derived birth weight. Maternal race and diabetes status did not influence the accuracy of the ultrasound-derived predicted birth weight. © 2017 by the American Institute of Ultrasound in Medicine.

  19. Diagnostic Classification of Schizophrenia Patients on the Basis of Regional Reward-Related fMRI Signal Patterns

    PubMed Central

    Koch, Stefan P.; Hägele, Claudia; Haynes, John-Dylan; Heinz, Andreas; Schlagenhauf, Florian; Sterzer, Philipp

    2015-01-01

    Functional neuroimaging has provided evidence for altered function of mesolimbic circuits implicated in reward processing, first and foremost the ventral striatum, in patients with schizophrenia. While such findings based on significant group differences in brain activations can provide important insights into the pathomechanisms of mental disorders, the use of neuroimaging results from standard univariate statistical analysis for individual diagnosis has proven difficult. In this proof of concept study, we tested whether the predictive accuracy for the diagnostic classification of schizophrenia patients vs. healthy controls could be improved using multivariate pattern analysis (MVPA) of regional functional magnetic resonance imaging (fMRI) activation patterns for the anticipation of monetary reward. With a searchlight MVPA approach using support vector machine classification, we found that the diagnostic category could be predicted from local activation patterns in frontal, temporal, occipital and midbrain regions, with a maximal cluster peak classification accuracy of 93% for the right pallidum. Region-of-interest based MVPA for the ventral striatum achieved a maximal cluster peak accuracy of 88%, whereas the classification accuracy on the basis of standard univariate analysis reached only 75%. Moreover, using support vector regression we could additionally predict the severity of negative symptoms from ventral striatal activation patterns. These results show that MVPA can be used to substantially increase the accuracy of diagnostic classification on the basis of task-related fMRI signal patterns in a regionally specific way. PMID:25799236

  20. Prediction of high-energy radiation belt electron fluxes using a combined VERB-NARMAX model

    NASA Astrophysics Data System (ADS)

    Pakhotin, I. P.; Balikhin, M. A.; Shprits, Y.; Subbotin, D.; Boynton, R.

    2013-12-01

    This study is concerned with the modelling and forecasting of energetic electron fluxes that endanger satellites in space. By combining data-driven predictions from the NARMAX methodology with the physics-based VERB code, it becomes possible to predict electron fluxes with a high level of accuracy and across a radial distance from inside the local acceleration region to out beyond geosynchronous orbit. The model coupling also makes is possible to avoid accounting for seed electron variations at the outer boundary. Conversely, combining a convection code with the VERB and NARMAX models has the potential to provide even greater accuracy in forecasting that is not limited to geostationary orbit but makes predictions across the entire outer radiation belt region.

  1. Predicting treatment response to cognitive behavioral therapy in panic disorder with agoraphobia by integrating local neural information.

    PubMed

    Hahn, Tim; Kircher, Tilo; Straube, Benjamin; Wittchen, Hans-Ulrich; Konrad, Carsten; Ströhle, Andreas; Wittmann, André; Pfleiderer, Bettina; Reif, Andreas; Arolt, Volker; Lueken, Ulrike

    2015-01-01

    Although neuroimaging research has made substantial progress in identifying the large-scale neural substrate of anxiety disorders, its value for clinical application lags behind expectations. Machine-learning approaches have predictive potential for individual-patient prognostic purposes and might thus aid translational efforts in psychiatric research. To predict treatment response to cognitive behavioral therapy (CBT) on an individual-patient level based on functional magnetic resonance imaging data in patients with panic disorder with agoraphobia (PD/AG). We included 49 patients free of medication for at least 4 weeks and with a primary diagnosis of PD/AG in a longitudinal study performed at 8 clinical research institutes and outpatient centers across Germany. The functional magnetic resonance imaging study was conducted between July 2007 and March 2010. Twelve CBT sessions conducted 2 times a week focusing on behavioral exposure. Treatment response was defined as exceeding a 50% reduction in Hamilton Anxiety Rating Scale scores. Blood oxygenation level-dependent signal was measured during a differential fear-conditioning task. Regional and whole-brain gaussian process classifiers using a nested leave-one-out cross-validation were used to predict the treatment response from data acquired before CBT. Although no single brain region was predictive of treatment response, integrating regional classifiers based on data from the acquisition and the extinction phases of the fear-conditioning task for the whole brain yielded good predictive performance (accuracy, 82%; sensitivity, 92%; specificity, 72%; P < .001). Data from the acquisition phase enabled 73% correct individual-patient classifications (sensitivity, 80%; specificity, 67%; P < .001), whereas data from the extinction phase led to an accuracy of 74% (sensitivity, 64%; specificity, 83%; P < .001). Conservative reanalyses under consideration of potential confounders yielded nominally lower but comparable accuracy rates (acquisition phase, 70%; extinction phase, 71%; combined, 79%). Predicting treatment response to CBT based on functional neuroimaging data in PD/AG is possible with high accuracy on an individual-patient level. This novel machine-learning approach brings personalized medicine within reach, directly supporting clinical decisions for the selection of treatment options, thus helping to improve response rates.

  2. The Co-Development of Skill at and Preference for Use of Retrieval-Based Processes for Solving Addition Problems: Individual and Sex Differences from First to Sixth Grade

    PubMed Central

    Bailey, Drew H.; Littlefield, Andrew; Geary, David C.

    2012-01-01

    The ability to retrieve basic arithmetic facts from long-term memory contributes to individual and perhaps sex differences in mathematics achievement. The current study tracked the co-development of preference for using retrieval over other strategies to solve single-digit addition problems, independent of accuracy, and skilled use of retrieval (i.e., accuracy and RT) from first to sixth grade, inclusive (n = 311). Accurate retrieval in first grade was related to working memory capacity and intelligence and predicted a preference for retrieval in second grade. In later grades, the relation between skill and preference changed such that preference in one grade predicted accuracy and RT in the next, as RT and accuracy continued to predict future gains in preference. In comparison to girls, boys had a consistent preference for retrieval over other strategies and had faster retrieval speeds, but the sex difference in retrieval accuracy varied across grades. Results indicate ability influences early skilled retrieval but both practice and skill influence each other in a feedback loop later in development, and provide insights into the source of the sex difference in problem solving approaches. PMID:22704036

  3. Prediction of successful memory encoding based on single-trial rhinal and hippocampal phase information.

    PubMed

    Höhne, Marlene; Jahanbekam, Amirhossein; Bauckhage, Christian; Axmacher, Nikolai; Fell, Juergen

    2016-10-01

    Mediotemporal EEG characteristics are closely related to long-term memory formation. It has been reported that rhinal and hippocampal EEG measures reflecting the stability of phases across trials are better suited to distinguish subsequently remembered from forgotten trials than event-related potentials or amplitude-based measures. Theoretical models suggest that the phase of EEG oscillations reflects neural excitability and influences cellular plasticity. However, while previous studies have shown that the stability of phase values across trials is indeed a relevant predictor of subsequent memory performance, the effect of absolute single-trial phase values has been little explored. Here, we reanalyzed intracranial EEG recordings from the mediotemporal lobe of 27 epilepsy patients performing a continuous word recognition paradigm. Two-class classification using a support vector machine was performed to predict subsequently remembered vs. forgotten trials based on individually selected frequencies and time points. We demonstrate that it is possible to successfully predict single-trial memory formation in the majority of patients (23 out of 27) based on only three single-trial phase values given by a rhinal phase, a hippocampal phase, and a rhinal-hippocampal phase difference. Overall classification accuracy across all subjects was 69.2% choosing frequencies from the range between 0.5 and 50Hz and time points from the interval between -0.5s and 2s. For 19 patients, above chance prediction of subsequent memory was possible even when choosing only time points from the prestimulus interval (overall accuracy: 65.2%). Furthermore, prediction accuracies based on single-trial phase surpassed those based on single-trial power. Our results confirm the functional relevance of mediotemporal EEG phase for long-term memory operations and suggest that phase information may be utilized for memory enhancement applications based on deep brain stimulation. Copyright © 2016 Elsevier Inc. All rights reserved.

  4. Protein subcellular localization prediction using artificial intelligence technology.

    PubMed

    Nair, Rajesh; Rost, Burkhard

    2008-01-01

    Proteins perform many important tasks in living organisms, such as catalysis of biochemical reactions, transport of nutrients, and recognition and transmission of signals. The plethora of aspects of the role of any particular protein is referred to as its "function." One aspect of protein function that has been the target of intensive research by computational biologists is its subcellular localization. Proteins must be localized in the same subcellular compartment to cooperate toward a common physiological function. Aberrant subcellular localization of proteins can result in several diseases, including kidney stones, cancer, and Alzheimer's disease. To date, sequence homology remains the most widely used method for inferring the function of a protein. However, the application of advanced artificial intelligence (AI)-based techniques in recent years has resulted in significant improvements in our ability to predict the subcellular localization of a protein. The prediction accuracy has risen steadily over the years, in large part due to the application of AI-based methods such as hidden Markov models (HMMs), neural networks (NNs), and support vector machines (SVMs), although the availability of larger experimental datasets has also played a role. Automatic methods that mine textual information from the biological literature and molecular biology databases have considerably sped up the process of annotation for proteins for which some information regarding function is available in the literature. State-of-the-art methods based on NNs and HMMs can predict the presence of N-terminal sorting signals extremely accurately. Ab initio methods that predict subcellular localization for any protein sequence using only the native amino acid sequence and features predicted from the native sequence have shown the most remarkable improvements. The prediction accuracy of these methods has increased by over 30% in the past decade. The accuracy of these methods is now on par with high-throughput methods for predicting localization, and they are beginning to play an important role in directing experimental research. In this chapter, we review some of the most important methods for the prediction of subcellular localization.

  5. Minimalist ensemble algorithms for genome-wide protein localization prediction.

    PubMed

    Lin, Jhih-Rong; Mondal, Ananda Mohan; Liu, Rong; Hu, Jianjun

    2012-07-03

    Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi.

  6. Minimalist ensemble algorithms for genome-wide protein localization prediction

    PubMed Central

    2012-01-01

    Background Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. Results This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. Conclusions We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi. PMID:22759391

  7. Prediction of the spectral reflectance of laser-generated color prints by combination of an optical model and learning methods.

    PubMed

    Nébouy, David; Hébert, Mathieu; Fournel, Thierry; Larina, Nina; Lesur, Jean-Luc

    2015-09-01

    Recent color printing technologies based on the principle of revealing colors on pre-functionalized achromatic supports by laser irradiation offer advanced functionalities, especially for security applications. However, for such technologies, the color prediction is challenging, compared to classic ink-transfer printing systems. The spectral properties of the coloring materials modified by the lasers are not precisely known and may strongly vary, depending on the laser settings, in a nonlinear manner. We show in this study, through the example of the color laser marking (CLM) technology, based on laser bleaching of a mixture of pigments, that the combination of an adapted optical reflectance model and learning methods to get the model's parameters enables prediction of the spectral reflectance of any printable color with rather good accuracy. Even though the pigment mixture is formulated from three colored pigments, an analysis of the dimensionality of the spectral space generated by CLM printing, thanks to a principal component analysis decomposition, shows that at least four spectral primaries are needed for accurate spectral reflectance predictions. A polynomial interpolation is then used to relate RGB laser intensities with virtual coordinates of new basis vectors. By studying the influence of the number of calibration patches on the prediction accuracy, we can conclude that a reasonable number of 130 patches are enough to achieve good accuracy in this application.

  8. BiPPred: Combined sequence- and structure-based prediction of peptide binding to the Hsp70 chaperone BiP.

    PubMed

    Schneider, Markus; Rosam, Mathias; Glaser, Manuel; Patronov, Atanas; Shah, Harpreet; Back, Katrin Christiane; Daake, Marina Angelika; Buchner, Johannes; Antes, Iris

    2016-10-01

    Substrate binding to Hsp70 chaperones is involved in many biological processes, and the identification of potential substrates is important for a comprehensive understanding of these events. We present a multi-scale pipeline for an accurate, yet efficient prediction of peptides binding to the Hsp70 chaperone BiP by combining sequence-based prediction with molecular docking and MMPBSA calculations. First, we measured the binding of 15mer peptides from known substrate proteins of BiP by peptide array (PA) experiments and performed an accuracy assessment of the PA data by fluorescence anisotropy studies. Several sequence-based prediction models were fitted using this and other peptide binding data. A structure-based position-specific scoring matrix (SB-PSSM) derived solely from structural modeling data forms the core of all models. The matrix elements are based on a combination of binding energy estimations, molecular dynamics simulations, and analysis of the BiP binding site, which led to new insights into the peptide binding specificities of the chaperone. Using this SB-PSSM, peptide binders could be predicted with high selectivity even without training of the model on experimental data. Additional training further increased the prediction accuracies. Subsequent molecular docking (DynaDock) and MMGBSA/MMPBSA-based binding affinity estimations for predicted binders allowed the identification of the correct binding mode of the peptides as well as the calculation of nearly quantitative binding affinities. The general concept behind the developed multi-scale pipeline can readily be applied to other protein-peptide complexes with linearly bound peptides, for which sufficient experimental binding data for the training of classical sequence-based prediction models is not available. Proteins 2016; 84:1390-1407. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  9. Comparison of univariate and multivariate models for prediction of major and minor elements from laser-induced breakdown spectra with and without masking

    NASA Astrophysics Data System (ADS)

    Dyar, M. Darby; Fassett, Caleb I.; Giguere, Stephen; Lepore, Kate; Byrne, Sarah; Boucher, Thomas; Carey, CJ; Mahadevan, Sridhar

    2016-09-01

    This study uses 1356 spectra from 452 geologically-diverse samples, the largest suite of LIBS rock spectra ever assembled, to compare the accuracy of elemental predictions in models that use only spectral regions thought to contain peaks arising from the element of interest versus those that use information in the entire spectrum. Results show that for the elements Si, Al, Ti, Fe, Mg, Ca, Na, K, Ni, Mn, Cr, Co, and Zn, univariate predictions based on single emission lines are by far the least accurate, no matter how carefully the region of channels/wavelengths is chosen and despite the prominence of the selected emission lines. An automated iterative algorithm was developed to sweep through all 5485 channels of data and select the single region that produces the optimal prediction accuracy for each element using univariate analysis. For the eight major elements, use of this technique results in a 35% improvement in prediction accuracy; for minors, the improvement is 13%. The best wavelength region choice for any given univariate analysis is likely to be an inherent property of the specific training set that cannot be generalized. In comparison, multivariate analysis using partial least-squares (PLS) almost universally outperforms univariate analysis. PLS using all the same wavelength regions from the univariate analysis produces results that improve in accuracy by 63% for major elements and 3% for minor element. This difference is likely a reflection of signal to noise ratios, which are far better for major elements than for minor elements, and likely limit their prediction accuracy by any technique. We also compare predictions using specific wavelength ranges for each element against those employing all channels. Masking out channels to focus on emission lines from a specific element that occurs decreases prediction accuracy for major elements but is useful for minor elements with low signals and proportionally much higher noise; use of PLS rather than univariate analysis is still recommended. Finally, we tested the generalizability of our results by analyzing a second data set from a different instrument. Overall prediction accuracies for the mixed data sets are higher than for either set alone for all major and minor elements except Ni, Cr, and Co, where results are roughly comparable.

  10. Mitigating Errors in External Respiratory Surrogate-Based Models of Tumor Position

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Malinowski, Kathleen T.; Fischell Department of Bioengineering, University of Maryland, College Park, MD; McAvoy, Thomas J.

    2012-04-01

    Purpose: To investigate the effect of tumor site, measurement precision, tumor-surrogate correlation, training data selection, model design, and interpatient and interfraction variations on the accuracy of external marker-based models of tumor position. Methods and Materials: Cyberknife Synchrony system log files comprising synchronously acquired positions of external markers and the tumor from 167 treatment fractions were analyzed. The accuracy of Synchrony, ordinary-least-squares regression, and partial-least-squares regression models for predicting the tumor position from the external markers was evaluated. The quantity and timing of the data used to build the predictive model were varied. The effects of tumor-surrogate correlation and the precisionmore » in both the tumor and the external surrogate position measurements were explored by adding noise to the data. Results: The tumor position prediction errors increased during the duration of a fraction. Increasing the training data quantities did not always lead to more accurate models. Adding uncorrelated noise to the external marker-based inputs degraded the tumor-surrogate correlation models by 16% for partial-least-squares and 57% for ordinary-least-squares. External marker and tumor position measurement errors led to tumor position prediction changes 0.3-3.6 times the magnitude of the measurement errors, varying widely with model algorithm. The tumor position prediction errors were significantly associated with the patient index but not with the fraction index or tumor site. Partial-least-squares was as accurate as Synchrony and more accurate than ordinary-least-squares. Conclusions: The accuracy of surrogate-based inferential models of tumor position was affected by all the investigated factors, except for the tumor site and fraction index.« less

  11. Small angle X-ray scattering and cross-linking for data assisted protein structure prediction in CASP 12 with prospects for improved accuracy.

    PubMed

    Ogorzalek, Tadeusz L; Hura, Greg L; Belsom, Adam; Burnett, Kathryn H; Kryshtafovych, Andriy; Tainer, John A; Rappsilber, Juri; Tsutakawa, Susan E; Fidelis, Krzysztof

    2018-03-01

    Experimental data offers empowering constraints for structure prediction. These constraints can be used to filter equivalently scored models or more powerfully within optimization functions toward prediction. In CASP12, Small Angle X-ray Scattering (SAXS) and Cross-Linking Mass Spectrometry (CLMS) data, measured on an exemplary set of novel fold targets, were provided to the CASP community of protein structure predictors. As solution-based techniques, SAXS and CLMS can efficiently measure states of the full-length sequence in its native solution conformation and assembly. However, this experimental data did not substantially improve prediction accuracy judged by fits to crystallographic models. One issue, beyond intrinsic limitations of the algorithms, was a disconnect between crystal structures and solution-based measurements. Our analyses show that many targets had substantial percentages of disordered regions (up to 40%) or were multimeric or both. Thus, solution measurements of flexibility and assembly support variations that may confound prediction algorithms trained on crystallographic data and expecting globular fully-folded monomeric proteins. Here, we consider the CLMS and SAXS data collected, the information in these solution measurements, and the challenges in incorporating them into computational prediction. As improvement opportunities were only partly realized in CASP12, we provide guidance on how data from the full-length biological unit and the solution state can better aid prediction of the folded monomer or subunit. We furthermore describe strategic integrations of solution measurements with computational prediction programs with the aim of substantially improving foundational knowledge and the accuracy of computational algorithms for biologically-relevant structure predictions for proteins in solution. © 2018 Wiley Periodicals, Inc.

  12. Diagnostic accuracy of APRI and FIB-4 for predicting hepatitis B virus-related liver fibrosis accompanied with hepatocellular carcinoma.

    PubMed

    Xiao, Guangqin; Zhu, Feng; Wang, Min; Zhang, Hang; Ye, Dawei; Yang, Jiayin; Jiang, Li; Liu, Chang; Yan, Lunan; Qin, Renyi

    2016-10-01

    Aspartate aminotransferase to platelet ratio index (APRI) and the fibrosis index based on four factors (FIB-4) are the two most focused non-invasive models to assess liver fibrosis. We aimed to examine the validity of these two models for predicting hepatitis B virus (HBV)-related liver fibrosis accompanied with hepatocellular carcinoma (HCC). We enrolled HBV-infected patients with liver cancer who had received hepatectomy. The accuracy of APRI and FIB-4 for diagnosing liver fibrosis was assessed based on their sensitivity, specificity, diagnostic efficiency, positive predictive value (PPV), negative predictive value (NPV), kappa (κ) value and area under the receiver-operating characteristic curve (AUC). Finally 2176 patients were included, with 1682 retrospective subjects and 494 prospective subjects. APRI (rs=0.310) and FIB-4 (rs=0.278) were positively correlated with liver fibrosis. And χ(2) analysis demonstrated that APRI and FIB-4 values correlated with different levels of liver fibrosis with all P values less than 0.01. The AUC values for APRI and FIB-4 were 0.685 and 0.626 (P=0.73) for predicting significant fibrosis, 0.681 and 0.648 (P=0.81) for differentiation of advanced fibrosis and 0.676 and 0.652 (P=0.77) for diagnosing cirrhosis. APRI and FIB-4 correlate with liver fibrosis. However these two models have low accuracy for predicting HBV-related liver fibrosis in HCC patients. Copyright © 2016. Published by Elsevier Ltd.

  13. Prediction of brittleness based on anisotropic rock physics model for kerogen-rich shale

    NASA Astrophysics Data System (ADS)

    Qian, Ke-Ran; He, Zhi-Liang; Chen, Ye-Quan; Liu, Xi-Wu; Li, Xiang-Yang

    2017-12-01

    The construction of a shale rock physics model and the selection of an appropriate brittleness index ( BI) are two significant steps that can influence the accuracy of brittleness prediction. On one hand, the existing models of kerogen-rich shale are controversial, so a reasonable rock physics model needs to be built. On the other hand, several types of equations already exist for predicting the BI whose feasibility needs to be carefully considered. This study constructed a kerogen-rich rock physics model by performing the selfconsistent approximation and the differential effective medium theory to model intercoupled clay and kerogen mixtures. The feasibility of our model was confirmed by comparison with classical models, showing better accuracy. Templates were constructed based on our model to link physical properties and the BI. Different equations for the BI had different sensitivities, making them suitable for different types of formations. Equations based on Young's Modulus were sensitive to variations in lithology, while those using Lame's Coefficients were sensitive to porosity and pore fluids. Physical information must be considered to improve brittleness prediction.

  14. A time series based sequence prediction algorithm to detect activities of daily living in smart home.

    PubMed

    Marufuzzaman, M; Reaz, M B I; Ali, M A M; Rahman, L F

    2015-01-01

    The goal of smart homes is to create an intelligent environment adapting the inhabitants need and assisting the person who needs special care and safety in their daily life. This can be reached by collecting the ADL (activities of daily living) data and further analysis within existing computing elements. In this research, a very recent algorithm named sequence prediction via enhanced episode discovery (SPEED) is modified and in order to improve accuracy time component is included. The modified SPEED or M-SPEED is a sequence prediction algorithm, which modified the previous SPEED algorithm by using time duration of appliance's ON-OFF states to decide the next state. M-SPEED discovered periodic episodes of inhabitant behavior, trained it with learned episodes, and made decisions based on the obtained knowledge. The results showed that M-SPEED achieves 96.8% prediction accuracy, which is better than other time prediction algorithms like PUBS, ALZ with temporal rules and the previous SPEED. Since human behavior shows natural temporal patterns, duration times can be used to predict future events more accurately. This inhabitant activity prediction system will certainly improve the smart homes by ensuring safety and better care for elderly and handicapped people.

  15. Accuracy evaluation of Fourier series analysis and singular spectrum analysis for predicting the volume of motorcycle sales in Indonesia

    NASA Astrophysics Data System (ADS)

    Sasmita, Yoga; Darmawan, Gumgum

    2017-08-01

    This research aims to evaluate the performance of forecasting by Fourier Series Analysis (FSA) and Singular Spectrum Analysis (SSA) which are more explorative and not requiring parametric assumption. Those methods are applied to predicting the volume of motorcycle sales in Indonesia from January 2005 to December 2016 (monthly). Both models are suitable for seasonal and trend component data. Technically, FSA defines time domain as the result of trend and seasonal component in different frequencies which is difficult to identify in the time domain analysis. With the hidden period is 2,918 ≈ 3 and significant model order is 3, FSA model is used to predict testing data. Meanwhile, SSA has two main processes, decomposition and reconstruction. SSA decomposes the time series data into different components. The reconstruction process starts with grouping the decomposition result based on similarity period of each component in trajectory matrix. With the optimum of window length (L = 53) and grouping effect (r = 4), SSA predicting testing data. Forecasting accuracy evaluation is done based on Mean Absolute Percentage Error (MAPE), Mean Absolute Error (MAE) and Root Mean Square Error (RMSE). The result shows that in the next 12 month, SSA has MAPE = 13.54 percent, MAE = 61,168.43 and RMSE = 75,244.92 and FSA has MAPE = 28.19 percent, MAE = 119,718.43 and RMSE = 142,511.17. Therefore, to predict volume of motorcycle sales in the next period should use SSA method which has better performance based on its accuracy.

  16. Paroxysmal atrial fibrillation prediction method with shorter HRV sequences.

    PubMed

    Boon, K H; Khalil-Hani, M; Malarvili, M B; Sia, C W

    2016-10-01

    This paper proposes a method that predicts the onset of paroxysmal atrial fibrillation (PAF), using heart rate variability (HRV) segments that are shorter than those applied in existing methods, while maintaining good prediction accuracy. PAF is a common cardiac arrhythmia that increases the health risk of a patient, and the development of an accurate predictor of the onset of PAF is clinical important because it increases the possibility to stabilize (electrically) and prevent the onset of atrial arrhythmias with different pacing techniques. We investigate the effect of HRV features extracted from different lengths of HRV segments prior to PAF onset with the proposed PAF prediction method. The pre-processing stage of the predictor includes QRS detection, HRV quantification and ectopic beat correction. Time-domain, frequency-domain, non-linear and bispectrum features are then extracted from the quantified HRV. In the feature selection, the HRV feature set and classifier parameters are optimized simultaneously using an optimization procedure based on genetic algorithm (GA). Both full feature set and statistically significant feature subset are optimized by GA respectively. For the statistically significant feature subset, Mann-Whitney U test is used to filter non-statistical significance features that cannot pass the statistical test at 20% significant level. The final stage of our predictor is the classifier that is based on support vector machine (SVM). A 10-fold cross-validation is applied in performance evaluation, and the proposed method achieves 79.3% prediction accuracy using 15-minutes HRV segment. This accuracy is comparable to that achieved by existing methods that use 30-minutes HRV segments, most of which achieves accuracy of around 80%. More importantly, our method significantly outperforms those that applied segments shorter than 30 minutes. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  17. BDDCS Class Prediction for New Molecular Entities

    PubMed Central

    Broccatelli, Fabio; Cruciani, Gabriele; Benet, Leslie Z.; Oprea, Tudor I.

    2012-01-01

    The Biopharmaceutics Drug Disposition Classification System (BDDCS) was successfully employed for predicting drug-drug interactions (DDIs) with respect to drug metabolizing enzymes (DMEs), drug transporters and their interplay. The major assumption of BDDCS is that the extent of metabolism (EoM) predicts high versus low intestinal permeability rate, and vice versa, at least when uptake transporters or paracellular transport are not involved. We recently published a collection of over 900 marketed drugs classified for BDDCS. We suggest that a reliable model for predicting BDDCS class, integrated with in vitro assays, could anticipate disposition and potential DDIs of new molecular entities (NMEs). Here we describe a computational procedure for predicting BDDCS class from molecular structures. The model was trained on a set of 300 oral drugs, and validated on an external set of 379 oral drugs, using 17 descriptors calculated or derived from the VolSurf+ software. For each molecule, a probability of BDDCS class membership was given, based on predicted EoM, FDA solubility (FDAS) and their confidence scores. The accuracy in predicting FDAS was 78% in training and 77% in validation, while for EoM prediction the accuracy was 82% in training and 79% in external validation. The actual BDDCS class corresponded to the highest ranked calculated class for 55% of the validation molecules, and it was within the top two ranked more than 92% of the times. The unbalanced stratification of the dataset didn’t affect the prediction, which showed highest accuracy in predicting classes 2 and 3 with respect to the most populated class 1. For class 4 drugs a general lack of predictability was observed. A linear discriminant analysis (LDA) confirmed the degree of accuracy for the prediction of the different BDDCS classes is tied to the structure of the dataset. This model could routinely be used in early drug discovery to prioritize in vitro tests for NMEs (e.g., affinity to transporters, intestinal metabolism, intestinal absorption and plasma protein binding). We further applied the BDDCS prediction model on a large set of medicinal chemistry compounds (over 30,000 chemicals). Based on this application, we suggest that solubility, and not permeability, is the major difference between NMEs and drugs. We anticipate that the forecast of BDDCS categories in early drug discovery may lead to a significant R&D cost reduction. PMID:22224483

  18. Accuracy of Psychology Interns' Clinical Predictions of Re-Incarceration of Delinquents: A Preliminary Study

    ERIC Educational Resources Information Center

    Hagan, Michael P.; Dent, Tyffani M. Monford; Coady, Jeff; Stewart, Shannon

    2006-01-01

    This study involved the assessment of three psychology interns' ability to predict re-incarceration based on the use of clinical judgement. Three psychology interns in an APA-accredited internship were given training on how to use clinical judgement in predicting future incarceration on the part of youth incarcerated in a juvenile correctional…

  19. Failure prediction using machine learning and time series in optical network.

    PubMed

    Wang, Zhilong; Zhang, Min; Wang, Danshi; Song, Chuang; Liu, Min; Li, Jin; Lou, Liqi; Liu, Zhuo

    2017-08-07

    In this paper, we propose a performance monitoring and failure prediction method in optical networks based on machine learning. The primary algorithms of this method are the support vector machine (SVM) and double exponential smoothing (DES). With a focus on risk-aware models in optical networks, the proposed protection plan primarily investigates how to predict the risk of an equipment failure. To the best of our knowledge, this important problem has not yet been fully considered. Experimental results showed that the average prediction accuracy of our method was 95% when predicting the optical equipment failure state. This finding means that our method can forecast an equipment failure risk with high accuracy. Therefore, our proposed DES-SVM method can effectively improve traditional risk-aware models to protect services from possible failures and enhance the optical network stability.

  20. Prediction of hot regions in protein-protein interaction by combining density-based incremental clustering with feature-based classification.

    PubMed

    Hu, Jing; Zhang, Xiaolong; Liu, Xiaoming; Tang, Jinshan

    2015-06-01

    Discovering hot regions in protein-protein interaction is important for drug and protein design, while experimental identification of hot regions is a time-consuming and labor-intensive effort; thus, the development of predictive models can be very helpful. In hot region prediction research, some models are based on structure information, and others are based on a protein interaction network. However, the prediction accuracy of these methods can still be improved. In this paper, a new method is proposed for hot region prediction, which combines density-based incremental clustering with feature-based classification. The method uses density-based incremental clustering to obtain rough hot regions, and uses feature-based classification to remove the non-hot spot residues from the rough hot regions. Experimental results show that the proposed method significantly improves the prediction performance of hot regions. Copyright © 2015 Elsevier Ltd. All rights reserved.

  1. SGP-1: Prediction and Validation of Homologous Genes Based on Sequence Alignments

    PubMed Central

    Wiehe, Thomas; Gebauer-Jung, Steffi; Mitchell-Olds, Thomas; Guigó, Roderic

    2001-01-01

    Conventional methods of gene prediction rely on the recognition of DNA-sequence signals, the coding potential or the comparison of a genomic sequence with a cDNA, EST, or protein database. Reasons for limited accuracy in many circumstances are species-specific training and the incompleteness of reference databases. Lately, comparative genome analysis has attracted increasing attention. Several analysis tools that are based on human/mouse comparisons are already available. Here, we present a program for the prediction of protein-coding genes, termed SGP-1 (Syntenic Gene Prediction), which is based on the similarity of homologous genomic sequences. In contrast to most existing tools, the accuracy of SGP-1 depends little on species-specific properties such as codon usage or the nucleotide distribution. SGP-1 may therefore be applied to nonstandard model organisms in vertebrates as well as in plants, without the need for extensive parameter training. In addition to predicting genes in large-scale genomic sequences, the program may be useful to validate gene structure annotations from databases. To this end, SGP-1 output also contains comparisons between predicted and annotated gene structures in HTML format. The program can be accessed via a Web server at http://soft.ice.mpg.de/sgp-1. The source code, written in ANSI C, is available on request from the authors. PMID:11544202

  2. Feature Extraction of Electronic Nose Signals Using QPSO-Based Multiple KFDA Signal Processing

    PubMed Central

    Wen, Tailai; Huang, Daoyu; Lu, Kun; Deng, Changjian; Zeng, Tanyue; Yu, Song; He, Zhiyi

    2018-01-01

    The aim of this research was to enhance the classification accuracy of an electronic nose (E-nose) in different detecting applications. During the learning process of the E-nose to predict the types of different odors, the prediction accuracy was not quite satisfying because the raw features extracted from sensors’ responses were regarded as the input of a classifier without any feature extraction processing. Therefore, in order to obtain more useful information and improve the E-nose’s classification accuracy, in this paper, a Weighted Kernels Fisher Discriminant Analysis (WKFDA) combined with Quantum-behaved Particle Swarm Optimization (QPSO), i.e., QWKFDA, was presented to reprocess the original feature matrix. In addition, we have also compared the proposed method with quite a few previously existing ones including Principal Component Analysis (PCA), Locality Preserving Projections (LPP), Fisher Discriminant Analysis (FDA) and Kernels Fisher Discriminant Analysis (KFDA). Experimental results proved that QWKFDA is an effective feature extraction method for E-nose in predicting the types of wound infection and inflammable gases, which shared much higher classification accuracy than those of the contrast methods. PMID:29382146

  3. Feature Extraction of Electronic Nose Signals Using QPSO-Based Multiple KFDA Signal Processing.

    PubMed

    Wen, Tailai; Yan, Jia; Huang, Daoyu; Lu, Kun; Deng, Changjian; Zeng, Tanyue; Yu, Song; He, Zhiyi

    2018-01-29

    The aim of this research was to enhance the classification accuracy of an electronic nose (E-nose) in different detecting applications. During the learning process of the E-nose to predict the types of different odors, the prediction accuracy was not quite satisfying because the raw features extracted from sensors' responses were regarded as the input of a classifier without any feature extraction processing. Therefore, in order to obtain more useful information and improve the E-nose's classification accuracy, in this paper, a Weighted Kernels Fisher Discriminant Analysis (WKFDA) combined with Quantum-behaved Particle Swarm Optimization (QPSO), i.e., QWKFDA, was presented to reprocess the original feature matrix. In addition, we have also compared the proposed method with quite a few previously existing ones including Principal Component Analysis (PCA), Locality Preserving Projections (LPP), Fisher Discriminant Analysis (FDA) and Kernels Fisher Discriminant Analysis (KFDA). Experimental results proved that QWKFDA is an effective feature extraction method for E-nose in predicting the types of wound infection and inflammable gases, which shared much higher classification accuracy than those of the contrast methods.

  4. Genomic prediction based on data from three layer lines using non-linear regression models.

    PubMed

    Huang, Heyun; Windig, Jack J; Vereijken, Addie; Calus, Mario P L

    2014-11-06

    Most studies on genomic prediction with reference populations that include multiple lines or breeds have used linear models. Data heterogeneity due to using multiple populations may conflict with model assumptions used in linear regression methods. In an attempt to alleviate potential discrepancies between assumptions of linear models and multi-population data, two types of alternative models were used: (1) a multi-trait genomic best linear unbiased prediction (GBLUP) model that modelled trait by line combinations as separate but correlated traits and (2) non-linear models based on kernel learning. These models were compared to conventional linear models for genomic prediction for two lines of brown layer hens (B1 and B2) and one line of white hens (W1). The three lines each had 1004 to 1023 training and 238 to 240 validation animals. Prediction accuracy was evaluated by estimating the correlation between observed phenotypes and predicted breeding values. When the training dataset included only data from the evaluated line, non-linear models yielded at best a similar accuracy as linear models. In some cases, when adding a distantly related line, the linear models showed a slight decrease in performance, while non-linear models generally showed no change in accuracy. When only information from a closely related line was used for training, linear models and non-linear radial basis function (RBF) kernel models performed similarly. The multi-trait GBLUP model took advantage of the estimated genetic correlations between the lines. Combining linear and non-linear models improved the accuracy of multi-line genomic prediction. Linear models and non-linear RBF models performed very similarly for genomic prediction, despite the expectation that non-linear models could deal better with the heterogeneous multi-population data. This heterogeneity of the data can be overcome by modelling trait by line combinations as separate but correlated traits, which avoids the occasional occurrence of large negative accuracies when the evaluated line was not included in the training dataset. Furthermore, when using a multi-line training dataset, non-linear models provided information on the genotype data that was complementary to the linear models, which indicates that the underlying data distributions of the three studied lines were indeed heterogeneous.

  5. ZY3-02 Laser Altimeter Footprint Geolocation Prediction

    PubMed Central

    Xie, Junfeng; Tang, Xinming; Mo, Fan; Li, Guoyuan; Zhu, Guangbin; Wang, Zhenming; Fu, Xingke; Gao, Xiaoming; Dou, Xianhui

    2017-01-01

    Successfully launched on 30 May 2016, ZY3-02 is the first Chinese surveying and mapping satellite equipped with a lightweight laser altimeter. Calibration is necessary before the laser altimeter becomes operational. Laser footprint location prediction is the first step in calibration that is based on ground infrared detectors, and it is difficult because the sample frequency of the ZY3-02 laser altimeter is 2 Hz, and the distance between two adjacent laser footprints is about 3.5 km. In this paper, we build an on-orbit rigorous geometric prediction model referenced to the rigorous geometric model of optical remote sensing satellites. The model includes three kinds of data that must be predicted: pointing angle, orbit parameters, and attitude angles. The proposed method is verified by a ZY3-02 laser altimeter on-orbit geometric calibration test. Five laser footprint prediction experiments are conducted based on the model, and the laser footprint prediction accuracy is better than 150 m on the ground. The effectiveness and accuracy of the on-orbit rigorous geometric prediction model are confirmed by the test results. The geolocation is predicted precisely by the proposed method, and this will give a reference to the geolocation prediction of future land laser detectors in other laser altimeter calibration test. PMID:28934160

  6. ZY3-02 Laser Altimeter Footprint Geolocation Prediction.

    PubMed

    Xie, Junfeng; Tang, Xinming; Mo, Fan; Li, Guoyuan; Zhu, Guangbin; Wang, Zhenming; Fu, Xingke; Gao, Xiaoming; Dou, Xianhui

    2017-09-21

    Successfully launched on 30 May 2016, ZY3-02 is the first Chinese surveying and mapping satellite equipped with a lightweight laser altimeter. Calibration is necessary before the laser altimeter becomes operational. Laser footprint location prediction is the first step in calibration that is based on ground infrared detectors, and it is difficult because the sample frequency of the ZY3-02 laser altimeter is 2 Hz, and the distance between two adjacent laser footprints is about 3.5 km. In this paper, we build an on-orbit rigorous geometric prediction model referenced to the rigorous geometric model of optical remote sensing satellites. The model includes three kinds of data that must be predicted: pointing angle, orbit parameters, and attitude angles. The proposed method is verified by a ZY3-02 laser altimeter on-orbit geometric calibration test. Five laser footprint prediction experiments are conducted based on the model, and the laser footprint prediction accuracy is better than 150 m on the ground. The effectiveness and accuracy of the on-orbit rigorous geometric prediction model are confirmed by the test results. The geolocation is predicted precisely by the proposed method, and this will give a reference to the geolocation prediction of future land laser detectors in other laser altimeter calibration test.

  7. Determination of heat capacity of ionic liquid based nanofluids using group method of data handling technique

    NASA Astrophysics Data System (ADS)

    Sadi, Maryam

    2018-01-01

    In this study a group method of data handling model has been successfully developed to predict heat capacity of ionic liquid based nanofluids by considering reduced temperature, acentric factor and molecular weight of ionic liquids, and nanoparticle concentration as input parameters. In order to accomplish modeling, 528 experimental data points extracted from the literature have been divided into training and testing subsets. The training set has been used to predict model coefficients and the testing set has been applied for model validation. The ability and accuracy of developed model, has been evaluated by comparison of model predictions with experimental values using different statistical parameters such as coefficient of determination, mean square error and mean absolute percentage error. The mean absolute percentage error of developed model for training and testing sets are 1.38% and 1.66%, respectively, which indicate excellent agreement between model predictions and experimental data. Also, the results estimated by the developed GMDH model exhibit a higher accuracy when compared to the available theoretical correlations.

  8. [From the prehypertensive adolescent to the hypertensive adult. Is possible to predict the conversion?].

    PubMed

    Pérez-Fernández, Guillermo Alberto; Grau-Abalo, Ricardo

    2012-01-01

    There are many risk factors for developing hypertension. In the XXI century, smarter ways to investigate are needed, so preventing the turning of an adolescent into a hypertensive adult must be a priority. The aim of this paper is to predict the risk of hypertension onset in adulthood, from cardiovascular tension and risk stratification since adolescence. A representative sample of 125 adolescents from the project "Pesquisaje Escolar en la Adolescencia de Hipertensión Arterial" (PESESCAD-HTA) was studied. They were diagnosed with prehypertension in 2001, and were followed for eight years (96 months) until January 2009. Two predictive indexes were obtained. The first, based on the total cardiovascular risk and the second from the multiplication of these risks with an accuracy index for each of 61.6% and 70.4%, respectively. The index based on the multiplication of cardiovascular risk can predict, with adequate accuracy, the turning of a prehypertensive adolescent into hypertensive once he/she reaches adulthood.

  9. In-class didactic versus self-directed teaching of the probe-based confocal laser endomicroscopy (pCLE) criteria for Barrett's esophagus.

    PubMed

    Rzouq, Fadi; Vennalaganti, Prashanth; Pakseresht, Kavous; Kanakadandi, Vijay; Parasa, Sravanthi; Mathur, Sharad C; Alsop, Benjamin R; Hornung, Benjamin; Gupta, Neil; Sharma, Prateek

    2016-02-01

    Optimal teaching methods for disease recognition using probe-based confocal laser endomicroscopy (pCLE) have not been developed. Our aim was to compare in-class didactic teaching vs. self-directed teaching of Barrett's neoplasia diagnosis using pCLE. This randomized controlled trial was conducted at a tertiary academic center. Study participants with no prior pCLE experience were randomized to in-class didactic (group 1) or self-directed teaching groups (group 2). For group 1, an expert conducted a classroom teaching session using standardized educational material. Participants in group 2 were provided with the same material on an audio PowerPoint. After initial training, all participants graded an initial set of 20 pCLE videos and reviewed correct responses with the expert (group 1) or on audio PowerPoint (group 2). Finally, all participants completed interpretations of a further 40 videos. Eighteen trainees (8 medical students, 10 gastroenterology trainees) participated in the study. Overall diagnostic accuracy for neoplasia prediction by pCLE was 77 % (95 % confidence interval [CI] 74.0 % - 79.2 %); of predictions made with high confidence (53 %), the accuracy was 85 % (95 %CI 81.8 % - 87.8 %). The overall accuracy and interobserver agreement was significantly higher in group 1 than in group 2 for all predictions (80.4 % vs. 73 %; P = 0.005) and for high confidence predictions (90 % vs. 80 %; P < 0.001). Following feedback (after the initial 20 videos), the overall accuracy improved from 73 % to 79 % (P = 0.04), mainly driven by a significant improvement in group 1 (74 % to 84 %; P < 0.01). Accuracy of prediction significantly improved with time in endoscopy training (72 % students, 77 % FY1, 82 % FY2, and 85 % FY3; P = 0.003). For novice trainees, in-class didactic teaching enables significantly better recognition of the pCLE features of Barrett's esophagus than self-directed teaching. The in-class didactic group had a shorter learning curve and were able to achieve 90 % accuracy for their high confidence predictions. © Georg Thieme Verlag KG Stuttgart · New York.

  10. Ontario multidetector computed tomographic coronary angiography study: field evaluation of diagnostic accuracy.

    PubMed

    Chow, Benjamin J W; Freeman, Michael R; Bowen, James M; Levin, Leslie; Hopkins, Robert B; Provost, Yves; Tarride, Jean-Eric; Dennie, Carole; Cohen, Eric A; Marcuzzi, Dan; Iwanochko, Robert; Moody, Alan R; Paul, Narinder; Parker, John D; O'Reilly, Daria J; Xie, Feng; Goeree, Ron

    2011-06-13

    Computed tomographic coronary angiography (CTCA) has gained clinical acceptance for the detection of obstructive coronary artery disease. Although single-center studies have demonstrated excellent accuracy, multicenter studies have yielded variable results. The true diagnostic accuracy of CTCA in the "real world" remains uncertain. We conducted a field evaluation comparing multidetector CTCA with invasive CA (ICA) to understand CTCA's diagnostic accuracy in a real-world setting. A multicenter cohort study of patients awaiting ICA was conducted between September 2006 and June 2009. All patients had either a low or an intermediate pretest probability for coronary artery disease and underwent CTCA and ICA within 10 days. The results of CTCA and ICA were interpreted visually by local expert observers who were blinded to all clinical data and imaging results. Using a patient-based analysis (diameter stenosis ≥50%) of 169 patients, the sensitivity, specificity, positive predictive value, and negative predictive value were 81.3% (95% confidence interval [CI], 71.0%-89.1%), 93.3% (95% CI, 85.9%-97.5%), 91.6% (95% CI, 82.5%-96.8%), and 84.7% (95% CI, 76.0%-91.2%), respectively; the area under receiver operating characteristic curve was 0.873. The diagnostic accuracy varied across centers (P < .001), with a sensitivity, specificity, positive predictive value, and negative predictive value ranging from 50.0% to 93.2%, 92.0% to 100%, 84.6% to 100%, and 42.9% to 94.7%, respectively. Compared with ICA, CTCA appears to have good accuracy; however, there was variability in diagnostic accuracy across centers. Factors affecting institutional variability need to be better understood before CTCA is universally adopted. Additional real-world evaluations are needed to fully understand the impact of CTCA on clinical care. clinicaltrials.gov Identifier: NCT00371891.

  11. Development of a new outcome prediction model for Chinese patients with penile squamous cell carcinoma based on preoperative serum C-reactive protein, body mass index, and standard pathological risk factors: the TNCB score group system.

    PubMed

    Li, Zai-Shang; Chen, Peng; Yao, Kai; Wang, Bin; Li, Jing; Mi, Qi-Wu; Chen, Xiao-Feng; Zhao, Qi; Li, Yong-Hong; Chen, Jie-Ping; Deng, Chuang-Zhong; Ye, Yun-Lin; Zhong, Ming-Zhu; Liu, Zhuo-Wei; Qin, Zi-Ke; Lin, Xiang-Tian; Liang, Wei-Cong; Han, Hui; Zhou, Fang-Jian

    2016-04-12

    To determine the predictive value and feasibility of the new outcome prediction model for Chinese patients with penile squamous cell carcinoma. The 3-year disease-specific survival (DSS) survival (DSS) was 92.3% in patients with < 8.70 mg/L CRP and 54.9% in those with elevated CRP (P < 0.001). The 3-year DSS was 86.5% in patients with a BMI < 22.6 Kg/m2 and 69.9% in those with a higher BMI (P = 0.025). In a multivariate analysis, pathological T stage (P < 0.001), pathological N stage (P = 0.002), BMI (P = 0.002), and CRP (P = 0.004) were independent predictors of DSS. A new scoring model was developed, consisting of BMI, CRP, and tumor T and N classification. In our study, we found that the addition of the above-mentioned parameters significantly increased the predictive accuracy of the system of the American Joint Committee on Cancer (AJCC) anatomic stage group. The accuracy of the new prediction category was verified. A total of 172 Chinese patients with penile squamous cell cancer were analyzed retrospectively between November 2005 and November 2014. Statistical data analysis was conducted using the nonparametric method. Survival analysis was performed with the log-rank test and the Cox proportional hazard model. Based on regression estimates of significant parameters in multivariate analysis, a new BMI-, CRP- and pathologic factors-based scoring model was developed to predict disease--specific outcomes. The predictive accuracy of the model was evaluated using the internal and external validation. The present study demonstrated that the TNCB score group system maybe a precise and easy to use tool for predicting outcomes in Chinese penile squamous cell carcinoma patients.

  12. Assessment of flat rolling theories for the use in a model-based controller for high-precision rolling applications

    NASA Astrophysics Data System (ADS)

    Stockert, Sven; Wehr, Matthias; Lohmar, Johannes; Abel, Dirk; Hirt, Gerhard

    2017-10-01

    In the electrical and medical industries the trend towards further miniaturization of devices is accompanied by the demand for smaller manufacturing tolerances. Such industries use a plentitude of small and narrow cold rolled metal strips with high thickness accuracy. Conventional rolling mills can hardly achieve further improvement of these tolerances. However, a model-based controller in combination with an additional piezoelectric actuator for high dynamic roll adjustment is expected to enable the production of the required metal strips with a thickness tolerance of +/-1 µm. The model-based controller has to be based on a rolling theory which can describe the rolling process very accurately. Additionally, the required computing time has to be low in order to predict the rolling process in real-time. In this work, four rolling theories from literature with different levels of complexity are tested for their suitability for the predictive controller. Rolling theories of von Kármán, Siebel, Bland & Ford and Alexander are implemented in Matlab and afterwards transferred to the real-time computer used for the controller. The prediction accuracy of these theories is validated using rolling trials with different thickness reduction and a comparison to the calculated results. Furthermore, the required computing time on the real-time computer is measured. Adequate results according the prediction accuracy can be achieved with the rolling theories developed by Bland & Ford and Alexander. A comparison of the computing time of those two theories reveals that Alexander's theory exceeds the sample rate of 1 kHz of the real-time computer.

  13. Assessing the prediction accuracy of cure in the Cox proportional hazards cure model: an application to breast cancer data.

    PubMed

    Asano, Junichi; Hirakawa, Akihiro; Hamada, Chikuma

    2014-01-01

    A cure rate model is a survival model incorporating the cure rate with the assumption that the population contains both uncured and cured individuals. It is a powerful statistical tool for prognostic studies, especially in cancer. The cure rate is important for making treatment decisions in clinical practice. The proportional hazards (PH) cure model can predict the cure rate for each patient. This contains a logistic regression component for the cure rate and a Cox regression component to estimate the hazard for uncured patients. A measure for quantifying the predictive accuracy of the cure rate estimated by the Cox PH cure model is required, as there has been a lack of previous research in this area. We used the Cox PH cure model for the breast cancer data; however, the area under the receiver operating characteristic curve (AUC) could not be estimated because many patients were censored. In this study, we used imputation-based AUCs to assess the predictive accuracy of the cure rate from the PH cure model. We examined the precision of these AUCs using simulation studies. The results demonstrated that the imputation-based AUCs were estimable and their biases were negligibly small in many cases, although ordinary AUC could not be estimated. Additionally, we introduced the bias-correction method of imputation-based AUCs and found that the bias-corrected estimate successfully compensated the overestimation in the simulation studies. We also illustrated the estimation of the imputation-based AUCs using breast cancer data. Copyright © 2014 John Wiley & Sons, Ltd.

  14. Long-range prediction of Indian summer monsoon rainfall using data mining and statistical approaches

    NASA Astrophysics Data System (ADS)

    H, Vathsala; Koolagudi, Shashidhar G.

    2017-10-01

    This paper presents a hybrid model to better predict Indian summer monsoon rainfall. The algorithm considers suitable techniques for processing dense datasets. The proposed three-step algorithm comprises closed itemset generation-based association rule mining for feature selection, cluster membership for dimensionality reduction, and simple logistic function for prediction. The application of predicting rainfall into flood, excess, normal, deficit, and drought based on 36 predictors consisting of land and ocean variables is presented. Results show good accuracy in the considered study period of 37years (1969-2005).

  15. [Comparison of three stand-level biomass estimation methods].

    PubMed

    Dong, Li Hu; Li, Feng Ri

    2016-12-01

    At present, the forest biomass methods of regional scale attract most of attention of the researchers, and developing the stand-level biomass model is popular. Based on the forestry inventory data of larch plantation (Larix olgensis) in Jilin Province, we used non-linear seemly unrelated regression (NSUR) to estimate the parameters in two additive system of stand-level biomass equations, i.e., stand-level biomass equations including the stand variables and stand biomass equations including the biomass expansion factor (i.e., Model system 1 and Model system 2), listed the constant biomass expansion factor for larch plantation and compared the prediction accuracy of three stand-level biomass estimation methods. The results indicated that for two additive system of biomass equations, the adjusted coefficient of determination (R a 2 ) of the total and stem equations was more than 0.95, the root mean squared error (RMSE), the mean prediction error (MPE) and the mean absolute error (MAE) were smaller. The branch and foliage biomass equations were worse than total and stem biomass equations, and the adjusted coefficient of determination (R a 2 ) was less than 0.95. The prediction accuracy of a constant biomass expansion factor was relatively lower than the prediction accuracy of Model system 1 and Model system 2. Overall, although stand-level biomass equation including the biomass expansion factor belonged to the volume-derived biomass estimation method, and was different from the stand biomass equations including stand variables in essence, but the obtained prediction accuracy of the two methods was similar. The constant biomass expansion factor had the lower prediction accuracy, and was inappropriate. In addition, in order to make the model parameter estimation more effective, the established stand-level biomass equations should consider the additivity in a system of all tree component biomass and total biomass equations.

  16. Protein location prediction using atomic composition and global features of the amino acid sequence

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cherian, Betsy Sheena, E-mail: betsy.skb@gmail.com; Nair, Achuthsankar S.

    2010-01-22

    Subcellular location of protein is constructive information in determining its function, screening for drug candidates, vaccine design, annotation of gene products and in selecting relevant proteins for further studies. Computational prediction of subcellular localization deals with predicting the location of a protein from its amino acid sequence. For a computational localization prediction method to be more accurate, it should exploit all possible relevant biological features that contribute to the subcellular localization. In this work, we extracted the biological features from the full length protein sequence to incorporate more biological information. A new biological feature, distribution of atomic composition is effectivelymore » used with, multiple physiochemical properties, amino acid composition, three part amino acid composition, and sequence similarity for predicting the subcellular location of the protein. Support Vector Machines are designed for four modules and prediction is made by a weighted voting system. Our system makes prediction with an accuracy of 100, 82.47, 88.81 for self-consistency test, jackknife test and independent data test respectively. Our results provide evidence that the prediction based on the biological features derived from the full length amino acid sequence gives better accuracy than those derived from N-terminal alone. Considering the features as a distribution within the entire sequence will bring out underlying property distribution to a greater detail to enhance the prediction accuracy.« less

  17. A resolution measure for three-dimensional microscopy

    PubMed Central

    Chao, Jerry; Ram, Sripad; Abraham, Anish V.; Ward, E. Sally; Ober, Raimund J.

    2009-01-01

    A three-dimensional (3D) resolution measure for the conventional optical microscope is introduced which overcomes the drawbacks of the classical 3D (axial) resolution limit. Formulated within the context of a parameter estimation problem and based on the Cramer-Rao lower bound, this 3D resolution measure indicates the accuracy with which a given distance between two objects in 3D space can be determined from the acquired image. It predicts that, given enough photons from the objects of interest, arbitrarily small distances of separation can be estimated with prespecified accuracy. Using simulated images of point source pairs, we show that the maximum likelihood estimator is capable of attaining the accuracy predicted by the resolution measure. We also demonstrate how different factors, such as extraneous noise sources and the spatial orientation of the imaged object pair, can affect the accuracy with which a given distance of separation can be determined. PMID:20161040

  18. Ligand and structure-based methodologies for the prediction of the activity of G protein-coupled receptor ligands

    NASA Astrophysics Data System (ADS)

    Costanzi, Stefano; Tikhonova, Irina G.; Harden, T. Kendall; Jacobson, Kenneth A.

    2009-11-01

    Accurate in silico models for the quantitative prediction of the activity of G protein-coupled receptor (GPCR) ligands would greatly facilitate the process of drug discovery and development. Several methodologies have been developed based on the properties of the ligands, the direct study of the receptor-ligand interactions, or a combination of both approaches. Ligand-based three-dimensional quantitative structure-activity relationships (3D-QSAR) techniques, not requiring knowledge of the receptor structure, have been historically the first to be applied to the prediction of the activity of GPCR ligands. They are generally endowed with robustness and good ranking ability; however they are highly dependent on training sets. Structure-based techniques generally do not provide the level of accuracy necessary to yield meaningful rankings when applied to GPCR homology models. However, they are essentially independent from training sets and have a sufficient level of accuracy to allow an effective discrimination between binders and nonbinders, thus qualifying as viable lead discovery tools. The combination of ligand and structure-based methodologies in the form of receptor-based 3D-QSAR and ligand and structure-based consensus models results in robust and accurate quantitative predictions. The contribution of the structure-based component to these combined approaches is expected to become more substantial and effective in the future, as more sophisticated scoring functions are developed and more detailed structural information on GPCRs is gathered.

  19. Adaptive Trajectory Prediction Algorithm for Climbing Flights

    NASA Technical Reports Server (NTRS)

    Schultz, Charles Alexander; Thipphavong, David P.; Erzberger, Heinz

    2012-01-01

    Aircraft climb trajectories are difficult to predict, and large errors in these predictions reduce the potential operational benefits of some advanced features for NextGen. The algorithm described in this paper improves climb trajectory prediction accuracy by adjusting trajectory predictions based on observed track data. It utilizes rate-of-climb and airspeed measurements derived from position data to dynamically adjust the aircraft weight modeled for trajectory predictions. In simulations with weight uncertainty, the algorithm is able to adapt to within 3 percent of the actual gross weight within two minutes of the initial adaptation. The root-mean-square of altitude errors for five-minute predictions was reduced by 73 percent. Conflict detection performance also improved, with a 15 percent reduction in missed alerts and a 10 percent reduction in false alerts. In a simulation with climb speed capture intent and weight uncertainty, the algorithm improved climb trajectory prediction accuracy by up to 30 percent and conflict detection performance, reducing missed and false alerts by up to 10 percent.

  20. Advanced turboprop noise prediction: Development of a code at NASA Langley based on recent theoretical results

    NASA Technical Reports Server (NTRS)

    Farassat, F.; Dunn, M. H.; Padula, S. L.

    1986-01-01

    The development of a high speed propeller noise prediction code at Langley Research Center is described. The code utilizes two recent acoustic formulations in the time domain for subsonic and supersonic sources. The structure and capabilities of the code are discussed. Grid size study for accuracy and speed of execution on a computer is also presented. The code is tested against an earlier Langley code. Considerable increase in accuracy and speed of execution are observed. Some examples of noise prediction of a high speed propeller for which acoustic test data are available are given. A brisk derivation of formulations used is given in an appendix.

  1. Flight effects on exhaust noise for turbojet and turbofan engines: Comparison of experimental data with prediction

    NASA Technical Reports Server (NTRS)

    Stone, J. R.

    1976-01-01

    It was demonstrated that static and in flight jet engine exhaust noise can be predicted with reasonable accuracy when the multiple source nature of the problem is taken into account. Jet mixing noise was predicted from the interim prediction method. Provisional methods of estimating internally generated noise and shock noise flight effects were used, based partly on existing prediction methods and partly on recent reported engine data.

  2. [Prediction of regional soil quality based on mutual information theory integrated with decision tree algorithm].

    PubMed

    Lin, Fen-Fang; Wang, Ke; Yang, Ning; Yan, Shi-Guang; Zheng, Xin-Yu

    2012-02-01

    In this paper, some main factors such as soil type, land use pattern, lithology type, topography, road, and industry type that affect soil quality were used to precisely obtain the spatial distribution characteristics of regional soil quality, mutual information theory was adopted to select the main environmental factors, and decision tree algorithm See 5.0 was applied to predict the grade of regional soil quality. The main factors affecting regional soil quality were soil type, land use, lithology type, distance to town, distance to water area, altitude, distance to road, and distance to industrial land. The prediction accuracy of the decision tree model with the variables selected by mutual information was obviously higher than that of the model with all variables, and, for the former model, whether of decision tree or of decision rule, its prediction accuracy was all higher than 80%. Based on the continuous and categorical data, the method of mutual information theory integrated with decision tree could not only reduce the number of input parameters for decision tree algorithm, but also predict and assess regional soil quality effectively.

  3. Short-term prediction of solar energy in Saudi Arabia using automated-design fuzzy logic systems

    PubMed Central

    2017-01-01

    Solar energy is considered as one of the main sources for renewable energy in the near future. However, solar energy and other renewable energy sources have a drawback related to the difficulty in predicting their availability in the near future. This problem affects optimal exploitation of solar energy, especially in connection with other resources. Therefore, reliable solar energy prediction models are essential to solar energy management and economics. This paper presents work aimed at designing reliable models to predict the global horizontal irradiance (GHI) for the next day in 8 stations in Saudi Arabia. The designed models are based on computational intelligence methods of automated-design fuzzy logic systems. The fuzzy logic systems are designed and optimized with two models using fuzzy c-means clustering (FCM) and simulated annealing (SA) algorithms. The first model uses FCM based on the subtractive clustering algorithm to automatically design the predictor fuzzy rules from data. The second model is using FCM followed by simulated annealing algorithm to enhance the prediction accuracy of the fuzzy logic system. The objective of the predictor is to accurately predict next-day global horizontal irradiance (GHI) using previous-day meteorological and solar radiation observations. The proposed models use observations of 10 variables of measured meteorological and solar radiation data to build the model. The experimentation and results of the prediction are detailed where the root mean square error of the prediction was approximately 88% for the second model tuned by simulated annealing compared to 79.75% accuracy using the first model. This results demonstrate a good modeling accuracy of the second model despite that the training and testing of the proposed models were carried out using spatially and temporally independent data. PMID:28806754

  4. Short-term prediction of solar energy in Saudi Arabia using automated-design fuzzy logic systems.

    PubMed

    Almaraashi, Majid

    2017-01-01

    Solar energy is considered as one of the main sources for renewable energy in the near future. However, solar energy and other renewable energy sources have a drawback related to the difficulty in predicting their availability in the near future. This problem affects optimal exploitation of solar energy, especially in connection with other resources. Therefore, reliable solar energy prediction models are essential to solar energy management and economics. This paper presents work aimed at designing reliable models to predict the global horizontal irradiance (GHI) for the next day in 8 stations in Saudi Arabia. The designed models are based on computational intelligence methods of automated-design fuzzy logic systems. The fuzzy logic systems are designed and optimized with two models using fuzzy c-means clustering (FCM) and simulated annealing (SA) algorithms. The first model uses FCM based on the subtractive clustering algorithm to automatically design the predictor fuzzy rules from data. The second model is using FCM followed by simulated annealing algorithm to enhance the prediction accuracy of the fuzzy logic system. The objective of the predictor is to accurately predict next-day global horizontal irradiance (GHI) using previous-day meteorological and solar radiation observations. The proposed models use observations of 10 variables of measured meteorological and solar radiation data to build the model. The experimentation and results of the prediction are detailed where the root mean square error of the prediction was approximately 88% for the second model tuned by simulated annealing compared to 79.75% accuracy using the first model. This results demonstrate a good modeling accuracy of the second model despite that the training and testing of the proposed models were carried out using spatially and temporally independent data.

  5. K-nearest neighbors based methods for identification of different gear crack levels under different motor speeds and loads: Revisited

    NASA Astrophysics Data System (ADS)

    Wang, Dong

    2016-03-01

    Gears are the most commonly used components in mechanical transmission systems. Their failures may cause transmission system breakdown and result in economic loss. Identification of different gear crack levels is important to prevent any unexpected gear failure because gear cracks lead to gear tooth breakage. Signal processing based methods mainly require expertize to explain gear fault signatures which is usually not easy to be achieved by ordinary users. In order to automatically identify different gear crack levels, intelligent gear crack identification methods should be developed. The previous case studies experimentally proved that K-nearest neighbors based methods exhibit high prediction accuracies for identification of 3 different gear crack levels under different motor speeds and loads. In this short communication, to further enhance prediction accuracies of existing K-nearest neighbors based methods and extend identification of 3 different gear crack levels to identification of 5 different gear crack levels, redundant statistical features are constructed by using Daubechies 44 (db44) binary wavelet packet transform at different wavelet decomposition levels, prior to the use of a K-nearest neighbors method. The dimensionality of redundant statistical features is 620, which provides richer gear fault signatures. Since many of these statistical features are redundant and highly correlated with each other, dimensionality reduction of redundant statistical features is conducted to obtain new significant statistical features. At last, the K-nearest neighbors method is used to identify 5 different gear crack levels under different motor speeds and loads. A case study including 3 experiments is investigated to demonstrate that the developed method provides higher prediction accuracies than the existing K-nearest neighbors based methods for recognizing different gear crack levels under different motor speeds and loads. Based on the new significant statistical features, some other popular statistical models including linear discriminant analysis, quadratic discriminant analysis, classification and regression tree and naive Bayes classifier, are compared with the developed method. The results show that the developed method has the highest prediction accuracies among these statistical models. Additionally, selection of the number of new significant features and parameter selection of K-nearest neighbors are thoroughly investigated.

  6. Graph pyramids for protein function prediction

    PubMed Central

    2015-01-01

    Background Uncovering the hidden organizational characteristics and regularities among biological sequences is the key issue for detailed understanding of an underlying biological phenomenon. Thus pattern recognition from nucleic acid sequences is an important affair for protein function prediction. As proteins from the same family exhibit similar characteristics, homology based approaches predict protein functions via protein classification. But conventional classification approaches mostly rely on the global features by considering only strong protein similarity matches. This leads to significant loss of prediction accuracy. Methods Here we construct the Protein-Protein Similarity (PPS) network, which captures the subtle properties of protein families. The proposed method considers the local as well as the global features, by examining the interactions among 'weakly interacting proteins' in the PPS network and by using hierarchical graph analysis via the graph pyramid. Different underlying properties of the protein families are uncovered by operating the proposed graph based features at various pyramid levels. Results Experimental results on benchmark data sets show that the proposed hierarchical voting algorithm using graph pyramid helps to improve computational efficiency as well the protein classification accuracy. Quantitatively, among 14,086 test sequences, on an average the proposed method misclassified only 21.1 sequences whereas baseline BLAST score based global feature matching method misclassified 362.9 sequences. With each correctly classified test sequence, the fast incremental learning ability of the proposed method further enhances the training model. Thus it has achieved more than 96% protein classification accuracy using only 20% per class training data. PMID:26044522

  7. Graph pyramids for protein function prediction.

    PubMed

    Sandhan, Tushar; Yoo, Youngjun; Choi, Jin; Kim, Sun

    2015-01-01

    Uncovering the hidden organizational characteristics and regularities among biological sequences is the key issue for detailed understanding of an underlying biological phenomenon. Thus pattern recognition from nucleic acid sequences is an important affair for protein function prediction. As proteins from the same family exhibit similar characteristics, homology based approaches predict protein functions via protein classification. But conventional classification approaches mostly rely on the global features by considering only strong protein similarity matches. This leads to significant loss of prediction accuracy. Here we construct the Protein-Protein Similarity (PPS) network, which captures the subtle properties of protein families. The proposed method considers the local as well as the global features, by examining the interactions among 'weakly interacting proteins' in the PPS network and by using hierarchical graph analysis via the graph pyramid. Different underlying properties of the protein families are uncovered by operating the proposed graph based features at various pyramid levels. Experimental results on benchmark data sets show that the proposed hierarchical voting algorithm using graph pyramid helps to improve computational efficiency as well the protein classification accuracy. Quantitatively, among 14,086 test sequences, on an average the proposed method misclassified only 21.1 sequences whereas baseline BLAST score based global feature matching method misclassified 362.9 sequences. With each correctly classified test sequence, the fast incremental learning ability of the proposed method further enhances the training model. Thus it has achieved more than 96% protein classification accuracy using only 20% per class training data.

  8. Assessing participation in community-based physical activity programs in Brazil.

    PubMed

    Reis, Rodrigo S; Yan, Yan; Parra, Diana C; Brownson, Ross C

    2014-01-01

    This study aimed to develop and validate a risk prediction model to examine the characteristics that are associated with participation in community-based physical activity programs in Brazil. We used pooled data from three surveys conducted from 2007 to 2009 in state capitals of Brazil with 6166 adults. A risk prediction model was built considering program participation as an outcome. The predictive accuracy of the model was quantified through discrimination (C statistic) and calibration (Brier score) properties. Bootstrapping methods were used to validate the predictive accuracy of the final model. The final model showed sex (women: odds ratio [OR] = 3.18, 95% confidence interval [CI] = 2.14-4.71), having less than high school degree (OR = 1.71, 95% CI = 1.16-2.53), reporting a good health (OR = 1.58, 95% CI = 1.02-2.24) or very good/excellent health (OR = 1.62, 95% CI = 1.05-2.51), having any comorbidity (OR = 1.74, 95% CI = 1.26-2.39), and perceiving the environment as safe to walk at night (OR = 1.59, 95% CI = 1.18-2.15) as predictors of participation in physical activity programs. Accuracy indices were adequate (C index = 0.778, Brier score = 0.031) and similar to those obtained from bootstrapping (C index = 0.792, Brier score = 0.030). Sociodemographic and health characteristics as well as perceptions of the environment are strong predictors of participation in community-based programs in selected cities of Brazil.

  9. Protein structure refinement using a quantum mechanics-based chemical shielding predictor† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c6sc04344e Click here for additional data file.

    PubMed Central

    2017-01-01

    The accurate prediction of protein chemical shifts using a quantum mechanics (QM)-based method has been the subject of intense research for more than 20 years but so far empirical methods for chemical shift prediction have proven more accurate. In this paper we show that a QM-based predictor of a protein backbone and CB chemical shifts (ProCS15, PeerJ, 2016, 3, e1344) is of comparable accuracy to empirical chemical shift predictors after chemical shift-based structural refinement that removes small structural errors. We present a method by which quantum chemistry based predictions of isotropic chemical shielding values (ProCS15) can be used to refine protein structures using Markov Chain Monte Carlo (MCMC) simulations, relating the chemical shielding values to the experimental chemical shifts probabilistically. Two kinds of MCMC structural refinement simulations were performed using force field geometry optimized X-ray structures as starting points: simulated annealing of the starting structure and constant temperature MCMC simulation followed by simulated annealing of a representative ensemble structure. Annealing of the CHARMM structure changes the CA-RMSD by an average of 0.4 Å but lowers the chemical shift RMSD by 1.0 and 0.7 ppm for CA and N. Conformational averaging has a relatively small effect (0.1–0.2 ppm) on the overall agreement with carbon chemical shifts but lowers the error for nitrogen chemical shifts by 0.4 ppm. If an amino acid specific offset is included the ProCS15 predicted chemical shifts have RMSD values relative to experiments that are comparable to popular empirical chemical shift predictors. The annealed representative ensemble structures differ in CA-RMSD relative to the initial structures by an average of 2.0 Å, with >2.0 Å difference for six proteins. In four of the cases, the largest structural differences arise in structurally flexible regions of the protein as determined by NMR, and in the remaining two cases, the large structural change may be due to force field deficiencies. The overall accuracy of the empirical methods are slightly improved by annealing the CHARMM structure with ProCS15, which may suggest that the minor structural changes introduced by ProCS15-based annealing improves the accuracy of the protein structures. Having established that QM-based chemical shift prediction can deliver the same accuracy as empirical shift predictors we hope this can help increase the accuracy of related approaches such as QM/MM or linear scaling approaches or interpreting protein structural dynamics from QM-derived chemical shift. PMID:28451325

  10. Neural Network Prediction of ICU Length of Stay Following Cardiac Surgery Based on Pre-Incision Variables

    PubMed Central

    Pothula, Venu M.; Yuan, Stanley C.; Maerz, David A.; Montes, Lucresia; Oleszkiewicz, Stephen M.; Yusupov, Albert; Perline, Richard

    2015-01-01

    Background Advanced predictive analytical techniques are being increasingly applied to clinical risk assessment. This study compared a neural network model to several other models in predicting the length of stay (LOS) in the cardiac surgical intensive care unit (ICU) based on pre-incision patient characteristics. Methods Thirty six variables collected from 185 cardiac surgical patients were analyzed for contribution to ICU LOS. The Automatic Linear Modeling (ALM) module of IBM-SPSS software identified 8 factors with statistically significant associations with ICU LOS; these factors were also analyzed with the Artificial Neural Network (ANN) module of the same software. The weighted contributions of each factor (“trained” data) were then applied to data for a “new” patient to predict ICU LOS for that individual. Results Factors identified in the ALM model were: use of an intra-aortic balloon pump; O2 delivery index; age; use of positive cardiac inotropic agents; hematocrit; serum creatinine ≥ 1.3 mg/deciliter; gender; arterial pCO2. The r2 value for ALM prediction of ICU LOS in the initial (training) model was 0.356, p <0.0001. Cross validation in prediction of a “new” patient yielded r2 = 0.200, p <0.0001. The same 8 factors analyzed with ANN yielded a training prediction r2 of 0.535 (p <0.0001) and a cross validation prediction r2 of 0.410, p <0.0001. Two additional predictive algorithms were studied, but they had lower prediction accuracies. Our validated neural network model identified the upper quartile of ICU LOS with an odds ratio of 9.8(p <0.0001). Conclusions ANN demonstrated a 2-fold greater accuracy than ALM in prediction of observed ICU LOS. This greater accuracy would be presumed to result from the capacity of ANN to capture nonlinear effects and higher order interactions. Predictive modeling may be of value in early anticipation of risks of post-operative morbidity and utilization of ICU facilities. PMID:26710254

  11. NMRDSP: an accurate prediction of protein shape strings from NMR chemical shifts and sequence data.

    PubMed

    Mao, Wusong; Cong, Peisheng; Wang, Zhiheng; Lu, Longjian; Zhu, Zhongliang; Li, Tonghua

    2013-01-01

    Shape string is structural sequence and is an extremely important structure representation of protein backbone conformations. Nuclear magnetic resonance chemical shifts give a strong correlation with the local protein structure, and are exploited to predict protein structures in conjunction with computational approaches. Here we demonstrate a novel approach, NMRDSP, which can accurately predict the protein shape string based on nuclear magnetic resonance chemical shifts and structural profiles obtained from sequence data. The NMRDSP uses six chemical shifts (HA, H, N, CA, CB and C) and eight elements of structure profiles as features, a non-redundant set (1,003 entries) as the training set, and a conditional random field as a classification algorithm. For an independent testing set (203 entries), we achieved an accuracy of 75.8% for S8 (the eight states accuracy) and 87.8% for S3 (the three states accuracy). This is higher than only using chemical shifts or sequence data, and confirms that the chemical shift and the structure profile are significant features for shape string prediction and their combination prominently improves the accuracy of the predictor. We have constructed the NMRDSP web server and believe it could be employed to provide a solid platform to predict other protein structures and functions. The NMRDSP web server is freely available at http://cal.tongji.edu.cn/NMRDSP/index.jsp.

  12. A new self-report inventory of dyslexia for students: criterion and construct validity.

    PubMed

    Tamboer, Peter; Vorst, Harrie C M

    2015-02-01

    The validity of a Dutch self-report inventory of dyslexia was ascertained in two samples of students. Six biographical questions, 20 general language statements and 56 specific language statements were based on dyslexia as a multi-dimensional deficit. Dyslexia and non-dyslexia were assessed with two criteria: identification with test results (Sample 1) and classification using biographical information (both samples). Using discriminant analyses, these criteria were predicted with various groups of statements. All together, 11 discriminant functions were used to estimate classification accuracy of the inventory. In Sample 1, 15 statements predicted the test criterion with classification accuracy of 98%, and 18 statements predicted the biographical criterion with classification accuracy of 97%. In Sample 2, 16 statements predicted the biographical criterion with classification accuracy of 94%. Estimations of positive and negative predictive value were 89% and 99%. Items of various discriminant functions were factor analysed to find characteristic difficulties of students with dyslexia, resulting in a five-factor structure in Sample 1 and a four-factor structure in Sample 2. Answer bias was investigated with measures of internal consistency reliability. Less than 20 self-report items are sufficient to accurately classify students with and without dyslexia. This supports the usefulness of self-assessment of dyslexia as a valid alternative to diagnostic test batteries. Copyright © 2015 John Wiley & Sons, Ltd.

  13. Predicting coronary artery disease using different artificial neural network models.

    PubMed

    Colak, M Cengiz; Colak, Cemil; Kocatürk, Hasan; Sağiroğlu, Seref; Barutçu, Irfan

    2008-08-01

    Eight different learning algorithms used for creating artificial neural network (ANN) models and the different ANN models in the prediction of coronary artery disease (CAD) are introduced. This work was carried out as a retrospective case-control study. Overall, 124 consecutive patients who had been diagnosed with CAD by coronary angiography (at least 1 coronary stenosis > 50% in major epicardial arteries) were enrolled in the work. Angiographically, the 113 people (group 2) with normal coronary arteries were taken as control subjects. Multi-layered perceptrons ANN architecture were applied. The ANN models trained with different learning algorithms were performed in 237 records, divided into training (n=171) and testing (n=66) data sets. The performance of prediction was evaluated by sensitivity, specificity and accuracy values based on standard definitions. The results have demonstrated that ANN models trained with eight different learning algorithms are promising because of high (greater than 71%) sensitivity, specificity and accuracy values in the prediction of CAD. Accuracy, sensitivity and specificity values varied between 83.63%-100%, 86.46%-100% and 74.67%-100% for training, respectively. For testing, the values were more than 71% for sensitivity, 76% for specificity and 81% for accuracy. It may be proposed that the use of different learning algorithms other than backpropagation and larger sample sizes can improve the performance of prediction. The proposed ANN models trained with these learning algorithms could be used a promising approach for predicting CAD without the need for invasive diagnostic methods and could help in the prognostic clinical decision.

  14. Evaluating the influence of spatial resolution of Landsat predictors on the accuracy of biomass models for large-area estimation across the eastern USA

    NASA Astrophysics Data System (ADS)

    Deo, Ram K.; Domke, Grant M.; Russell, Matthew B.; Woodall, Christopher W.; Andersen, Hans-Erik

    2018-05-01

    Aboveground biomass (AGB) estimates for regional-scale forest planning have become cost-effective with the free access to satellite data from sensors such as Landsat and MODIS. However, the accuracy of AGB predictions based on passive optical data depends on spatial resolution and spatial extent of target area as fine resolution (small pixels) data are associated with smaller coverage and longer repeat cycles compared to coarse resolution data. This study evaluated various spatial resolutions of Landsat-derived predictors on the accuracy of regional AGB models at three different sites in the eastern USA: Maine, Pennsylvania-New Jersey, and South Carolina. We combined national forest inventory data with Landsat-derived predictors at spatial resolutions ranging from 30–1000 m to understand the optimal spatial resolution of optical data for large-area (regional) AGB estimation. Ten generic models were developed using the data collected in 2014, 2015 and 2016, and the predictions were evaluated (i) at the county-level against the estimates of the USFS Forest Inventory and Analysis Program which relied on EVALIDator tool and national forest inventory data from the 2009–2013 cycle and (ii) within a large number of strips (~1 km wide) predicted via LiDAR metrics at 30 m spatial resolution. The county-level estimates by the EVALIDator and Landsat models were highly related (R 2 > 0.66), although the R 2 varied significantly across sites and resolution of predictors. The mean and standard deviation of county-level estimates followed increasing and decreasing trends, respectively, with models of coarser resolution. The Landsat-based total AGB estimates were larger than the LiDAR-based total estimates within the strips, however the mean of AGB predictions by LiDAR were mostly within one-standard deviations of the mean predictions obtained from the Landsat-based model at any of the resolutions. We conclude that satellite data at resolutions up to 1000 m provide acceptable accuracy for continental scale analysis of AGB.

  15. Surrogate Modeling of High-Fidelity Fracture Simulations for Real-Time Residual Strength Predictions

    NASA Technical Reports Server (NTRS)

    Spear, Ashley D.; Priest, Amanda R.; Veilleux, Michael G.; Ingraffea, Anthony R.; Hochhalter, Jacob D.

    2011-01-01

    A surrogate model methodology is described for predicting in real time the residual strength of flight structures with discrete-source damage. Starting with design of experiment, an artificial neural network is developed that takes as input discrete-source damage parameters and outputs a prediction of the structural residual strength. Target residual strength values used to train the artificial neural network are derived from 3D finite element-based fracture simulations. A residual strength test of a metallic, integrally-stiffened panel is simulated to show that crack growth and residual strength are determined more accurately in discrete-source damage cases by using an elastic-plastic fracture framework rather than a linear-elastic fracture mechanics-based method. Improving accuracy of the residual strength training data would, in turn, improve accuracy of the surrogate model. When combined, the surrogate model methodology and high-fidelity fracture simulation framework provide useful tools for adaptive flight technology.

  16. Feed-Forward Neural Network Soft-Sensor Modeling of Flotation Process Based on Particle Swarm Optimization and Gravitational Search Algorithm

    PubMed Central

    Wang, Jie-Sheng; Han, Shuang

    2015-01-01

    For predicting the key technology indicators (concentrate grade and tailings recovery rate) of flotation process, a feed-forward neural network (FNN) based soft-sensor model optimized by the hybrid algorithm combining particle swarm optimization (PSO) algorithm and gravitational search algorithm (GSA) is proposed. Although GSA has better optimization capability, it has slow convergence velocity and is easy to fall into local optimum. So in this paper, the velocity vector and position vector of GSA are adjusted by PSO algorithm in order to improve its convergence speed and prediction accuracy. Finally, the proposed hybrid algorithm is adopted to optimize the parameters of FNN soft-sensor model. Simulation results show that the model has better generalization and prediction accuracy for the concentrate grade and tailings recovery rate to meet the online soft-sensor requirements of the real-time control in the flotation process. PMID:26583034

  17. Surrogate Modeling of High-Fidelity Fracture Simulations for Real-Time Residual Strength Predictions

    NASA Technical Reports Server (NTRS)

    Spear, Ashley D.; Priest, Amanda R.; Veilleux, Michael G.; Ingraffea, Anthony R.; Hochhalter, Jacob D.

    2011-01-01

    A surrogate model methodology is described for predicting, during flight, the residual strength of aircraft structures that sustain discrete-source damage. Starting with design of experiment, an artificial neural network is developed that takes as input discrete-source damage parameters and outputs a prediction of the structural residual strength. Target residual strength values used to train the artificial neural network are derived from 3D finite element-based fracture simulations. Two ductile fracture simulations are presented to show that crack growth and residual strength are determined more accurately in discrete-source damage cases by using an elastic-plastic fracture framework rather than a linear-elastic fracture mechanics-based method. Improving accuracy of the residual strength training data does, in turn, improve accuracy of the surrogate model. When combined, the surrogate model methodology and high fidelity fracture simulation framework provide useful tools for adaptive flight technology.

  18. Predicting Survival From Large Echocardiography and Electronic Health Record Datasets: Optimization With Machine Learning.

    PubMed

    Samad, Manar D; Ulloa, Alvaro; Wehner, Gregory J; Jing, Linyuan; Hartzel, Dustin; Good, Christopher W; Williams, Brent A; Haggerty, Christopher M; Fornwalt, Brandon K

    2018-06-09

    The goal of this study was to use machine learning to more accurately predict survival after echocardiography. Predicting patient outcomes (e.g., survival) following echocardiography is primarily based on ejection fraction (EF) and comorbidities. However, there may be significant predictive information within additional echocardiography-derived measurements combined with clinical electronic health record data. Mortality was studied in 171,510 unselected patients who underwent 331,317 echocardiograms in a large regional health system. We investigated the predictive performance of nonlinear machine learning models compared with that of linear logistic regression models using 3 different inputs: 1) clinical variables, including 90 cardiovascular-relevant International Classification of Diseases, Tenth Revision, codes, and age, sex, height, weight, heart rate, blood pressures, low-density lipoprotein, high-density lipoprotein, and smoking; 2) clinical variables plus physician-reported EF; and 3) clinical variables and EF, plus 57 additional echocardiographic measurements. Missing data were imputed with a multivariate imputation by using a chained equations algorithm (MICE). We compared models versus each other and baseline clinical scoring systems by using a mean area under the curve (AUC) over 10 cross-validation folds and across 10 survival durations (6 to 60 months). Machine learning models achieved significantly higher prediction accuracy (all AUC >0.82) over common clinical risk scores (AUC = 0.61 to 0.79), with the nonlinear random forest models outperforming logistic regression (p < 0.01). The random forest model including all echocardiographic measurements yielded the highest prediction accuracy (p < 0.01 across all models and survival durations). Only 10 variables were needed to achieve 96% of the maximum prediction accuracy, with 6 of these variables being derived from echocardiography. Tricuspid regurgitation velocity was more predictive of survival than LVEF. In a subset of studies with complete data for the top 10 variables, multivariate imputation by chained equations yielded slightly reduced predictive accuracies (difference in AUC of 0.003) compared with the original data. Machine learning can fully utilize large combinations of disparate input variables to predict survival after echocardiography with superior accuracy. Copyright © 2018 American College of Cardiology Foundation. Published by Elsevier Inc. All rights reserved.

  19. Predicting the thermal conductivity of aluminium alloys in the cryogenic to room temperature range

    NASA Astrophysics Data System (ADS)

    Woodcraft, Adam L.

    2005-06-01

    Aluminium alloys are being used increasingly in cryogenic systems. However, cryogenic thermal conductivity measurements have been made on only a few of the many types in general use. This paper describes a method of predicting the thermal conductivity of any aluminium alloy between the superconducting transition temperature (approximately 1 K) and room temperature, based on a measurement of the thermal conductivity or electrical resistivity at a single temperature. Where predictions are based on low temperature measurements (approximately 4 K and below), the accuracy is generally better than 10%. Useful predictions can also be made from room temperature measurements for most alloys, but with reduced accuracy. This method permits aluminium alloys to be used in situations where the thermal conductivity is important without having to make (or find) direct measurements over the entire temperature range of interest. There is therefore greater scope to choose alloys based on mechanical properties and availability, rather than on whether cryogenic thermal conductivity measurements have been made. Recommended thermal conductivity values are presented for aluminium 6082 (based on a new measurement), and for 1000 series, and types 2014, 2024, 2219, 3003, 5052, 5083, 5086, 5154, 6061, 6063, 6082, 7039 and 7075 (based on low temperature measurements in the literature).

  20. On the distance of genetic relationships and the accuracy of genomic prediction in pig breeding.

    PubMed

    Meuwissen, Theo H E; Odegard, Jorgen; Andersen-Ranberg, Ina; Grindflek, Eli

    2014-08-01

    With the advent of genomic selection, alternative relationship matrices are used in animal breeding, which vary in their coverage of distant relationships due to old common ancestors. Relationships based on pedigree (A) and linkage analysis (GLA) cover only recent relationships because of the limited depth of the known pedigree. Relationships based on identity-by-state (G) include relationships up to the age of the SNP (single nucleotide polymorphism) mutations. We hypothesised that the latter relationships were too old, since QTL (quantitative trait locus) mutations for traits under selection were probably more recent than the SNPs on a chip, which are typically selected for high minor allele frequency. In addition, A and GLA relationships are too recent to cover genetic differences accurately. Thus, we devised a relationship matrix that considered intermediate-aged relationships and compared all these relationship matrices for their accuracy of genomic prediction in a pig breeding situation. Haplotypes were constructed and used to build a haplotype-based relationship matrix (GH), which considers more intermediate-aged relationships, since haplotypes recombine more quickly than SNPs mutate. Dense genotypes (38 453 SNPs) on 3250 elite breeding pigs were combined with phenotypes for growth rate (2668 records), lean meat percentage (2618), weight at three weeks of age (7387) and number of teats (5851) to estimate breeding values for all animals in the pedigree (8187 animals) using the aforementioned relationship matrices. Phenotypes on the youngest 424 to 486 animals were masked and predicted in order to assess the accuracy of the alternative genomic predictions. Correlations between the relationships and regressions of older on younger relationships revealed that the age of the relationships increased in the order A, GLA, GH and G. Use of genomic relationship matrices yielded significantly higher prediction accuracies than A. GH and G, differed not significantly, but were significantly more accurate than GLA. Our hypothesis that intermediate-aged relationships yield more accurate genomic predictions than G was confirmed for two of four traits, but these results were not statistically significant. Use of estimated genotype probabilities for ungenotyped animals proved to be an efficient method to include the phenotypes of ungenotyped animals.

  1. Efficient depth intraprediction method for H.264/AVC-based three-dimensional video coding

    NASA Astrophysics Data System (ADS)

    Oh, Kwan-Jung; Oh, Byung Tae

    2015-04-01

    We present an intracoding method that is applicable to depth map coding in multiview plus depth systems. Our approach combines skip prediction and plane segmentation-based prediction. The proposed depth intraskip prediction uses the estimated direction at both the encoder and decoder, and does not need to encode residual data. Our plane segmentation-based intraprediction divides the current block into biregions, and applies a different prediction scheme for each segmented region. This method avoids incorrect estimations across different regions, resulting in higher prediction accuracy. Simulation results demonstrate that the proposed scheme is superior to H.264/advanced video coding intraprediction and has the ability to improve the subjective rendering quality.

  2. A universal deep learning approach for modeling the flow of patients under different severities.

    PubMed

    Jiang, Shancheng; Chin, Kwai-Sang; Tsui, Kwok L

    2018-02-01

    The Accident and Emergency Department (A&ED) is the frontline for providing emergency care in hospitals. Unfortunately, relative A&ED resources have failed to keep up with continuously increasing demand in recent years, which leads to overcrowding in A&ED. Knowing the fluctuation of patient arrival volume in advance is a significant premise to relieve this pressure. Based on this motivation, the objective of this study is to explore an integrated framework with high accuracy for predicting A&ED patient flow under different triage levels, by combining a novel feature selection process with deep neural networks. Administrative data is collected from an actual A&ED and categorized into five groups based on different triage levels. A genetic algorithm (GA)-based feature selection algorithm is improved and implemented as a pre-processing step for this time-series prediction problem, in order to explore key features affecting patient flow. In our improved GA, a fitness-based crossover is proposed to maintain the joint information of multiple features during iterative process, instead of traditional point-based crossover. Deep neural networks (DNN) is employed as the prediction model to utilize their universal adaptability and high flexibility. In the model-training process, the learning algorithm is well-configured based on a parallel stochastic gradient descent algorithm. Two effective regularization strategies are integrated in one DNN framework to avoid overfitting. All introduced hyper-parameters are optimized efficiently by grid-search in one pass. As for feature selection, our improved GA-based feature selection algorithm has outperformed a typical GA and four state-of-the-art feature selection algorithms (mRMR, SAFS, VIFR, and CFR). As for the prediction accuracy of proposed integrated framework, compared with other frequently used statistical models (GLM, seasonal-ARIMA, ARIMAX, and ANN) and modern machine models (SVM-RBF, SVM-linear, RF, and R-LASSO), the proposed integrated "DNN-I-GA" framework achieves higher prediction accuracy on both MAPE and RMSE metrics in pairwise comparisons. The contribution of our study is two-fold. Theoretically, the traditional GA-based feature selection process is improved to have less hyper-parameters and higher efficiency, and the joint information of multiple features is maintained by fitness-based crossover operator. The universal property of DNN is further enhanced by merging different regularization strategies. Practically, features selected by our improved GA can be used to acquire an underlying relationship between patient flows and input features. Predictive values are significant indicators of patients' demand and can be used by A&ED managers to make resource planning and allocation. High accuracy achieved by the present framework in different cases enhances the reliability of downstream decision makings. Copyright © 2017 Elsevier B.V. All rights reserved.

  3. Using Time Series Analysis to Predict Cardiac Arrest in a PICU.

    PubMed

    Kennedy, Curtis E; Aoki, Noriaki; Mariscalco, Michele; Turley, James P

    2015-11-01

    To build and test cardiac arrest prediction models in a PICU, using time series analysis as input, and to measure changes in prediction accuracy attributable to different classes of time series data. Retrospective cohort study. Thirty-one bed academic PICU that provides care for medical and general surgical (not congenital heart surgery) patients. Patients experiencing a cardiac arrest in the PICU and requiring external cardiac massage for at least 2 minutes. None. One hundred three cases of cardiac arrest and 109 control cases were used to prepare a baseline dataset that consisted of 1,025 variables in four data classes: multivariate, raw time series, clinical calculations, and time series trend analysis. We trained 20 arrest prediction models using a matrix of five feature sets (combinations of data classes) with four modeling algorithms: linear regression, decision tree, neural network, and support vector machine. The reference model (multivariate data with regression algorithm) had an accuracy of 78% and 87% area under the receiver operating characteristic curve. The best model (multivariate + trend analysis data with support vector machine algorithm) had an accuracy of 94% and 98% area under the receiver operating characteristic curve. Cardiac arrest predictions based on a traditional model built with multivariate data and a regression algorithm misclassified cases 3.7 times more frequently than predictions that included time series trend analysis and built with a support vector machine algorithm. Although the final model lacks the specificity necessary for clinical application, we have demonstrated how information from time series data can be used to increase the accuracy of clinical prediction models.

  4. Prediction accuracies for growth and wood attributes of interior spruce in space using genotyping-by-sequencing.

    PubMed

    Gamal El-Dien, Omnia; Ratcliffe, Blaise; Klápště, Jaroslav; Chen, Charles; Porth, Ilga; El-Kassaby, Yousry A

    2015-05-09

    Genomic selection (GS) in forestry can substantially reduce the length of breeding cycle and increase gain per unit time through early selection and greater selection intensity, particularly for traits of low heritability and late expression. Affordable next-generation sequencing technologies made it possible to genotype large numbers of trees at a reasonable cost. Genotyping-by-sequencing was used to genotype 1,126 Interior spruce trees representing 25 open-pollinated families planted over three sites in British Columbia, Canada. Four imputation algorithms were compared (mean value (MI), singular value decomposition (SVD), expectation maximization (EM), and a newly derived, family-based k-nearest neighbor (kNN-Fam)). Trees were phenotyped for several yield and wood attributes. Single- and multi-site GS prediction models were developed using the Ridge Regression Best Linear Unbiased Predictor (RR-BLUP) and the Generalized Ridge Regression (GRR) to test different assumption about trait architecture. Finally, using PCA, multi-trait GS prediction models were developed. The EM and kNN-Fam imputation methods were superior for 30 and 60% missing data, respectively. The RR-BLUP GS prediction model produced better accuracies than the GRR indicating that the genetic architecture for these traits is complex. GS prediction accuracies for multi-site were high and better than those of single-sites while multi-site predictability produced the lowest accuracies reflecting type-b genetic correlations and deemed unreliable. The incorporation of genomic information in quantitative genetics analyses produced more realistic heritability estimates as half-sib pedigree tended to inflate the additive genetic variance and subsequently both heritability and gain estimates. Principle component scores as representatives of multi-trait GS prediction models produced surprising results where negatively correlated traits could be concurrently selected for using PCA2 and PCA3. The application of GS to open-pollinated family testing, the simplest form of tree improvement evaluation methods, was proven to be effective. Prediction accuracies obtained for all traits greatly support the integration of GS in tree breeding. While the within-site GS prediction accuracies were high, the results clearly indicate that single-site GS models ability to predict other sites are unreliable supporting the utilization of multi-site approach. Principle component scores provided an opportunity for the concurrent selection of traits with different phenotypic optima.

  5. A data-driven SVR model for long-term runoff prediction and uncertainty analysis based on the Bayesian framework

    NASA Astrophysics Data System (ADS)

    Liang, Zhongmin; Li, Yujie; Hu, Yiming; Li, Binquan; Wang, Jun

    2017-06-01

    Accurate and reliable long-term forecasting plays an important role in water resources management and utilization. In this paper, a hybrid model called SVR-HUP is presented to predict long-term runoff and quantify the prediction uncertainty. The model is created based on three steps. First, appropriate predictors are selected according to the correlations between meteorological factors and runoff. Second, a support vector regression (SVR) model is structured and optimized based on the LibSVM toolbox and a genetic algorithm. Finally, using forecasted and observed runoff, a hydrologic uncertainty processor (HUP) based on a Bayesian framework is used to estimate the posterior probability distribution of the simulated values, and the associated uncertainty of prediction was quantitatively analyzed. Six precision evaluation indexes, including the correlation coefficient (CC), relative root mean square error (RRMSE), relative error (RE), mean absolute percentage error (MAPE), Nash-Sutcliffe efficiency (NSE), and qualification rate (QR), are used to measure the prediction accuracy. As a case study, the proposed approach is applied in the Han River basin, South Central China. Three types of SVR models are established to forecast the monthly, flood season and annual runoff volumes. The results indicate that SVR yields satisfactory accuracy and reliability at all three scales. In addition, the results suggest that the HUP cannot only quantify the uncertainty of prediction based on a confidence interval but also provide a more accurate single value prediction than the initial SVR forecasting result. Thus, the SVR-HUP model provides an alternative method for long-term runoff forecasting.

  6. Trust from the past: Bayesian Personalized Ranking based Link Prediction in Knowledge Graphs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Baichuan; Choudhury, Sutanay; Al-Hasan, Mohammad

    2016-02-01

    Estimating the confidence for a link is a critical task for Knowledge Graph construction. Link prediction, or predicting the likelihood of a link in a knowledge graph based on prior state is a key research direction within this area. We propose a Latent Feature Embedding based link recommendation model for prediction task and utilize Bayesian Personalized Ranking based optimization technique for learning models for each predicate. Experimental results on large-scale knowledge bases such as YAGO2 show that our approach achieves substantially higher performance than several state-of-art approaches. Furthermore, we also study the performance of the link prediction algorithm in termsmore » of topological properties of the Knowledge Graph and present a linear regression model to reason about its expected level of accuracy.« less

  7. Enhancing Predictive Accuracy of Cardiac Autonomic Neuropathy Using Blood Biochemistry Features and Iterative Multitier Ensembles.

    PubMed

    Abawajy, Jemal; Kelarev, Andrei; Chowdhury, Morshed U; Jelinek, Herbert F

    2016-01-01

    Blood biochemistry attributes form an important class of tests, routinely collected several times per year for many patients with diabetes. The objective of this study is to investigate the role of blood biochemistry for improving the predictive accuracy of the diagnosis of cardiac autonomic neuropathy (CAN) progression. Blood biochemistry contributes to CAN, and so it is a causative factor that can provide additional power for the diagnosis of CAN especially in the absence of a complete set of Ewing tests. We introduce automated iterative multitier ensembles (AIME) and investigate their performance in comparison to base classifiers and standard ensemble classifiers for blood biochemistry attributes. AIME incorporate diverse ensembles into several tiers simultaneously and combine them into one automatically generated integrated system so that one ensemble acts as an integral part of another ensemble. We carried out extensive experimental analysis using large datasets from the diabetes screening research initiative (DiScRi) project. The results of our experiments show that several blood biochemistry attributes can be used to supplement the Ewing battery for the detection of CAN in situations where one or more of the Ewing tests cannot be completed because of the individual difficulties faced by each patient in performing the tests. The results show that AIME provide higher accuracy as a multitier CAN classification paradigm. The best predictive accuracy of 99.57% has been obtained by the AIME combining decorate on top tier with bagging on middle tier based on random forest. Practitioners can use these findings to increase the accuracy of CAN diagnosis.

  8. Compound activity prediction using models of binding pockets or ligand properties in 3D

    PubMed Central

    Kufareva, Irina; Chen, Yu-Chen; Ilatovskiy, Andrey V.; Abagyan, Ruben

    2014-01-01

    Transient interactions of endogenous and exogenous small molecules with flexible binding sites in proteins or macromolecular assemblies play a critical role in all biological processes. Current advances in high-resolution protein structure determination, database development, and docking methodology make it possible to design three-dimensional models for prediction of such interactions with increasing accuracy and specificity. Using the data collected in the Pocketome encyclopedia, we here provide an overview of two types of the three-dimensional ligand activity models, pocket-based and ligand property-based, for two important classes of proteins, nuclear and G-protein coupled receptors. For half the targets, the pocket models discriminate actives from property matched decoys with acceptable accuracy (the area under ROC curve, AUC, exceeding 84%) and for about one fifth of the targets with high accuracy (AUC > 95%). The 3D ligand property field models performed better than 95% in half of the cases. The high performance models can already become a basis of activity predictions for new chemicals. Family-wide benchmarking of the models highlights strengths of both approaches and helps identify their inherent bottlenecks and challenges. PMID:23116466

  9. Kalman/Map filtering-aided fast normalized cross correlation-based Wi-Fi fingerprinting location sensing.

    PubMed

    Sun, Yongliang; Xu, Yubin; Li, Cheng; Ma, Lin

    2013-11-13

    A Kalman/map filtering (KMF)-aided fast normalized cross correlation (FNCC)-based Wi-Fi fingerprinting location sensing system is proposed in this paper. Compared with conventional neighbor selection algorithms that calculate localization results with received signal strength (RSS) mean samples, the proposed FNCC algorithm makes use of all the on-line RSS samples and reference point RSS variations to achieve higher fingerprinting accuracy. The FNCC computes efficiently while maintaining the same accuracy as the basic normalized cross correlation. Additionally, a KMF is also proposed to process fingerprinting localization results. It employs a new map matching algorithm to nonlinearize the linear location prediction process of Kalman filtering (KF) that takes advantage of spatial proximities of consecutive localization results. With a calibration model integrated into an indoor map, the map matching algorithm corrects unreasonable prediction locations of the KF according to the building interior structure. Thus, more accurate prediction locations are obtained. Using these locations, the KMF considerably improves fingerprinting algorithm performance. Experimental results demonstrate that the FNCC algorithm with reduced computational complexity outperforms other neighbor selection algorithms and the KMF effectively improves location sensing accuracy by using indoor map information and spatial proximities of consecutive localization results.

  10. Kalman/Map Filtering-Aided Fast Normalized Cross Correlation-Based Wi-Fi Fingerprinting Location Sensing

    PubMed Central

    Sun, Yongliang; Xu, Yubin; Li, Cheng; Ma, Lin

    2013-01-01

    A Kalman/map filtering (KMF)-aided fast normalized cross correlation (FNCC)-based Wi-Fi fingerprinting location sensing system is proposed in this paper. Compared with conventional neighbor selection algorithms that calculate localization results with received signal strength (RSS) mean samples, the proposed FNCC algorithm makes use of all the on-line RSS samples and reference point RSS variations to achieve higher fingerprinting accuracy. The FNCC computes efficiently while maintaining the same accuracy as the basic normalized cross correlation. Additionally, a KMF is also proposed to process fingerprinting localization results. It employs a new map matching algorithm to nonlinearize the linear location prediction process of Kalman filtering (KF) that takes advantage of spatial proximities of consecutive localization results. With a calibration model integrated into an indoor map, the map matching algorithm corrects unreasonable prediction locations of the KF according to the building interior structure. Thus, more accurate prediction locations are obtained. Using these locations, the KMF considerably improves fingerprinting algorithm performance. Experimental results demonstrate that the FNCC algorithm with reduced computational complexity outperforms other neighbor selection algorithms and the KMF effectively improves location sensing accuracy by using indoor map information and spatial proximities of consecutive localization results. PMID:24233027

  11. Predicting body fat percentage based on gender, age and BMI by using artificial neural networks.

    PubMed

    Kupusinac, Aleksandar; Stokić, Edita; Doroslovački, Rade

    2014-02-01

    In the human body, the relation between fat and fat-free mass (muscles, bones etc.) is necessary for the diagnosis of obesity and prediction of its comorbidities. Numerous formulas, such as Deurenberg et al., Gallagher et al., Jackson and Pollock, Jackson et al. etc., are available to predict body fat percentage (BF%) from gender (GEN), age (AGE) and body mass index (BMI). These formulas are all fairly similar and widely applicable, since they provide an easy, low-cost and non-invasive prediction of BF%. This paper presents a program solution for predicting BF% based on artificial neural network (ANN). ANN training, validation and testing are done by randomly divided dataset that includes 2755 subjects: 1332 women (GEN = 0) and 1423 men (GEN = 1), with AGE from 18 to 88 y and BMI from 16.60 to 64.60 kg/m(2). BF% was estimated by using Tanita bioelectrical impedance measurements (Tanita Corporation, Tokyo, Japan). ANN inputs are: GEN, AGE and BMI, and output is BF%. The predictive accuracy of our solution is 80.43%. The main goal of this paper is to promote a new approach to predicting BF% that has same complexity and costs but higher predictive accuracy than above-mentioned formulas. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  12. STRUM: structure-based prediction of protein stability changes upon single-point mutation.

    PubMed

    Quan, Lijun; Lv, Qiang; Zhang, Yang

    2016-10-01

    Mutations in human genome are mainly through single nucleotide polymorphism, some of which can affect stability and function of proteins, causing human diseases. Several methods have been proposed to predict the effect of mutations on protein stability; but most require features from experimental structure. Given the fast progress in protein structure prediction, this work explores the possibility to improve the mutation-induced stability change prediction using low-resolution structure modeling. We developed a new method (STRUM) for predicting stability change caused by single-point mutations. Starting from wild-type sequences, 3D models are constructed by the iterative threading assembly refinement (I-TASSER) simulations, where physics- and knowledge-based energy functions are derived on the I-TASSER models and used to train STRUM models through gradient boosting regression. STRUM was assessed by 5-fold cross validation on 3421 experimentally determined mutations from 150 proteins. The Pearson correlation coefficient (PCC) between predicted and measured changes of Gibbs free-energy gap, ΔΔG, upon mutation reaches 0.79 with a root-mean-square error 1.2 kcal/mol in the mutation-based cross-validations. The PCC reduces if separating training and test mutations from non-homologous proteins, which reflects inherent correlations in the current mutation sample. Nevertheless, the results significantly outperform other state-of-the-art methods, including those built on experimental protein structures. Detailed analyses show that the most sensitive features in STRUM are the physics-based energy terms on I-TASSER models and the conservation scores from multiple-threading template alignments. However, the ΔΔG prediction accuracy has only a marginal dependence on the accuracy of protein structure models as long as the global fold is correct. These data demonstrate the feasibility to use low-resolution structure modeling for high-accuracy stability change prediction upon point mutations. http://zhanglab.ccmb.med.umich.edu/STRUM/ CONTACT: qiang@suda.edu.cn and zhng@umich.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  13. STRUM: structure-based prediction of protein stability changes upon single-point mutation

    PubMed Central

    Quan, Lijun; Lv, Qiang; Zhang, Yang

    2016-01-01

    Motivation: Mutations in human genome are mainly through single nucleotide polymorphism, some of which can affect stability and function of proteins, causing human diseases. Several methods have been proposed to predict the effect of mutations on protein stability; but most require features from experimental structure. Given the fast progress in protein structure prediction, this work explores the possibility to improve the mutation-induced stability change prediction using low-resolution structure modeling. Results: We developed a new method (STRUM) for predicting stability change caused by single-point mutations. Starting from wild-type sequences, 3D models are constructed by the iterative threading assembly refinement (I-TASSER) simulations, where physics- and knowledge-based energy functions are derived on the I-TASSER models and used to train STRUM models through gradient boosting regression. STRUM was assessed by 5-fold cross validation on 3421 experimentally determined mutations from 150 proteins. The Pearson correlation coefficient (PCC) between predicted and measured changes of Gibbs free-energy gap, ΔΔG, upon mutation reaches 0.79 with a root-mean-square error 1.2 kcal/mol in the mutation-based cross-validations. The PCC reduces if separating training and test mutations from non-homologous proteins, which reflects inherent correlations in the current mutation sample. Nevertheless, the results significantly outperform other state-of-the-art methods, including those built on experimental protein structures. Detailed analyses show that the most sensitive features in STRUM are the physics-based energy terms on I-TASSER models and the conservation scores from multiple-threading template alignments. However, the ΔΔG prediction accuracy has only a marginal dependence on the accuracy of protein structure models as long as the global fold is correct. These data demonstrate the feasibility to use low-resolution structure modeling for high-accuracy stability change prediction upon point mutations. Availability and Implementation: http://zhanglab.ccmb.med.umich.edu/STRUM/ Contact: qiang@suda.edu.cn and zhng@umich.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27318206

  14. Individual tree crown approach for predicting site index in boreal forests using airborne laser scanning and hyperspectral data

    NASA Astrophysics Data System (ADS)

    Kandare, Kaja; Ørka, Hans Ole; Dalponte, Michele; Næsset, Erik; Gobakken, Terje

    2017-08-01

    Site productivity is essential information for sustainable forest management and site index (SI) is the most common quantitative measure of it. The SI is usually determined for individual tree species based on tree height and the age of the 100 largest trees per hectare according to stem diameter. The present study aimed to demonstrate and validate a methodology for the determination of SI using remotely sensed data, in particular fused airborne laser scanning (ALS) and airborne hyperspectral data in a forest site in Norway. The applied approach was based on individual tree crown (ITC) delineation: tree species, tree height, diameter at breast height (DBH), and age were modelled and predicted at ITC level using 10-fold cross validation. Four dominant ITCs per 400 m2 plot were selected as input to predict SI at plot level for Norway spruce (Picea abies (L.) Karst.) and Scots pine (Pinus sylvestris L.). We applied an experimental setup with different subsets of dominant ITCs with different combinations of attributes (predicted or field-derived) for SI predictions. The results revealed that the selection of the dominant ITCs based on the largest DBH independent of tree species, predicted the SI with similar accuracy as ITCs matched with field-derived dominant trees (RMSE: 27.6% vs 23.3%). The SI accuracies were at the same level when dominant species were determined from the remotely sensed or field data (RMSE: 27.6% vs 27.8%). However, when the predicted tree age was used the SI accuracy decreased compared to field-derived age (RMSE: 27.6% vs 7.6%). In general, SI was overpredicted for both tree species in the mature forest, while there was an underprediction in the young forest. In conclusion, the proposed approach for SI determination based on ITC delineation and a combination of ALS and hyperspectral data is an efficient and stable procedure, which has the potential to predict SI in forest areas at various spatial scales and additionally to improve existing SI maps in Norway.

  15. Feature Selection Methods for Zero-Shot Learning of Neural Activity.

    PubMed

    Caceres, Carlos A; Roos, Matthew J; Rupp, Kyle M; Milsap, Griffin; Crone, Nathan E; Wolmetz, Michael E; Ratto, Christopher R

    2017-01-01

    Dimensionality poses a serious challenge when making predictions from human neuroimaging data. Across imaging modalities, large pools of potential neural features (e.g., responses from particular voxels, electrodes, and temporal windows) have to be related to typically limited sets of stimuli and samples. In recent years, zero-shot prediction models have been introduced for mapping between neural signals and semantic attributes, which allows for classification of stimulus classes not explicitly included in the training set. While choices about feature selection can have a substantial impact when closed-set accuracy, open-set robustness, and runtime are competing design objectives, no systematic study of feature selection for these models has been reported. Instead, a relatively straightforward feature stability approach has been adopted and successfully applied across models and imaging modalities. To characterize the tradeoffs in feature selection for zero-shot learning, we compared correlation-based stability to several other feature selection techniques on comparable data sets from two distinct imaging modalities: functional Magnetic Resonance Imaging and Electrocorticography. While most of the feature selection methods resulted in similar zero-shot prediction accuracies and spatial/spectral patterns of selected features, there was one exception; A novel feature/attribute correlation approach was able to achieve those accuracies with far fewer features, suggesting the potential for simpler prediction models that yield high zero-shot classification accuracy.

  16. Enhanced NMR Discrimination of Pharmaceutically Relevant Molecular Crystal Forms through Fragment-Based Ab Initio Chemical Shift Predictions.

    PubMed

    Hartman, Joshua D; Day, Graeme M; Beran, Gregory J O

    2016-11-02

    Chemical shift prediction plays an important role in the determination or validation of crystal structures with solid-state nuclear magnetic resonance (NMR) spectroscopy. One of the fundamental theoretical challenges lies in discriminating variations in chemical shifts resulting from different crystallographic environments. Fragment-based electronic structure methods provide an alternative to the widely used plane wave gauge-including projector augmented wave (GIPAW) density functional technique for chemical shift prediction. Fragment methods allow hybrid density functionals to be employed routinely in chemical shift prediction, and we have recently demonstrated appreciable improvements in the accuracy of the predicted shifts when using the hybrid PBE0 functional instead of generalized gradient approximation (GGA) functionals like PBE. Here, we investigate the solid-state 13 C and 15 N NMR spectra for multiple crystal forms of acetaminophen, phenobarbital, and testosterone. We demonstrate that the use of the hybrid density functional instead of a GGA provides both higher accuracy in the chemical shifts and increased discrimination among the different crystallographic environments. Finally, these results also provide compelling evidence for the transferability of the linear regression parameters mapping predicted chemical shieldings to chemical shifts that were derived in an earlier study.

  17. Enhanced NMR Discrimination of Pharmaceutically Relevant Molecular Crystal Forms through Fragment-Based Ab Initio Chemical Shift Predictions

    PubMed Central

    2016-01-01

    Chemical shift prediction plays an important role in the determination or validation of crystal structures with solid-state nuclear magnetic resonance (NMR) spectroscopy. One of the fundamental theoretical challenges lies in discriminating variations in chemical shifts resulting from different crystallographic environments. Fragment-based electronic structure methods provide an alternative to the widely used plane wave gauge-including projector augmented wave (GIPAW) density functional technique for chemical shift prediction. Fragment methods allow hybrid density functionals to be employed routinely in chemical shift prediction, and we have recently demonstrated appreciable improvements in the accuracy of the predicted shifts when using the hybrid PBE0 functional instead of generalized gradient approximation (GGA) functionals like PBE. Here, we investigate the solid-state 13C and 15N NMR spectra for multiple crystal forms of acetaminophen, phenobarbital, and testosterone. We demonstrate that the use of the hybrid density functional instead of a GGA provides both higher accuracy in the chemical shifts and increased discrimination among the different crystallographic environments. Finally, these results also provide compelling evidence for the transferability of the linear regression parameters mapping predicted chemical shieldings to chemical shifts that were derived in an earlier study. PMID:27829821

  18. Prediction of Dementia in Primary Care Patients

    PubMed Central

    Jessen, Frank; Wiese, Birgitt; Bickel, Horst; Eiffländer-Gorfer, Sandra; Fuchs, Angela; Kaduszkiewicz, Hanna; Köhler, Mirjam; Luck, Tobias; Mösch, Edelgard; Pentzek, Michael; Riedel-Heller, Steffi G.; Wagner, Michael; Weyerer, Siegfried; Maier, Wolfgang; van den Bussche, Hendrik

    2011-01-01

    Background Current approaches for AD prediction are based on biomarkers, which are however of restricted availability in primary care. AD prediction tools for primary care are therefore needed. We present a prediction score based on information that can be obtained in the primary care setting. Methodology/Principal Findings We performed a longitudinal cohort study in 3.055 non-demented individuals above 75 years recruited via primary care chart registries (Study on Aging, Cognition and Dementia, AgeCoDe). After the baseline investigation we performed three follow-up investigations at 18 months intervals with incident dementia as the primary outcome. The best set of predictors was extracted from the baseline variables in one randomly selected half of the sample. This set included age, subjective memory impairment, performance on delayed verbal recall and verbal fluency, on the Mini-Mental-State-Examination, and on an instrumental activities of daily living scale. These variables were aggregated to a prediction score, which achieved a prediction accuracy of 0.84 for AD. The score was applied to the second half of the sample (test cohort). Here, the prediction accuracy was 0.79. With a cut-off of at least 80% sensitivity in the first cohort, 79.6% sensitivity, 66.4% specificity, 14.7% positive predictive value (PPV) and 97.8% negative predictive value of (NPV) for AD were achieved in the test cohort. At a cut-off for a high risk population (5% of individuals with the highest risk score in the first cohort) the PPV for AD was 39.1% (52% for any dementia) in the test cohort. Conclusions The prediction score has useful prediction accuracy. It can define individuals (1) sensitively for low cost-low risk interventions, or (2) more specific and with increased PPV for measures of prevention with greater costs or risks. As it is independent of technical aids, it may be used within large scale prevention programs. PMID:21364746

  19. Prediction of dementia in primary care patients.

    PubMed

    Jessen, Frank; Wiese, Birgitt; Bickel, Horst; Eiffländer-Gorfer, Sandra; Fuchs, Angela; Kaduszkiewicz, Hanna; Köhler, Mirjam; Luck, Tobias; Mösch, Edelgard; Pentzek, Michael; Riedel-Heller, Steffi G; Wagner, Michael; Weyerer, Siegfried; Maier, Wolfgang; van den Bussche, Hendrik

    2011-02-18

    Current approaches for AD prediction are based on biomarkers, which are however of restricted availability in primary care. AD prediction tools for primary care are therefore needed. We present a prediction score based on information that can be obtained in the primary care setting. We performed a longitudinal cohort study in 3.055 non-demented individuals above 75 years recruited via primary care chart registries (Study on Aging, Cognition and Dementia, AgeCoDe). After the baseline investigation we performed three follow-up investigations at 18 months intervals with incident dementia as the primary outcome. The best set of predictors was extracted from the baseline variables in one randomly selected half of the sample. This set included age, subjective memory impairment, performance on delayed verbal recall and verbal fluency, on the Mini-Mental-State-Examination, and on an instrumental activities of daily living scale. These variables were aggregated to a prediction score, which achieved a prediction accuracy of 0.84 for AD. The score was applied to the second half of the sample (test cohort). Here, the prediction accuracy was 0.79. With a cut-off of at least 80% sensitivity in the first cohort, 79.6% sensitivity, 66.4% specificity, 14.7% positive predictive value (PPV) and 97.8% negative predictive value of (NPV) for AD were achieved in the test cohort. At a cut-off for a high risk population (5% of individuals with the highest risk score in the first cohort) the PPV for AD was 39.1% (52% for any dementia) in the test cohort. The prediction score has useful prediction accuracy. It can define individuals (1) sensitively for low cost-low risk interventions, or (2) more specific and with increased PPV for measures of prevention with greater costs or risks. As it is independent of technical aids, it may be used within large scale prevention programs.

  20. Development and evaluation of a regression-based model to predict cesium concentration ratios for freshwater fish.

    PubMed

    Pinder, John E; Rowan, David J; Rasmussen, Joseph B; Smith, Jim T; Hinton, Thomas G; Whicker, F W

    2014-08-01

    Data from published studies and World Wide Web sources were combined to produce and test a regression model to predict Cs concentration ratios for freshwater fish species. The accuracies of predicted concentration ratios, which were computed using 1) species trophic levels obtained from random resampling of known food items and 2) K concentrations in the water for 207 fish from 44 species and 43 locations, were tested against independent observations of ratios for 57 fish from 17 species from 25 locations. Accuracy was assessed as the percent of observed to predicted ratios within factors of 2 or 3. Conservatism, expressed as the lack of under prediction, was assessed as the percent of observed to predicted ratios that were less than 2 or less than 3. The model's median observed to predicted ratio was 1.26, which was not significantly different from 1, and 50% of the ratios were between 0.73 and 1.85. The percentages of ratios within factors of 2 or 3 were 67 and 82%, respectively. The percentages of ratios that were <2 or <3 were 79 and 88%, respectively. An example for Perca fluviatilis demonstrated that increased prediction accuracy could be obtained when more detailed knowledge of diet was available to estimate trophic level. Copyright © 2014 Elsevier Ltd. All rights reserved.

  1. ir-HSP: Improved Recognition of Heat Shock Proteins, Their Families and Sub-types Based On g-Spaced Di-peptide Features and Support Vector Machine

    PubMed Central

    Meher, Prabina K.; Sahu, Tanmaya K.; Gahoi, Shachi; Rao, Atmakuri R.

    2018-01-01

    Heat shock proteins (HSPs) play a pivotal role in cell growth and variability. Since conventional approaches are expensive and voluminous protein sequence information is available in the post-genomic era, development of an automated and accurate computational tool is highly desirable for prediction of HSPs, their families and sub-types. Thus, we propose a computational approach for reliable prediction of all these components in a single framework and with higher accuracy as well. The proposed approach achieved an overall accuracy of ~84% in predicting HSPs, ~97% in predicting six different families of HSPs, and ~94% in predicting four types of DnaJ proteins, with bench mark datasets. The developed approach also achieved higher accuracy as compared to most of the existing approaches. For easy prediction of HSPs by experimental scientists, a user friendly web server ir-HSP is made freely accessible at http://cabgrid.res.in:8080/ir-hsp. The ir-HSP was further evaluated for proteome-wide identification of HSPs by using proteome datasets of eight different species, and ~50% of the predicted HSPs in each species were found to be annotated with InterPro HSP families/domains. Thus, the developed computational method is expected to supplement the currently available approaches for prediction of HSPs, to the extent of their families and sub-types. PMID:29379521

  2. A Coarse-Grained Elastic Network Atom Contact Model and Its Use in the Simulation of Protein Dynamics and the Prediction of the Effect of Mutations

    PubMed Central

    Frappier, Vincent; Najmanovich, Rafael J.

    2014-01-01

    Normal mode analysis (NMA) methods are widely used to study dynamic aspects of protein structures. Two critical components of NMA methods are coarse-graining in the level of simplification used to represent protein structures and the choice of potential energy functional form. There is a trade-off between speed and accuracy in different choices. In one extreme one finds accurate but slow molecular-dynamics based methods with all-atom representations and detailed atom potentials. On the other extreme, fast elastic network model (ENM) methods with Cα−only representations and simplified potentials that based on geometry alone, thus oblivious to protein sequence. Here we present ENCoM, an Elastic Network Contact Model that employs a potential energy function that includes a pairwise atom-type non-bonded interaction term and thus makes it possible to consider the effect of the specific nature of amino-acids on dynamics within the context of NMA. ENCoM is as fast as existing ENM methods and outperforms such methods in the generation of conformational ensembles. Here we introduce a new application for NMA methods with the use of ENCoM in the prediction of the effect of mutations on protein stability. While existing methods are based on machine learning or enthalpic considerations, the use of ENCoM, based on vibrational normal modes, is based on entropic considerations. This represents a novel area of application for NMA methods and a novel approach for the prediction of the effect of mutations. We compare ENCoM to a large number of methods in terms of accuracy and self-consistency. We show that the accuracy of ENCoM is comparable to that of the best existing methods. We show that existing methods are biased towards the prediction of destabilizing mutations and that ENCoM is less biased at predicting stabilizing mutations. PMID:24762569

  3. Selective testing strategies for diagnosing group A streptococcal infection in children with pharyngitis: a systematic review and prospective multicentre external validation study

    PubMed Central

    Cohen, Jérémie F.; Cohen, Robert; Levy, Corinne; Thollot, Franck; Benani, Mohamed; Bidet, Philippe; Chalumeau, Martin

    2015-01-01

    Background: Several clinical prediction rules for diagnosing group A streptococcal infection in children with pharyngitis are available. We aimed to compare the diagnostic accuracy of rules-based selective testing strategies in a prospective cohort of children with pharyngitis. Methods: We identified clinical prediction rules through a systematic search of MEDLINE and Embase (1975–2014), which we then validated in a prospective cohort involving French children who presented with pharyngitis during a 1-year period (2010–2011). We diagnosed infection with group A streptococcus using two throat swabs: one obtained for a rapid antigen detection test (StreptAtest, Dectrapharm) and one obtained for culture (reference standard). We validated rules-based selective testing strategies as follows: low risk of group A streptococcal infection, no further testing or antibiotic therapy needed; intermediate risk of infection, rapid antigen detection for all patients and antibiotic therapy for those with a positive test result; and high risk of infection, empiric antibiotic treatment. Results: We identified 8 clinical prediction rules, 6 of which could be prospectively validated. Sensitivity and specificity of rules-based selective testing strategies ranged from 66% (95% confidence interval [CI] 61–72) to 94% (95% CI 92–97) and from 40% (95% CI 35–45) to 88% (95% CI 85–91), respectively. Use of rapid antigen detection testing following the clinical prediction rule ranged from 24% (95% CI 21–27) to 86% (95% CI 84–89). None of the rules-based selective testing strategies achieved our diagnostic accuracy target (sensitivity and specificity > 85%). Interpretation: Rules-based selective testing strategies did not show sufficient diagnostic accuracy in this study population. The relevance of clinical prediction rules for determining which children with pharyngitis should undergo a rapid antigen detection test remains questionable. PMID:25487666

  4. Predicting one repetition maximum equations accuracy in paralympic rowers with motor disabilities.

    PubMed

    Schwingel, Paulo A; Porto, Yuri C; Dias, Marcelo C M; Moreira, Mônica M; Zoppi, Cláudio C

    2009-05-01

    Predicting one repetition maximum equations accuracy in paralympic rowers Resistance training intensity is prescribed using percentiles of the maximum strength, defined as the maximum tension generated for a muscle or muscular group. This value is found through the application of the one maximal repetition (1RM) test. One maximal repetition test demands time and still is not appropriate for some populations because of the risk it offers. In recent years, the prediction of maximal strength, through predicting equations, has been used to prevent the inconveniences of the 1RM test. The purpose of this study was to verify the accuracy of 12 1RM predicting equations for disabled rowers. Nine male paralympic rowers (7 one-leg amputated rowers and 2 cerebral paralyzed rowers; age, 30 +/- 7.9 years; height, 175.1 +/- 5.9 cm; weight, 69 +/- 13.6 kg) performed 1RM test for lying T-bar row and flat barbell bench press exercises to determine upper-body strength and leg press exercise to determine lower-body strength. One maximal repetition test was performed, and based on submaximal repetitions loads, several linear and exponential equations models were tested with regard of their accuracy. We did not find statistical differences for lying T-bar row and bench press exercises between measured and predicted 1RM values (p = 0.84 and 0.23 for lying T-bar row and flat barbell bench press, respectively); however, leg press exercise reached a high significant difference between measured and predicted values (p < 0.01). In conclusion, rowers with motor disabilities tolerate 1RM testing procedures, and predicting 1RM equations are accurate for bench press and lying T-bar row, but not for leg press, in this kind of athlete.

  5. A generic approach for the development of short-term predictions of Escherichia coli and biotoxins in shellfish

    PubMed Central

    Schmidt, Wiebke; Evers-King, Hayley L.; Campos, Carlos J. A.; Jones, Darren B.; Miller, Peter I.; Davidson, Keith; Shutler, Jamie D.

    2018-01-01

    Microbiological contamination or elevated marine biotoxin concentrations within shellfish can result in temporary closure of shellfish aquaculture harvesting, leading to financial loss for the aquaculture business and a potential reduction in consumer confidence in shellfish products. We present a method for predicting short-term variations in shellfish concentrations of Escherichia coli and biotoxin (okadaic acid and its derivates dinophysistoxins and pectenotoxins). The approach was evaluated for 2 contrasting shellfish harvesting areas. Through a meta-data analysis and using environmental data (in situ, satellite observations and meteorological nowcasts and forecasts), key environmental drivers were identified and used to develop models to predict E. coli and biotoxin concentrations within shellfish. Models were trained and evaluated using independent datasets, and the best models were identified based on the model exhibiting the lowest root mean square error. The best biotoxin model was able to provide 1 wk forecasts with an accuracy of 86%, a 0% false positive rate and a 0% false discovery rate (n = 78 observations) when used to predict the closure of shellfish beds due to biotoxin. The best E. coli models were used to predict the European hygiene classification of the shellfish beds to an accuracy of 99% (n = 107 observations) and 98% (n = 63 observations) for a bay (St Austell Bay) and an estuary (Turnaware Bar), respectively. This generic approach enables high accuracy short-term farm-specific forecasts, based on readily accessible environmental data and observations. PMID:29805719

  6. Transmembrane protein topology prediction using support vector machines.

    PubMed

    Nugent, Timothy; Jones, David T

    2009-05-26

    Alpha-helical transmembrane (TM) proteins are involved in a wide range of important biological processes such as cell signaling, transport of membrane-impermeable molecules, cell-cell communication, cell recognition and cell adhesion. Many are also prime drug targets, and it has been estimated that more than half of all drugs currently on the market target membrane proteins. However, due to the experimental difficulties involved in obtaining high quality crystals, this class of protein is severely under-represented in structural databases. In the absence of structural data, sequence-based prediction methods allow TM protein topology to be investigated. We present a support vector machine-based (SVM) TM protein topology predictor that integrates both signal peptide and re-entrant helix prediction, benchmarked with full cross-validation on a novel data set of 131 sequences with known crystal structures. The method achieves topology prediction accuracy of 89%, while signal peptides and re-entrant helices are predicted with 93% and 44% accuracy respectively. An additional SVM trained to discriminate between globular and TM proteins detected zero false positives, with a low false negative rate of 0.4%. We present the results of applying these tools to a number of complete genomes. Source code, data sets and a web server are freely available from http://bioinf.cs.ucl.ac.uk/psipred/. The high accuracy of TM topology prediction which includes detection of both signal peptides and re-entrant helices, combined with the ability to effectively discriminate between TM and globular proteins, make this method ideally suited to whole genome annotation of alpha-helical transmembrane proteins.

  7. Assessment of MRI-Based Automated Fetal Cerebral Cortical Folding Measures in Prediction of Gestational Age in the Third Trimester.

    PubMed

    Wu, J; Awate, S P; Licht, D J; Clouchoux, C; du Plessis, A J; Avants, B B; Vossough, A; Gee, J C; Limperopoulos, C

    2015-07-01

    Traditional methods of dating a pregnancy based on history or sonographic assessment have a large variation in the third trimester. We aimed to assess the ability of various quantitative measures of brain cortical folding on MR imaging in determining fetal gestational age in the third trimester. We evaluated 8 different quantitative cortical folding measures to predict gestational age in 33 healthy fetuses by using T2-weighted fetal MR imaging. We compared the accuracy of the prediction of gestational age by these cortical folding measures with the accuracy of prediction by brain volume measurement and by a previously reported semiquantitative visual scale of brain maturity. Regression models were constructed, and measurement biases and variances were determined via a cross-validation procedure. The cortical folding measures are accurate in the estimation and prediction of gestational age (mean of the absolute error, 0.43 ± 0.45 weeks) and perform better than (P = .024) brain volume (mean of the absolute error, 0.72 ± 0.61 weeks) or sonography measures (SDs approximately 1.5 weeks, as reported in literature). Prediction accuracy is comparable with that of the semiquantitative visual assessment score (mean, 0.57 ± 0.41 weeks). Quantitative cortical folding measures such as global average curvedness can be an accurate and reliable estimator of gestational age and brain maturity for healthy fetuses in the third trimester and have the potential to be an indicator of brain-growth delays for at-risk fetuses and preterm neonates. © 2015 by American Journal of Neuroradiology.

  8. An ensemble approach to protein fold classification by integration of template-based assignment and support vector machine classifier.

    PubMed

    Xia, Jiaqi; Peng, Zhenling; Qi, Dawei; Mu, Hongbo; Yang, Jianyi

    2017-03-15

    Protein fold classification is a critical step in protein structure prediction. There are two possible ways to classify protein folds. One is through template-based fold assignment and the other is ab-initio prediction using machine learning algorithms. Combination of both solutions to improve the prediction accuracy was never explored before. We developed two algorithms, HH-fold and SVM-fold for protein fold classification. HH-fold is a template-based fold assignment algorithm using the HHsearch program. SVM-fold is a support vector machine-based ab-initio classification algorithm, in which a comprehensive set of features are extracted from three complementary sequence profiles. These two algorithms are then combined, resulting to the ensemble approach TA-fold. We performed a comprehensive assessment for the proposed methods by comparing with ab-initio methods and template-based threading methods on six benchmark datasets. An accuracy of 0.799 was achieved by TA-fold on the DD dataset that consists of proteins from 27 folds. This represents improvement of 5.4-11.7% over ab-initio methods. After updating this dataset to include more proteins in the same folds, the accuracy increased to 0.971. In addition, TA-fold achieved >0.9 accuracy on a large dataset consisting of 6451 proteins from 184 folds. Experiments on the LE dataset show that TA-fold consistently outperforms other threading methods at the family, superfamily and fold levels. The success of TA-fold is attributed to the combination of template-based fold assignment and ab-initio classification using features from complementary sequence profiles that contain rich evolution information. http://yanglab.nankai.edu.cn/TA-fold/. yangjy@nankai.edu.cn or mhb-506@163.com. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  9. Quantitative AOP-based predictions for two aromatase inhibitors evaluating the influence of bioaccumulation on prediction accuracy

    EPA Science Inventory

    The adverse outcome pathway (AOP) framework can be used to support the use of mechanistic toxicology data as a basis for risk assessment. For certain risk contexts this includes defining, quantitative linkages between the molecular initiating event (MIE) and subsequent key events...

  10. Prediction of pesticide acute toxicity using two-dimensional chemical descriptors and target species classification

    EPA Science Inventory

    Previous modelling of the median lethal dose (oral rat LD50) has indicated that local class-based models yield better correlations than global models. We evaluated the hypothesis that dividing the dataset by pesticidal mechanisms would improve prediction accuracy. A linear discri...

  11. Signal Detection Theory as a Tool for Successful Student Selection

    ERIC Educational Resources Information Center

    van Ooijen-van der Linden, Linda; van der Smagt, Maarten J.; Woertman, Liesbeth; te Pas, Susan F.

    2017-01-01

    Prediction accuracy of academic achievement for admission purposes requires adequate "sensitivity" and "specificity" of admission tools, yet the available information on the validity and predictive power of admission tools is largely based on studies using correlational and regression statistics. The goal of this study was to…

  12. Predictive models for Escherichia coli concentrations at inland lake beaches and relationship of model variables to pathogen detection

    EPA Science Inventory

    Methods are needed improve the timeliness and accuracy of recreational water‐quality assessments. Traditional culture methods require 18–24 h to obtain results and may not reflect current conditions. Predictive models, based on environmental and water quality variables, have been...

  13. The Comparative Accuracy of Two Hydrologic Models in Simulating Warm-Season Runoff for Two Small, Hillslope Catchments

    EPA Science Inventory

    Runoff prediction is a cornerstone of water resources planning, and therefore modeling performance is a key issue. This paper investigates the comparative advantages of conceptual versus process- based models in predicting warm season runoff for upland, low-yield micro-catchments...

  14. Improving lung cancer prognosis assessment by incorporating synthetic minority oversampling technique and score fusion method

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yan, Shiju; Qian, Wei; Guan, Yubao

    2016-06-15

    Purpose: This study aims to investigate the potential to improve lung cancer recurrence risk prediction performance for stage I NSCLS patients by integrating oversampling, feature selection, and score fusion techniques and develop an optimal prediction model. Methods: A dataset involving 94 early stage lung cancer patients was retrospectively assembled, which includes CT images, nine clinical and biological (CB) markers, and outcome of 3-yr disease-free survival (DFS) after surgery. Among the 94 patients, 74 remained DFS and 20 had cancer recurrence. Applying a computer-aided detection scheme, tumors were segmented from the CT images and 35 quantitative image (QI) features were initiallymore » computed. Two normalized Gaussian radial basis function network (RBFN) based classifiers were built based on QI features and CB markers separately. To improve prediction performance, the authors applied a synthetic minority oversampling technique (SMOTE) and a BestFirst based feature selection method to optimize the classifiers and also tested fusion methods to combine QI and CB based prediction results. Results: Using a leave-one-case-out cross-validation (K-fold cross-validation) method, the computed areas under a receiver operating characteristic curve (AUCs) were 0.716 ± 0.071 and 0.642 ± 0.061, when using the QI and CB based classifiers, respectively. By fusion of the scores generated by the two classifiers, AUC significantly increased to 0.859 ± 0.052 (p < 0.05) with an overall prediction accuracy of 89.4%. Conclusions: This study demonstrated the feasibility of improving prediction performance by integrating SMOTE, feature selection, and score fusion techniques. Combining QI features and CB markers and performing SMOTE prior to feature selection in classifier training enabled RBFN based classifier to yield improved prediction accuracy.« less

  15. Accuracy of genomic breeding values for meat tenderness in Polled Nellore cattle.

    PubMed

    Magnabosco, C U; Lopes, F B; Fragoso, R C; Eifert, E C; Valente, B D; Rosa, G J M; Sainz, R D

    2016-07-01

    Zebu () cattle, mostly of the Nellore breed, comprise more than 80% of the beef cattle in Brazil, given their tolerance of the tropical climate and high resistance to ectoparasites. Despite their advantages for production in tropical environments, zebu cattle tend to produce tougher meat than Bos taurus breeds. Traditional genetic selection to improve meat tenderness is constrained by the difficulty and cost of phenotypic evaluation for meat quality. Therefore, genomic selection may be the best strategy to improve meat quality traits. This study was performed to compare the accuracies of different Bayesian regression models in predicting molecular breeding values for meat tenderness in Polled Nellore cattle. The data set was composed of Warner-Bratzler shear force (WBSF) of longissimus muscle from 205, 141, and 81 animals slaughtered in 2005, 2010, and 2012, respectively, which were selected and mated so as to create extreme segregation for WBSF. The animals were genotyped with either the Illumina BovineHD (HD; 777,000 from 90 samples) chip or the GeneSeek Genomic Profiler (GGP Indicus HD; 77,000 from 337 samples). The quality controls of SNP were Hard-Weinberg Proportion -value ≥ 0.1%, minor allele frequency > 1%, and call rate > 90%. The FImpute program was used for imputation from the GGP Indicus HD chip to the HD chip. The effect of each SNP was estimated using ridge regression, least absolute shrinkage and selection operator (LASSO), Bayes A, Bayes B, and Bayes Cπ methods. Different numbers of SNP were used, with 1, 2, 3, 4, 5, 7, 10, 20, 40, 60, 80, or 100% of the markers preselected based on their significance test (-value from genomewide association studies [GWAS]) or randomly sampled. The prediction accuracy was assessed by the correlation between genomic breeding value and the observed WBSF phenotype, using a leave-one-out cross-validation methodology. The prediction accuracies using all markers were all very similar for all models, ranging from 0.22 (Bayes Cπ) to 0.25 (Bayes B). When preselecting SNP based on GWAS results, the highest correlation (0.27) between WBSF and the genomic breeding value was achieved using the Bayesian LASSO model with 15,030 (3%) markers. Although this study used relatively few animals, the design of the segregating population ensured wide genetic variability for meat tenderness, which was important to achieve acceptable accuracy of genomic prediction. Although all models showed similar levels of prediction accuracy, some small advantages were observed with the Bayes B approach when higher numbers of markers were preselected based on their -values resulting from a GWAS analysis.

  16. CSmetaPred: a consensus method for prediction of catalytic residues.

    PubMed

    Choudhary, Preeti; Kumar, Shailesh; Bachhawat, Anand Kumar; Pandit, Shashi Bhushan

    2017-12-22

    Knowledge of catalytic residues can play an essential role in elucidating mechanistic details of an enzyme. However, experimental identification of catalytic residues is a tedious and time-consuming task, which can be expedited by computational predictions. Despite significant development in active-site prediction methods, one of the remaining issues is ranked positions of putative catalytic residues among all ranked residues. In order to improve ranking of catalytic residues and their prediction accuracy, we have developed a meta-approach based method CSmetaPred. In this approach, residues are ranked based on the mean of normalized residue scores derived from four well-known catalytic residue predictors. The mean residue score of CSmetaPred is combined with predicted pocket information to improve prediction performance in meta-predictor, CSmetaPred_poc. Both meta-predictors are evaluated on two comprehensive benchmark datasets and three legacy datasets using Receiver Operating Characteristic (ROC) and Precision Recall (PR) curves. The visual and quantitative analysis of ROC and PR curves shows that meta-predictors outperform their constituent methods and CSmetaPred_poc is the best of evaluated methods. For instance, on CSAMAC dataset CSmetaPred_poc (CSmetaPred) achieves highest Mean Average Specificity (MAS), a scalar measure for ROC curve, of 0.97 (0.96). Importantly, median predicted rank of catalytic residues is the lowest (best) for CSmetaPred_poc. Considering residues ranked ≤20 classified as true positive in binary classification, CSmetaPred_poc achieves prediction accuracy of 0.94 on CSAMAC dataset. Moreover, on the same dataset CSmetaPred_poc predicts all catalytic residues within top 20 ranks for ~73% of enzymes. Furthermore, benchmarking of prediction on comparative modelled structures showed that models result in better prediction than only sequence based predictions. These analyses suggest that CSmetaPred_poc is able to rank putative catalytic residues at lower (better) ranked positions, which can facilitate and expedite their experimental characterization. The benchmarking studies showed that employing meta-approach in combining residue-level scores derived from well-known catalytic residue predictors can improve prediction accuracy as well as provide improved ranked positions of known catalytic residues. Hence, such predictions can assist experimentalist to prioritize residues for mutational studies in their efforts to characterize catalytic residues. Both meta-predictors are available as webserver at: http://14.139.227.206/csmetapred/ .

  17. Genomic Prediction of Gene Bank Wheat Landraces.

    PubMed

    Crossa, José; Jarquín, Diego; Franco, Jorge; Pérez-Rodríguez, Paulino; Burgueño, Juan; Saint-Pierre, Carolina; Vikram, Prashant; Sansaloni, Carolina; Petroli, Cesar; Akdemir, Deniz; Sneller, Clay; Reynolds, Matthew; Tattaris, Maria; Payne, Thomas; Guzman, Carlos; Peña, Roberto J; Wenzl, Peter; Singh, Sukhwinder

    2016-07-07

    This study examines genomic prediction within 8416 Mexican landrace accessions and 2403 Iranian landrace accessions stored in gene banks. The Mexican and Iranian collections were evaluated in separate field trials, including an optimum environment for several traits, and in two separate environments (drought, D and heat, H) for the highly heritable traits, days to heading (DTH), and days to maturity (DTM). Analyses accounting and not accounting for population structure were performed. Genomic prediction models include genotype × environment interaction (G × E). Two alternative prediction strategies were studied: (1) random cross-validation of the data in 20% training (TRN) and 80% testing (TST) (TRN20-TST80) sets, and (2) two types of core sets, "diversity" and "prediction", including 10% and 20%, respectively, of the total collections. Accounting for population structure decreased prediction accuracy by 15-20% as compared to prediction accuracy obtained when not accounting for population structure. Accounting for population structure gave prediction accuracies for traits evaluated in one environment for TRN20-TST80 that ranged from 0.407 to 0.677 for Mexican landraces, and from 0.166 to 0.662 for Iranian landraces. Prediction accuracy of the 20% diversity core set was similar to accuracies obtained for TRN20-TST80, ranging from 0.412 to 0.654 for Mexican landraces, and from 0.182 to 0.647 for Iranian landraces. The predictive core set gave similar prediction accuracy as the diversity core set for Mexican collections, but slightly lower for Iranian collections. Prediction accuracy when incorporating G × E for DTH and DTM for Mexican landraces for TRN20-TST80 was around 0.60, which is greater than without the G × E term. For Iranian landraces, accuracies were 0.55 for the G × E model with TRN20-TST80. Results show promising prediction accuracies for potential use in germplasm enhancement and rapid introgression of exotic germplasm into elite materials. Copyright © 2016 Crossa et al.

  18. Predicting volume of distribution with decision tree-based regression methods using predicted tissue:plasma partition coefficients.

    PubMed

    Freitas, Alex A; Limbu, Kriti; Ghafourian, Taravat

    2015-01-01

    Volume of distribution is an important pharmacokinetic property that indicates the extent of a drug's distribution in the body tissues. This paper addresses the problem of how to estimate the apparent volume of distribution at steady state (Vss) of chemical compounds in the human body using decision tree-based regression methods from the area of data mining (or machine learning). Hence, the pros and cons of several different types of decision tree-based regression methods have been discussed. The regression methods predict Vss using, as predictive features, both the compounds' molecular descriptors and the compounds' tissue:plasma partition coefficients (Kt:p) - often used in physiologically-based pharmacokinetics. Therefore, this work has assessed whether the data mining-based prediction of Vss can be made more accurate by using as input not only the compounds' molecular descriptors but also (a subset of) their predicted Kt:p values. Comparison of the models that used only molecular descriptors, in particular, the Bagging decision tree (mean fold error of 2.33), with those employing predicted Kt:p values in addition to the molecular descriptors, such as the Bagging decision tree using adipose Kt:p (mean fold error of 2.29), indicated that the use of predicted Kt:p values as descriptors may be beneficial for accurate prediction of Vss using decision trees if prior feature selection is applied. Decision tree based models presented in this work have an accuracy that is reasonable and similar to the accuracy of reported Vss inter-species extrapolations in the literature. The estimation of Vss for new compounds in drug discovery will benefit from methods that are able to integrate large and varied sources of data and flexible non-linear data mining methods such as decision trees, which can produce interpretable models. Graphical AbstractDecision trees for the prediction of tissue partition coefficient and volume of distribution of drugs.

  19. Toward accurate prediction of pKa values for internal protein residues: the importance of conformational relaxation and desolvation energy.

    PubMed

    Wallace, Jason A; Wang, Yuhang; Shi, Chuanyin; Pastoor, Kevin J; Nguyen, Bao-Linh; Xia, Kai; Shen, Jana K

    2011-12-01

    Proton uptake or release controls many important biological processes, such as energy transduction, virus replication, and catalysis. Accurate pK(a) prediction informs about proton pathways, thereby revealing detailed acid-base mechanisms. Physics-based methods in the framework of molecular dynamics simulations not only offer pK(a) predictions but also inform about the physical origins of pK(a) shifts and provide details of ionization-induced conformational relaxation and large-scale transitions. One such method is the recently developed continuous constant pH molecular dynamics (CPHMD) method, which has been shown to be an accurate and robust pK(a) prediction tool for naturally occurring titratable residues. To further examine the accuracy and limitations of CPHMD, we blindly predicted the pK(a) values for 87 titratable residues introduced in various hydrophobic regions of staphylococcal nuclease and variants. The predictions gave a root-mean-square deviation of 1.69 pK units from experiment, and there were only two pK(a)'s with errors greater than 3.5 pK units. Analysis of the conformational fluctuation of titrating side-chains in the context of the errors of calculated pK(a) values indicate that explicit treatment of conformational flexibility and the associated dielectric relaxation gives CPHMD a distinct advantage. Analysis of the sources of errors suggests that more accurate pK(a) predictions can be obtained for the most deeply buried residues by improving the accuracy in calculating desolvation energies. Furthermore, it is found that the generalized Born implicit-solvent model underlying the current CPHMD implementation slightly distorts the local conformational environment such that the inclusion of an explicit-solvent representation may offer improvement of accuracy. Copyright © 2011 Wiley-Liss, Inc.

  20. A general strategy for performing temperature-programming in high performance liquid chromatography--further improvements in the accuracy of retention time predictions of segmented temperature gradients.

    PubMed

    Wiese, Steffen; Teutenberg, Thorsten; Schmidt, Torsten C

    2012-01-27

    In the present work it is shown that the linear elution strength (LES) model which was adapted from temperature-programming gas chromatography (GC) can also be employed for systematic method development in high-temperature liquid chromatography (HT-HPLC). The ability to predict isothermal retention times based on temperature-gradient as well as isothermal input data was investigated. For a small temperature interval of ΔT=40°C, both approaches result in very similar predictions. Average relative errors of predicted retention times of 2.7% and 1.9% were observed for simulations based on isothermal and temperature-gradient measurements, respectively. Concurrently, it was investigated whether the accuracy of retention time predictions of segmented temperature gradients can be further improved by temperature dependent calculation of the parameter S(T) of the LES relationship. It was found that the accuracy of retention time predictions of multi-step temperature gradients can be improved to around 1.5%, if S(T) was also calculated temperature dependent. The adjusted experimental design making use of four temperature-gradient measurements was applied for systematic method development of selected food additives by high-temperature liquid chromatography. Method development was performed within a temperature interval from 40°C to 180°C using water as mobile phase. Two separation methods were established where selected food additives were baseline separated. In addition, a good agreement between simulation and experiment was observed, because an average relative error of predicted retention times of complex segmented temperature gradients less than 5% was observed. Finally, a schedule of recommendations to assist the practitioner during systematic method development in high-temperature liquid chromatography was established. Copyright © 2011 Elsevier B.V. All rights reserved.

  1. Test battery with the human cell line activation test, direct peptide reactivity assay and DEREK based on a 139 chemical data set for predicting skin sensitizing potential and potency of chemicals.

    PubMed

    Takenouchi, Osamu; Fukui, Shiho; Okamoto, Kenji; Kurotani, Satoru; Imai, Noriyasu; Fujishiro, Miyuki; Kyotani, Daiki; Kato, Yoshinao; Kasahara, Toshihiko; Fujita, Masaharu; Toyoda, Akemi; Sekiya, Daisuke; Watanabe, Shinichi; Seto, Hirokazu; Hirota, Morihiko; Ashikaga, Takao; Miyazawa, Masaaki

    2015-11-01

    To develop a testing strategy incorporating the human cell line activation test (h-CLAT), direct peptide reactivity assay (DPRA) and DEREK, we created an expanded data set of 139 chemicals (102 sensitizers and 37 non-sensitizers) by combining the existing data set of 101 chemicals through the collaborative projects of Japan Cosmetic Industry Association. Of the additional 38 chemicals, 15 chemicals with relatively low water solubility (log Kow > 3.5) were selected to clarify the limitation of testing strategies regarding the lipophilic chemicals. Predictivities of the h-CLAT, DPRA and DEREK, and the combinations thereof were evaluated by comparison to results of the local lymph node assay. When evaluating 139 chemicals using combinations of three methods based on integrated testing strategy (ITS) concept (ITS-based test battery) and a sequential testing strategy (STS) weighing the predictive performance of the h-CLAT and DPRA, overall similar predictivities were found as before on the 101 chemical data set. An analysis of false negative chemicals suggested a major limitation of our strategies was the testing of low water-soluble chemicals. When excluded the negative results for chemicals with log Kow > 3.5, the sensitivity and accuracy of ITS improved to 97% (91 of 94 chemicals) and 89% (114 of 128). Likewise, the sensitivity and accuracy of STS to 98% (92 of 94) and 85% (111 of 129). Moreover, the ITS and STS also showed good correlation with local lymph node assay on three potency classifications, yielding accuracies of 74% (ITS) and 73% (STS). Thus, the inclusion of log Kow in analysis could give both strategies a higher predictive performance. Copyright © 2015 John Wiley & Sons, Ltd.

  2. Filter Tuning Using the Chi-Squared Statistic

    NASA Technical Reports Server (NTRS)

    Lilly-Salkowski, Tyler B.

    2017-01-01

    This paper examines the use of the Chi-square statistic as a means of evaluating filter performance. The goal of the process is to characterize the filter performance in the metric of covariance realism. The Chi-squared statistic is the value calculated to determine the realism of a covariance based on the prediction accuracy and the covariance values at a given point in time. Once calculated, it is the distribution of this statistic that provides insight on the accuracy of the covariance. The process of tuning an Extended Kalman Filter (EKF) for Aqua and Aura support is described, including examination of the measurement errors of available observation types, and methods of dealing with potentially volatile atmospheric drag modeling. Predictive accuracy and the distribution of the Chi-squared statistic, calculated from EKF solutions, are assessed.

  3. An ensemble framework for identifying essential proteins.

    PubMed

    Zhang, Xue; Xiao, Wangxin; Acencio, Marcio Luis; Lemke, Ney; Wang, Xujing

    2016-08-25

    Many centrality measures have been proposed to mine and characterize the correlations between network topological properties and protein essentiality. However, most of them show limited prediction accuracy, and the number of common predicted essential proteins by different methods is very small. In this paper, an ensemble framework is proposed which integrates gene expression data and protein-protein interaction networks (PINs). It aims to improve the prediction accuracy of basic centrality measures. The idea behind this ensemble framework is that different protein-protein interactions (PPIs) may show different contributions to protein essentiality. Five standard centrality measures (degree centrality, betweenness centrality, closeness centrality, eigenvector centrality, and subgraph centrality) are integrated into the ensemble framework respectively. We evaluated the performance of the proposed ensemble framework using yeast PINs and gene expression data. The results show that it can considerably improve the prediction accuracy of the five centrality measures individually. It can also remarkably increase the number of common predicted essential proteins among those predicted by each centrality measure individually and enable each centrality measure to find more low-degree essential proteins. This paper demonstrates that it is valuable to differentiate the contributions of different PPIs for identifying essential proteins based on network topological characteristics. The proposed ensemble framework is a successful paradigm to this end.

  4. Development and evaluation of a regression-based model to predict cesium-137 concentration ratios for saltwater fish.

    PubMed

    Pinder, John E; Rowan, David J; Smith, Jim T

    2016-02-01

    Data from published studies and World Wide Web sources were combined to develop a regression model to predict (137)Cs concentration ratios for saltwater fish. Predictions were developed from 1) numeric trophic levels computed primarily from random resampling of known food items and 2) K concentrations in the saltwater for 65 samplings from 41 different species from both the Atlantic and Pacific Oceans. A number of different models were initially developed and evaluated for accuracy which was assessed as the ratios of independently measured concentration ratios to those predicted by the model. In contrast to freshwater systems, were K concentrations are highly variable and are an important factor in affecting fish concentration ratios, the less variable K concentrations in saltwater were relatively unimportant in affecting concentration ratios. As a result, the simplest model, which used only trophic level as a predictor, had comparable accuracies to more complex models that also included K concentrations. A test of model accuracy involving comparisons of 56 published concentration ratios from 51 species of marine fish to those predicted by the model indicated that 52 of the predicted concentration ratios were within a factor of 2 of the observed concentration ratios. Copyright © 2015 Elsevier Ltd. All rights reserved.

  5. Threshold models for genome-enabled prediction of ordinal categorical traits in plant breeding.

    PubMed

    Montesinos-López, Osval A; Montesinos-López, Abelardo; Pérez-Rodríguez, Paulino; de Los Campos, Gustavo; Eskridge, Kent; Crossa, José

    2014-12-23

    Categorical scores for disease susceptibility or resistance often are recorded in plant breeding. The aim of this study was to introduce genomic models for analyzing ordinal characters and to assess the predictive ability of genomic predictions for ordered categorical phenotypes using a threshold model counterpart of the Genomic Best Linear Unbiased Predictor (i.e., TGBLUP). The threshold model was used to relate a hypothetical underlying scale to the outward categorical response. We present an empirical application where a total of nine models, five without interaction and four with genomic × environment interaction (G×E) and genomic additive × additive × environment interaction (G×G×E), were used. We assessed the proposed models using data consisting of 278 maize lines genotyped with 46,347 single-nucleotide polymorphisms and evaluated for disease resistance [with ordinal scores from 1 (no disease) to 5 (complete infection)] in three environments (Colombia, Zimbabwe, and Mexico). Models with G×E captured a sizeable proportion of the total variability, which indicates the importance of introducing interaction to improve prediction accuracy. Relative to models based on main effects only, the models that included G×E achieved 9-14% gains in prediction accuracy; adding additive × additive interactions did not increase prediction accuracy consistently across locations. Copyright © 2015 Montesinos-López et al.

  6. Predicting Sargassum blooms in the Caribbean Sea from MODIS observations

    NASA Astrophysics Data System (ADS)

    Wang, Mengqiu; Hu, Chuanmin

    2017-04-01

    Recurrent and significant Sargassum beaching events in the Caribbean Sea (CS) have caused serious environmental and economic problems, calling for a long-term prediction capacity of Sargassum blooms. Here we present predictions based on a hindcast of 2000-2016 observations from Moderate Resolution Imaging Spectroradiometer (MODIS), which showed Sargassum abundance in the CS and the Central West Atlantic (CWA), as well as connectivity between the two regions with time lags. This information was used to derive bloom and nonbloom probability matrices for each 1° square in the CS for the months of May-August, predicted from bloom conditions in a hotspot region in the CWA in February. A suite of standard statistical measures were used to gauge the prediction accuracy, among which the user's accuracy and kappa statistics showed high fidelity of the probability maps in predicting both blooms and nonblooms in the eastern CS with several months of lead time, with overall accuracy often exceeding 80%. The bloom probability maps from this hindcast analysis will provide early warnings to better study Sargassum blooms and prepare for beaching events near the study region. This approach may also be extendable to many other regions around the world that face similar challenges and opportunities of macroalgal blooms and beaching events.

  7. High accuracy prediction of beta-turns and their types using propensities and multiple alignments.

    PubMed

    Fuchs, Patrick F J; Alix, Alain J P

    2005-06-01

    We have developed a method that predicts both the presence and the type of beta-turns, using a straightforward approach based on propensities and multiple alignments. The propensities were calculated classically, but the way to use them for prediction was completely new: starting from a tetrapeptide sequence on which one wants to evaluate the presence of a beta-turn, the propensity for a given residue is modified by taking into account all the residues present in the multiple alignment at this position. The evaluation of a score is then done by weighting these propensities by the use of Position-specific score matrices generated by PSI-BLAST. The introduction of secondary structure information predicted by PSIPRED or SSPRO2 as well as taking into account the flanking residues around the tetrapeptide improved the accuracy greatly. This latter evaluated on a database of 426 reference proteins (previously used on other studies) by a sevenfold crossvalidation gave very good results with a Matthews Correlation Coefficient (MCC) of 0.42 and an overall prediction accuracy of 74.8%; this places our method among the best ones. A jackknife test was also done, which gave results within the same range. This shows that it is possible to reach neural networks accuracy with considerably less computional cost and complexity. Furthermore, propensities remain excellent descriptors of amino acid tendencies to belong to beta-turns, which can be useful for peptide or protein engineering and design. For beta-turn type prediction, we reached the best accuracy ever published in terms of MCC (except for the irregular type IV) in the range of 0.25-0.30 for types I, II, and I' and 0.13-0.15 for types VIII, II', and IV. To our knowledge, our method is the only one available on the Web that predicts types I' and II'. The accuracy evaluated on two larger databases of 547 and 823 proteins was not improved significantly. All of this was implemented into a Web server called COUDES (French acronym for: Chercher Ou Une Deviation Existe Surement), which is available at the following URL: http://bioserv.rpbs.jussieu.fr/Coudes/index.html within the new bioinformatics platform RPBS.

  8. TH-CD-207A-07: Prediction of High Dimensional State Subject to Respiratory Motion: A Manifold Learning Approach

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liu, W; Sawant, A; Ruan, D

    Purpose: The development of high dimensional imaging systems (e.g. volumetric MRI, CBCT, photogrammetry systems) in image-guided radiotherapy provides important pathways to the ultimate goal of real-time volumetric/surface motion monitoring. This study aims to develop a prediction method for the high dimensional state subject to respiratory motion. Compared to conventional linear dimension reduction based approaches, our method utilizes manifold learning to construct a descriptive feature submanifold, where more efficient and accurate prediction can be performed. Methods: We developed a prediction framework for high-dimensional state subject to respiratory motion. The proposed method performs dimension reduction in a nonlinear setting to permit moremore » descriptive features compared to its linear counterparts (e.g., classic PCA). Specifically, a kernel PCA is used to construct a proper low-dimensional feature manifold, where low-dimensional prediction is performed. A fixed-point iterative pre-image estimation method is applied subsequently to recover the predicted value in the original state space. We evaluated and compared the proposed method with PCA-based method on 200 level-set surfaces reconstructed from surface point clouds captured by the VisionRT system. The prediction accuracy was evaluated with respect to root-mean-squared-error (RMSE) for both 200ms and 600ms lookahead lengths. Results: The proposed method outperformed PCA-based approach with statistically higher prediction accuracy. In one-dimensional feature subspace, our method achieved mean prediction accuracy of 0.86mm and 0.89mm for 200ms and 600ms lookahead lengths respectively, compared to 0.95mm and 1.04mm from PCA-based method. The paired t-tests further demonstrated the statistical significance of the superiority of our method, with p-values of 6.33e-3 and 5.78e-5, respectively. Conclusion: The proposed approach benefits from the descriptiveness of a nonlinear manifold and the prediction reliability in such low dimensional manifold. The fixed-point iterative approach turns out to work well practically for the pre-image recovery. Our approach is particularly suitable to facilitate managing respiratory motion in image-guide radiotherapy. This work is supported in part by NIH grant R01 CA169102-02.« less

  9. Consistent prediction of GO protein localization.

    PubMed

    Spetale, Flavio E; Arce, Debora; Krsticevic, Flavia; Bulacio, Pilar; Tapia, Elizabeth

    2018-05-17

    The GO-Cellular Component (GO-CC) ontology provides a controlled vocabulary for the consistent description of the subcellular compartments or macromolecular complexes where proteins may act. Current machine learning-based methods used for the automated GO-CC annotation of proteins suffer from the inconsistency of individual GO-CC term predictions. Here, we present FGGA-CC + , a class of hierarchical graph-based classifiers for the consistent GO-CC annotation of protein coding genes at the subcellular compartment or macromolecular complex levels. Aiming to boost the accuracy of GO-CC predictions, we make use of the protein localization knowledge in the GO-Biological Process (GO-BP) annotations to boost the accuracy of GO-CC prediction. As a result, FGGA-CC + classifiers are built from annotation data in both the GO-CC and GO-BP ontologies. Due to their graph-based design, FGGA-CC + classifiers are fully interpretable and their predictions amenable to expert analysis. Promising results on protein annotation data from five model organisms were obtained. Additionally, successful validation results in the annotation of a challenging subset of tandem duplicated genes in the tomato non-model organism were accomplished. Overall, these results suggest that FGGA-CC + classifiers can indeed be useful for satisfying the huge demand of GO-CC annotation arising from ubiquitous high throughout sequencing and proteomic projects.

  10. Improving accuracies of genomic predictions for drought tolerance in maize by joint modeling of additive and dominance effects in multi-environment trials.

    PubMed

    Dias, Kaio Olímpio Das Graças; Gezan, Salvador Alejandro; Guimarães, Claudia Teixeira; Nazarian, Alireza; da Costa E Silva, Luciano; Parentoni, Sidney Netto; de Oliveira Guimarães, Paulo Evaristo; de Oliveira Anoni, Carina; Pádua, José Maria Villela; de Oliveira Pinto, Marcos; Noda, Roberto Willians; Ribeiro, Carlos Alexandre Gomes; de Magalhães, Jurandir Vieira; Garcia, Antonio Augusto Franco; de Souza, João Cândido; Guimarães, Lauro José Moreira; Pastina, Maria Marta

    2018-07-01

    Breeding for drought tolerance is a challenging task that requires costly, extensive, and precise phenotyping. Genomic selection (GS) can be used to maximize selection efficiency and the genetic gains in maize (Zea mays L.) breeding programs for drought tolerance. Here, we evaluated the accuracy of genomic selection (GS) using additive (A) and additive + dominance (AD) models to predict the performance of untested maize single-cross hybrids for drought tolerance in multi-environment trials. Phenotypic data of five drought tolerance traits were measured in 308 hybrids along eight trials under water-stressed (WS) and well-watered (WW) conditions over two years and two locations in Brazil. Hybrids' genotypes were inferred based on their parents' genotypes (inbred lines) using single-nucleotide polymorphism markers obtained via genotyping-by-sequencing. GS analyses were performed using genomic best linear unbiased prediction by fitting a factor analytic (FA) multiplicative mixed model. Two cross-validation (CV) schemes were tested: CV1 and CV2. The FA framework allowed for investigating the stability of additive and dominance effects across environments, as well as the additive-by-environment and the dominance-by-environment interactions, with interesting applications for parental and hybrid selection. Results showed differences in the predictive accuracy between A and AD models, using both CV1 and CV2, for the five traits in both water conditions. For grain yield (GY) under WS and using CV1, the AD model doubled the predictive accuracy in comparison to the A model. Through CV2, GS models benefit from borrowing information of correlated trials, resulting in an increase of 40% and 9% in the predictive accuracy of GY under WS for A and AD models, respectively. These results highlight the importance of multi-environment trial analyses using GS models that incorporate additive and dominance effects for genomic predictions of GY under drought in maize single-cross hybrids.

  11. CS-AMPPred: An Updated SVM Model for Antimicrobial Activity Prediction in Cysteine-Stabilized Peptides

    PubMed Central

    Porto, William F.; Pires, Állan S.; Franco, Octavio L.

    2012-01-01

    The antimicrobial peptides (AMP) have been proposed as an alternative to control resistant pathogens. However, due to multifunctional properties of several AMP classes, until now there has been no way to perform efficient AMP identification, except through in vitro and in vivo tests. Nevertheless, an indication of activity can be provided by prediction methods. In order to contribute to the AMP prediction field, the CS-AMPPred (Cysteine-Stabilized Antimicrobial Peptides Predictor) is presented here, consisting of an updated version of the Support Vector Machine (SVM) model for antimicrobial activity prediction in cysteine-stabilized peptides. The CS-AMPPred is based on five sequence descriptors: indexes of (i) α-helix and (ii) loop formation; and averages of (iii) net charge, (iv) hydrophobicity and (v) flexibility. CS-AMPPred was based on 310 cysteine-stabilized AMPs and 310 sequences extracted from PDB. The polynomial kernel achieves the best accuracy on 5-fold cross validation (85.81%), while the radial and linear kernels achieve 84.19%. Testing in a blind data set, the polynomial and radial kernels achieve an accuracy of 90.00%, while the linear model achieves 89.33%. The three models reach higher accuracies than previously described methods. A standalone version of CS-AMPPred is available for download at and runs on any Linux machine. PMID:23240023

  12. Relationship between the Prediction Accuracy of Tsunami Inundation and Relative Distribution of Tsunami Source and Observation Arrays: A Case Study in Tokyo Bay

    NASA Astrophysics Data System (ADS)

    Takagawa, T.

    2017-12-01

    A rapid and precise tsunami forecast based on offshore monitoring is getting attention to reduce human losses due to devastating tsunami inundation. We developed a forecast method based on the combination of hierarchical Bayesian inversion with pre-computed database and rapid post-computing of tsunami inundation. The method was applied to Tokyo bay to evaluate the efficiency of observation arrays against three tsunamigenic earthquakes. One is a scenario earthquake at Nankai trough and the other two are historic ones of Genroku in 1703 and Enpo in 1677. In general, rich observation array near the tsunami source has an advantage in both accuracy and rapidness of tsunami forecast. To examine the effect of observation time length we used four types of data with the lengths of 5, 10, 20 and 45 minutes after the earthquake occurrences. Prediction accuracy of tsunami inundation was evaluated by the simulated tsunami inundation areas around Tokyo bay due to target earthquakes. The shortest time length of accurate prediction varied with target earthquakes. Here, accurate prediction means the simulated values fall within the 95% credible intervals of prediction. In Enpo earthquake case, 5-minutes observation is enough for accurate prediction for Tokyo bay, but 10-minutes and 45-minutes are needed in the case of Nankai trough and Genroku, respectively. The difference of the shortest time length for accurate prediction shows the strong relationship with the relative distance from the tsunami source and observation arrays. In the Enpo case, offshore tsunami observation points are densely distributed even in the source region. So, accurate prediction can be rapidly achieved within 5 minutes. This precise prediction is useful for early warnings. Even in the worst case of Genroku, where less observation points are available near the source, accurate prediction can be obtained within 45 minutes. This information can be useful to figure out the outline of the hazard in an early stage of reaction.

  13. [Application of ARIMA model to predict number of malaria cases in China].

    PubMed

    Hui-Yu, H; Hua-Qin, S; Shun-Xian, Z; Lin, A I; Yan, L U; Yu-Chun, C; Shi-Zhu, L I; Xue-Jiao, T; Chun-Li, Y; Wei, H U; Jia-Xu, C

    2017-08-15

    Objective To study the application of autoregressive integrated moving average (ARIMA) model to predict the monthly reported malaria cases in China, so as to provide a reference for prevention and control of malaria. Methods SPSS 24.0 software was used to construct the ARIMA models based on the monthly reported malaria cases of the time series of 20062015 and 2011-2015, respectively. The data of malaria cases from January to December, 2016 were used as validation data to compare the accuracy of the two ARIMA models. Results The models of the monthly reported cases of malaria in China were ARIMA (2, 1, 1) (1, 1, 0) 12 and ARIMA (1, 0, 0) (1, 1, 0) 12 respectively. The comparison between the predictions of the two models and actual situation of malaria cases showed that the ARIMA model based on the data of 2011-2015 had a higher accuracy of forecasting than the model based on the data of 2006-2015 had. Conclusion The establishment and prediction of ARIMA model is a dynamic process, which needs to be adjusted unceasingly according to the accumulated data, and in addition, the major changes of epidemic characteristics of infectious diseases must be considered.

  14. ASME V\\&V challenge problem: Surrogate-based V&V

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Beghini, Lauren L.; Hough, Patricia D.

    2015-12-18

    The process of verification and validation can be resource intensive. From the computational model perspective, the resource demand typically arises from long simulation run times on multiple cores coupled with the need to characterize and propagate uncertainties. In addition, predictive computations performed for safety and reliability analyses have similar resource requirements. For this reason, there is a tradeoff between the time required to complete the requisite studies and the fidelity or accuracy of the results that can be obtained. At a high level, our approach is cast within a validation hierarchy that provides a framework in which we perform sensitivitymore » analysis, model calibration, model validation, and prediction. The evidence gathered as part of these activities is mapped into the Predictive Capability Maturity Model to assess credibility of the model used for the reliability predictions. With regard to specific technical aspects of our analysis, we employ surrogate-based methods, primarily based on polynomial chaos expansions and Gaussian processes, for model calibration, sensitivity analysis, and uncertainty quantification in order to reduce the number of simulations that must be done. The goal is to tip the tradeoff balance to improving accuracy without increasing the computational demands.« less

  15. Contingency Awareness Shapes Acquisition and Extinction of Emotional Responses in a Conditioning Model of Pain-Related Fear

    PubMed Central

    Labrenz, Franziska; Icenhour, Adriane; Benson, Sven; Elsenbruch, Sigrid

    2015-01-01

    As a fundamental learning process, fear conditioning promotes the formation of associations between predictive cues and biologically significant signals. In its application to pain, conditioning may provide important insight into mechanisms underlying pain-related fear, although knowledge especially in interoceptive pain paradigms remains scarce. Furthermore, while the influence of contingency awareness on excitatory learning is subject of ongoing debate, its role in pain-related acquisition is poorly understood and essentially unknown regarding extinction as inhibitory learning. Therefore, we addressed the impact of contingency awareness on learned emotional responses to pain- and safety-predictive cues in a combined dataset of two pain-related conditioning studies. In total, 75 healthy participants underwent differential fear acquisition, during which rectal distensions as interoceptive unconditioned stimuli (US) were repeatedly paired with a predictive visual cue (conditioned stimulus; CS+) while another cue (CS−) was presented unpaired. During extinction, both CS were presented without US. CS valence, indicating learned emotional responses, and CS-US contingencies were assessed on visual analog scales (VAS). Based on an integrative measure of contingency accuracy, a median-split was performed to compare groups with low vs. high contingency accuracy regarding learned emotional responses. To investigate predictive value of contingency accuracy, regression analyses were conducted. Highly accurate individuals revealed more pronounced negative emotional responses to CS+ and increased positive responses to CS− when compared to participants with low contingency accuracy. Following extinction, highly accurate individuals had fully extinguished pain-predictive cue properties, while exhibiting persistent positive emotional responses to safety signals. In contrast, individuals with low accuracy revealed equally positive emotional responses to both, CS+ and CS−. Contingency accuracy predicted variance in the formation of positive responses to safety cues while no predictive value was found for danger cues following acquisition and for neither cue following extinction. Our findings underscore specific roles of learned danger and safety in pain-related acquisition and extinction. Contingency accuracy appears to distinctly impact learned emotional responses to safety and danger cues, supporting aversive learning to occur independently from CS-US awareness. The interplay of cognitive and emotional factors in shaping excitatory and inhibitory pain-related learning may contribute to altered pain processing, underscoring its clinical relevance in chronic pain. PMID:26640433

  16. Contingency Awareness Shapes Acquisition and Extinction of Emotional Responses in a Conditioning Model of Pain-Related Fear.

    PubMed

    Labrenz, Franziska; Icenhour, Adriane; Benson, Sven; Elsenbruch, Sigrid

    2015-01-01

    As a fundamental learning process, fear conditioning promotes the formation of associations between predictive cues and biologically significant signals. In its application to pain, conditioning may provide important insight into mechanisms underlying pain-related fear, although knowledge especially in interoceptive pain paradigms remains scarce. Furthermore, while the influence of contingency awareness on excitatory learning is subject of ongoing debate, its role in pain-related acquisition is poorly understood and essentially unknown regarding extinction as inhibitory learning. Therefore, we addressed the impact of contingency awareness on learned emotional responses to pain- and safety-predictive cues in a combined dataset of two pain-related conditioning studies. In total, 75 healthy participants underwent differential fear acquisition, during which rectal distensions as interoceptive unconditioned stimuli (US) were repeatedly paired with a predictive visual cue (conditioned stimulus; CS(+)) while another cue (CS(-)) was presented unpaired. During extinction, both CS were presented without US. CS valence, indicating learned emotional responses, and CS-US contingencies were assessed on visual analog scales (VAS). Based on an integrative measure of contingency accuracy, a median-split was performed to compare groups with low vs. high contingency accuracy regarding learned emotional responses. To investigate predictive value of contingency accuracy, regression analyses were conducted. Highly accurate individuals revealed more pronounced negative emotional responses to CS(+) and increased positive responses to CS(-) when compared to participants with low contingency accuracy. Following extinction, highly accurate individuals had fully extinguished pain-predictive cue properties, while exhibiting persistent positive emotional responses to safety signals. In contrast, individuals with low accuracy revealed equally positive emotional responses to both, CS(+) and CS(-). Contingency accuracy predicted variance in the formation of positive responses to safety cues while no predictive value was found for danger cues following acquisition and for neither cue following extinction. Our findings underscore specific roles of learned danger and safety in pain-related acquisition and extinction. Contingency accuracy appears to distinctly impact learned emotional responses to safety and danger cues, supporting aversive learning to occur independently from CS-US awareness. The interplay of cognitive and emotional factors in shaping excitatory and inhibitory pain-related learning may contribute to altered pain processing, underscoring its clinical relevance in chronic pain.

  17. Stock market index prediction using neural networks

    NASA Astrophysics Data System (ADS)

    Komo, Darmadi; Chang, Chein-I.; Ko, Hanseok

    1994-03-01

    A neural network approach to stock market index prediction is presented. Actual data of the Wall Street Journal's Dow Jones Industrial Index has been used for a benchmark in our experiments where Radial Basis Function based neural networks have been designed to model these indices over the period from January 1988 to Dec 1992. A notable success has been achieved with the proposed model producing over 90% prediction accuracies observed based on monthly Dow Jones Industrial Index predictions. The model has also captured both moderate and heavy index fluctuations. The experiments conducted in this study demonstrated that the Radial Basis Function neural network represents an excellent candidate to predict stock market index.

  18. Simultaneous Co-Clustering and Classification in Customers Insight

    NASA Astrophysics Data System (ADS)

    Anggistia, M.; Saefuddin, A.; Sartono, B.

    2017-04-01

    Building predictive model based on the heterogeneous dataset may yield many problems, such as less precise in parameter and prediction accuracy. Such problem can be solved by segmenting the data into relatively homogeneous groups and then build a predictive model for each cluster. The advantage of using this strategy usually gives result in simpler models, more interpretable, and more actionable without any loss in accuracy and reliability. This work concerns on marketing data set which recorded a customer behaviour across products. There are some variables describing customer and product as attributes. The basic idea of this approach is to combine co-clustering and classification simultaneously. The objective of this research is to analyse the customer across product characteristics, so the marketing strategy implemented precisely.

  19. Accuracy of binding mode prediction with a cascadic stochastic tunneling method.

    PubMed

    Fischer, Bernhard; Basili, Serena; Merlitz, Holger; Wenzel, Wolfgang

    2007-07-01

    We investigate the accuracy of the binding modes predicted for 83 complexes of the high-resolution subset of the ASTEX/CCDC receptor-ligand database using the atomistic FlexScreen approach with a simple forcefield-based scoring function. The median RMS deviation between experimental and predicted binding mode was just 0.83 A. Over 80% of the ligands dock within 2 A of the experimental binding mode, for 60 complexes the docking protocol locates the correct binding mode in all of ten independent simulations. Most docking failures arise because (a) the experimental structure clashed in our forcefield and is thus unattainable in the docking process or (b) because the ligand is stabilized by crystal water. 2007 Wiley-Liss, Inc.

  20. A comparison of fatigue life prediction methodologies for rotorcraft

    NASA Technical Reports Server (NTRS)

    Everett, R. A., Jr.

    1990-01-01

    Because of the current U.S. Army requirement that all new rotorcraft be designed to a 'six nines' reliability on fatigue life, this study was undertaken to assess the accuracy of the current safe life philosophy using the nominal stress Palmgrem-Miner linear cumulative damage rule to predict the fatigue life of rotorcraft dynamic components. It has been shown that this methodology can predict fatigue lives that differ from test lives by more than two orders of magnitude. A further objective of this work was to compare the accuracy of this methodology to another safe life method called the local strain approach as well as to a method which predicts fatigue life based solely on crack growth data. Spectrum fatigue tests were run on notched (k(sub t) = 3.2) specimens made of 4340 steel using the Felix/28 tests fairly well, being slightly on the unconservative side of the test data. The crack growth method, which is based on 'small crack' crack growth data and a crack-closure model, also predicted the fatigue lives very well with the predicted lives being slightly longer that the mean test lives but within the experimental scatter band. The crack growth model was also able to predict the change in test lives produced by the rainflow reconstructed spectra.

  1. Comparison of self-report-based and physical performance-based frailty definitions among patients receiving maintenance hemodialysis.

    PubMed

    Johansen, Kirsten L; Dalrymple, Lorien S; Delgado, Cynthia; Kaysen, George A; Kornak, John; Grimes, Barbara; Chertow, Glenn M

    2014-10-01

    A well-accepted definition of frailty includes measurements of physical performance, which may limit its clinical utility. In a cross-sectional study, we compared prevalence and patient characteristics based on a frailty definition that uses self-reported function to the classic performance-based definition and developed a modified self-report-based definition. Prevalent adult patients receiving hemodialysis in 14 centers around San Francisco and Atlanta in 2009-2011. Self-report-based frailty definition in which a score lower than 75 on the Physical Function scale of the 36-Item Short Form Health Survey (SF-36) was substituted for gait speed and grip strength in the classic definition; modified self-report definition with optimized Physical Function score cutoff points derived in a development (one-half) cohort and validated in the other half. Performance-based frailty defined as 3 of the following: weight loss, weakness, exhaustion, low physical activity, and slow gait speed. 387 (53%) patients were frail based on self-reported function, of whom 209 (29% of the cohort) met the performance-based definition. Only 23 (3%) met the performance-based definition of frailty only. The self-report definition had 90% sensitivity, 64% specificity, 54% positive predictive value, 93% negative predictive value, and 72.5% overall accuracy. Intracellular water per kilogram of body weight and serum albumin, prealbumin, and creatinine levels were highest among nonfrail individuals, intermediate among those who were frail by self-report, and lowest among those who also were frail by performance. Age, percentage of body fat, and C-reactive protein level followed an opposite pattern. The modified self-report definition had better accuracy (84%; 95% CI, 79%-89%) and superior specificity (88%) and positive predictive value (67%). Our study did not address prediction of outcomes. Patients who meet the self-report-based but not the performance-based definition of frailty may represent an intermediate phenotype. A modified self-report definition can improve the accuracy of a questionnaire-based method of defining frailty. Published by Elsevier Inc.

  2. Prediction of clinical behaviour and treatment for cancers.

    PubMed

    Futschik, Matthias E; Sullivan, Mike; Reeve, Anthony; Kasabov, Nikola

    2003-01-01

    Prediction of clinical behaviour and treatment for cancers is based on the integration of clinical and pathological parameters. Recent reports have demonstrated that gene expression profiling provides a powerful new approach for determining disease outcome. If clinical and microarray data each contain independent information then it should be possible to combine these datasets to gain more accurate prognostic information. Here, we have used existing clinical information and microarray data to generate a combined prognostic model for outcome prediction for diffuse large B-cell lymphoma (DLBCL). A prediction accuracy of 87.5% was achieved. This constitutes a significant improvement compared to the previously most accurate prognostic model with an accuracy of 77.6%. The model introduced here may be generally applicable to the combination of various types of molecular and clinical data for improving medical decision support systems and individualising patient care.

  3. Genomic Prediction of Gene Bank Wheat Landraces

    PubMed Central

    Crossa, José; Jarquín, Diego; Franco, Jorge; Pérez-Rodríguez, Paulino; Burgueño, Juan; Saint-Pierre, Carolina; Vikram, Prashant; Sansaloni, Carolina; Petroli, Cesar; Akdemir, Deniz; Sneller, Clay; Reynolds, Matthew; Tattaris, Maria; Payne, Thomas; Guzman, Carlos; Peña, Roberto J.; Wenzl, Peter; Singh, Sukhwinder

    2016-01-01

    This study examines genomic prediction within 8416 Mexican landrace accessions and 2403 Iranian landrace accessions stored in gene banks. The Mexican and Iranian collections were evaluated in separate field trials, including an optimum environment for several traits, and in two separate environments (drought, D and heat, H) for the highly heritable traits, days to heading (DTH), and days to maturity (DTM). Analyses accounting and not accounting for population structure were performed. Genomic prediction models include genotype × environment interaction (G × E). Two alternative prediction strategies were studied: (1) random cross-validation of the data in 20% training (TRN) and 80% testing (TST) (TRN20-TST80) sets, and (2) two types of core sets, “diversity” and “prediction”, including 10% and 20%, respectively, of the total collections. Accounting for population structure decreased prediction accuracy by 15–20% as compared to prediction accuracy obtained when not accounting for population structure. Accounting for population structure gave prediction accuracies for traits evaluated in one environment for TRN20-TST80 that ranged from 0.407 to 0.677 for Mexican landraces, and from 0.166 to 0.662 for Iranian landraces. Prediction accuracy of the 20% diversity core set was similar to accuracies obtained for TRN20-TST80, ranging from 0.412 to 0.654 for Mexican landraces, and from 0.182 to 0.647 for Iranian landraces. The predictive core set gave similar prediction accuracy as the diversity core set for Mexican collections, but slightly lower for Iranian collections. Prediction accuracy when incorporating G × E for DTH and DTM for Mexican landraces for TRN20-TST80 was around 0.60, which is greater than without the G × E term. For Iranian landraces, accuracies were 0.55 for the G × E model with TRN20-TST80. Results show promising prediction accuracies for potential use in germplasm enhancement and rapid introgression of exotic germplasm into elite materials. PMID:27172218

  4. Accurate prediction of energy expenditure using a shoe-based activity monitor.

    PubMed

    Sazonova, Nadezhda; Browning, Raymond C; Sazonov, Edward

    2011-07-01

    The aim of this study was to develop and validate a method for predicting energy expenditure (EE) using a footwear-based system with integrated accelerometer and pressure sensors. We developed a footwear-based device with an embedded accelerometer and insole pressure sensors for the prediction of EE. The data from the device can be used to perform accurate recognition of major postures and activities and to estimate EE using the acceleration, pressure, and posture/activity classification information in a branched algorithm without the need for individual calibration. We measured EE via indirect calorimetry as 16 adults (body mass index=19-39 kg·m) performed various low- to moderate-intensity activities and compared measured versus predicted EE using several models based on the acceleration and pressure signals. Inclusion of pressure data resulted in better accuracy of EE prediction during static postures such as sitting and standing. The activity-based branched model that included predictors from accelerometer and pressure sensors (BACC-PS) achieved the lowest error (e.g., root mean squared error (RMSE)=0.69 METs) compared with the accelerometer-only-based branched model BACC (RMSE=0.77 METs) and nonbranched model (RMSE=0.94-0.99 METs). Comparison of EE prediction models using data from both legs versus models using data from a single leg indicates that only one shoe needs to be equipped with sensors. These results suggest that foot acceleration combined with insole pressure measurement, when used in an activity-specific branched model, can accurately estimate the EE associated with common daily postures and activities. The accuracy and unobtrusiveness of a footwear-based device may make it an effective physical activity monitoring tool.

  5. The SIST-M: Predictive validity of a brief structured Clinical Dementia Rating interview

    PubMed Central

    Okereke, Olivia I.; Pantoja-Galicia, Norberto; Copeland, Maura; Hyman, Bradley T.; Wanggaard, Taylor; Albert, Marilyn S.; Betensky, Rebecca A.; Blacker, Deborah

    2011-01-01

    Background We previously established reliability and cross-sectional validity of the SIST-M (Structured Interview and Scoring Tool–Massachusetts Alzheimer's Disease Research Center), a shortened version of an instrument shown to predict progression to Alzheimer disease (AD), even among persons with very mild cognitive impairment (vMCI). Objective To test predictive validity of the SIST-M. Methods Participants were 342 community-dwelling, non-demented older adults in a longitudinal study. Baseline Clinical Dementia Rating (CDR) ratings were determined by either: 1) clinician interviews or 2) a previously developed computer algorithm based on 60 questions (of a possible 131) extracted from clinician interviews. We developed age+gender+education-adjusted Cox proportional hazards models using CDR-sum-of-boxes (CDR-SB) as the predictor, where CDR-SB was determined by either clinician interview or algorithm; models were run for the full sample (n=342) and among those jointly classified as vMCI using clinician- and algorithm-based CDR ratings (n=156). We directly compared predictive accuracy using time-dependent Receiver Operating Characteristic (ROC) curves. Results AD hazard ratios (HRs) were similar for clinician-based and algorithm-based CDR-SB: for a 1-point increment in CDR-SB, respective HRs (95% CI)=3.1 (2.5,3.9) and 2.8 (2.2,3.5); among those with vMCI, respective HRs (95% CI) were 2.2 (1.6,3.2) and 2.1 (1.5,3.0). Similarly high predictive accuracy was achieved: the concordance probability (weighted average of the area-under-the-ROC curves) over follow-up was 0.78 vs. 0.76 using clinician-based vs. algorithm-based CDR-SB. Conclusion CDR scores based on items from this shortened interview had high predictive ability for AD – comparable to that using a lengthy clinical interview. PMID:21986342

  6. Integrated detection of fractures and caves in carbonate fractured-vuggy reservoirs based on seismic data and well data

    NASA Astrophysics Data System (ADS)

    Cao, Zhanning; Li, Xiangyang; Sun, Shaohan; Liu, Qun; Deng, Guangxiao

    2018-04-01

    Aiming at the prediction of carbonate fractured-vuggy reservoirs, we put forward an integrated approach based on seismic and well data. We divide a carbonate fracture-cave system into four scales for study: micro-scale fracture, meso-scale fracture, macro-scale fracture and cave. Firstly, we analyze anisotropic attributes of prestack azimuth gathers based on multi-scale rock physics forward modeling. We select the frequency attenuation gradient attribute to calculate azimuth anisotropy intensity, and we constrain the result with Formation MicroScanner image data and trial production data to predict the distribution of both micro-scale and meso-scale fracture sets. Then, poststack seismic attributes, variance, curvature and ant algorithms are used to predict the distribution of macro-scale fractures. We also constrain the results with trial production data for accuracy. Next, the distribution of caves is predicted by the amplitude corresponding to the instantaneous peak frequency of the seismic imaging data. Finally, the meso-scale fracture sets, macro-scale fractures and caves are combined to obtain an integrated result. This integrated approach is applied to a real field in Tarim Basin in western China for the prediction of fracture-cave reservoirs. The results indicate that this approach can well explain the spatial distribution of carbonate reservoirs. It can solve the problem of non-uniqueness and improve fracture prediction accuracy.

  7. A machine-learning approach for predicting palmitoylation sites from integrated sequence-based features.

    PubMed

    Li, Liqi; Luo, Qifa; Xiao, Weidong; Li, Jinhui; Zhou, Shiwen; Li, Yongsheng; Zheng, Xiaoqi; Yang, Hua

    2017-02-01

    Palmitoylation is the covalent attachment of lipids to amino acid residues in proteins. As an important form of protein posttranslational modification, it increases the hydrophobicity of proteins, which contributes to the protein transportation, organelle localization, and functions, therefore plays an important role in a variety of cell biological processes. Identification of palmitoylation sites is necessary for understanding protein-protein interaction, protein stability, and activity. Since conventional experimental techniques to determine palmitoylation sites in proteins are both labor intensive and costly, a fast and accurate computational approach to predict palmitoylation sites from protein sequences is in urgent need. In this study, a support vector machine (SVM)-based method was proposed through integrating PSI-BLAST profile, physicochemical properties, [Formula: see text]-mer amino acid compositions (AACs), and [Formula: see text]-mer pseudo AACs into the principal feature vector. A recursive feature selection scheme was subsequently implemented to single out the most discriminative features. Finally, an SVM method was implemented to predict palmitoylation sites in proteins based on the optimal features. The proposed method achieved an accuracy of 99.41% and Matthews Correlation Coefficient of 0.9773 for a benchmark dataset. The result indicates the efficiency and accuracy of our method in prediction of palmitoylation sites based on protein sequences.

  8. A Hybrid FPGA-Based System for EEG- and EMG-Based Online Movement Prediction.

    PubMed

    Wöhrle, Hendrik; Tabie, Marc; Kim, Su Kyoung; Kirchner, Frank; Kirchner, Elsa Andrea

    2017-07-03

    A current trend in the development of assistive devices for rehabilitation, for example exoskeletons or active orthoses, is to utilize physiological data to enhance their functionality and usability, for example by predicting the patient's upcoming movements using electroencephalography (EEG) or electromyography (EMG). However, these modalities have different temporal properties and classification accuracies, which results in specific advantages and disadvantages. To use physiological data analysis in rehabilitation devices, the processing should be performed in real-time, guarantee close to natural movement onset support, provide high mobility, and should be performed by miniaturized systems that can be embedded into the rehabilitation device. We present a novel Field Programmable Gate Array (FPGA) -based system for real-time movement prediction using physiological data. Its parallel processing capabilities allows the combination of movement predictions based on EEG and EMG and additionally a P300 detection, which is likely evoked by instructions of the therapist. The system is evaluated in an offline and an online study with twelve healthy subjects in total. We show that it provides a high computational performance and significantly lower power consumption in comparison to a standard PC. Furthermore, despite the usage of fixed-point computations, the proposed system achieves a classification accuracy similar to systems with double precision floating-point precision.

  9. A Hybrid FPGA-Based System for EEG- and EMG-Based Online Movement Prediction

    PubMed Central

    Wöhrle, Hendrik; Tabie, Marc; Kim, Su Kyoung; Kirchner, Frank; Kirchner, Elsa Andrea

    2017-01-01

    A current trend in the development of assistive devices for rehabilitation, for example exoskeletons or active orthoses, is to utilize physiological data to enhance their functionality and usability, for example by predicting the patient’s upcoming movements using electroencephalography (EEG) or electromyography (EMG). However, these modalities have different temporal properties and classification accuracies, which results in specific advantages and disadvantages. To use physiological data analysis in rehabilitation devices, the processing should be performed in real-time, guarantee close to natural movement onset support, provide high mobility, and should be performed by miniaturized systems that can be embedded into the rehabilitation device. We present a novel Field Programmable Gate Array (FPGA) -based system for real-time movement prediction using physiological data. Its parallel processing capabilities allows the combination of movement predictions based on EEG and EMG and additionally a P300 detection, which is likely evoked by instructions of the therapist. The system is evaluated in an offline and an online study with twelve healthy subjects in total. We show that it provides a high computational performance and significantly lower power consumption in comparison to a standard PC. Furthermore, despite the usage of fixed-point computations, the proposed system achieves a classification accuracy similar to systems with double precision floating-point precision. PMID:28671632

  10. The accuracy of Genomic Selection in Norwegian red cattle assessed by cross-validation.

    PubMed

    Luan, Tu; Woolliams, John A; Lien, Sigbjørn; Kent, Matthew; Svendsen, Morten; Meuwissen, Theo H E

    2009-11-01

    Genomic Selection (GS) is a newly developed tool for the estimation of breeding values for quantitative traits through the use of dense markers covering the whole genome. For a successful application of GS, accuracy of the prediction of genomewide breeding value (GW-EBV) is a key issue to consider. Here we investigated the accuracy and possible bias of GW-EBV prediction, using real bovine SNP genotyping (18,991 SNPs) and phenotypic data of 500 Norwegian Red bulls. The study was performed on milk yield, fat yield, protein yield, first lactation mastitis traits, and calving ease. Three methods, best linear unbiased prediction (G-BLUP), Bayesian statistics (BayesB), and a mixture model approach (MIXTURE), were used to estimate marker effects, and their accuracy and bias were estimated by using cross-validation. The accuracies of the GW-EBV prediction were found to vary widely between 0.12 and 0.62. G-BLUP gave overall the highest accuracy. We observed a strong relationship between the accuracy of the prediction and the heritability of the trait. GW-EBV prediction for production traits with high heritability achieved higher accuracy and also lower bias than health traits with low heritability. To achieve a similar accuracy for the health traits probably more records will be needed.

  11. MicroRNA based Pan-Cancer Diagnosis and Treatment Recommendation.

    PubMed

    Cheerla, Nikhil; Gevaert, Olivier

    2017-01-13

    The current state-of-the-art in cancer diagnosis and treatment is not ideal; diagnostic tests are accurate but invasive, and treatments are "one-size fits-all" instead of being personalized. Recently, miRNA's have garnered significant attention as cancer biomarkers, owing to their ease of access (circulating miRNA in the blood) and stability. There have been many studies showing the effectiveness of miRNA data in diagnosing specific cancer types, but few studies explore the role of miRNA in predicting treatment outcome. Here we go a step further, using tissue miRNA and clinical data across 21 cancers from the 'The Cancer Genome Atlas' (TCGA) database. We use machine learning techniques to create an accurate pan-cancer diagnosis system, and a prediction model for treatment outcomes. Finally, using these models, we create a web-based tool that diagnoses cancer and recommends the best treatment options. We achieved 97.2% accuracy for classification using a support vector machine classifier with radial basis. The accuracies improved to 99.9-100% when climbing up the embryonic tree and classifying cancers at different stages. We define the accuracy as the ratio of the total number of instances correctly classified to the total instances. The classifier also performed well, achieving greater than 80% sensitivity for many cancer types on independent validation datasets. Many miRNAs selected by our feature selection algorithm had strong previous associations to various cancers and tumor progression. Then, using miRNA, clinical and treatment data and encoding it in a machine-learning readable format, we built a prognosis predictor model to predict the outcome of treatment with 85% accuracy. We used this model to create a tool that recommends personalized treatment regimens. Both the diagnosis and prognosis model, incorporating semi-supervised learning techniques to improve their accuracies with repeated use, were uploaded online for easy access. Our research is a step towards the final goal of diagnosing cancer and predicting treatment recommendations using non-invasive blood tests.

  12. Diagnostic accuracy of FEV1/forced vital capacity ratio z scores in asthmatic patients.

    PubMed

    Lambert, Allison; Drummond, M Bradley; Wei, Christine; Irvin, Charles; Kaminsky, David; McCormack, Meredith; Wise, Robert

    2015-09-01

    The FEV1/forced vital capacity (FVC) ratio is used as a criterion for airflow obstruction; however, the test characteristics of spirometry in the diagnosis of asthma are not well established. The accuracy of a test depends on the pretest probability of disease. We wanted to estimate the FEV1/FVC ratio z score threshold with optimal accuracy for the diagnosis of asthma for different pretest probabilities. Asthmatic patients enrolled in 4 trials from the Asthma Clinical Research Centers were included in this analysis. Measured and predicted FEV1/FVC ratios were obtained, with calculation of z scores for each participant. Across a range of asthma prevalences and z score thresholds, the overall diagnostic accuracy was calculated. One thousand six hundred eight participants were included (mean age, 39 years; 71% female; 61% white). The mean FEV1 percent predicted value was 83% (SD, 15%). In a symptomatic population with 50% pretest probability of asthma, optimal accuracy (68%) is achieved with a z score threshold of -1.0 (16th percentile), corresponding to a 6 percentage point reduction from the predicted ratio. However, in a screening population with a 5% pretest probability of asthma, the optimum z score is -2.0 (second percentile), corresponding to a 12 percentage point reduction from the predicted ratio. These findings were not altered by markers of disease control. Reduction of the FEV1/FVC ratio can support the diagnosis of asthma; however, the ratio is neither sensitive nor specific enough for diagnostic accuracy. When interpreting spirometric results, consideration of the pretest probability is an important consideration in the diagnosis of asthma based on airflow limitation. Copyright © 2015 American Academy of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.

  13. Evaluation of Gravitational Field Models Based on the Laser Range Observation of Low Earth Orbit Satellites

    NASA Astrophysics Data System (ADS)

    Hong-bo, Wang; Chang-yin, Zhao; Wei, Zhang; Jin-wei, Zhan; Sheng-xian, Yu

    2016-07-01

    The Earth gravitational field model is one of the most important dynamic models in satellite orbit computation. Several space gravity missions made great successes in recent years, prompting the publishing of several gravitational filed models. In this paper, two classical (JGM3, EGM96) and four latest (EIGEN-CHAMP05S, GGM03S, GOCE02S, EGM2008) models are evaluated by employing them in the precision orbit determination (POD) and prediction. These calculations are performed based on the laser ranging observation of four Low Earth Orbit (LEO) satellites, including CHAMP, GFZ-1, GRACE-A, and SWARM-A. The residual error of observation in POD is adopted to describe the accuracy of six gravitational field models. The main results we obtained are as follows. (1) For the POD of LEOs, the accuracies of 4 latest models are at the same level, and better than those of 2 classical models; (2) Taking JGM3 as reference, EGM96 model's accuracy is better in most situations, and the accuracies of the 4 latest models are improved by 12%-47% in POD and 63% in prediction, respectively. We also confirm that the model's accuracy in POD is enhanced with the increasing degree and order if they are smaller than 70, and when they exceed 70, the accuracy keeps constant, implying that the model's degree and order truncated to 70 are sufficient to meet the requirement of LEO computation of centimeter precision.

  14. EVALUATING RISK-PREDICTION MODELS USING DATA FROM ELECTRONIC HEALTH RECORDS.

    PubMed

    Wang, L E; Shaw, Pamela A; Mathelier, Hansie M; Kimmel, Stephen E; French, Benjamin

    2016-03-01

    The availability of data from electronic health records facilitates the development and evaluation of risk-prediction models, but estimation of prediction accuracy could be limited by outcome misclassification, which can arise if events are not captured. We evaluate the robustness of prediction accuracy summaries, obtained from receiver operating characteristic curves and risk-reclassification methods, if events are not captured (i.e., "false negatives"). We derive estimators for sensitivity and specificity if misclassification is independent of marker values. In simulation studies, we quantify the potential for bias in prediction accuracy summaries if misclassification depends on marker values. We compare the accuracy of alternative prognostic models for 30-day all-cause hospital readmission among 4548 patients discharged from the University of Pennsylvania Health System with a primary diagnosis of heart failure. Simulation studies indicate that if misclassification depends on marker values, then the estimated accuracy improvement is also biased, but the direction of the bias depends on the direction of the association between markers and the probability of misclassification. In our application, 29% of the 1143 readmitted patients were readmitted to a hospital elsewhere in Pennsylvania, which reduced prediction accuracy. Outcome misclassification can result in erroneous conclusions regarding the accuracy of risk-prediction models.

  15. Homology to peptide pattern for annotation of carbohydrate-active enzymes and prediction of function.

    PubMed

    Busk, P K; Pilgaard, B; Lezyk, M J; Meyer, A S; Lange, L

    2017-04-12

    Carbohydrate-active enzymes are found in all organisms and participate in key biological processes. These enzymes are classified in 274 families in the CAZy database but the sequence diversity within each family makes it a major task to identify new family members and to provide basis for prediction of enzyme function. A fast and reliable method for de novo annotation of genes encoding carbohydrate-active enzymes is to identify conserved peptides in the curated enzyme families followed by matching of the conserved peptides to the sequence of interest as demonstrated for the glycosyl hydrolase and the lytic polysaccharide monooxygenase families. This approach not only assigns the enzymes to families but also provides functional prediction of the enzymes with high accuracy. We identified conserved peptides for all enzyme families in the CAZy database with Peptide Pattern Recognition. The conserved peptides were matched to protein sequence for de novo annotation and functional prediction of carbohydrate-active enzymes with the Hotpep method. Annotation of protein sequences from 12 bacterial and 16 fungal genomes to families with Hotpep had an accuracy of 0.84 (measured as F1-score) compared to semiautomatic annotation by the CAZy database whereas the dbCAN HMM-based method had an accuracy of 0.77 with optimized parameters. Furthermore, Hotpep provided a functional prediction with 86% accuracy for the annotated genes. Hotpep is available as a stand-alone application for MS Windows. Hotpep is a state-of-the-art method for automatic annotation and functional prediction of carbohydrate-active enzymes.

  16. Mixed Model Methods for Genomic Prediction and Variance Component Estimation of Additive and Dominance Effects Using SNP Markers

    PubMed Central

    Da, Yang; Wang, Chunkao; Wang, Shengwen; Hu, Guo

    2014-01-01

    We established a genomic model of quantitative trait with genomic additive and dominance relationships that parallels the traditional quantitative genetics model, which partitions a genotypic value as breeding value plus dominance deviation and calculates additive and dominance relationships using pedigree information. Based on this genomic model, two sets of computationally complementary but mathematically identical mixed model methods were developed for genomic best linear unbiased prediction (GBLUP) and genomic restricted maximum likelihood estimation (GREML) of additive and dominance effects using SNP markers. These two sets are referred to as the CE and QM sets, where the CE set was designed for large numbers of markers and the QM set was designed for large numbers of individuals. GBLUP and associated accuracy formulations for individuals in training and validation data sets were derived for breeding values, dominance deviations and genotypic values. Simulation study showed that GREML and GBLUP generally were able to capture small additive and dominance effects that each accounted for 0.00005–0.0003 of the phenotypic variance and GREML was able to differentiate true additive and dominance heritability levels. GBLUP of the total genetic value as the summation of additive and dominance effects had higher prediction accuracy than either additive or dominance GBLUP, causal variants had the highest accuracy of GREML and GBLUP, and predicted accuracies were in agreement with observed accuracies. Genomic additive and dominance relationship matrices using SNP markers were consistent with theoretical expectations. The GREML and GBLUP methods can be an effective tool for assessing the type and magnitude of genetic effects affecting a phenotype and for predicting the total genetic value at the whole genome level. PMID:24498162

  17. Mixed model methods for genomic prediction and variance component estimation of additive and dominance effects using SNP markers.

    PubMed

    Da, Yang; Wang, Chunkao; Wang, Shengwen; Hu, Guo

    2014-01-01

    We established a genomic model of quantitative trait with genomic additive and dominance relationships that parallels the traditional quantitative genetics model, which partitions a genotypic value as breeding value plus dominance deviation and calculates additive and dominance relationships using pedigree information. Based on this genomic model, two sets of computationally complementary but mathematically identical mixed model methods were developed for genomic best linear unbiased prediction (GBLUP) and genomic restricted maximum likelihood estimation (GREML) of additive and dominance effects using SNP markers. These two sets are referred to as the CE and QM sets, where the CE set was designed for large numbers of markers and the QM set was designed for large numbers of individuals. GBLUP and associated accuracy formulations for individuals in training and validation data sets were derived for breeding values, dominance deviations and genotypic values. Simulation study showed that GREML and GBLUP generally were able to capture small additive and dominance effects that each accounted for 0.00005-0.0003 of the phenotypic variance and GREML was able to differentiate true additive and dominance heritability levels. GBLUP of the total genetic value as the summation of additive and dominance effects had higher prediction accuracy than either additive or dominance GBLUP, causal variants had the highest accuracy of GREML and GBLUP, and predicted accuracies were in agreement with observed accuracies. Genomic additive and dominance relationship matrices using SNP markers were consistent with theoretical expectations. The GREML and GBLUP methods can be an effective tool for assessing the type and magnitude of genetic effects affecting a phenotype and for predicting the total genetic value at the whole genome level.

  18. Modeling a Spatio-Temporal Individual Travel Behavior Using Geotagged Social Network Data: a Case Study of Greater Cincinnati

    NASA Astrophysics Data System (ADS)

    Saeedimoghaddam, M.; Kim, C.

    2017-10-01

    Understanding individual travel behavior is vital in travel demand management as well as in urban and transportation planning. New data sources including mobile phone data and location-based social media (LBSM) data allow us to understand mobility behavior on an unprecedented level of details. Recent studies of trip purpose prediction tend to use machine learning (ML) methods, since they generally produce high levels of predictive accuracy. Few studies used LSBM as a large data source to extend its potential in predicting individual travel destination using ML techniques. In the presented research, we created a spatio-temporal probabilistic model based on an ensemble ML framework named "Random Forests" utilizing the travel extracted from geotagged Tweets in 419 census tracts of Greater Cincinnati area for predicting the tract ID of an individual's travel destination at any time using the information of its origin. We evaluated the model accuracy using the travels extracted from the Tweets themselves as well as the travels from household travel survey. The Tweets and survey based travels that start from same tract in the south western parts of the study area is more likely to select same destination compare to the other parts. Also, both Tweets and survey based travels were affected by the attraction points in the downtown of Cincinnati and the tracts in the north eastern part of the area. Finally, both evaluations show that the model predictions are acceptable, but it cannot predict destination using inputs from other data sources as precise as the Tweets based data.

  19. Selective deficits in episodic feeling of knowing in ageing: a novel use of the general knowledge task.

    PubMed

    Morson, Suzannah M; Moulin, Chris J A; Souchay, Céline

    2015-05-01

    Failure to recall an item from memory can be accompanied by the subjective experience that the item is known but currently unavailable for report. The feeling of knowing (FOK) task allows measurement of the predictive accuracy of this reflective judgement. Young and older adults were asked to provide answers to general knowledge questions both prior to and after learning, thus measuring both semantic and episodic memory for the items. FOK judgements were made at each stage for all unrecalled responses, providing a measure of predictive accuracy for semantic and episodic knowledge. Results demonstrated a selective effect of age on episodic FOK resolution, with older adults found to have impaired episodic FOK accuracy while semantic FOK accuracy remained intact. Although recall and recognition measures of episodic memory are equivalent between the two age groups, older adults may have been unable to access contextual details on which to base their FOK judgements. The results suggest that older adults are not able to accurately predict future recognition of unrecalled episodic information, and consequently may have difficulties in monitoring recently encoded memories. Copyright © 2015. Published by Elsevier B.V.

  20. Impaired gas exchange: accuracy of defining characteristics in children with acute respiratory infection1

    PubMed Central

    Pascoal, Lívia Maia; Lopes, Marcos Venícios de Oliveira; Chaves, Daniel Bruno Resende; Beltrão, Beatriz Amorim; da Silva, Viviane Martins; Monteiro, Flávia Paula Magalhães

    2015-01-01

    OBJECTIVE: to analyze the accuracy of the defining characteristics of the Impaired gas exchange nursing diagnosis in children with acute respiratory infection. METHOD: open prospective cohort study conducted with 136 children monitored for a consecutive period of at least six days and not more than ten days. An instrument based on the defining characteristics of the Impaired gas exchange diagnosis and on literature addressing pulmonary assessment was used to collect data. The accuracy means of all the defining characteristics under study were computed. RESULTS: the Impaired gas exchange diagnosis was present in 42.6% of the children in the first assessment. Hypoxemia was the characteristic that presented the best measures of accuracy. Abnormal breathing presented high sensitivity, while restlessness, cyanosis, and abnormal skin color showed high specificity. All the characteristics presented negative predictive values of 70% and cyanosis stood out by its high positive predictive value. CONCLUSION: hypoxemia was the defining characteristic that presented the best predictive ability to determine Impaired gas exchange. Studies of this nature enable nurses to minimize variability in clinical situations presented by the patient and to identify more precisely the nursing diagnosis that represents the patient's true clinical condition. PMID:26155010

  1. Predicting Individual Fuel Economy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lin, Zhenhong; Greene, David L

    2011-01-01

    To make informed decisions about travel and vehicle purchase, consumers need unbiased and accurate information of the fuel economy they will actually obtain. In the past, the EPA fuel economy estimates based on its 1984 rules have been widely criticized for overestimating on-road fuel economy. In 2008, EPA adopted a new estimation rule. This study compares the usefulness of the EPA's 1984 and 2008 estimates based on their prediction bias and accuracy and attempts to improve the prediction of on-road fuel economies based on consumer and vehicle attributes. We examine the usefulness of the EPA fuel economy estimates using amore » large sample of self-reported on-road fuel economy data and develop an Individualized Model for more accurately predicting an individual driver's on-road fuel economy based on easily determined vehicle and driver attributes. Accuracy rather than bias appears to have limited the usefulness of the EPA 1984 estimates in predicting on-road MPG. The EPA 2008 estimates appear to be equally inaccurate and substantially more biased relative to the self-reported data. Furthermore, the 2008 estimates exhibit an underestimation bias that increases with increasing fuel economy, suggesting that the new numbers will tend to underestimate the real-world benefits of fuel economy and emissions standards. By including several simple driver and vehicle attributes, the Individualized Model reduces the unexplained variance by over 55% and the standard error by 33% based on an independent test sample. The additional explanatory variables can be easily provided by the individuals.« less

  2. A Java-based fMRI processing pipeline evaluation system for assessment of univariate general linear model and multivariate canonical variate analysis-based pipelines.

    PubMed

    Zhang, Jing; Liang, Lichen; Anderson, Jon R; Gatewood, Lael; Rottenberg, David A; Strother, Stephen C

    2008-01-01

    As functional magnetic resonance imaging (fMRI) becomes widely used, the demands for evaluation of fMRI processing pipelines and validation of fMRI analysis results is increasing rapidly. The current NPAIRS package, an IDL-based fMRI processing pipeline evaluation framework, lacks system interoperability and the ability to evaluate general linear model (GLM)-based pipelines using prediction metrics. Thus, it can not fully evaluate fMRI analytical software modules such as FSL.FEAT and NPAIRS.GLM. In order to overcome these limitations, a Java-based fMRI processing pipeline evaluation system was developed. It integrated YALE (a machine learning environment) into Fiswidgets (a fMRI software environment) to obtain system interoperability and applied an algorithm to measure GLM prediction accuracy. The results demonstrated that the system can evaluate fMRI processing pipelines with univariate GLM and multivariate canonical variates analysis (CVA)-based models on real fMRI data based on prediction accuracy (classification accuracy) and statistical parametric image (SPI) reproducibility. In addition, a preliminary study was performed where four fMRI processing pipelines with GLM and CVA modules such as FSL.FEAT and NPAIRS.CVA were evaluated with the system. The results indicated that (1) the system can compare different fMRI processing pipelines with heterogeneous models (NPAIRS.GLM, NPAIRS.CVA and FSL.FEAT) and rank their performance by automatic performance scoring, and (2) the rank of pipeline performance is highly dependent on the preprocessing operations. These results suggest that the system will be of value for the comparison, validation, standardization and optimization of functional neuroimaging software packages and fMRI processing pipelines.

  3. Improved Short-Term Clock Prediction Method for Real-Time Positioning.

    PubMed

    Lv, Yifei; Dai, Zhiqiang; Zhao, Qile; Yang, Sheng; Zhou, Jinning; Liu, Jingnan

    2017-06-06

    The application of real-time precise point positioning (PPP) requires real-time precise orbit and clock products that should be predicted within a short time to compensate for the communication delay or data gap. Unlike orbit correction, clock correction is difficult to model and predict. The widely used linear model hardly fits long periodic trends with a small data set and exhibits significant accuracy degradation in real-time prediction when a large data set is used. This study proposes a new prediction model for maintaining short-term satellite clocks to meet the high-precision requirements of real-time clocks and provide clock extrapolation without interrupting the real-time data stream. Fast Fourier transform (FFT) is used to analyze the linear prediction residuals of real-time clocks. The periodic terms obtained through FFT are adopted in the sliding window prediction to achieve a significant improvement in short-term prediction accuracy. This study also analyzes and compares the accuracy of short-term forecasts (less than 3 h) by using different length observations. Experimental results obtained from International GNSS Service (IGS) final products and our own real-time clocks show that the 3-h prediction accuracy is better than 0.85 ns. The new model can replace IGS ultra-rapid products in the application of real-time PPP. It is also found that there is a positive correlation between the prediction accuracy and the short-term stability of on-board clocks. Compared with the accuracy of the traditional linear model, the accuracy of the static PPP using the new model of the 2-h prediction clock in N, E, and U directions is improved by about 50%. Furthermore, the static PPP accuracy of 2-h clock products is better than 0.1 m. When an interruption occurs in the real-time model, the accuracy of the kinematic PPP solution using 1-h clock prediction product is better than 0.2 m, without significant accuracy degradation. This model is of practical significance because it solves the problems of interruption and delay in data broadcast in real-time clock estimation and can meet the requirements of real-time PPP.

  4. Genomic-Enabled Prediction in Maize Using Kernel Models with Genotype × Environment Interaction

    PubMed Central

    Bandeira e Sousa, Massaine; Cuevas, Jaime; de Oliveira Couto, Evellyn Giselly; Pérez-Rodríguez, Paulino; Jarquín, Diego; Fritsche-Neto, Roberto; Burgueño, Juan; Crossa, Jose

    2017-01-01

    Multi-environment trials are routinely conducted in plant breeding to select candidates for the next selection cycle. In this study, we compare the prediction accuracy of four developed genomic-enabled prediction models: (1) single-environment, main genotypic effect model (SM); (2) multi-environment, main genotypic effects model (MM); (3) multi-environment, single variance G×E deviation model (MDs); and (4) multi-environment, environment-specific variance G×E deviation model (MDe). Each of these four models were fitted using two kernel methods: a linear kernel Genomic Best Linear Unbiased Predictor, GBLUP (GB), and a nonlinear kernel Gaussian kernel (GK). The eight model-method combinations were applied to two extensive Brazilian maize data sets (HEL and USP data sets), having different numbers of maize hybrids evaluated in different environments for grain yield (GY), plant height (PH), and ear height (EH). Results show that the MDe and the MDs models fitted with the Gaussian kernel (MDe-GK, and MDs-GK) had the highest prediction accuracy. For GY in the HEL data set, the increase in prediction accuracy of SM-GK over SM-GB ranged from 9 to 32%. For the MM, MDs, and MDe models, the increase in prediction accuracy of GK over GB ranged from 9 to 49%. For GY in the USP data set, the increase in prediction accuracy of SM-GK over SM-GB ranged from 0 to 7%. For the MM, MDs, and MDe models, the increase in prediction accuracy of GK over GB ranged from 34 to 70%. For traits PH and EH, gains in prediction accuracy of models with GK compared to models with GB were smaller than those achieved in GY. Also, these gains in prediction accuracy decreased when a more difficult prediction problem was studied. PMID:28455415

  5. Genomic-Enabled Prediction in Maize Using Kernel Models with Genotype × Environment Interaction.

    PubMed

    Bandeira E Sousa, Massaine; Cuevas, Jaime; de Oliveira Couto, Evellyn Giselly; Pérez-Rodríguez, Paulino; Jarquín, Diego; Fritsche-Neto, Roberto; Burgueño, Juan; Crossa, Jose

    2017-06-07

    Multi-environment trials are routinely conducted in plant breeding to select candidates for the next selection cycle. In this study, we compare the prediction accuracy of four developed genomic-enabled prediction models: (1) single-environment, main genotypic effect model (SM); (2) multi-environment, main genotypic effects model (MM); (3) multi-environment, single variance G×E deviation model (MDs); and (4) multi-environment, environment-specific variance G×E deviation model (MDe). Each of these four models were fitted using two kernel methods: a linear kernel Genomic Best Linear Unbiased Predictor, GBLUP (GB), and a nonlinear kernel Gaussian kernel (GK). The eight model-method combinations were applied to two extensive Brazilian maize data sets (HEL and USP data sets), having different numbers of maize hybrids evaluated in different environments for grain yield (GY), plant height (PH), and ear height (EH). Results show that the MDe and the MDs models fitted with the Gaussian kernel (MDe-GK, and MDs-GK) had the highest prediction accuracy. For GY in the HEL data set, the increase in prediction accuracy of SM-GK over SM-GB ranged from 9 to 32%. For the MM, MDs, and MDe models, the increase in prediction accuracy of GK over GB ranged from 9 to 49%. For GY in the USP data set, the increase in prediction accuracy of SM-GK over SM-GB ranged from 0 to 7%. For the MM, MDs, and MDe models, the increase in prediction accuracy of GK over GB ranged from 34 to 70%. For traits PH and EH, gains in prediction accuracy of models with GK compared to models with GB were smaller than those achieved in GY. Also, these gains in prediction accuracy decreased when a more difficult prediction problem was studied. Copyright © 2017 Bandeira e Sousa et al.

  6. Use Of Clinical Decision Analysis In Predicting The Efficacy Of Newer Radiological Imaging Modalities: Radioscintigraphy Versus Single Photon Transverse Section Emission Computed Tomography

    NASA Astrophysics Data System (ADS)

    Prince, John R.

    1982-12-01

    Sensitivity, specificity, and predictive accuracy have been shown to be useful measures of the clinical efficacy of diagnostic tests and can be used to predict the potential improvement in diagnostic certitude resulting from the introduction of a competing technology. This communication demonstrates how the informal use of clinical decision analysis may guide health planners in the allocation of resources, purchasing decisions, and implementation of high technology. For didactic purposes the focus is on a comparison between conventional planar radioscintigraphy (RS) and single photon transverse section emission conputed tomography (SPECT). For example, positive predictive accuracy (PPA) for brain RS in a specialist hospital with a 50% disease prevalance is about 95%. SPECT should increase this predicted accuracy to 96%. In a primary care hospital with only a 15% disease prevalance the PPA is only 77% and SPECT may increase this accuracy to about 79%. Similar calculations based on published data show that marginal improvements are expected with SPECT in the liver. It is concluded that: a) The decision to purchase a high technology imaging modality such as SPECT for clinical purposes should be analyzed on an individual organ system and institutional basis. High technology may be justified in specialist hospitals but not necessarily in primary care hospitals. This is more dependent on disease prevalance than procedure volume; b) It is questionable whether SPECT imaging will be competitive with standard RS procedures. Research should concentrate on the development of different medical applications.

  7. ToxiM: A Toxicity Prediction Tool for Small Molecules Developed Using Machine Learning and Chemoinformatics Approaches.

    PubMed

    Sharma, Ashok K; Srivastava, Gopal N; Roy, Ankita; Sharma, Vineet K

    2017-01-01

    The experimental methods for the prediction of molecular toxicity are tedious and time-consuming tasks. Thus, the computational approaches could be used to develop alternative methods for toxicity prediction. We have developed a tool for the prediction of molecular toxicity along with the aqueous solubility and permeability of any molecule/metabolite. Using a comprehensive and curated set of toxin molecules as a training set, the different chemical and structural based features such as descriptors and fingerprints were exploited for feature selection, optimization and development of machine learning based classification and regression models. The compositional differences in the distribution of atoms were apparent between toxins and non-toxins, and hence, the molecular features were used for the classification and regression. On 10-fold cross-validation, the descriptor-based, fingerprint-based and hybrid-based classification models showed similar accuracy (93%) and Matthews's correlation coefficient (0.84). The performances of all the three models were comparable (Matthews's correlation coefficient = 0.84-0.87) on the blind dataset. In addition, the regression-based models using descriptors as input features were also compared and evaluated on the blind dataset. Random forest based regression model for the prediction of solubility performed better ( R 2 = 0.84) than the multi-linear regression (MLR) and partial least square regression (PLSR) models, whereas, the partial least squares based regression model for the prediction of permeability (caco-2) performed better ( R 2 = 0.68) in comparison to the random forest and MLR based regression models. The performance of final classification and regression models was evaluated using the two validation datasets including the known toxins and commonly used constituents of health products, which attests to its accuracy. The ToxiM web server would be a highly useful and reliable tool for the prediction of toxicity, solubility, and permeability of small molecules.

  8. ToxiM: A Toxicity Prediction Tool for Small Molecules Developed Using Machine Learning and Chemoinformatics Approaches

    PubMed Central

    Sharma, Ashok K.; Srivastava, Gopal N.; Roy, Ankita; Sharma, Vineet K.

    2017-01-01

    The experimental methods for the prediction of molecular toxicity are tedious and time-consuming tasks. Thus, the computational approaches could be used to develop alternative methods for toxicity prediction. We have developed a tool for the prediction of molecular toxicity along with the aqueous solubility and permeability of any molecule/metabolite. Using a comprehensive and curated set of toxin molecules as a training set, the different chemical and structural based features such as descriptors and fingerprints were exploited for feature selection, optimization and development of machine learning based classification and regression models. The compositional differences in the distribution of atoms were apparent between toxins and non-toxins, and hence, the molecular features were used for the classification and regression. On 10-fold cross-validation, the descriptor-based, fingerprint-based and hybrid-based classification models showed similar accuracy (93%) and Matthews's correlation coefficient (0.84). The performances of all the three models were comparable (Matthews's correlation coefficient = 0.84–0.87) on the blind dataset. In addition, the regression-based models using descriptors as input features were also compared and evaluated on the blind dataset. Random forest based regression model for the prediction of solubility performed better (R2 = 0.84) than the multi-linear regression (MLR) and partial least square regression (PLSR) models, whereas, the partial least squares based regression model for the prediction of permeability (caco-2) performed better (R2 = 0.68) in comparison to the random forest and MLR based regression models. The performance of final classification and regression models was evaluated using the two validation datasets including the known toxins and commonly used constituents of health products, which attests to its accuracy. The ToxiM web server would be a highly useful and reliable tool for the prediction of toxicity, solubility, and permeability of small molecules. PMID:29249969

  9. Modified linear predictive coding approach for moving target tracking by Doppler radar

    NASA Astrophysics Data System (ADS)

    Ding, Yipeng; Lin, Xiaoyi; Sun, Ke-Hui; Xu, Xue-Mei; Liu, Xi-Yao

    2016-07-01

    Doppler radar is a cost-effective tool for moving target tracking, which can support a large range of civilian and military applications. A modified linear predictive coding (LPC) approach is proposed to increase the target localization accuracy of the Doppler radar. Based on the time-frequency analysis of the received echo, the proposed approach first real-time estimates the noise statistical parameters and constructs an adaptive filter to intelligently suppress the noise interference. Then, a linear predictive model is applied to extend the available data, which can help improve the resolution of the target localization result. Compared with the traditional LPC method, which empirically decides the extension data length, the proposed approach develops an error array to evaluate the prediction accuracy and thus, adjust the optimum extension data length intelligently. Finally, the prediction error array is superimposed with the predictor output to correct the prediction error. A series of experiments are conducted to illustrate the validity and performance of the proposed techniques.

  10. How long will my mouse live? Machine learning approaches for prediction of mouse life span.

    PubMed

    Swindell, William R; Harper, James M; Miller, Richard A

    2008-09-01

    Prediction of individual life span based on characteristics evaluated at middle-age represents a challenging objective for aging research. In this study, we used machine learning algorithms to construct models that predict life span in a stock of genetically heterogeneous mice. Life-span prediction accuracy of 22 algorithms was evaluated using a cross-validation approach, in which models were trained and tested with distinct subsets of data. Using a combination of body weight and T-cell subset measures evaluated before 2 years of age, we show that the life-span quartile to which an individual mouse belongs can be predicted with an accuracy of 35.3% (+/-0.10%). This result provides a new benchmark for the development of life-span-predictive models, but improvement can be expected through identification of new predictor variables and development of computational approaches. Future work in this direction can provide tools for aging research and will shed light on associations between phenotypic traits and longevity.

  11. On Predictive Understanding of Extreme Events: Pattern Recognition Approach; Prediction Algorithms; Applications to Disaster Preparedness

    NASA Astrophysics Data System (ADS)

    Keilis-Borok, V. I.; Soloviev, A.; Gabrielov, A.

    2011-12-01

    We describe a uniform approach to predicting different extreme events, also known as critical phenomena, disasters, or crises. The following types of such events are considered: strong earthquakes; economic recessions (their onset and termination); surges of unemployment; surges of crime; and electoral changes of the governing party. A uniform approach is possible due to the common feature of these events: each of them is generated by a certain hierarchical dissipative complex system. After a coarse-graining, such systems exhibit regular behavior patterns; we look among them for "premonitory patterns" that signal the approach of an extreme event. We introduce methodology, based on the optimal control theory, assisting disaster management in choosing optimal set of disaster preparedness measures undertaken in response to a prediction. Predictions with their currently realistic (limited) accuracy do allow preventing a considerable part of the damage by a hierarchy of preparedness measures. Accuracy of prediction should be known, but not necessarily high.

  12. Assessing Participation in Community-Based Physical Activity Programs in Brazil

    PubMed Central

    REIS, RODRIGO S.; YAN, YAN; PARRA, DIANA C.; BROWNSON, ROSS C.

    2015-01-01

    Purpose This study aimed to develop and validate a risk prediction model to examine the characteristics that are associated with participation in community-based physical activity programs in Brazil. Methods We used pooled data from three surveys conducted from 2007 to 2009 in state capitals of Brazil with 6166 adults. A risk prediction model was built considering program participation as an outcome. The predictive accuracy of the model was quantified through discrimination (C statistic) and calibration (Brier score) properties. Bootstrapping methods were used to validate the predictive accuracy of the final model. Results The final model showed sex (women: odds ratio [OR] = 3.18, 95% confidence interval [CI] = 2.14–4.71), having less than high school degree (OR = 1.71, 95% CI = 1.16–2.53), reporting a good health (OR = 1.58, 95% CI = 1.02–2.24) or very good/excellent health (OR = 1.62, 95% CI = 1.05–2.51), having any comorbidity (OR = 1.74, 95% CI = 1.26–2.39), and perceiving the environment as safe to walk at night (OR = 1.59, 95% CI = 1.18–2.15) as predictors of participation in physical activity programs. Accuracy indices were adequate (C index = 0.778, Brier score = 0.031) and similar to those obtained from bootstrapping (C index = 0.792, Brier score = 0.030). Conclusions Sociodemographic and health characteristics as well as perceptions of the environment are strong predictors of participation in community-based programs in selected cities of Brazil. PMID:23846162

  13. EGASP: the human ENCODE Genome Annotation Assessment Project

    PubMed Central

    Guigó, Roderic; Flicek, Paul; Abril, Josep F; Reymond, Alexandre; Lagarde, Julien; Denoeud, France; Antonarakis, Stylianos; Ashburner, Michael; Bajic, Vladimir B; Birney, Ewan; Castelo, Robert; Eyras, Eduardo; Ucla, Catherine; Gingeras, Thomas R; Harrow, Jennifer; Hubbard, Tim; Lewis, Suzanna E; Reese, Martin G

    2006-01-01

    Background We present the results of EGASP, a community experiment to assess the state-of-the-art in genome annotation within the ENCODE regions, which span 1% of the human genome sequence. The experiment had two major goals: the assessment of the accuracy of computational methods to predict protein coding genes; and the overall assessment of the completeness of the current human genome annotations as represented in the ENCODE regions. For the computational prediction assessment, eighteen groups contributed gene predictions. We evaluated these submissions against each other based on a 'reference set' of annotations generated as part of the GENCODE project. These annotations were not available to the prediction groups prior to the submission deadline, so that their predictions were blind and an external advisory committee could perform a fair assessment. Results The best methods had at least one gene transcript correctly predicted for close to 70% of the annotated genes. Nevertheless, the multiple transcript accuracy, taking into account alternative splicing, reached only approximately 40% to 50% accuracy. At the coding nucleotide level, the best programs reached an accuracy of 90% in both sensitivity and specificity. Programs relying on mRNA and protein sequences were the most accurate in reproducing the manually curated annotations. Experimental validation shows that only a very small percentage (3.2%) of the selected 221 computationally predicted exons outside of the existing annotation could be verified. Conclusion This is the first such experiment in human DNA, and we have followed the standards established in a similar experiment, GASP1, in Drosophila melanogaster. We believe the results presented here contribute to the value of ongoing large-scale annotation projects and should guide further experimental methods when being scaled up to the entire human genome sequence. PMID:16925836

  14. Outcome Prediction in Mathematical Models of Immune Response to Infection.

    PubMed

    Mai, Manuel; Wang, Kun; Huber, Greg; Kirby, Michael; Shattuck, Mark D; O'Hern, Corey S

    2015-01-01

    Clinicians need to predict patient outcomes with high accuracy as early as possible after disease inception. In this manuscript, we show that patient-to-patient variability sets a fundamental limit on outcome prediction accuracy for a general class of mathematical models for the immune response to infection. However, accuracy can be increased at the expense of delayed prognosis. We investigate several systems of ordinary differential equations (ODEs) that model the host immune response to a pathogen load. Advantages of systems of ODEs for investigating the immune response to infection include the ability to collect data on large numbers of 'virtual patients', each with a given set of model parameters, and obtain many time points during the course of the infection. We implement patient-to-patient variability v in the ODE models by randomly selecting the model parameters from distributions with coefficients of variation v that are centered on physiological values. We use logistic regression with one-versus-all classification to predict the discrete steady-state outcomes of the system. We find that the prediction algorithm achieves near 100% accuracy for v = 0, and the accuracy decreases with increasing v for all ODE models studied. The fact that multiple steady-state outcomes can be obtained for a given initial condition, i.e. the basins of attraction overlap in the space of initial conditions, limits the prediction accuracy for v > 0. Increasing the elapsed time of the variables used to train and test the classifier, increases the prediction accuracy, while adding explicit external noise to the ODE models decreases the prediction accuracy. Our results quantify the competition between early prognosis and high prediction accuracy that is frequently encountered by clinicians.

  15. A function accounting for training set size and marker density to model the average accuracy of genomic prediction.

    PubMed

    Erbe, Malena; Gredler, Birgit; Seefried, Franz Reinhold; Bapst, Beat; Simianer, Henner

    2013-01-01

    Prediction of genomic breeding values is of major practical relevance in dairy cattle breeding. Deterministic equations have been suggested to predict the accuracy of genomic breeding values in a given design which are based on training set size, reliability of phenotypes, and the number of independent chromosome segments ([Formula: see text]). The aim of our study was to find a general deterministic equation for the average accuracy of genomic breeding values that also accounts for marker density and can be fitted empirically. Two data sets of 5'698 Holstein Friesian bulls genotyped with 50 K SNPs and 1'332 Brown Swiss bulls genotyped with 50 K SNPs and imputed to ∼600 K SNPs were available. Different k-fold (k = 2-10, 15, 20) cross-validation scenarios (50 replicates, random assignment) were performed using a genomic BLUP approach. A maximum likelihood approach was used to estimate the parameters of different prediction equations. The highest likelihood was obtained when using a modified form of the deterministic equation of Daetwyler et al. (2010), augmented by a weighting factor (w) based on the assumption that the maximum achievable accuracy is [Formula: see text]. The proportion of genetic variance captured by the complete SNP sets ([Formula: see text]) was 0.76 to 0.82 for Holstein Friesian and 0.72 to 0.75 for Brown Swiss. When modifying the number of SNPs, w was found to be proportional to the log of the marker density up to a limit which is population and trait specific and was found to be reached with ∼20'000 SNPs in the Brown Swiss population studied.

  16. Accuracy of the Broselow Tape in South Sudan, "The Hungriest Place on Earth".

    PubMed

    Clark, Melissa C; Lewis, Roger J; Fleischman, Ross J; Ogunniyi, Adedamola A; Patel, Dipesh S; Donaldson, Ross I

    2016-01-01

    The Broselow tape is a length-based tool used for the rapid estimation of pediatric weight and was developed to reduce dosage-related errors during emergencies. This study seeks to assess the accuracy of the Broselow tape and age-based formulas in predicting weights of South Sudanese children of varying nutritional status. This was a retrospective, cross-sectional study using data from existing acute malnutrition screening programs for children less than 5 years of age in South Sudan. Using anthropometric measurements, actual weights were compared with estimated weights from the Broselow tape and three age-based formulas. Mid-upper arm circumference was used to determine if each child was malnourished. Broselow accuracy was assessed by the percentage of measured weights falling into the same color zone as the predicted weight. For each method, accuracy was assessed by mean percentage error and percentage of predicted weights falling within 10% of actual weight. All data were analyzed by nutritional status subgroup. Only 10.7% of malnourished and 26.6% of nonmalnourished children had their actual weight fall within the Broselow color zone corresponding to their length. The Broselow method overestimated weight by a mean of 26.6% in malnourished children and 16.6% in nonmalnourished children (p < 0.001). Age-based formulas also overestimated weight, with mean errors ranging from 16.2% over actual weight (Advanced Pediatric Life Support in nonmalnourished children) to 70.9% over actual (Best Guess in severely malnourished children). The Broselow tape and age-based formulas selected for comparison were all markedly inaccurate in both the nonmalnourished and the malnourished populations studied, worsening with increasing malnourishment. Additional studies should explore appropriate methods of weight and dosage estimation for populations of low- and low-to-middle-income countries and regions with a high prevalence of malnutrition. © 2015 by the Society for Academic Emergency Medicine.

  17. Adjusted Clinical Groups: Predictive Accuracy for Medicaid Enrollees in Three States

    PubMed Central

    Adams, E. Kathleen; Bronstein, Janet M.; Raskind-Hood, Cheryl

    2002-01-01

    Actuarial split-sample methods were used to assess predictive accuracy of adjusted clinical groups (ACGs) for Medicaid enrollees in Georgia, Mississippi (lagging in managed care penetration), and California. Accuracy for two non-random groups—high-cost and located in urban poor areas—was assessed. Measures for random groups were derived with and without short-term enrollees to assess the effect of turnover on predictive accuracy. ACGs improved predictive accuracy for high-cost conditions in all States, but did so only for those in Georgia's poorest urban areas. Higher and more unpredictable expenses of short-term enrollees moderated the predictive power of ACGs. This limitation was significant in Mississippi due in part, to that State's very high proportion of short-term enrollees. PMID:12545598

  18. StruLocPred: structure-based protein subcellular localisation prediction using multi-class support vector machine.

    PubMed

    Zhou, Wengang; Dickerson, Julie A

    2012-01-01

    Knowledge of protein subcellular locations can help decipher a protein's biological function. This work proposes new features: sequence-based: Hybrid Amino Acid Pair (HAAP) and two structure-based: Secondary Structural Element Composition (SSEC) and solvent accessibility state frequency. A multi-class Support Vector Machine is developed to predict the locations. Testing on two established data sets yields better prediction accuracies than the best available systems. Comparisons with existing methods show comparable results to ESLPred2. When StruLocPred is applied to the entire Arabidopsis proteome, over 77% of proteins with known locations match the prediction results. An implementation of this system is at http://wgzhou.ece. iastate.edu/StruLocPred/.

  19. Predicting missing links in complex networks based on common neighbors and distance

    PubMed Central

    Yang, Jinxuan; Zhang, Xiao-Dong

    2016-01-01

    The algorithms based on common neighbors metric to predict missing links in complex networks are very popular, but most of these algorithms do not account for missing links between nodes with no common neighbors. It is not accurate enough to reconstruct networks by using these methods in some cases especially when between nodes have less common neighbors. We proposed in this paper a new algorithm based on common neighbors and distance to improve accuracy of link prediction. Our proposed algorithm makes remarkable effect in predicting the missing links between nodes with no common neighbors and performs better than most existing currently used methods for a variety of real-world networks without increasing complexity. PMID:27905526

  20. Assessing the Effectiveness of Statistical Classification Techniques in Predicting Future Employment of Participants in the Temporary Assistance for Needy Families Program

    ERIC Educational Resources Information Center

    Montoya, Isaac D.

    2008-01-01

    Three classification techniques (Chi-square Automatic Interaction Detection [CHAID], Classification and Regression Tree [CART], and discriminant analysis) were tested to determine their accuracy in predicting Temporary Assistance for Needy Families program recipients' future employment. Technique evaluation was based on proportion of correctly…

  1. DEVELOPMENT OF QUANTITATIVE STRUCTURE ACTIVITY RELATIONSHIPS (QSARS) TO PREDICT TOXICITY FOR A VARIETY OF HUMAN AND ECOLOGICAL ENDPOINTS

    EPA Science Inventory

    In general, the accuracy of a predicted toxicity value increases with increase in similarity between the query chemical and the chemicals used to develop a QSAR model. A toxicity estimation methodology employing this finding has been developed. A hierarchical based clustering t...

  2. How SNP chips will advance our knowledge of factors controlling puberty and aid in selecting replacement females

    USDA-ARS?s Scientific Manuscript database

    The promise of genomic selection is that genetic potential can be accurately predicted from genotypes. Simple deoxyribonucleic acid (DNA) tests might replace low accuracy predictions based on performance and pedigree for expensive or lowly heritable measures of puberty and fertility. The promise i...

  3. How single nucleotide polymorphism chips will advance our knowledge of factors controlling puberty and aid in selecting replacement beef females

    USDA-ARS?s Scientific Manuscript database

    The promise of genomic selection is accurate prediction of animals' genetic potential from their genotypes. Simple DNA tests might replace low accuracy predictions for expensive or lowly heritable measures of puberty and fertility based on performance and pedigree. Knowing which DNA variants affec...

  4. Fatigue Strength Prediction for Titanium Alloy TiAl6V4 Manufactured by Selective Laser Melting

    NASA Astrophysics Data System (ADS)

    Leuders, Stefan; Vollmer, Malte; Brenne, Florian; Tröster, Thomas; Niendorf, Thomas

    2015-09-01

    Selective laser melting (SLM), as a metalworking additive manufacturing technique, received considerable attention from industry and academia due to unprecedented design freedom and overall balanced material properties. However, the fatigue behavior of SLM-processed materials often suffers from local imperfections such as micron-sized pores. In order to enable robust designs of SLM components used in an industrial environment, further research regarding process-induced porosity and its impact on the fatigue behavior is required. Hence, this study aims at a transfer of fatigue prediction models, established for conventional process-routes, to the field of SLM materials. By using high-resolution computed tomography, load increase tests, and electron microscopy, it is shown that pore-based fatigue strength predictions for a titanium alloy TiAl6V4 have become feasible. However, the obtained accuracies are subjected to scatter, which is probably caused by the high defect density even present in SLM materials manufactured following optimized processing routes. Based on thorough examination of crack surfaces and crack initiation sites, respectively, implications for optimization of prediction accuracy of the models in focus are deduced.

  5. High variation subarctic topsoil pollutant concentration prediction using neural network residual kriging

    NASA Astrophysics Data System (ADS)

    Sergeev, A. P.; Tarasov, D. A.; Buevich, A. G.; Subbotina, I. E.; Shichkin, A. V.; Sergeeva, M. V.; Lvova, O. A.

    2017-06-01

    The work deals with the application of neural networks residual kriging (NNRK) to the spatial prediction of the abnormally distributed soil pollutant (Cr). It is known that combination of geostatistical interpolation approaches (kriging) and neural networks leads to significantly better prediction accuracy and productivity. Generalized regression neural networks and multilayer perceptrons are classes of neural networks widely used for the continuous function mapping. Each network has its own pros and cons; however both demonstrated fast training and good mapping possibilities. In the work, we examined and compared two combined techniques: generalized regression neural network residual kriging (GRNNRK) and multilayer perceptron residual kriging (MLPRK). The case study is based on the real data sets on surface contamination by chromium at a particular location of the subarctic Novy Urengoy, Russia, obtained during the previously conducted screening. The proposed models have been built, implemented and validated using ArcGIS and MATLAB environments. The networks structures have been chosen during a computer simulation based on the minimization of the RMSE. MLRPK showed the best predictive accuracy comparing to the geostatistical approach (kriging) and even to GRNNRK.

  6. Combining Physicochemical and Evolutionary Information for Protein Contact Prediction

    PubMed Central

    Schneider, Michael; Brock, Oliver

    2014-01-01

    We introduce a novel contact prediction method that achieves high prediction accuracy by combining evolutionary and physicochemical information about native contacts. We obtain evolutionary information from multiple-sequence alignments and physicochemical information from predicted ab initio protein structures. These structures represent low-energy states in an energy landscape and thus capture the physicochemical information encoded in the energy function. Such low-energy structures are likely to contain native contacts, even if their overall fold is not native. To differentiate native from non-native contacts in those structures, we develop a graph-based representation of the structural context of contacts. We then use this representation to train an support vector machine classifier to identify most likely native contacts in otherwise non-native structures. The resulting contact predictions are highly accurate. As a result of combining two sources of information—evolutionary and physicochemical—we maintain prediction accuracy even when only few sequence homologs are present. We show that the predicted contacts help to improve ab initio structure prediction. A web service is available at http://compbio.robotics.tu-berlin.de/epc-map/. PMID:25338092

  7. A Hybrid Short-Term Traffic Flow Prediction Model Based on Singular Spectrum Analysis and Kernel Extreme Learning Machine.

    PubMed

    Shang, Qiang; Lin, Ciyun; Yang, Zhaosheng; Bing, Qichun; Zhou, Xiyang

    2016-01-01

    Short-term traffic flow prediction is one of the most important issues in the field of intelligent transport system (ITS). Because of the uncertainty and nonlinearity, short-term traffic flow prediction is a challenging task. In order to improve the accuracy of short-time traffic flow prediction, a hybrid model (SSA-KELM) is proposed based on singular spectrum analysis (SSA) and kernel extreme learning machine (KELM). SSA is used to filter out the noise of traffic flow time series. Then, the filtered traffic flow data is used to train KELM model, the optimal input form of the proposed model is determined by phase space reconstruction, and parameters of the model are optimized by gravitational search algorithm (GSA). Finally, case validation is carried out using the measured data of an expressway in Xiamen, China. And the SSA-KELM model is compared with several well-known prediction models, including support vector machine, extreme learning machine, and single KLEM model. The experimental results demonstrate that performance of the proposed model is superior to that of the comparison models. Apart from accuracy improvement, the proposed model is more robust.

  8. Estimation of Power Consumption in the Circular Sawing of Stone Based on Tangential Force Distribution

    NASA Astrophysics Data System (ADS)

    Huang, Guoqin; Zhang, Meiqin; Huang, Hui; Guo, Hua; Xu, Xipeng

    2018-04-01

    Circular sawing is an important method for the processing of natural stone. The ability to predict sawing power is important in the optimisation, monitoring and control of the sawing process. In this paper, a predictive model (PFD) of sawing power, which is based on the tangential force distribution at the sawing contact zone, was proposed, experimentally validated and modified. With regard to the influence of sawing speed on tangential force distribution, the modified PFD (MPFD) performed with high predictive accuracy across a wide range of sawing parameters, including sawing speed. The mean maximum absolute error rate was within 6.78%, and the maximum absolute error rate was within 11.7%. The practicability of predicting sawing power by the MPFD with few initial experimental samples was proved in case studies. On the premise of high sample measurement accuracy, only two samples are required for a fixed sawing speed. The feasibility of applying the MPFD to optimise sawing parameters while lowering the energy consumption of the sawing system was validated. The case study shows that energy use was reduced 28% by optimising the sawing parameters. The MPFD model can be used to predict sawing power, optimise sawing parameters and control energy.

  9. A Hybrid Short-Term Traffic Flow Prediction Model Based on Singular Spectrum Analysis and Kernel Extreme Learning Machine

    PubMed Central

    Lin, Ciyun; Yang, Zhaosheng; Bing, Qichun; Zhou, Xiyang

    2016-01-01

    Short-term traffic flow prediction is one of the most important issues in the field of intelligent transport system (ITS). Because of the uncertainty and nonlinearity, short-term traffic flow prediction is a challenging task. In order to improve the accuracy of short-time traffic flow prediction, a hybrid model (SSA-KELM) is proposed based on singular spectrum analysis (SSA) and kernel extreme learning machine (KELM). SSA is used to filter out the noise of traffic flow time series. Then, the filtered traffic flow data is used to train KELM model, the optimal input form of the proposed model is determined by phase space reconstruction, and parameters of the model are optimized by gravitational search algorithm (GSA). Finally, case validation is carried out using the measured data of an expressway in Xiamen, China. And the SSA-KELM model is compared with several well-known prediction models, including support vector machine, extreme learning machine, and single KLEM model. The experimental results demonstrate that performance of the proposed model is superior to that of the comparison models. Apart from accuracy improvement, the proposed model is more robust. PMID:27551829

  10. Predicting the Types of Ion Channel-Targeted Conotoxins Based on AVC-SVM Model.

    PubMed

    Xianfang, Wang; Junmei, Wang; Xiaolei, Wang; Yue, Zhang

    2017-01-01

    The conotoxin proteins are disulfide-rich small peptides. Predicting the types of ion channel-targeted conotoxins has great value in the treatment of chronic diseases, epilepsy, and cardiovascular diseases. To solve the problem of information redundancy existing when using current methods, a new model is presented to predict the types of ion channel-targeted conotoxins based on AVC (Analysis of Variance and Correlation) and SVM (Support Vector Machine). First, the F value is used to measure the significance level of the feature for the result, and the attribute with smaller F value is filtered by rough selection. Secondly, redundancy degree is calculated by Pearson Correlation Coefficient. And the threshold is set to filter attributes with weak independence to get the result of the refinement. Finally, SVM is used to predict the types of ion channel-targeted conotoxins. The experimental results show the proposed AVC-SVM model reaches an overall accuracy of 91.98%, an average accuracy of 92.17%, and the total number of parameters of 68. The proposed model provides highly useful information for further experimental research. The prediction model will be accessed free of charge at our web server.

  11. Predicting the Types of Ion Channel-Targeted Conotoxins Based on AVC-SVM Model

    PubMed Central

    Xiaolei, Wang

    2017-01-01

    The conotoxin proteins are disulfide-rich small peptides. Predicting the types of ion channel-targeted conotoxins has great value in the treatment of chronic diseases, epilepsy, and cardiovascular diseases. To solve the problem of information redundancy existing when using current methods, a new model is presented to predict the types of ion channel-targeted conotoxins based on AVC (Analysis of Variance and Correlation) and SVM (Support Vector Machine). First, the F value is used to measure the significance level of the feature for the result, and the attribute with smaller F value is filtered by rough selection. Secondly, redundancy degree is calculated by Pearson Correlation Coefficient. And the threshold is set to filter attributes with weak independence to get the result of the refinement. Finally, SVM is used to predict the types of ion channel-targeted conotoxins. The experimental results show the proposed AVC-SVM model reaches an overall accuracy of 91.98%, an average accuracy of 92.17%, and the total number of parameters of 68. The proposed model provides highly useful information for further experimental research. The prediction model will be accessed free of charge at our web server. PMID:28497044

  12. Study on the medical meteorological forecast of the number of hypertension inpatient based on SVR

    NASA Astrophysics Data System (ADS)

    Zhai, Guangyu; Chai, Guorong; Zhang, Haifeng

    2017-06-01

    The purpose of this study is to build a hypertension prediction model by discussing the meteorological factors for hypertension incidence. The research method is selecting the standard data of relative humidity, air temperature, visibility, wind speed and air pressure of Lanzhou from 2010 to 2012(calculating the maximum, minimum and average value with 5 days as a unit ) as the input variables of Support Vector Regression(SVR) and the standard data of hypertension incidence of the same period as the output dependent variables to obtain the optimal prediction parameters by cross validation algorithm, then by SVR algorithm learning and training, a SVR forecast model for hypertension incidence is built. The result shows that the hypertension prediction model is composed of 15 input independent variables, the training accuracy is 0.005, the final error is 0.0026389. The forecast accuracy based on SVR model is 97.1429%, which is higher than statistical forecast equation and neural network prediction method. It is concluded that SVR model provides a new method for hypertension prediction with its simple calculation, small error as well as higher historical sample fitting and Independent sample forecast capability.

  13. Flight Evaluation of Center-TRACON Automation System Trajectory Prediction Process

    NASA Technical Reports Server (NTRS)

    Williams, David H.; Green, Steven M.

    1998-01-01

    Two flight experiments (Phase 1 in October 1992 and Phase 2 in September 1994) were conducted to evaluate the accuracy of the Center-TRACON Automation System (CTAS) trajectory prediction process. The Transport Systems Research Vehicle (TSRV) Boeing 737 based at Langley Research Center flew 57 arrival trajectories that included cruise and descent segments; at the same time, descent clearance advisories from CTAS were followed. Actual trajectories of the airplane were compared with the trajectories predicted by the CTAS trajectory synthesis algorithms and airplane Flight Management System (FMS). Trajectory prediction accuracy was evaluated over several levels of cockpit automation that ranged from a conventional cockpit to performance-based FMS vertical navigation (VNAV). Error sources and their magnitudes were identified and measured from the flight data. The major source of error during these tests was found to be the predicted winds aloft used by CTAS. The most significant effect related to flight guidance was the cross-track and turn-overshoot errors associated with conventional VOR guidance. FMS lateral navigation (LNAV) guidance significantly reduced both the cross-track and turn-overshoot error. Pilot procedures and VNAV guidance were found to significantly reduce the vertical profile errors associated with atmospheric and airplane performance model errors.

  14. Noninvasive scoring system for significant inflammation related to chronic hepatitis B

    NASA Astrophysics Data System (ADS)

    Hong, Mei-Zhu; Ye, Linglong; Jin, Li-Xin; Ren, Yan-Dan; Yu, Xiao-Fang; Liu, Xiao-Bin; Zhang, Ru-Mian; Fang, Kuangnan; Pan, Jin-Shui

    2017-03-01

    Although a liver stiffness measurement-based model can precisely predict significant intrahepatic inflammation, transient elastography is not commonly available in a primary care center. Additionally, high body mass index and bilirubinemia have notable effects on the accuracy of transient elastography. The present study aimed to create a noninvasive scoring system for the prediction of intrahepatic inflammatory activity related to chronic hepatitis B, without the aid of transient elastography. A total of 396 patients with chronic hepatitis B were enrolled in the present study. Liver biopsies were performed, liver histology was scored using the Scheuer scoring system, and serum markers and liver function were investigated. Inflammatory activity scoring models were constructed for both hepatitis B envelope antigen (+) and hepatitis B envelope antigen (-) patients. The sensitivity, specificity, positive predictive value, negative predictive value, and area under the curve were 86.00%, 84.80%, 62.32%, 95.39%, and 0.9219, respectively, in the hepatitis B envelope antigen (+) group and 91.89%, 89.86%, 70.83%, 97.64%, and 0.9691, respectively, in the hepatitis B envelope antigen (-) group. Significant inflammation related to chronic hepatitis B can be predicted with satisfactory accuracy by using our logistic regression-based scoring system.

  15. Reduced kernel recursive least squares algorithm for aero-engine degradation prediction

    NASA Astrophysics Data System (ADS)

    Zhou, Haowen; Huang, Jinquan; Lu, Feng

    2017-10-01

    Kernel adaptive filters (KAFs) generate a linear growing radial basis function (RBF) network with the number of training samples, thereby lacking sparseness. To deal with this drawback, traditional sparsification techniques select a subset of original training data based on a certain criterion to train the network and discard the redundant data directly. Although these methods curb the growth of the network effectively, it should be noted that information conveyed by these redundant samples is omitted, which may lead to accuracy degradation. In this paper, we present a novel online sparsification method which requires much less training time without sacrificing the accuracy performance. Specifically, a reduced kernel recursive least squares (RKRLS) algorithm is developed based on the reduced technique and the linear independency. Unlike conventional methods, our novel methodology employs these redundant data to update the coefficients of the existing network. Due to the effective utilization of the redundant data, the novel algorithm achieves a better accuracy performance, although the network size is significantly reduced. Experiments on time series prediction and online regression demonstrate that RKRLS algorithm requires much less computational consumption and maintains the satisfactory accuracy performance. Finally, we propose an enhanced multi-sensor prognostic model based on RKRLS and Hidden Markov Model (HMM) for remaining useful life (RUL) estimation. A case study in a turbofan degradation dataset is performed to evaluate the performance of the novel prognostic approach.

  16. Prediction of UT1-UTC, LOD and AAM χ3 by combination of least-squares and multivariate stochastic methods

    NASA Astrophysics Data System (ADS)

    Niedzielski, Tomasz; Kosek, Wiesław

    2008-02-01

    This article presents the application of a multivariate prediction technique for predicting universal time (UT1-UTC), length of day (LOD) and the axial component of atmospheric angular momentum (AAM χ 3). The multivariate predictions of LOD and UT1-UTC are generated by means of the combination of (1) least-squares (LS) extrapolation of models for annual, semiannual, 18.6-year, 9.3-year oscillations and for the linear trend, and (2) multivariate autoregressive (MAR) stochastic prediction of LS residuals (LS + MAR). The MAR technique enables the use of the AAM χ 3 time-series as the explanatory variable for the computation of LOD or UT1-UTC predictions. In order to evaluate the performance of this approach, two other prediction schemes are also applied: (1) LS extrapolation, (2) combination of LS extrapolation and univariate autoregressive (AR) prediction of LS residuals (LS + AR). The multivariate predictions of AAM χ 3 data, however, are computed as a combination of the extrapolation of the LS model for annual and semiannual oscillations and the LS + MAR. The AAM χ 3 predictions are also compared with LS extrapolation and LS + AR prediction. It is shown that the predictions of LOD and UT1-UTC based on LS + MAR taking into account the axial component of AAM are more accurate than the predictions of LOD and UT1-UTC based on LS extrapolation or on LS + AR. In particular, the UT1-UTC predictions based on LS + MAR during El Niño/La Niña events exhibit considerably smaller prediction errors than those calculated by means of LS or LS + AR. The AAM χ 3 time-series is predicted using LS + MAR with higher accuracy than applying LS extrapolation itself in the case of medium-term predictions (up to 100 days in the future). However, the predictions of AAM χ 3 reveal the best accuracy for LS + AR.

  17. Developing symptom-based predictive models of endometriosis as a clinical screening tool: results from a multicenter study

    PubMed Central

    Nnoaham, Kelechi E.; Hummelshoj, Lone; Kennedy, Stephen H.; Jenkinson, Crispin; Zondervan, Krina T.

    2012-01-01

    Objective To generate and validate symptom-based models to predict endometriosis among symptomatic women prior to undergoing their first laparoscopy. Design Prospective, observational, two-phase study, in which women completed a 25-item questionnaire prior to surgery. Setting Nineteen hospitals in 13 countries. Patient(s) Symptomatic women (n = 1,396) scheduled for laparoscopy without a previous surgical diagnosis of endometriosis. Intervention(s) None. Main Outcome Measure(s) Sensitivity and specificity of endometriosis diagnosis predicted by symptoms and patient characteristics from optimal models developed using multiple logistic regression analyses in one data set (phase I), and independently validated in a second data set (phase II) by receiver operating characteristic (ROC) curve analysis. Result(s) Three hundred sixty (46.7%) women in phase I and 364 (58.2%) in phase II were diagnosed with endometriosis at laparoscopy. Menstrual dyschezia (pain on opening bowels) and a history of benign ovarian cysts most strongly predicted both any and stage III and IV endometriosis in both phases. Prediction of any-stage endometriosis, although improved by ultrasound scan evidence of cyst/nodules, was relatively poor (area under the curve [AUC] = 68.3). Stage III and IV disease was predicted with good accuracy (AUC = 84.9, sensitivity of 82.3% and specificity 75.8% at an optimal cut-off of 0.24). Conclusion(s) Our symptom-based models predict any-stage endometriosis relatively poorly and stage III and IV disease with good accuracy. Predictive tools based on such models could help to prioritize women for surgical investigation in clinical practice and thus contribute to reducing time to diagnosis. We invite other researchers to validate the key models in additional populations. PMID:22657249

  18. Performance and effects of land cover type on synthetic surface reflectance data and NDVI estimates for assessment and monitoring of semi-arid rangeland

    USGS Publications Warehouse

    Olexa, Edward M.; Lawrence, Rick L

    2014-01-01

    Federal land management agencies provide stewardship over much of the rangelands in the arid andsemi-arid western United States, but they often lack data of the proper spatiotemporal resolution andextent needed to assess range conditions and monitor trends. Recent advances in the blending of com-plementary, remotely sensed data could provide public lands managers with the needed information.We applied the Spatial and Temporal Adaptive Reflectance Fusion Model (STARFM) to five Landsat TMand concurrent Terra MODIS scenes, and used pixel-based regression and difference image analyses toevaluate the quality of synthetic reflectance and NDVI products associated with semi-arid rangeland. Pre-dicted red reflectance data consistently demonstrated higher accuracy, less bias, and stronger correlationwith observed data than did analogous near-infrared (NIR) data. The accuracy of both bands tended todecline as the lag between base and prediction dates increased; however, mean absolute errors (MAE)were typically ≤10%. The quality of area-wide NDVI estimates was less consistent than either spectra lband, although the MAE of estimates predicted using early season base pairs were ≤10% throughout the growing season. Correlation between known and predicted NDVI values and agreement with the 1:1regression line tended to decline as the prediction lag increased. Further analyses of NDVI predictions,based on a 22 June base pair and stratified by land cover/land use (LCLU), revealed accurate estimates through the growing season; however, inter-class performance varied. This work demonstrates the successful application of the STARFM algorithm to semi-arid rangeland; however, we encourage evaluation of STARFM’s performance on a per product basis, stratified by LCLU, with attention given to the influence of base pair selection and the impact of the time lag.

  19. GNSS/Electronic Compass/Road Segment Information Fusion for Vehicle-to-Vehicle Collision Avoidance Application

    PubMed Central

    Cheng, Qi; Xue, Dabin; Wang, Guanyu; Ochieng, Washington Yotto

    2017-01-01

    The increasing number of vehicles in modern cities brings the problem of increasing crashes. One of the applications or services of Intelligent Transportation Systems (ITS) conceived to improve safety and reduce congestion is collision avoidance. This safety critical application requires sub-meter level vehicle state estimation accuracy with very high integrity, continuity and availability, to detect an impending collision and issue a warning or intervene in the case that the warning is not heeded. Because of the challenging city environment, to date there is no approved method capable of delivering this high level of performance in vehicle state estimation. In particular, the current Global Navigation Satellite System (GNSS) based collision avoidance systems have the major limitation that the real-time accuracy of dynamic state estimation deteriorates during abrupt acceleration and deceleration situations, compromising the integrity of collision avoidance. Therefore, to provide the Required Navigation Performance (RNP) for collision avoidance, this paper proposes a novel Particle Filter (PF) based model for the integration or fusion of real-time kinematic (RTK) GNSS position solutions with electronic compass and road segment data used in conjunction with an Autoregressive (AR) motion model. The real-time vehicle state estimates are used together with distance based collision avoidance algorithms to predict potential collisions. The algorithms are tested by simulation and in the field representing a low density urban environment. The results show that the proposed algorithm meets the horizontal positioning accuracy requirement for collision avoidance and is superior to positioning accuracy of GNSS only, traditional Constant Velocity (CV) and Constant Acceleration (CA) based motion models, with a significant improvement in the prediction accuracy of potential collision. PMID:29186851

  20. GNSS/Electronic Compass/Road Segment Information Fusion for Vehicle-to-Vehicle Collision Avoidance Application.

    PubMed

    Sun, Rui; Cheng, Qi; Xue, Dabin; Wang, Guanyu; Ochieng, Washington Yotto

    2017-11-25

    The increasing number of vehicles in modern cities brings the problem of increasing crashes. One of the applications or services of Intelligent Transportation Systems (ITS) conceived to improve safety and reduce congestion is collision avoidance. This safety critical application requires sub-meter level vehicle state estimation accuracy with very high integrity, continuity and availability, to detect an impending collision and issue a warning or intervene in the case that the warning is not heeded. Because of the challenging city environment, to date there is no approved method capable of delivering this high level of performance in vehicle state estimation. In particular, the current Global Navigation Satellite System (GNSS) based collision avoidance systems have the major limitation that the real-time accuracy of dynamic state estimation deteriorates during abrupt acceleration and deceleration situations, compromising the integrity of collision avoidance. Therefore, to provide the Required Navigation Performance (RNP) for collision avoidance, this paper proposes a novel Particle Filter (PF) based model for the integration or fusion of real-time kinematic (RTK) GNSS position solutions with electronic compass and road segment data used in conjunction with an Autoregressive (AR) motion model. The real-time vehicle state estimates are used together with distance based collision avoidance algorithms to predict potential collisions. The algorithms are tested by simulation and in the field representing a low density urban environment. The results show that the proposed algorithm meets the horizontal positioning accuracy requirement for collision avoidance and is superior to positioning accuracy of GNSS only, traditional Constant Velocity (CV) and Constant Acceleration (CA) based motion models, with a significant improvement in the prediction accuracy of potential collision.

  1. A genome-scale metabolic flux model of Escherichia coli K–12 derived from the EcoCyc database

    PubMed Central

    2014-01-01

    Background Constraint-based models of Escherichia coli metabolic flux have played a key role in computational studies of cellular metabolism at the genome scale. We sought to develop a next-generation constraint-based E. coli model that achieved improved phenotypic prediction accuracy while being frequently updated and easy to use. We also sought to compare model predictions with experimental data to highlight open questions in E. coli biology. Results We present EcoCyc–18.0–GEM, a genome-scale model of the E. coli K–12 MG1655 metabolic network. The model is automatically generated from the current state of EcoCyc using the MetaFlux software, enabling the release of multiple model updates per year. EcoCyc–18.0–GEM encompasses 1445 genes, 2286 unique metabolic reactions, and 1453 unique metabolites. We demonstrate a three-part validation of the model that breaks new ground in breadth and accuracy: (i) Comparison of simulated growth in aerobic and anaerobic glucose culture with experimental results from chemostat culture and simulation results from the E. coli modeling literature. (ii) Essentiality prediction for the 1445 genes represented in the model, in which EcoCyc–18.0–GEM achieves an improved accuracy of 95.2% in predicting the growth phenotype of experimental gene knockouts. (iii) Nutrient utilization predictions under 431 different media conditions, for which the model achieves an overall accuracy of 80.7%. The model’s derivation from EcoCyc enables query and visualization via the EcoCyc website, facilitating model reuse and validation by inspection. We present an extensive investigation of disagreements between EcoCyc–18.0–GEM predictions and experimental data to highlight areas of interest to E. coli modelers and experimentalists, including 70 incorrect predictions of gene essentiality on glucose, 80 incorrect predictions of gene essentiality on glycerol, and 83 incorrect predictions of nutrient utilization. Conclusion Significant advantages can be derived from the combination of model organism databases and flux balance modeling represented by MetaFlux. Interpretation of the EcoCyc database as a flux balance model results in a highly accurate metabolic model and provides a rigorous consistency check for information stored in the database. PMID:24974895

  2. Parsimonious data: How a single Facebook like predicts voting behavior in multiparty systems.

    PubMed

    Kristensen, Jakob Bæk; Albrechtsen, Thomas; Dahl-Nielsen, Emil; Jensen, Michael; Skovrind, Magnus; Bornakke, Tobias

    2017-01-01

    This study shows how liking politicians' public Facebook posts can be used as an accurate measure for predicting present-day voter intention in a multiparty system. We highlight that a few, but selective digital traces produce prediction accuracies that are on par or even greater than most current approaches based upon bigger and broader datasets. Combining the online and offline, we connect a subsample of surveyed respondents to their public Facebook activity and apply machine learning classifiers to explore the link between their political liking behaviour and actual voting intention. Through this work, we show that even a single selective Facebook like can reveal as much about political voter intention as hundreds of heterogeneous likes. Further, by including the entire political like history of the respondents, our model reaches prediction accuracies above previous multiparty studies (60-70%). The main contribution of this paper is to show how public like-activity on Facebook allows political profiling of individual users in a multiparty system with accuracies above previous studies. Beside increased accuracies, the paper shows how such parsimonious measures allows us to generalize our findings to the entire population of a country and even across national borders, to other political multiparty systems. The approach in this study relies on data that are publicly available, and the simple setup we propose can with some limitations, be generalized to millions of users in other multiparty systems.

  3. Feature Selection Methods for Zero-Shot Learning of Neural Activity

    PubMed Central

    Caceres, Carlos A.; Roos, Matthew J.; Rupp, Kyle M.; Milsap, Griffin; Crone, Nathan E.; Wolmetz, Michael E.; Ratto, Christopher R.

    2017-01-01

    Dimensionality poses a serious challenge when making predictions from human neuroimaging data. Across imaging modalities, large pools of potential neural features (e.g., responses from particular voxels, electrodes, and temporal windows) have to be related to typically limited sets of stimuli and samples. In recent years, zero-shot prediction models have been introduced for mapping between neural signals and semantic attributes, which allows for classification of stimulus classes not explicitly included in the training set. While choices about feature selection can have a substantial impact when closed-set accuracy, open-set robustness, and runtime are competing design objectives, no systematic study of feature selection for these models has been reported. Instead, a relatively straightforward feature stability approach has been adopted and successfully applied across models and imaging modalities. To characterize the tradeoffs in feature selection for zero-shot learning, we compared correlation-based stability to several other feature selection techniques on comparable data sets from two distinct imaging modalities: functional Magnetic Resonance Imaging and Electrocorticography. While most of the feature selection methods resulted in similar zero-shot prediction accuracies and spatial/spectral patterns of selected features, there was one exception; A novel feature/attribute correlation approach was able to achieve those accuracies with far fewer features, suggesting the potential for simpler prediction models that yield high zero-shot classification accuracy. PMID:28690513

  4. Combined ECG, Echocardiographic, and Biomarker Criteria for Diagnosing Acute Myocardial Infarction in Out-of-Hospital Cardiac Arrest Patients.

    PubMed

    Lee, Sang-Eun; Uhm, Jae-Sun; Kim, Jong-Youn; Pak, Hui-Nam; Lee, Moon-Hyoung; Joung, Boyoung

    2015-07-01

    Acute coronary lesions commonly trigger out-of-hospital cardiac arrest (OHCA). However, the prevalence of coronary artery disease (CAD) in Asian patients with OHCA and whether electrocardiogram (ECG) and other findings might predict acute myocardial infarction (AMI) have not been fully elucidated. Of 284 consecutive resuscitated OHCA patients seen between January 2006 and July 2013, we enrolled 135 patients who had undergone coronary evaluation. ECGs, echocardiography, and biomarkers were compared between patients with or without CAD. We included 135 consecutive patients aged 54 years (interquartile range 45-65) with sustained return of spontaneous circulation after OHCA between 2006 and 2012. Sixty six (45%) patients had CAD. The initial rhythm was shockable and non-shockable in 110 (81%) and 25 (19%) patients, respectively. ST-segment elevation predicted CAD with 42% sensitivity, 87% specificity, and 65% accuracy. ST elevation and/or regional wall motion abnormality (RWMA) showed 68% sensitivity, 52% specificity, and 70% accuracy in the prediction of CAD. Finally, a combination of ST elevation and/or RWMA and/or troponin T elevation predicted CAD with 94% sensitivity, 17% specificity, and 55% accuracy. In patients with OHCA without obvious non-cardiac causes, selection for coronary angiogram based on the combined criterion could detect 94% of CADs. However, compared with ECG only criteria, the combined criterion failed to improve diagnostic accuracy with a lower specificity.

  5. A drift line bias estimator: ARMA-based filter or calibration method, and its application in BDS/GPS-based attitude determination

    NASA Astrophysics Data System (ADS)

    Liang, Zhang; Yanqing, Hou; Jie, Wu

    2016-12-01

    The multi-antenna synchronized receiver (using a common clock) is widely applied in GNSS-based attitude determination (AD) or terrain deformations monitoring, and many other applications, since the high-accuracy single-differenced carrier phase can be used to improve the positioning or AD accuracy. Thus, the line bias (LB) parameter (fractional bias isolating) should be calibrated in the single-differenced phase equations. In the past decades, all researchers estimated the LB as a constant parameter in advance and compensated it in real time. However, the constant LB assumption is inappropriate in practical applications because of the physical length and permittivity changes of the cables, caused by the environmental temperature variation and the instability of receiver-self inner circuit transmitting delay. Considering the LB drift (or colored LB) in practical circumstances, this paper initiates a real-time estimator using auto regressive moving average-based (ARMA) prediction/whitening filter model or Moving average-based (MA) constant calibration model. In the ARMA-based filter model, four cases namely AR(1), ARMA(1, 1), AR(2) and ARMA(2, 1) are applied for the LB prediction. The real-time relative positioning model using the ARMA-based predicting LB is derived and it is theoretically proved that the positioning accuracy is better than the traditional double difference carrier phase (DDCP) model. The drifting LB is defined with a phase temperature changing rate integral function, which is a random walk process if the phase temperature changing rate is white noise, and is validated by the analysis of the AR model coefficient. The auto covariance function shows that the LB is indeed varying in time and estimating it as a constant is not safe, which is also demonstrated by the analysis on LB variation of each visible satellite during a zero and short baseline BDS/GPS experiment. Compared to the DDCP approach, in the zero-baseline experiment, the LB constant calibration (LBCC) and MA approaches improved the positioning accuracy of the vertical component, while slightly degrading the accuracy of the horizontal components. The ARMA(1, 0) model, however, improved the positioning accuracy of all three components, with 40 and 50 % improvement of the vertical component for BDS and GPS, respectively. In the short baseline experiment, compared to the DDCP approach, the LBCC approach yielded bad positioning solutions and degraded the AD accuracy; both MA and ARMA-based filter approaches improved the AD accuracy. Moreover, the ARMA(1, 0) and ARMA(1, 1) models have relatively better performance, improving to 55 % and 48 % the elevation angle in ARMA(1, 1) and MA model for GPS, respectively. Furthermore, the drifting LB variation is found to be continuous and slowly cumulative; the variation magnitudes in the unit of length are almost identical on different frequency carrier phases, so the LB variation does not show obvious correlation between different frequencies. Consequently, the wide-lane LB in the unit of cycle is very stable, while the narrow-lane LB varies largely in time. This reasoning probably also explains the phenomenon that the wide-lane LB originating in the satellites is stable, while the narrow-lane LB varies. The results of ARMA-based filters are better than the MA model, which probably implies that the modeling for drifting LB can further improve the precise point positioning accuracy.

  6. Evaluation of an ensemble of genetic models for prediction of a quantitative trait.

    PubMed

    Milton, Jacqueline N; Steinberg, Martin H; Sebastiani, Paola

    2014-01-01

    Many genetic markers have been shown to be associated with common quantitative traits in genome-wide association studies. Typically these associated genetic markers have small to modest effect sizes and individually they explain only a small amount of the variability of the phenotype. In order to build a genetic prediction model without fitting a multiple linear regression model with possibly hundreds of genetic markers as predictors, researchers often summarize the joint effect of risk alleles into a genetic score that is used as a covariate in the genetic prediction model. However, the prediction accuracy can be highly variable and selecting the optimal number of markers to be included in the genetic score is challenging. In this manuscript we present a strategy to build an ensemble of genetic prediction models from data and we show that the ensemble-based method makes the challenge of choosing the number of genetic markers more amenable. Using simulated data with varying heritability and number of genetic markers, we compare the predictive accuracy and inclusion of true positive and false positive markers of a single genetic prediction model and our proposed ensemble method. The results show that the ensemble of genetic models tends to include a larger number of genetic variants than a single genetic model and it is more likely to include all of the true genetic markers. This increased sensitivity is obtained at the price of a lower specificity that appears to minimally affect the predictive accuracy of the ensemble.

  7. Genomic prediction of the polled and horned phenotypes in Merino sheep.

    PubMed

    Duijvesteijn, Naomi; Bolormaa, Sunduimijid; Daetwyler, Hans D; van der Werf, Julius H J

    2018-05-22

    In horned sheep breeds, breeding for polledness has been of interest for decades. The objective of this study was to improve prediction of the horned and polled phenotypes using horn scores classified as polled, scurs, knobs or horns. Derived phenotypes polled/non-polled (P/NP) and horned/non-horned (H/NH) were used to test four different strategies for prediction in 4001 purebred Merino sheep. These strategies include the use of single 'single nucleotide polymorphism' (SNP) genotypes, multiple-SNP haplotypes, genome-wide and chromosome-wide genomic best linear unbiased prediction and information from imputed sequence variants from the region including the RXFP2 gene. Low-density genotypes of these animals were imputed to the Illumina Ovine high-density (600k) chip and the 1.78-kb insertion polymorphism in RXFP2 was included in the imputation process to whole-genome sequence. We evaluated the mode of inheritance and validated models by a fivefold cross-validation and across- and between-family prediction. The most significant SNPs for prediction of P/NP and H/NH were OAR10_29546872.1 and OAR10_29458450, respectively, located on chromosome 10 close to the 1.78-kb insertion at 29.5 Mb. The mode of inheritance included an additive effect and a sex-dependent effect for dominance for P/NP and a sex-dependent additive and dominance effect for H/NH. Models with the highest prediction accuracies for H/NH used either single SNPs or 3-SNP haplotypes and included a polygenic effect estimated based on traditional pedigree relationships. Prediction accuracies for H/NH were 0.323 for females and 0.725 for males. For predicting P/NP, the best models were the same as for H/NH but included a genomic relationship matrix with accuracies of 0.713 for females and 0.620 for males. Our results show that prediction accuracy is high using a single SNP, but does not reach 1 since the causative mutation is not genotyped. Incomplete penetrance or allelic heterogeneity, which can influence expression of the phenotype, may explain why prediction accuracy did not approach 1 with any of the genetic models tested here. Nevertheless, a breeding program to eradicate horns from Merino sheep can be effective by selecting genotypes GG of SNP OAR10_29458450 or TT of SNP OAR10_29546872.1 since all sheep with these genotypes will be non-horned.

  8. Didactic training vs. computer-based self-learning in the prediction of diminutive colon polyp histology by trainees: a randomized controlled study.

    PubMed

    Khan, Taimur; Cinnor, Birtukan; Gupta, Neil; Hosford, Lindsay; Bansal, Ajay; Olyaee, Mojtaba S; Wani, Sachin; Rastogi, Amit

    2017-12-01

    Background and study aim  Experts can accurately predict diminutive polyp histology, but the ideal method to train nonexperts is not known. The aim of the study was to compare accuracy in diminutive polyp histology characterization using narrow-band imaging (NBI) between participants undergoing classroom didactic training vs. computer-based self-learning. Participants and methods  Trainees at two institutions were randomized to classroom didactic training or computer-based self-learning. In didactic training, experienced endoscopists reviewed a presentation on NBI patterns for adenomatous and hyperplastic polyps and 40 NBI videos, along with interactive discussion. The self-learning group reviewed the same presentation of 40 teaching videos independently, without interactive discussion. A total of 40 testing videos of diminutive polyps under NBI were then evaluated by both groups. Performance characteristics were calculated by comparing predicted and actual histology. Fisher's exact test was used and P  < 0.05 was considered significant. Results  A total of 17 trainees participated (8 didactic training and 9 self-learning). A larger proportion of polyps were diagnosed with high confidence in the classroom group (66.5 % vs. 50.8 %; P  < 0.01), although sensitivity (86.9 % vs. 95.0 %) and accuracy (85.7 % vs. 93.9 %) of high-confidence predictions were higher in the self-learning group. However, there was no difference in overall accuracy of histology characterization (83.4 % vs. 87.2 %; P  = 0.19). Similar results were noted when comparing sensitivity and specificity between the groups. Conclusion  The self-learning group showed results on a par with or, for high-confidence predictions, even slightly superior to classroom didactic training for predicting diminutive polyp histology. This approach can help in widespread training and clinical implementation of real-time polyp histology characterization. © Georg Thieme Verlag KG Stuttgart · New York.

  9. The accuracy of new wheelchair users' predictions about their future wheelchair use.

    PubMed

    Hoenig, Helen; Griffiths, Patricia; Ganesh, Shanti; Caves, Kevin; Harris, Frances

    2012-06-01

    This study examined the accuracy of new wheelchair user predictions about their future wheelchair use. This was a prospective cohort study of 84 community-dwelling veterans provided a new manual wheelchair. The association between predicted and actual wheelchair use was strong at 3 mos (ϕ coefficient = 0.56), with 90% of those who anticipated using the wheelchair at 3 mos still using it (i.e., positive predictive value = 0.96) and 60% of those who anticipated not using it indeed no longer using the wheelchair (i.e., negative predictive value = 0.60, overall accuracy = 0.92). Predictive accuracy diminished over time, with overall accuracy declining from 0.92 at 3 mos to 0.66 at 6 mos. At all time points, and for all types of use, patients better predicted use as opposed to disuse, with correspondingly higher positive than negative predictive values. Accuracy of prediction of use in specific indoor and outdoor locations varied according to location. This study demonstrates the importance of better understanding the potential mismatch between the anticipated and actual patterns of wheelchair use. The findings suggest that users can be relied upon to accurately predict their basic wheelchair-related needs in the short-term. Further exploration is needed to identify characteristics that will aid users and their providers in more accurately predicting mobility needs for the long-term.

  10. Improving consensus contact prediction via server correlation reduction.

    PubMed

    Gao, Xin; Bu, Dongbo; Xu, Jinbo; Li, Ming

    2009-05-06

    Protein inter-residue contacts play a crucial role in the determination and prediction of protein structures. Previous studies on contact prediction indicate that although template-based consensus methods outperform sequence-based methods on targets with typical templates, such consensus methods perform poorly on new fold targets. However, we find out that even for new fold targets, the models generated by threading programs can contain many true contacts. The challenge is how to identify them. In this paper, we develop an integer linear programming model for consensus contact prediction. In contrast to the simple majority voting method assuming that all the individual servers are equally important and independent, the newly developed method evaluates their correlation by using maximum likelihood estimation and extracts independent latent servers from them by using principal component analysis. An integer linear programming method is then applied to assign a weight to each latent server to maximize the difference between true contacts and false ones. The proposed method is tested on the CASP7 data set. If the top L/5 predicted contacts are evaluated where L is the protein size, the average accuracy is 73%, which is much higher than that of any previously reported study. Moreover, if only the 15 new fold CASP7 targets are considered, our method achieves an average accuracy of 37%, which is much better than that of the majority voting method, SVM-LOMETS, SVM-SEQ, and SAM-T06. These methods demonstrate an average accuracy of 13.0%, 10.8%, 25.8% and 21.2%, respectively. Reducing server correlation and optimally combining independent latent servers show a significant improvement over the traditional consensus methods. This approach can hopefully provide a powerful tool for protein structure refinement and prediction use.

  11. Survival prediction of trauma patients: a study on US National Trauma Data Bank.

    PubMed

    Sefrioui, I; Amadini, R; Mauro, J; El Fallahi, A; Gabbrielli, M

    2017-12-01

    Exceptional circumstances like major incidents or natural disasters may cause a huge number of victims that might not be immediately and simultaneously saved. In these cases it is important to define priorities avoiding to waste time and resources for not savable victims. Trauma and Injury Severity Score (TRISS) methodology is the well-known and standard system usually used by practitioners to predict the survival probability of trauma patients. However, practitioners have noted that the accuracy of TRISS predictions is unacceptable especially for severely injured patients. Thus, alternative methods should be proposed. In this work we evaluate different approaches for predicting whether a patient will survive or not according to simple and easily measurable observations. We conducted a rigorous, comparative study based on the most important prediction techniques using real clinical data of the US National Trauma Data Bank. Empirical results show that well-known Machine Learning classifiers can outperform the TRISS methodology. Based on our findings, we can say that the best approach we evaluated is Random Forest: it has the best accuracy, the best area under the curve, and k-statistic, as well as the second-best sensitivity and specificity. It has also a good calibration curve. Furthermore, its performance monotonically increases as the dataset size grows, meaning that it can be very effective to exploit incoming knowledge. Considering the whole dataset, it is always better than TRISS. Finally, we implemented a new tool to compute the survival of victims. This will help medical practitioners to obtain a better accuracy than the TRISS tools. Random Forests may be a good candidate solution for improving the predictions on survival upon the standard TRISS methodology.

  12. Prediction of lithium response in first-episode mania using the LITHium Intelligent Agent (LITHIA): Pilot data and proof-of-concept.

    PubMed

    Fleck, David E; Ernest, Nicholas; Adler, Caleb M; Cohen, Kelly; Eliassen, James C; Norris, Matthew; Komoroski, Richard A; Chu, Wen-Jang; Welge, Jeffrey A; Blom, Thomas J; DelBello, Melissa P; Strakowski, Stephen M

    2017-06-01

    Individualized treatment for bipolar disorder based on neuroimaging treatment targets remains elusive. To address this shortcoming, we developed a linguistic machine learning system based on a cascading genetic fuzzy tree (GFT) design called the LITHium Intelligent Agent (LITHIA). Using multiple objectively defined functional magnetic resonance imaging (fMRI) and proton magnetic resonance spectroscopy ( 1 H-MRS) inputs, we tested whether LITHIA could accurately predict the lithium response in participants with first-episode bipolar mania. We identified 20 subjects with first-episode bipolar mania who received an adequate trial of lithium over 8 weeks and both fMRI and 1 H-MRS scans at baseline pre-treatment. We trained LITHIA using 18 1 H-MRS and 90 fMRI inputs over four training runs to classify treatment response and predict symptom reductions. Each training run contained a randomly selected 80% of the total sample and was followed by a 20% validation run. Over a different randomly selected distribution of the sample, we then compared LITHIA to eight common classification methods. LITHIA demonstrated nearly perfect classification accuracy and was able to predict post-treatment symptom reductions at 8 weeks with at least 88% accuracy in training and 80% accuracy in validation. Moreover, LITHIA exceeded the predictive capacity of the eight comparator methods and showed little tendency towards overfitting. The results provided proof-of-concept that a novel GFT is capable of providing control to a multidimensional bioinformatics problem-namely, prediction of the lithium response-in a pilot data set. Future work on this, and similar machine learning systems, could help assign psychiatric treatments more efficiently, thereby optimizing outcomes and limiting unnecessary treatment. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  13. Paroxysmal atrial fibrillation prediction based on HRV analysis and non-dominated sorting genetic algorithm III.

    PubMed

    Boon, K H; Khalil-Hani, M; Malarvili, M B

    2018-01-01

    This paper presents a method that able to predict the paroxysmal atrial fibrillation (PAF). The method uses shorter heart rate variability (HRV) signals when compared to existing methods, and achieves good prediction accuracy. PAF is a common cardiac arrhythmia that increases the health risk of a patient, and the development of an accurate predictor of the onset of PAF is clinical important because it increases the possibility to electrically stabilize and prevent the onset of atrial arrhythmias with different pacing techniques. We propose a multi-objective optimization algorithm based on the non-dominated sorting genetic algorithm III for optimizing the baseline PAF prediction system, that consists of the stages of pre-processing, HRV feature extraction, and support vector machine (SVM) model. The pre-processing stage comprises of heart rate correction, interpolation, and signal detrending. After that, time-domain, frequency-domain, non-linear HRV features are extracted from the pre-processed data in feature extraction stage. Then, these features are used as input to the SVM for predicting the PAF event. The proposed optimization algorithm is used to optimize the parameters and settings of various HRV feature extraction algorithms, select the best feature subsets, and tune the SVM parameters simultaneously for maximum prediction performance. The proposed method achieves an accuracy rate of 87.7%, which significantly outperforms most of the previous works. This accuracy rate is achieved even with the HRV signal length being reduced from the typical 30 min to just 5 min (a reduction of 83%). Furthermore, another significant result is the sensitivity rate, which is considered more important that other performance metrics in this paper, can be improved with the trade-off of lower specificity. Copyright © 2017 Elsevier B.V. All rights reserved.

  14. Estimation of the monthly average daily solar radiation using geographic information system and advanced case-based reasoning.

    PubMed

    Koo, Choongwan; Hong, Taehoon; Lee, Minhyun; Park, Hyo Seon

    2013-05-07

    The photovoltaic (PV) system is considered an unlimited source of clean energy, whose amount of electricity generation changes according to the monthly average daily solar radiation (MADSR). It is revealed that the MADSR distribution in South Korea has very diverse patterns due to the country's climatic and geographical characteristics. This study aimed to develop a MADSR estimation model for the location without the measured MADSR data, using an advanced case based reasoning (CBR) model, which is a hybrid methodology combining CBR with artificial neural network, multiregression analysis, and genetic algorithm. The average prediction accuracy of the advanced CBR model was very high at 95.69%, and the standard deviation of the prediction accuracy was 3.67%, showing a significant improvement in prediction accuracy and consistency. A case study was conducted to verify the proposed model. The proposed model could be useful for owner or construction manager in charge of determining whether or not to introduce the PV system and where to install it. Also, it would benefit contractors in a competitive bidding process to accurately estimate the electricity generation of the PV system in advance and to conduct an economic and environmental feasibility study from the life cycle perspective.

  15. Prediction of Potential Hit Song and Musical Genre Using Artificial Neural Networks

    NASA Astrophysics Data System (ADS)

    Monterola, Christopher; Abundo, Cheryl; Tugaff, Jeric; Venturina, Lorcel Ericka

    Accurately quantifying the goodness of music based on the seemingly subjective taste of the public is a multi-million industry. Recording companies can make sound decisions on which songs or artists to prioritize if accurate forecasting is achieved. We extract 56 single-valued musical features (e.g. pitch and tempo) from 380 Original Pilipino Music (OPM) songs (190 are hit songs) released from 2004 to 2006. Based on an effect size criterion which measures a variable's discriminating power, the 20 highest ranked features are fed to a classifier tasked to predict hit songs. We show that regardless of musical genre, a trained feed-forward neural network (NN) can predict potential hit songs with an average accuracy of ΦNN = 81%. The accuracy is about +20% higher than those of standard classifiers such as linear discriminant analysis (LDA, ΦLDA = 61%) and classification and regression trees (CART, ΦCART = 57%). Both LDA and CART are above the proportional chance criterion (PCC, ΦPCC = 50%) but are slightly below the suggested acceptable classifier requirement of 1.25*ΦPCC = 63%. Utilizing a similar procedure, we demonstrate that different genres (ballad, alternative rock or rock) of OPM songs can be automatically classified with near perfect accuracy using LDA or NN but only around 77% using CART.

  16. Randomized Subspace Learning for Proline Cis-Trans Isomerization Prediction.

    PubMed

    Al-Jarrah, Omar Y; Yoo, Paul D; Taha, Kamal; Muhaidat, Sami; Shami, Abdallah; Zaki, Nazar

    2015-01-01

    Proline residues are common source of kinetic complications during folding. The X-Pro peptide bond is the only peptide bond for which the stability of the cis and trans conformations is comparable. The cis-trans isomerization (CTI) of X-Pro peptide bonds is a widely recognized rate-limiting factor, which can not only induces additional slow phases in protein folding but also modifies the millisecond and sub-millisecond dynamics of the protein. An accurate computational prediction of proline CTI is of great importance for the understanding of protein folding, splicing, cell signaling, and transmembrane active transport in both the human body and animals. In our earlier work, we successfully developed a biophysically motivated proline CTI predictor utilizing a novel tree-based consensus model with a powerful metalearning technique and achieved 86.58 percent Q2 accuracy and 0.74 Mcc, which is a better result than the results (70-73 percent Q2 accuracies) reported in the literature on the well-referenced benchmark dataset. In this paper, we describe experiments with novel randomized subspace learning and bootstrap seeding techniques as an extension to our earlier work, the consensus models as well as entropy-based learning methods, to obtain better accuracy through a precise and robust learning scheme for proline CTI prediction.

  17. Elderly fall risk prediction based on a physiological profile approach using artificial neural networks.

    PubMed

    Razmara, Jafar; Zaboli, Mohammad Hassan; Hassankhani, Hadi

    2016-11-01

    Falls play a critical role in older people's life as it is an important source of morbidity and mortality in elders. In this article, elders fall risk is predicted based on a physiological profile approach using a multilayer neural network with back-propagation learning algorithm. The personal physiological profile of 200 elders was collected through a questionnaire and used as the experimental data for learning and testing the neural network. The profile contains a series of simple factors putting elders at risk for falls such as vision abilities, muscle forces, and some other daily activities and grouped into two sets: psychological factors and public factors. The experimental data were investigated to select factors with high impact using principal component analysis. The experimental results show an accuracy of ≈90 percent and ≈87.5 percent for fall prediction among the psychological and public factors, respectively. Furthermore, combining these two datasets yield an accuracy of ≈91 percent that is better than the accuracy of single datasets. The proposed method suggests a set of valid and reliable measurements that can be employed in a range of health care systems and physical therapy to distinguish people who are at risk for falls.

  18. Tertiary structure-based analysis of microRNA–target interactions

    PubMed Central

    Gan, Hin Hark; Gunsalus, Kristin C.

    2013-01-01

    Current computational analysis of microRNA interactions is based largely on primary and secondary structure analysis. Computationally efficient tertiary structure-based methods are needed to enable more realistic modeling of the molecular interactions underlying miRNA-mediated translational repression. We incorporate algorithms for predicting duplex RNA structures, ionic strength effects, duplex entropy and free energy, and docking of duplex–Argonaute protein complexes into a pipeline to model and predict miRNA–target duplex binding energies. To ensure modeling accuracy and computational efficiency, we use an all-atom description of RNA and a continuum description of ionic interactions using the Poisson–Boltzmann equation. Our method predicts the conformations of two constructs of Caenorhabditis elegans let-7 miRNA–target duplexes to an accuracy of ∼3.8 Å root mean square distance of their NMR structures. We also show that the computed duplex formation enthalpies, entropies, and free energies for eight miRNA–target duplexes agree with titration calorimetry data. Analysis of duplex–Argonaute docking shows that structural distortions arising from single-base-pair mismatches in the seed region influence the activity of the complex by destabilizing both duplex hybridization and its association with Argonaute. Collectively, these results demonstrate that tertiary structure-based modeling of miRNA interactions can reveal structural mechanisms not accessible with current secondary structure-based methods. PMID:23417009

  19. The role of feedback contingency in perceptual category learning.

    PubMed

    Ashby, F Gregory; Vucovich, Lauren E

    2016-11-01

    Feedback is highly contingent on behavior if it eventually becomes easy to predict, and weakly contingent on behavior if it remains difficult or impossible to predict even after learning is complete. Many studies have demonstrated that humans and nonhuman animals are highly sensitive to feedback contingency, but no known studies have examined how feedback contingency affects category learning, and current theories assign little or no importance to this variable. Two experiments examined the effects of contingency degradation on rule-based and information-integration category learning. In rule-based tasks, optimal accuracy is possible with a simple explicit rule, whereas optimal accuracy in information-integration tasks requires integrating information from 2 or more incommensurable perceptual dimensions. In both experiments, participants each learned rule-based or information-integration categories under either high or low levels of feedback contingency. The exact same stimuli were used in all 4 conditions, and optimal accuracy was identical in every condition. Learning was good in both high-contingency conditions, but most participants showed little or no evidence of learning in either low-contingency condition. Possible causes of these effects, as well as their theoretical implications, are discussed. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  20. The Role of Feedback Contingency in Perceptual Category Learning

    PubMed Central

    Ashby, F. Gregory; Vucovich, Lauren E.

    2016-01-01

    Feedback is highly contingent on behavior if it eventually becomes easy to predict, and weakly contingent on behavior if it remains difficult or impossible to predict even after learning is complete. Many studies have demonstrated that humans and nonhuman animals are highly sensitive to feedback contingency, but no known studies have examined how feedback contingency affects category learning, and current theories assign little or no importance to this variable. Two experiments examined the effects of contingency degradation on rule-based and information-integration category learning. In rule-based tasks, optimal accuracy is possible with a simple explicit rule, whereas optimal accuracy in information-integration tasks requires integrating information from two or more incommensurable perceptual dimensions. In both experiments, participants each learned rule-based or information-integration categories under either high or low levels of feedback contingency. The exact same stimuli were used in all four conditions and optimal accuracy was identical in every condition. Learning was good in both high-contingency conditions, but most participants showed little or no evidence of learning in either low-contingency condition. Possible causes of these effects are discussed, as well as their theoretical implications. PMID:27149393

  1. A new computational strategy for predicting essential genes.

    PubMed

    Cheng, Jian; Wu, Wenwu; Zhang, Yinwen; Li, Xiangchen; Jiang, Xiaoqian; Wei, Gehong; Tao, Shiheng

    2013-12-21

    Determination of the minimum gene set for cellular life is one of the central goals in biology. Genome-wide essential gene identification has progressed rapidly in certain bacterial species; however, it remains difficult to achieve in most eukaryotic species. Several computational models have recently been developed to integrate gene features and used as alternatives to transfer gene essentiality annotations between organisms. We first collected features that were widely used by previous predictive models and assessed the relationships between gene features and gene essentiality using a stepwise regression model. We found two issues that could significantly reduce model accuracy: (i) the effect of multicollinearity among gene features and (ii) the diverse and even contrasting correlations between gene features and gene essentiality existing within and among different species. To address these issues, we developed a novel model called feature-based weighted Naïve Bayes model (FWM), which is based on Naïve Bayes classifiers, logistic regression, and genetic algorithm. The proposed model assesses features and filters out the effects of multicollinearity and diversity. The performance of FWM was compared with other popular models, such as support vector machine, Naïve Bayes model, and logistic regression model, by applying FWM to reciprocally predict essential genes among and within 21 species. Our results showed that FWM significantly improves the accuracy and robustness of essential gene prediction. FWM can remarkably improve the accuracy of essential gene prediction and may be used as an alternative method for other classification work. This method can contribute substantially to the knowledge of the minimum gene sets required for living organisms and the discovery of new drug targets.

  2. Pan-Arctic modelling of net ecosystem exchange of CO2

    PubMed Central

    Shaver, G. R.; Rastetter, E. B.; Salmon, V.; Street, L. E.; van de Weg, M. J.; Rocha, A.; van Wijk, M. T.; Williams, M.

    2013-01-01

    Net ecosystem exchange (NEE) of C varies greatly among Arctic ecosystems. Here, we show that approximately 75 per cent of this variation can be accounted for in a single regression model that predicts NEE as a function of leaf area index (LAI), air temperature and photosynthetically active radiation (PAR). The model was developed in concert with a survey of the light response of NEE in Arctic and subarctic tundras in Alaska, Greenland, Svalbard and Sweden. Model parametrizations based on data collected in one part of the Arctic can be used to predict NEE in other parts of the Arctic with accuracy similar to that of predictions based on data collected in the same site where NEE is predicted. The principal requirement for the dataset is that it should contain a sufficiently wide range of measurements of NEE at both high and low values of LAI, air temperature and PAR, to properly constrain the estimates of model parameters. Canopy N content can also be substituted for leaf area in predicting NEE, with equal or greater accuracy, but substitution of soil temperature for air temperature does not improve predictions. Overall, the results suggest a remarkable convergence in regulation of NEE in diverse ecosystem types throughout the Arctic. PMID:23836790

  3. Variant effect prediction tools assessed using independent, functional assay-based datasets: implications for discovery and diagnostics.

    PubMed

    Mahmood, Khalid; Jung, Chol-Hee; Philip, Gayle; Georgeson, Peter; Chung, Jessica; Pope, Bernard J; Park, Daniel J

    2017-05-16

    Genetic variant effect prediction algorithms are used extensively in clinical genomics and research to determine the likely consequences of amino acid substitutions on protein function. It is vital that we better understand their accuracies and limitations because published performance metrics are confounded by serious problems of circularity and error propagation. Here, we derive three independent, functionally determined human mutation datasets, UniFun, BRCA1-DMS and TP53-TA, and employ them, alongside previously described datasets, to assess the pre-eminent variant effect prediction tools. Apparent accuracies of variant effect prediction tools were influenced significantly by the benchmarking dataset. Benchmarking with the assay-determined datasets UniFun and BRCA1-DMS yielded areas under the receiver operating characteristic curves in the modest ranges of 0.52 to 0.63 and 0.54 to 0.75, respectively, considerably lower than observed for other, potentially more conflicted datasets. These results raise concerns about how such algorithms should be employed, particularly in a clinical setting. Contemporary variant effect prediction tools are unlikely to be as accurate at the general prediction of functional impacts on proteins as reported prior. Use of functional assay-based datasets that avoid prior dependencies promises to be valuable for the ongoing development and accurate benchmarking of such tools.

  4. Predicting CD4 count changes among patients on antiretroviral treatment: Application of data mining techniques.

    PubMed

    Kebede, Mihiretu; Zegeye, Desalegn Tigabu; Zeleke, Berihun Megabiaw

    2017-12-01

    To monitor the progress of therapy and disease progression, periodic CD4 counts are required throughout the course of HIV/AIDS care and support. The demand for CD4 count measurement is increasing as ART programs expand over the last decade. This study aimed to predict CD4 count changes and to identify the predictors of CD4 count changes among patients on ART. A cross-sectional study was conducted at the University of Gondar Hospital from 3,104 adult patients on ART with CD4 counts measured at least twice (baseline and most recent). Data were retrieved from the HIV care clinic electronic database and patients` charts. Descriptive data were analyzed by SPSS version 20. Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology was followed to undertake the study. WEKA version 3.8 was used to conduct a predictive data mining. Before building the predictive data mining models, information gain values and correlation-based Feature Selection methods were used for attribute selection. Variables were ranked according to their relevance based on their information gain values. J48, Neural Network, and Random Forest algorithms were experimented to assess model accuracies. The median duration of ART was 191.5 weeks. The mean CD4 count change was 243 (SD 191.14) cells per microliter. Overall, 2427 (78.2%) patients had their CD4 counts increased by at least 100 cells per microliter, while 4% had a decline from the baseline CD4 value. Baseline variables including age, educational status, CD8 count, ART regimen, and hemoglobin levels predicted CD4 count changes with predictive accuracies of J48, Neural Network, and Random Forest being 87.1%, 83.5%, and 99.8%, respectively. Random Forest algorithm had a superior performance accuracy level than both J48 and Artificial Neural Network. The precision, sensitivity and recall values of Random Forest were also more than 99%. Nearly accurate prediction results were obtained using Random Forest algorithm. This algorithm could be used in a low-resource setting to build a web-based prediction model for CD4 count changes. Copyright © 2017 Elsevier B.V. All rights reserved.

  5. Verification and Validation of the New Dynamic Mooring Modules Available in FAST v8: Preprint

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wendt, Fabian; Robertson, Amy; Jonkman, Jason

    2016-08-01

    The open-source aero-hydro-servo-elastic wind turbine simulation software, FAST v8, was recently coupled to two newly developed mooring dynamics modules: MoorDyn and FEAMooring. MoorDyn is a lumped-mass-based mooring dynamics module developed by the University of Maine, and FEAMooring is a finite-element-based mooring dynamics module developed by Texas A&M University. This paper summarizes the work performed to verify and validate these modules against other mooring models and measured test data to assess their reliability and accuracy. The quality of the fairlead load predictions by the open-source mooring modules MoorDyn and FEAMooring appear to be largely equivalent to what is predicted by themore » commercial tool OrcaFlex. Both mooring dynamic model predictions agree well with the experimental data, considering the given limitations in the accuracy of the platform hydrodynamic load calculation and the quality of the measurement data.« less

  6. Verification and Validation of the New Dynamic Mooring Modules Available in FAST v8

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wendt, Fabian F.; Andersen, Morten T.; Robertson, Amy N.

    2016-07-01

    The open-source aero-hydro-servo-elastic wind turbine simulation software, FAST v8, was recently coupled to two newly developed mooring dynamics modules: MoorDyn and FEAMooring. MoorDyn is a lumped-mass-based mooring dynamics module developed by the University of Maine, and FEAMooring is a finite-element-based mooring dynamics module developed by Texas A&M University. This paper summarizes the work performed to verify and validate these modules against other mooring models and measured test data to assess their reliability and accuracy. The quality of the fairlead load predictions by the open-source mooring modules MoorDyn and FEAMooring appear to be largely equivalent to what is predicted by themore » commercial tool OrcaFlex. Both mooring dynamic model predictions agree well with the experimental data, considering the given limitations in the accuracy of the platform hydrodynamic load calculation and the quality of the measurement data.« less

  7. Development of a Natural Language Processing Engine to Generate Bladder Cancer Pathology Data for Health Services Research.

    PubMed

    Schroeck, Florian R; Patterson, Olga V; Alba, Patrick R; Pattison, Erik A; Seigne, John D; DuVall, Scott L; Robertson, Douglas J; Sirovich, Brenda; Goodney, Philip P

    2017-12-01

    To take the first step toward assembling population-based cohorts of patients with bladder cancer with longitudinal pathology data, we developed and validated a natural language processing (NLP) engine that abstracts pathology data from full-text pathology reports. Using 600 bladder pathology reports randomly selected from the Department of Veterans Affairs, we developed and validated an NLP engine to abstract data on histology, invasion (presence vs absence and depth), grade, the presence of muscularis propria, and the presence of carcinoma in situ. Our gold standard was based on an independent review of reports by 2 urologists, followed by adjudication. We assessed the NLP performance by calculating the accuracy, the positive predictive value, and the sensitivity. We subsequently applied the NLP engine to pathology reports from 10,725 patients with bladder cancer. When comparing the NLP output to the gold standard, NLP achieved the highest accuracy (0.98) for the presence vs the absence of carcinoma in situ. Accuracy for histology, invasion (presence vs absence), grade, and the presence of muscularis propria ranged from 0.83 to 0.96. The most challenging variable was depth of invasion (accuracy 0.68), with an acceptable positive predictive value for lamina propria (0.82) and for muscularis propria (0.87) invasion. The validated engine was capable of abstracting pathologic characteristics for 99% of the patients with bladder cancer. NLP had high accuracy for 5 of 6 variables and abstracted data for the vast majority of the patients. This now allows for the assembly of population-based cohorts with longitudinal pathology data. Published by Elsevier Inc.

  8. Efficient pairwise RNA structure prediction using probabilistic alignment constraints in Dynalign

    PubMed Central

    2007-01-01

    Background Joint alignment and secondary structure prediction of two RNA sequences can significantly improve the accuracy of the structural predictions. Methods addressing this problem, however, are forced to employ constraints that reduce computation by restricting the alignments and/or structures (i.e. folds) that are permissible. In this paper, a new methodology is presented for the purpose of establishing alignment constraints based on nucleotide alignment and insertion posterior probabilities. Using a hidden Markov model, posterior probabilities of alignment and insertion are computed for all possible pairings of nucleotide positions from the two sequences. These alignment and insertion posterior probabilities are additively combined to obtain probabilities of co-incidence for nucleotide position pairs. A suitable alignment constraint is obtained by thresholding the co-incidence probabilities. The constraint is integrated with Dynalign, a free energy minimization algorithm for joint alignment and secondary structure prediction. The resulting method is benchmarked against the previous version of Dynalign and against other programs for pairwise RNA structure prediction. Results The proposed technique eliminates manual parameter selection in Dynalign and provides significant computational time savings in comparison to prior constraints in Dynalign while simultaneously providing a small improvement in the structural prediction accuracy. Savings are also realized in memory. In experiments over a 5S RNA dataset with average sequence length of approximately 120 nucleotides, the method reduces computation by a factor of 2. The method performs favorably in comparison to other programs for pairwise RNA structure prediction: yielding better accuracy, on average, and requiring significantly lesser computational resources. Conclusion Probabilistic analysis can be utilized in order to automate the determination of alignment constraints for pairwise RNA structure prediction methods in a principled fashion. These constraints can reduce the computational and memory requirements of these methods while maintaining or improving their accuracy of structural prediction. This extends the practical reach of these methods to longer length sequences. The revised Dynalign code is freely available for download. PMID:17445273

  9. Accuracy of Prediction Instruments for Diagnosing Large Vessel Occlusion in Individuals With Suspected Stroke: A Systematic Review for the 2018 Guidelines for the Early Management of Patients With Acute Ischemic Stroke.

    PubMed

    Smith, Eric E; Kent, David M; Bulsara, Ketan R; Leung, Lester Y; Lichtman, Judith H; Reeves, Mathew J; Towfighi, Amytis; Whiteley, William N; Zahuranec, Darin B

    2018-03-01

    Endovascular thrombectomy is a highly efficacious treatment for large vessel occlusion (LVO). LVO prediction instruments, based on stroke signs and symptoms, have been proposed to identify stroke patients with LVO for rapid transport to endovascular thrombectomy-capable hospitals. This evidence review committee was commissioned by the American Heart Association/American Stroke Association to systematically review evidence for the accuracy of LVO prediction instruments. Medline, Embase, and Cochrane databases were searched on October 27, 2016. Study quality was assessed with the Quality Assessment of Diagnostic Accuracy-2 tool. Thirty-six relevant studies were identified. Most studies (21 of 36) recruited patients with ischemic stroke, with few studies in the prehospital setting (4 of 36) and in populations that included hemorrhagic stroke or stroke mimics (12 of 36). The most frequently studied prediction instrument was the National Institutes of Health Stroke Scale. Most studies had either some risk of bias or unclear risk of bias. Reported discrimination of LVO mostly ranged from 0.70 to 0.85, as measured by the C statistic. In meta-analysis, sensitivity was as high as 87% and specificity was as high as 90%, but no threshold on any instruments predicted LVO with both high sensitivity and specificity. With a positive LVO prediction test, the probability of LVO could be 50% to 60% (depending on the LVO prevalence in the population), but the probability of LVO with a negative test could still be ≥10%. No scale predicted LVO with both high sensitivity and high specificity. Systems that use LVO prediction instruments for triage will miss some patients with LVO and milder stroke. More prospective studies are needed to assess the accuracy of LVO prediction instruments in the prehospital setting in all patients with suspected stroke, including patients with hemorrhagic stroke and stroke mimics. © 2018 American Heart Association, Inc.

  10. GASP: Gapped Ancestral Sequence Prediction for proteins

    PubMed Central

    Edwards, Richard J; Shields, Denis C

    2004-01-01

    Background The prediction of ancestral protein sequences from multiple sequence alignments is useful for many bioinformatics analyses. Predicting ancestral sequences is not a simple procedure and relies on accurate alignments and phylogenies. Several algorithms exist based on Maximum Parsimony or Maximum Likelihood methods but many current implementations are unable to process residues with gaps, which may represent insertion/deletion (indel) events or sequence fragments. Results Here we present a new algorithm, GASP (Gapped Ancestral Sequence Prediction), for predicting ancestral sequences from phylogenetic trees and the corresponding multiple sequence alignments. Alignments may be of any size and contain gaps. GASP first assigns the positions of gaps in the phylogeny before using a likelihood-based approach centred on amino acid substitution matrices to assign ancestral amino acids. Important outgroup information is used by first working down from the tips of the tree to the root, using descendant data only to assign probabilities, and then working back up from the root to the tips using descendant and outgroup data to make predictions. GASP was tested on a number of simulated datasets based on real phylogenies. Prediction accuracy for ungapped data was similar to three alternative algorithms tested, with GASP performing better in some cases and worse in others. Adding simple insertions and deletions to the simulated data did not have a detrimental effect on GASP accuracy. Conclusions GASP (Gapped Ancestral Sequence Prediction) will predict ancestral sequences from multiple protein alignments of any size. Although not as accurate in all cases as some of the more sophisticated maximum likelihood approaches, it can process a wide range of input phylogenies and will predict ancestral sequences for gapped and ungapped residues alike. PMID:15350199

  11. A time series modeling approach in risk appraisal of violent and sexual recidivism.

    PubMed

    Bani-Yaghoub, Majid; Fedoroff, J Paul; Curry, Susan; Amundsen, David E

    2010-10-01

    For over half a century, various clinical and actuarial methods have been employed to assess the likelihood of violent recidivism. Yet there is a need for new methods that can improve the accuracy of recidivism predictions. This study proposes a new time series modeling approach that generates high levels of predictive accuracy over short and long periods of time. The proposed approach outperformed two widely used actuarial instruments (i.e., the Violence Risk Appraisal Guide and the Sex Offender Risk Appraisal Guide). Furthermore, analysis of temporal risk variations based on specific time series models can add valuable information into risk assessment and management of violent offenders.

  12. Clinical models are inaccurate in predicting bile duct stones in situ for patients with gallbladder.

    PubMed

    Topal, B; Fieuws, S; Tomczyk, K; Aerts, R; Van Steenbergen, W; Verslype, C; Penninckx, F

    2009-01-01

    The probability that a patient has common bile duct stones (CBDS) is a key factor in determining diagnostic and treatment strategies. This prospective cohort study evaluated the accuracy of clinical models in predicting CBDS for patients who will undergo cholecystectomy for lithiasis. From October 2005 until September 2006, 335 consecutive patients with symptoms of gallstone disease underwent cholecystectomy. Statistical analysis was performed on prospective patient data obtained at the time of first presentation to the hospital. Demonstrable CBDS at the time of endoscopic retrograde cholangiopancreatography (ERCP) or intraoperative cholangiography (IOC) was considered the gold standard for the presence of CBDS. Common bile duct stones were demonstrated in 53 patients. For 35 patients, ERCP was performed, with successful stone clearance in 24 of 30 patients who had proven CBDS. In 29 patients, IOC showed CBDS, which were managed successfully via laparoscopic common bile duct exploration, with stone extraction at the time of cholecystectomy. Prospective validation of the existing model for CBDS resulted in a predictive accuracy rate of 73%. The new model showed a predictive accuracy rate of 79%. Clinical models are inaccurate in predicting CBDS in patients with cholelithiasis. Management strategies should be based on the local availability of therapeutic expertise.

  13. Accurate genomic predictions for BCWD resistance in rainbow trout are achieved using low-density SNP panels: Evidence that long-range LD is a major contributing factor.

    PubMed

    Vallejo, Roger L; Silva, Rafael M O; Evenhuis, Jason P; Gao, Guangtu; Liu, Sixin; Parsons, James E; Martin, Kyle E; Wiens, Gregory D; Lourenco, Daniela A L; Leeds, Timothy D; Palti, Yniv

    2018-06-05

    Previously accurate genomic predictions for Bacterial cold water disease (BCWD) resistance in rainbow trout were obtained using a medium-density single nucleotide polymorphism (SNP) array. Here, the impact of lower-density SNP panels on the accuracy of genomic predictions was investigated in a commercial rainbow trout breeding population. Using progeny performance data, the accuracy of genomic breeding values (GEBV) using 35K, 10K, 3K, 1K, 500, 300 and 200 SNP panels as well as a panel with 70 quantitative trait loci (QTL)-flanking SNP was compared. The GEBVs were estimated using the Bayesian method BayesB, single-step GBLUP (ssGBLUP) and weighted ssGBLUP (wssGBLUP). The accuracy of GEBVs remained high despite the sharp reductions in SNP density, and even with 500 SNP accuracy was higher than the pedigree-based prediction (0.50-0.56 versus 0.36). Furthermore, the prediction accuracy with the 70 QTL-flanking SNP (0.65-0.72) was similar to the panel with 35K SNP (0.65-0.71). Genomewide linkage disequilibrium (LD) analysis revealed strong LD (r 2  ≥ 0.25) spanning on average over 1 Mb across the rainbow trout genome. This long-range LD likely contributed to the accurate genomic predictions with the low-density SNP panels. Population structure analysis supported the hypothesis that long-range LD in this population may be caused by admixture. Results suggest that lower-cost, low-density SNP panels can be used for implementing genomic selection for BCWD resistance in rainbow trout breeding programs. © 2018 The Authors. This article is a U.S. Government work and is in the public domain in the USA. Journal of Animal Breeding and Genetics published by Blackwell Verlag GmbH.

  14. Analysis of Mining-Induced Subsidence Prediction by Exponent Knothe Model Combined with Insar and Leveling

    NASA Astrophysics Data System (ADS)

    Chen, Lei; Zhang, Liguo; Tang, Yixian; Zhang, Hong

    2018-04-01

    The principle of exponent Knothe model was introduced in detail and the variation process of mining subsidence with time was analysed based on the formulas of subsidence, subsidence velocity and subsidence acceleration in the paper. Five scenes of radar images and six levelling measurements were collected to extract ground deformation characteristics in one coal mining area in this study. Then the unknown parameters of exponent Knothe model were estimated by combined levelling data with deformation information along the line of sight obtained by InSAR technique. By compared the fitting and prediction results obtained by InSAR and levelling with that obtained only by levelling, it was shown that the accuracy of fitting and prediction combined with InSAR and levelling was obviously better than the other that. Therefore, the InSAR measurements can significantly improve the fitting and prediction accuracy of exponent Knothe model.

  15. Complex Questions Asked by Defense Lawyers But Not Prosecutors Predicts Convictions in Child Abuse Trials

    PubMed Central

    Evans, Angela D.; Lyon, Thomas D.

    2010-01-01

    Attorneys’ language has been found to influence the accuracy of a child's testimony, with defense attorneys asking more complex questions than the prosecution (Zajac & Hayne, J. Exp Psychol Appl 9:187–195, 2003; Zajac et al. Psychiatr Psychol Law, 10:199–209, 2003). These complex questions may be used as a strategy to influence the jury's perceived accuracy of child witnesses. However, we currently do not know whether the complexity of attorney's questions predict the trial outcome. The present study assesses whether the complexity of questions is related to the trial outcome in 46 child sexual abuse court transcripts using an automated linguistic analysis. Based on the complexity of defense attorney's questions, the trial verdict was accurately predicted 82.6% of the time. Contrary to our prediction, more complex questions asked by the defense were associated with convictions, not acquittals. PMID:18633698

  16. Predictive modeling of respiratory tumor motion for real-time prediction of baseline shifts

    NASA Astrophysics Data System (ADS)

    Balasubramanian, A.; Shamsuddin, R.; Prabhakaran, B.; Sawant, A.

    2017-03-01

    Baseline shifts in respiratory patterns can result in significant spatiotemporal changes in patient anatomy (compared to that captured during simulation), in turn, causing geometric and dosimetric errors in the administration of thoracic and abdominal radiotherapy. We propose predictive modeling of the tumor motion trajectories for predicting a baseline shift ahead of its occurrence. The key idea is to use the features of the tumor motion trajectory over a 1 min window, and predict the occurrence of a baseline shift in the 5 s that immediately follow (lookahead window). In this study, we explored a preliminary trend-based analysis with multi-class annotations as well as a more focused binary classification analysis. In both analyses, a number of different inter-fraction and intra-fraction training strategies were studied, both offline as well as online, along with data sufficiency and skew compensation for class imbalances. The performance of different training strategies were compared across multiple machine learning classification algorithms, including nearest neighbor, Naïve Bayes, linear discriminant and ensemble Adaboost. The prediction performance is evaluated using metrics such as accuracy, precision, recall and the area under the curve (AUC) for repeater operating characteristics curve. The key results of the trend-based analysis indicate that (i) intra-fraction training strategies achieve highest prediction accuracies (90.5-91.4%) (ii) the predictive modeling yields lowest accuracies (50-60%) when the training data does not include any information from the test patient; (iii) the prediction latencies are as low as a few hundred milliseconds, and thus conducive for real-time prediction. The binary classification performance is promising, indicated by high AUCs (0.96-0.98). It also confirms the utility of prior data from previous patients, and also the necessity of training the classifier on some initial data from the new patient for reasonable prediction performance. The ability to predict a baseline shift with a sufficient look-ahead window will enable clinical systems or even human users to hold the treatment beam in such situations, thereby reducing the probability of serious geometric and dosimetric errors.

  17. Predictive modeling of respiratory tumor motion for real-time prediction of baseline shifts

    PubMed Central

    Balasubramanian, A; Shamsuddin, R; Prabhakaran, B; Sawant, A

    2017-01-01

    Baseline shifts in respiratory patterns can result in significant spatiotemporal changes in patient anatomy (compared to that captured during simulation), in turn, causing geometric and dosimetric errors in the administration of thoracic and abdominal radiotherapy. We propose predictive modeling of the tumor motion trajectories for predicting a baseline shift ahead of its occurrence. The key idea is to use the features of the tumor motion trajectory over a 1 min window, and predict the occurrence of a baseline shift in the 5 s that immediately follow (lookahead window). In this study, we explored a preliminary trend-based analysis with multi-class annotations as well as a more focused binary classification analysis. In both analyses, a number of different inter-fraction and intra-fraction training strategies were studied, both offline as well as online, along with data sufficiency and skew compensation for class imbalances. The performance of different training strategies were compared across multiple machine learning classification algorithms, including nearest neighbor, Naïve Bayes, linear discriminant and ensemble Adaboost. The prediction performance is evaluated using metrics such as accuracy, precision, recall and the area under the curve (AUC) for repeater operating characteristics curve. The key results of the trend-based analysis indicate that (i) intra-fraction training strategies achieve highest prediction accuracies (90.5–91.4%); (ii) the predictive modeling yields lowest accuracies (50–60%) when the training data does not include any information from the test patient; (iii) the prediction latencies are as low as a few hundred milliseconds, and thus conducive for real-time prediction. The binary classification performance is promising, indicated by high AUCs (0.96–0.98). It also confirms the utility of prior data from previous patients, and also the necessity of training the classifier on some initial data from the new patient for reasonable prediction performance. The ability to predict a baseline shift with a sufficient lookahead window will enable clinical systems or even human users to hold the treatment beam in such situations, thereby reducing the probability of serious geometric and dosimetric errors. PMID:28075331

  18. Predictive modeling of respiratory tumor motion for real-time prediction of baseline shifts.

    PubMed

    Balasubramanian, A; Shamsuddin, R; Prabhakaran, B; Sawant, A

    2017-03-07

    Baseline shifts in respiratory patterns can result in significant spatiotemporal changes in patient anatomy (compared to that captured during simulation), in turn, causing geometric and dosimetric errors in the administration of thoracic and abdominal radiotherapy. We propose predictive modeling of the tumor motion trajectories for predicting a baseline shift ahead of its occurrence. The key idea is to use the features of the tumor motion trajectory over a 1 min window, and predict the occurrence of a baseline shift in the 5 s that immediately follow (lookahead window). In this study, we explored a preliminary trend-based analysis with multi-class annotations as well as a more focused binary classification analysis. In both analyses, a number of different inter-fraction and intra-fraction training strategies were studied, both offline as well as online, along with data sufficiency and skew compensation for class imbalances. The performance of different training strategies were compared across multiple machine learning classification algorithms, including nearest neighbor, Naïve Bayes, linear discriminant and ensemble Adaboost. The prediction performance is evaluated using metrics such as accuracy, precision, recall and the area under the curve (AUC) for repeater operating characteristics curve. The key results of the trend-based analysis indicate that (i) intra-fraction training strategies achieve highest prediction accuracies (90.5-91.4%); (ii) the predictive modeling yields lowest accuracies (50-60%) when the training data does not include any information from the test patient; (iii) the prediction latencies are as low as a few hundred milliseconds, and thus conducive for real-time prediction. The binary classification performance is promising, indicated by high AUCs (0.96-0.98). It also confirms the utility of prior data from previous patients, and also the necessity of training the classifier on some initial data from the new patient for reasonable prediction performance. The ability to predict a baseline shift with a sufficient look-ahead window will enable clinical systems or even human users to hold the treatment beam in such situations, thereby reducing the probability of serious geometric and dosimetric errors.

  19. DNA methylation-based forensic age prediction using artificial neural networks and next generation sequencing.

    PubMed

    Vidaki, Athina; Ballard, David; Aliferi, Anastasia; Miller, Thomas H; Barron, Leon P; Syndercombe Court, Denise

    2017-05-01

    The ability to estimate the age of the donor from recovered biological material at a crime scene can be of substantial value in forensic investigations. Aging can be complex and is associated with various molecular modifications in cells that accumulate over a person's lifetime including epigenetic patterns. The aim of this study was to use age-specific DNA methylation patterns to generate an accurate model for the prediction of chronological age using data from whole blood. In total, 45 age-associated CpG sites were selected based on their reported age coefficients in a previous extensive study and investigated using publicly available methylation data obtained from 1156 whole blood samples (aged 2-90 years) analysed with Illumina's genome-wide methylation platforms (27K/450K). Applying stepwise regression for variable selection, 23 of these CpG sites were identified that could significantly contribute to age prediction modelling and multiple regression analysis carried out with these markers provided an accurate prediction of age (R 2 =0.92, mean absolute error (MAE)=4.6 years). However, applying machine learning, and more specifically a generalised regression neural network model, the age prediction significantly improved (R 2 =0.96) with a MAE=3.3 years for the training set and 4.4 years for a blind test set of 231 cases. The machine learning approach used 16 CpG sites, located in 16 different genomic regions, with the top 3 predictors of age belonged to the genes NHLRC1, SCGN and CSNK1D. The proposed model was further tested using independent cohorts of 53 monozygotic twins (MAE=7.1 years) and a cohort of 1011 disease state individuals (MAE=7.2 years). Furthermore, we highlighted the age markers' potential applicability in samples other than blood by predicting age with similar accuracy in 265 saliva samples (R 2 =0.96) with a MAE=3.2 years (training set) and 4.0 years (blind test). In an attempt to create a sensitive and accurate age prediction test, a next generation sequencing (NGS)-based method able to quantify the methylation status of the selected 16 CpG sites was developed using the Illumina MiSeq ® platform. The method was validated using DNA standards of known methylation levels and the age prediction accuracy has been initially assessed in a set of 46 whole blood samples. Although the resulted prediction accuracy using the NGS data was lower compared to the original model (MAE=7.5years), it is expected that future optimization of our strategy to account for technical variation as well as increasing the sample size will improve both the prediction accuracy and reproducibility. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  20. Evaluation of CROES Nephrolithometry Nomogram as a Preoperative Predictive System for Percutaneous Nephrolithotomy Outcomes.

    PubMed

    Kumar, Sumit; Sreenivas, Jayaram; Karthikeyan, Vilvapathy Senguttuvan; Mallya, Ashwin; Keshavamurthy, Ramaiah

    2016-10-01

    Scoring systems have been devised to predict outcomes of percutaneous nephrolithotomy (PCNL). CROES nephrolithometry nomogram (CNN) is the latest tool devised to predict stone-free rate (SFR). We aim to compare predictive accuracy of CNN against Guy stone score (GSS) for SFR and postoperative outcomes. Between January 2013 and December 2015, 313 patients undergoing PCNL were analyzed for predictive accuracy of GSS, CNN, and stone burden (SB) for SFR, complications, operation time (OT), and length of hospitalization (LOH). We further stratified patients into risk groups based on CNN and GSS. Mean ± standard deviation (SD) SB was 298.8 ± 235.75 mm 2 . SB, GSS, and CNN (area under curve [AUC]: 0.662, 0.660, 0.673) were found to be predictors of SFR. However, predictability for complications was not as good (AUC: SB 0.583, GSS 0.554, CNN 0.580). Single implicated calix (Adj. OR 3.644; p = 0.027), absence of staghorn calculus (Adj. OR 3.091; p = 0.044), single stone (Adj. OR 3.855; p = 0.002), and single puncture (Adj. OR 2.309; p = 0.048) significantly predicted SFR on multivariate analysis. Charlson comorbidity index (CCI; p = 0.020) and staghorn calculus (p = 0.002) were independent predictors for complications on linear regression. SB and GSS independently predicted OT on multivariate analysis. SB and complications significantly predicted LOH, while GSS and CNN did not predict LOH. CNN offered better risk stratification for residual stones than GSS. CNN and GSS have good preoperative predictive accuracy for SFR. Number of implicated calices may affect SFR, and CCI affects complications. Studies should incorporate these factors in scoring systems and assess if predictability of PCNL outcomes improves.

Top