Sample records for improved predictive accuracy

  1. Evaluating the accuracy of SHAPE-directed RNA secondary structure predictions

    PubMed Central

    Sükösd, Zsuzsanna; Swenson, M. Shel; Kjems, Jørgen; Heitsch, Christine E.

    2013-01-01

    Recent advances in RNA structure determination include using data from high-throughput probing experiments to improve thermodynamic prediction accuracy. We evaluate the extent and nature of improvements in data-directed predictions for a diverse set of 16S/18S ribosomal sequences using a stochastic model of experimental SHAPE data. The average accuracy for 1000 data-directed predictions always improves over the original minimum free energy (MFE) structure. However, the amount of improvement varies with the sequence, exhibiting a correlation with MFE accuracy. Further analysis of this correlation shows that accurate MFE base pairs are typically preserved in a data-directed prediction, whereas inaccurate ones are not. Thus, the positive predictive value of common base pairs is consistently higher than the directed prediction accuracy. Finally, we confirm sequence dependencies in the directability of thermodynamic predictions and investigate the potential for greater accuracy improvements in the worst performing test sequence. PMID:23325843

  2. Improving the accuracy of camber predictions for precast pretensioned concrete beams : [tech transfer summary].

    DOT National Transportation Integrated Search

    2015-07-01

    Implementing the recommendations of this study is expected to significantly : improve the accuracy of camber measurements and predictions and to : ultimately help reduce construction delays, improve bridge serviceability, : and decrease costs.

  3. Medium- and Long-term Prediction of LOD Change by the Leap-step Autoregressive Model

    NASA Astrophysics Data System (ADS)

    Wang, Qijie

    2015-08-01

    The accuracy of medium- and long-term prediction of length of day (LOD) change base on combined least-square and autoregressive (LS+AR) deteriorates gradually. Leap-step autoregressive (LSAR) model can significantly reduce the edge effect of the observation sequence. Especially, LSAR model greatly improves the resolution of signals’ low-frequency components. Therefore, it can improve the efficiency of prediction. In this work, LSAR is used to forecast the LOD change. The LOD series from EOP 08 C04 provided by IERS is modeled by both the LSAR and AR models. The results of the two models are analyzed and compared. When the prediction length is between 10-30 days, the accuracy improvement is less than 10%. When the prediction length amounts to above 30 day, the accuracy improved obviously, with the maximum being around 19%. The results show that the LSAR model has higher prediction accuracy and stability in medium- and long-term prediction.

  4. PPCM: Combing multiple classifiers to improve protein-protein interaction prediction

    DOE PAGES

    Yao, Jianzhuang; Guo, Hong; Yang, Xiaohan

    2015-08-01

    Determining protein-protein interaction (PPI) in biological systems is of considerable importance, and prediction of PPI has become a popular research area. Although different classifiers have been developed for PPI prediction, no single classifier seems to be able to predict PPI with high confidence. We postulated that by combining individual classifiers the accuracy of PPI prediction could be improved. We developed a method called protein-protein interaction prediction classifiers merger (PPCM), and this method combines output from two PPI prediction tools, GO2PPI and Phyloprof, using Random Forests algorithm. The performance of PPCM was tested by area under the curve (AUC) using anmore » assembled Gold Standard database that contains both positive and negative PPI pairs. Our AUC test showed that PPCM significantly improved the PPI prediction accuracy over the corresponding individual classifiers. We found that additional classifiers incorporated into PPCM could lead to further improvement in the PPI prediction accuracy. Furthermore, cross species PPCM could achieve competitive and even better prediction accuracy compared to the single species PPCM. This study established a robust pipeline for PPI prediction by integrating multiple classifiers using Random Forests algorithm. Ultimately, this pipeline will be useful for predicting PPI in nonmodel species.« less

  5. Improving Fermi Orbit Determination and Prediction in an Uncertain Atmospheric Drag Environment

    NASA Technical Reports Server (NTRS)

    Vavrina, Matthew A.; Newman, Clark P.; Slojkowski, Steven E.; Carpenter, J. Russell

    2014-01-01

    Orbit determination and prediction of the Fermi Gamma-ray Space Telescope trajectory is strongly impacted by the unpredictability and variability of atmospheric density and the spacecraft's ballistic coefficient. Operationally, Global Positioning System point solutions are processed with an extended Kalman filter for orbit determination, and predictions are generated for conjunction assessment with secondary objects. When these predictions are compared to Joint Space Operations Center radar-based solutions, the close approach distance between the two predictions can greatly differ ahead of the conjunction. This work explores strategies for improving prediction accuracy and helps to explain the prediction disparities. Namely, a tuning analysis is performed to determine atmospheric drag modeling and filter parameters that can improve orbit determination as well as prediction accuracy. A 45% improvement in three-day prediction accuracy is realized by tuning the ballistic coefficient and atmospheric density stochastic models, measurement frequency, and other modeling and filter parameters.

  6. Research on Improved Depth Belief Network-Based Prediction of Cardiovascular Diseases

    PubMed Central

    Zhang, Hongpo

    2018-01-01

    Quantitative analysis and prediction can help to reduce the risk of cardiovascular disease. Quantitative prediction based on traditional model has low accuracy. The variance of model prediction based on shallow neural network is larger. In this paper, cardiovascular disease prediction model based on improved deep belief network (DBN) is proposed. Using the reconstruction error, the network depth is determined independently, and unsupervised training and supervised optimization are combined. It ensures the accuracy of model prediction while guaranteeing stability. Thirty experiments were performed independently on the Statlog (Heart) and Heart Disease Database data sets in the UCI database. Experimental results showed that the mean of prediction accuracy was 91.26% and 89.78%, respectively. The variance of prediction accuracy was 5.78 and 4.46, respectively. PMID:29854369

  7. Improved method for predicting protein fold patterns with ensemble classifiers.

    PubMed

    Chen, W; Liu, X; Huang, Y; Jiang, Y; Zou, Q; Lin, C

    2012-01-27

    Protein folding is recognized as a critical problem in the field of biophysics in the 21st century. Predicting protein-folding patterns is challenging due to the complex structure of proteins. In an attempt to solve this problem, we employed ensemble classifiers to improve prediction accuracy. In our experiments, 188-dimensional features were extracted based on the composition and physical-chemical property of proteins and 20-dimensional features were selected using a coupled position-specific scoring matrix. Compared with traditional prediction methods, these methods were superior in terms of prediction accuracy. The 188-dimensional feature-based method achieved 71.2% accuracy in five cross-validations. The accuracy rose to 77% when we used a 20-dimensional feature vector. These methods were used on recent data, with 54.2% accuracy. Source codes and dataset, together with web server and software tools for prediction, are available at: http://datamining.xmu.edu.cn/main/~cwc/ProteinPredict.html.

  8. A Final Approach Trajectory Model for Current Operations

    NASA Technical Reports Server (NTRS)

    Gong, Chester; Sadovsky, Alexander

    2010-01-01

    Predicting accurate trajectories with limited intent information is a challenge faced by air traffic management decision support tools in operation today. One such tool is the FAA's Terminal Proximity Alert system which is intended to assist controllers in maintaining safe separation of arrival aircraft during final approach. In an effort to improve the performance of such tools, two final approach trajectory models are proposed; one based on polynomial interpolation, the other on the Fourier transform. These models were tested against actual traffic data and used to study effects of the key final approach trajectory modeling parameters of wind, aircraft type, and weight class, on trajectory prediction accuracy. Using only the limited intent data available to today's ATM system, both the polynomial interpolation and Fourier transform models showed improved trajectory prediction accuracy over a baseline dead reckoning model. Analysis of actual arrival traffic showed that this improved trajectory prediction accuracy leads to improved inter-arrival separation prediction accuracy for longer look ahead times. The difference in mean inter-arrival separation prediction error between the Fourier transform and dead reckoning models was 0.2 nmi for a look ahead time of 120 sec, a 33 percent improvement, with a corresponding 32 percent improvement in standard deviation.

  9. The Upper and Lower Bounds of the Prediction Accuracies of Ensemble Methods for Binary Classification

    PubMed Central

    Wang, Xueyi; Davidson, Nicholas J.

    2011-01-01

    Ensemble methods have been widely used to improve prediction accuracy over individual classifiers. In this paper, we achieve a few results about the prediction accuracies of ensemble methods for binary classification that are missed or misinterpreted in previous literature. First we show the upper and lower bounds of the prediction accuracies (i.e. the best and worst possible prediction accuracies) of ensemble methods. Next we show that an ensemble method can achieve > 0.5 prediction accuracy, while individual classifiers have < 0.5 prediction accuracies. Furthermore, for individual classifiers with different prediction accuracies, the average of the individual accuracies determines the upper and lower bounds. We perform two experiments to verify the results and show that it is hard to achieve the upper and lower bounds accuracies by random individual classifiers and better algorithms need to be developed. PMID:21853162

  10. Improved Short-Term Clock Prediction Method for Real-Time Positioning.

    PubMed

    Lv, Yifei; Dai, Zhiqiang; Zhao, Qile; Yang, Sheng; Zhou, Jinning; Liu, Jingnan

    2017-06-06

    The application of real-time precise point positioning (PPP) requires real-time precise orbit and clock products that should be predicted within a short time to compensate for the communication delay or data gap. Unlike orbit correction, clock correction is difficult to model and predict. The widely used linear model hardly fits long periodic trends with a small data set and exhibits significant accuracy degradation in real-time prediction when a large data set is used. This study proposes a new prediction model for maintaining short-term satellite clocks to meet the high-precision requirements of real-time clocks and provide clock extrapolation without interrupting the real-time data stream. Fast Fourier transform (FFT) is used to analyze the linear prediction residuals of real-time clocks. The periodic terms obtained through FFT are adopted in the sliding window prediction to achieve a significant improvement in short-term prediction accuracy. This study also analyzes and compares the accuracy of short-term forecasts (less than 3 h) by using different length observations. Experimental results obtained from International GNSS Service (IGS) final products and our own real-time clocks show that the 3-h prediction accuracy is better than 0.85 ns. The new model can replace IGS ultra-rapid products in the application of real-time PPP. It is also found that there is a positive correlation between the prediction accuracy and the short-term stability of on-board clocks. Compared with the accuracy of the traditional linear model, the accuracy of the static PPP using the new model of the 2-h prediction clock in N, E, and U directions is improved by about 50%. Furthermore, the static PPP accuracy of 2-h clock products is better than 0.1 m. When an interruption occurs in the real-time model, the accuracy of the kinematic PPP solution using 1-h clock prediction product is better than 0.2 m, without significant accuracy degradation. This model is of practical significance because it solves the problems of interruption and delay in data broadcast in real-time clock estimation and can meet the requirements of real-time PPP.

  11. Psychopathy, IQ, and Violence in European American and African American County Jail Inmates

    ERIC Educational Resources Information Center

    Walsh, Zach; Swogger, Marc T.; Kosson, David S.

    2004-01-01

    The accuracy of the prediction of criminal violence may be improved by combining psychopathy with other variables that have been found to predict violence. Research has suggested that assessing intelligence (i.e., IQ) as well as psychopathy improves the accuracy of violence prediction. In the present study, the authors tested this hypothesis by…

  12. Parameter prediction based on Improved Process neural network and ARMA error compensation in Evaporation Process

    NASA Astrophysics Data System (ADS)

    Qian, Xiaoshan

    2018-01-01

    The traditional model of evaporation process parameters have continuity and cumulative characteristics of the prediction error larger issues, based on the basis of the process proposed an adaptive particle swarm neural network forecasting method parameters established on the autoregressive moving average (ARMA) error correction procedure compensated prediction model to predict the results of the neural network to improve prediction accuracy. Taking a alumina plant evaporation process to analyze production data validation, and compared with the traditional model, the new model prediction accuracy greatly improved, can be used to predict the dynamic process of evaporation of sodium aluminate solution components.

  13. Artificial neural network prediction of ischemic tissue fate in acute stroke imaging

    PubMed Central

    Huang, Shiliang; Shen, Qiang; Duong, Timothy Q

    2010-01-01

    Multimodal magnetic resonance imaging of acute stroke provides predictive value that can be used to guide stroke therapy. A flexible artificial neural network (ANN) algorithm was developed and applied to predict ischemic tissue fate on three stroke groups: 30-, 60-minute, and permanent middle cerebral artery occlusion in rats. Cerebral blood flow (CBF), apparent diffusion coefficient (ADC), and spin–spin relaxation time constant (T2) were acquired during the acute phase up to 3 hours and again at 24 hours followed by histology. Infarct was predicted on a pixel-by-pixel basis using only acute (30-minute) stroke data. In addition, neighboring pixel information and infarction incidence were also incorporated into the ANN model to improve prediction accuracy. Receiver-operating characteristic analysis was used to quantify prediction accuracy. The major findings were the following: (1) CBF alone poorly predicted the final infarct across three experimental groups; (2) ADC alone adequately predicted the infarct; (3) CBF+ADC improved the prediction accuracy; (4) inclusion of neighboring pixel information and infarction incidence further improved the prediction accuracy; and (5) prediction was more accurate for permanent occlusion, followed by 60- and 30-minute occlusion. The ANN predictive model could thus provide a flexible and objective framework for clinicians to evaluate stroke treatment options on an individual patient basis. PMID:20424631

  14. Bio-knowledge based filters improve residue-residue contact prediction accuracy.

    PubMed

    Wozniak, P P; Pelc, J; Skrzypecki, M; Vriend, G; Kotulska, M

    2018-05-29

    Residue-residue contact prediction through direct coupling analysis has reached impressive accuracy, but yet higher accuracy will be needed to allow for routine modelling of protein structures. One way to improve the prediction accuracy is to filter predicted contacts using knowledge about the particular protein of interest or knowledge about protein structures in general. We focus on the latter and discuss a set of filters that can be used to remove false positive contact predictions. Each filter depends on one or a few cut-off parameters for which the filter performance was investigated. Combining all filters while using default parameters resulted for a test-set of 851 protein domains in the removal of 29% of the predictions of which 92% were indeed false positives. All data and scripts are available from http://comprec-lin.iiar.pwr.edu.pl/FPfilter/. malgorzata.kotulska@pwr.edu.pl. Supplementary data are available at Bioinformatics online.

  15. Protein docking prediction using predicted protein-protein interface.

    PubMed

    Li, Bin; Kihara, Daisuke

    2012-01-10

    Many important cellular processes are carried out by protein complexes. To provide physical pictures of interacting proteins, many computational protein-protein prediction methods have been developed in the past. However, it is still difficult to identify the correct docking complex structure within top ranks among alternative conformations. We present a novel protein docking algorithm that utilizes imperfect protein-protein binding interface prediction for guiding protein docking. Since the accuracy of protein binding site prediction varies depending on cases, the challenge is to develop a method which does not deteriorate but improves docking results by using a binding site prediction which may not be 100% accurate. The algorithm, named PI-LZerD (using Predicted Interface with Local 3D Zernike descriptor-based Docking algorithm), is based on a pair wise protein docking prediction algorithm, LZerD, which we have developed earlier. PI-LZerD starts from performing docking prediction using the provided protein-protein binding interface prediction as constraints, which is followed by the second round of docking with updated docking interface information to further improve docking conformation. Benchmark results on bound and unbound cases show that PI-LZerD consistently improves the docking prediction accuracy as compared with docking without using binding site prediction or using the binding site prediction as post-filtering. We have developed PI-LZerD, a pairwise docking algorithm, which uses imperfect protein-protein binding interface prediction to improve docking accuracy. PI-LZerD consistently showed better prediction accuracy over alternative methods in the series of benchmark experiments including docking using actual docking interface site predictions as well as unbound docking cases.

  16. Genetic algorithm based adaptive neural network ensemble and its application in predicting carbon flux

    USGS Publications Warehouse

    Xue, Y.; Liu, S.; Hu, Y.; Yang, J.; Chen, Q.

    2007-01-01

    To improve the accuracy in prediction, Genetic Algorithm based Adaptive Neural Network Ensemble (GA-ANNE) is presented. Intersections are allowed between different training sets based on the fuzzy clustering analysis, which ensures the diversity as well as the accuracy of individual Neural Networks (NNs). Moreover, to improve the accuracy of the adaptive weights of individual NNs, GA is used to optimize the cluster centers. Empirical results in predicting carbon flux of Duke Forest reveal that GA-ANNE can predict the carbon flux more accurately than Radial Basis Function Neural Network (RBFNN), Bagging NN ensemble, and ANNE. ?? 2007 IEEE.

  17. On the accuracy of ERS-1 orbit predictions

    NASA Technical Reports Server (NTRS)

    Koenig, Rolf; Li, H.; Massmann, Franz-Heinrich; Raimondo, J. C.; Rajasenan, C.; Reigber, C.

    1993-01-01

    Since the launch of ERS-1, the D-PAF (German Processing and Archiving Facility) provides regularly orbit predictions for the worldwide SLR (Satellite Laser Ranging) tracking network. The weekly distributed orbital elements are so called tuned IRV's and tuned SAO-elements. The tuning procedure, designed to improve the accuracy of the recovery of the orbit at the stations, is discussed based on numerical results. This shows that tuning of elements is essential for ERS-1 with the currently applied tracking procedures. The orbital elements are updated by daily distributed time bias functions. The generation of the time bias function is explained. Problems and numerical results are presented. The time bias function increases the prediction accuracy considerably. Finally, the quality assessment of ERS-1 orbit predictions is described. The accuracy is compiled for about 250 days since launch. The average accuracy lies in the range of 50-100 ms and has considerably improved.

  18. Canopy Temperature and Vegetation Indices from High-Throughput Phenotyping Improve Accuracy of Pedigree and Genomic Selection for Grain Yield in Wheat

    PubMed Central

    Rutkoski, Jessica; Poland, Jesse; Mondal, Suchismita; Autrique, Enrique; Pérez, Lorena González; Crossa, José; Reynolds, Matthew; Singh, Ravi

    2016-01-01

    Genomic selection can be applied prior to phenotyping, enabling shorter breeding cycles and greater rates of genetic gain relative to phenotypic selection. Traits measured using high-throughput phenotyping based on proximal or remote sensing could be useful for improving pedigree and genomic prediction model accuracies for traits not yet possible to phenotype directly. We tested if using aerial measurements of canopy temperature, and green and red normalized difference vegetation index as secondary traits in pedigree and genomic best linear unbiased prediction models could increase accuracy for grain yield in wheat, Triticum aestivum L., using 557 lines in five environments. Secondary traits on training and test sets, and grain yield on the training set were modeled as multivariate, and compared to univariate models with grain yield on the training set only. Cross validation accuracies were estimated within and across-environment, with and without replication, and with and without correcting for days to heading. We observed that, within environment, with unreplicated secondary trait data, and without correcting for days to heading, secondary traits increased accuracies for grain yield by 56% in pedigree, and 70% in genomic prediction models, on average. Secondary traits increased accuracy slightly more when replicated, and considerably less when models corrected for days to heading. In across-environment prediction, trends were similar but less consistent. These results show that secondary traits measured in high-throughput could be used in pedigree and genomic prediction to improve accuracy. This approach could improve selection in wheat during early stages if validated in early-generation breeding plots. PMID:27402362

  19. Dissolved oxygen content prediction in crab culture using a hybrid intelligent method

    PubMed Central

    Yu, Huihui; Chen, Yingyi; Hassan, ShahbazGul; Li, Daoliang

    2016-01-01

    A precise predictive model is needed to obtain a clear understanding of the changing dissolved oxygen content in outdoor crab ponds, to assess how to reduce risk and to optimize water quality management. The uncertainties in the data from multiple sensors are a significant factor when building a dissolved oxygen content prediction model. To increase prediction accuracy, a new hybrid dissolved oxygen content forecasting model based on the radial basis function neural networks (RBFNN) data fusion method and a least squares support vector machine (LSSVM) with an optimal improved particle swarm optimization(IPSO) is developed. In the modelling process, the RBFNN data fusion method is used to improve information accuracy and provide more trustworthy training samples for the IPSO-LSSVM prediction model. The LSSVM is a powerful tool for achieving nonlinear dissolved oxygen content forecasting. In addition, an improved particle swarm optimization algorithm is developed to determine the optimal parameters for the LSSVM with high accuracy and generalizability. In this study, the comparison of the prediction results of different traditional models validates the effectiveness and accuracy of the proposed hybrid RBFNN-IPSO-LSSVM model for dissolved oxygen content prediction in outdoor crab ponds. PMID:27270206

  20. Dissolved oxygen content prediction in crab culture using a hybrid intelligent method.

    PubMed

    Yu, Huihui; Chen, Yingyi; Hassan, ShahbazGul; Li, Daoliang

    2016-06-08

    A precise predictive model is needed to obtain a clear understanding of the changing dissolved oxygen content in outdoor crab ponds, to assess how to reduce risk and to optimize water quality management. The uncertainties in the data from multiple sensors are a significant factor when building a dissolved oxygen content prediction model. To increase prediction accuracy, a new hybrid dissolved oxygen content forecasting model based on the radial basis function neural networks (RBFNN) data fusion method and a least squares support vector machine (LSSVM) with an optimal improved particle swarm optimization(IPSO) is developed. In the modelling process, the RBFNN data fusion method is used to improve information accuracy and provide more trustworthy training samples for the IPSO-LSSVM prediction model. The LSSVM is a powerful tool for achieving nonlinear dissolved oxygen content forecasting. In addition, an improved particle swarm optimization algorithm is developed to determine the optimal parameters for the LSSVM with high accuracy and generalizability. In this study, the comparison of the prediction results of different traditional models validates the effectiveness and accuracy of the proposed hybrid RBFNN-IPSO-LSSVM model for dissolved oxygen content prediction in outdoor crab ponds.

  1. Integrated Strategy Improves the Prediction Accuracy of miRNA in Large Dataset

    PubMed Central

    Lipps, David; Devineni, Sree

    2016-01-01

    MiRNAs are short non-coding RNAs of about 22 nucleotides, which play critical roles in gene expression regulation. The biogenesis of miRNAs is largely determined by the sequence and structural features of their parental RNA molecules. Based on these features, multiple computational tools have been developed to predict if RNA transcripts contain miRNAs or not. Although being very successful, these predictors started to face multiple challenges in recent years. Many predictors were optimized using datasets of hundreds of miRNA samples. The sizes of these datasets are much smaller than the number of known miRNAs. Consequently, the prediction accuracy of these predictors in large dataset becomes unknown and needs to be re-tested. In addition, many predictors were optimized for either high sensitivity or high specificity. These optimization strategies may bring in serious limitations in applications. Moreover, to meet continuously raised expectations on these computational tools, improving the prediction accuracy becomes extremely important. In this study, a meta-predictor mirMeta was developed by integrating a set of non-linear transformations with meta-strategy. More specifically, the outputs of five individual predictors were first preprocessed using non-linear transformations, and then fed into an artificial neural network to make the meta-prediction. The prediction accuracy of meta-predictor was validated using both multi-fold cross-validation and independent dataset. The final accuracy of meta-predictor in newly-designed large dataset is improved by 7% to 93%. The meta-predictor is also proved to be less dependent on datasets, as well as has refined balance between sensitivity and specificity. This study has two folds of importance: First, it shows that the combination of non-linear transformations and artificial neural networks improves the prediction accuracy of individual predictors. Second, a new miRNA predictor with significantly improved prediction accuracy is developed for the community for identifying novel miRNAs and the complete set of miRNAs. Source code is available at: https://github.com/xueLab/mirMeta PMID:28002428

  2. Correcting Memory Improves Accuracy of Predicted Task Duration

    ERIC Educational Resources Information Center

    Roy, Michael M.; Mitten, Scott T.; Christenfeld, Nicholas J. S.

    2008-01-01

    People are often inaccurate in predicting task duration. The memory bias explanation holds that this error is due to people having incorrect memories of how long previous tasks have taken, and these biased memories cause biased predictions. Therefore, the authors examined the effect on increasing predictive accuracy of correcting memory through…

  3. Assessing Predictive Properties of Genome-Wide Selection in Soybeans

    PubMed Central

    Xavier, Alencar; Muir, William M.; Rainey, Katy Martin

    2016-01-01

    Many economically important traits in plant breeding have low heritability or are difficult to measure. For these traits, genomic selection has attractive features and may boost genetic gains. Our goal was to evaluate alternative scenarios to implement genomic selection for yield components in soybean (Glycine max L. merr). We used a nested association panel with cross validation to evaluate the impacts of training population size, genotyping density, and prediction model on the accuracy of genomic prediction. Our results indicate that training population size was the factor most relevant to improvement in genome-wide prediction, with greatest improvement observed in training sets up to 2000 individuals. We discuss assumptions that influence the choice of the prediction model. Although alternative models had minor impacts on prediction accuracy, the most robust prediction model was the combination of reproducing kernel Hilbert space regression and BayesB. Higher genotyping density marginally improved accuracy. Our study finds that breeding programs seeking efficient genomic selection in soybeans would best allocate resources by investing in a representative training set. PMID:27317786

  4. Validity of Predictive Equations for Resting Energy Expenditure Developed for Obese Patients: Impact of Body Composition Method

    PubMed Central

    Achamrah, Najate; Jésus, Pierre; Grigioni, Sébastien; Rimbert, Agnès; Petit, André; Déchelotte, Pierre; Folope, Vanessa; Coëffier, Moïse

    2018-01-01

    Predictive equations have been specifically developed for obese patients to estimate resting energy expenditure (REE). Body composition (BC) assessment is needed for some of these equations. We assessed the impact of BC methods on the accuracy of specific predictive equations developed in obese patients. REE was measured (mREE) by indirect calorimetry and BC assessed by bioelectrical impedance analysis (BIA) and dual-energy X-ray absorptiometry (DXA). mREE, percentages of prediction accuracy (±10% of mREE) were compared. Predictive equations were studied in 2588 obese patients. Mean mREE was 1788 ± 6.3 kcal/24 h. Only the Müller (BIA) and Harris & Benedict (HB) equations provided REE with no difference from mREE. The Huang, Müller, Horie-Waitzberg, and HB formulas provided a higher accurate prediction (>60% of cases). The use of BIA provided better predictions of REE than DXA for the Huang and Müller equations. Inversely, the Horie-Waitzberg and Lazzer formulas provided a higher accuracy using DXA. Accuracy decreased when applied to patients with BMI ≥ 40, except for the Horie-Waitzberg and Lazzer (DXA) formulas. Müller equations based on BIA provided a marked improvement of REE prediction accuracy than equations not based on BC. The interest of BC to improve REE predictive equations accuracy in obese patients should be confirmed. PMID:29320432

  5. Analysis of spatial distribution of land cover maps accuracy

    NASA Astrophysics Data System (ADS)

    Khatami, R.; Mountrakis, G.; Stehman, S. V.

    2017-12-01

    Land cover maps have become one of the most important products of remote sensing science. However, classification errors will exist in any classified map and affect the reliability of subsequent map usage. Moreover, classification accuracy often varies over different regions of a classified map. These variations of accuracy will affect the reliability of subsequent analyses of different regions based on the classified maps. The traditional approach of map accuracy assessment based on an error matrix does not capture the spatial variation in classification accuracy. Here, per-pixel accuracy prediction methods are proposed based on interpolating accuracy values from a test sample to produce wall-to-wall accuracy maps. Different accuracy prediction methods were developed based on four factors: predictive domain (spatial versus spectral), interpolation function (constant, linear, Gaussian, and logistic), incorporation of class information (interpolating each class separately versus grouping them together), and sample size. Incorporation of spectral domain as explanatory feature spaces of classification accuracy interpolation was done for the first time in this research. Performance of the prediction methods was evaluated using 26 test blocks, with 10 km × 10 km dimensions, dispersed throughout the United States. The performance of the predictions was evaluated using the area under the curve (AUC) of the receiver operating characteristic. Relative to existing accuracy prediction methods, our proposed methods resulted in improvements of AUC of 0.15 or greater. Evaluation of the four factors comprising the accuracy prediction methods demonstrated that: i) interpolations should be done separately for each class instead of grouping all classes together; ii) if an all-classes approach is used, the spectral domain will result in substantially greater AUC than the spatial domain; iii) for the smaller sample size and per-class predictions, the spectral and spatial domain yielded similar AUC; iv) for the larger sample size (i.e., very dense spatial sample) and per-class predictions, the spatial domain yielded larger AUC; v) increasing the sample size improved accuracy predictions with a greater benefit accruing to the spatial domain; and vi) the function used for interpolation had the smallest effect on AUC.

  6. Protein contact prediction using patterns of correlation.

    PubMed

    Hamilton, Nicholas; Burrage, Kevin; Ragan, Mark A; Huber, Thomas

    2004-09-01

    We describe a new method for using neural networks to predict residue contact pairs in a protein. The main inputs to the neural network are a set of 25 measures of correlated mutation between all pairs of residues in two "windows" of size 5 centered on the residues of interest. While the individual pair-wise correlations are a relatively weak predictor of contact, by training the network on windows of correlation the accuracy of prediction is significantly improved. The neural network is trained on a set of 100 proteins and then tested on a disjoint set of 1033 proteins of known structure. An average predictive accuracy of 21.7% is obtained taking the best L/2 predictions for each protein, where L is the sequence length. Taking the best L/10 predictions gives an average accuracy of 30.7%. The predictor is also tested on a set of 59 proteins from the CASP5 experiment. The accuracy is found to be relatively consistent across different sequence lengths, but to vary widely according to the secondary structure. Predictive accuracy is also found to improve by using multiple sequence alignments containing many sequences to calculate the correlations. Copyright 2004 Wiley-Liss, Inc.

  7. Adjusted Clinical Groups: Predictive Accuracy for Medicaid Enrollees in Three States

    PubMed Central

    Adams, E. Kathleen; Bronstein, Janet M.; Raskind-Hood, Cheryl

    2002-01-01

    Actuarial split-sample methods were used to assess predictive accuracy of adjusted clinical groups (ACGs) for Medicaid enrollees in Georgia, Mississippi (lagging in managed care penetration), and California. Accuracy for two non-random groups—high-cost and located in urban poor areas—was assessed. Measures for random groups were derived with and without short-term enrollees to assess the effect of turnover on predictive accuracy. ACGs improved predictive accuracy for high-cost conditions in all States, but did so only for those in Georgia's poorest urban areas. Higher and more unpredictable expenses of short-term enrollees moderated the predictive power of ACGs. This limitation was significant in Mississippi due in part, to that State's very high proportion of short-term enrollees. PMID:12545598

  8. Prediction of response factors for gas chromatography with flame ionization detection: Algorithm improvement, extension to silylated compounds, and application to the quantification of metabolites

    PubMed Central

    de Saint Laumer, Jean‐Yves; Leocata, Sabine; Tissot, Emeline; Baroux, Lucie; Kampf, David M.; Merle, Philippe; Boschung, Alain; Seyfried, Markus

    2015-01-01

    We previously showed that the relative response factors of volatile compounds were predictable from either combustion enthalpies or their molecular formulae only 1. We now extend this prediction to silylated derivatives by adding an increment in the ab initio calculation of combustion enthalpies. The accuracy of the experimental relative response factors database was also improved and its population increased to 490 values. In particular, more brominated compounds were measured, and their prediction accuracy was improved by adding a correction factor in the algorithm. The correlation coefficient between predicted and measured values increased from 0.936 to 0.972, leading to a mean prediction accuracy of ± 6%. Thus, 93% of the relative response factors values were predicted with an accuracy of better than ± 10%. The capabilities of the extended algorithm are exemplified by (i) the quick and accurate quantification of hydroxylated metabolites resulting from a biodegradation test after silylation and prediction of their relative response factors, without having the reference substances available; and (ii) the rapid purity determinations of volatile compounds. This study confirms that Gas chromatography with a flame ionization detector and using predicted relative response factors is one of the few techniques that enables quantification of volatile compounds without calibrating the instrument with the pure reference substance. PMID:26179324

  9. Accuracies of univariate and multivariate genomic prediction models in African cassava.

    PubMed

    Okeke, Uche Godfrey; Akdemir, Deniz; Rabbi, Ismail; Kulakow, Peter; Jannink, Jean-Luc

    2017-12-04

    Genomic selection (GS) promises to accelerate genetic gain in plant breeding programs especially for crop species such as cassava that have long breeding cycles. Practically, to implement GS in cassava breeding, it is necessary to evaluate different GS models and to develop suitable models for an optimized breeding pipeline. In this paper, we compared (1) prediction accuracies from a single-trait (uT) and a multi-trait (MT) mixed model for a single-environment genetic evaluation (Scenario 1), and (2) accuracies from a compound symmetric multi-environment model (uE) parameterized as a univariate multi-kernel model to a multivariate (ME) multi-environment mixed model that accounts for genotype-by-environment interaction for multi-environment genetic evaluation (Scenario 2). For these analyses, we used 16 years of public cassava breeding data for six target cassava traits and a fivefold cross-validation scheme with 10-repeat cycles to assess model prediction accuracies. In Scenario 1, the MT models had higher prediction accuracies than the uT models for all traits and locations analyzed, which amounted to on average a 40% improved prediction accuracy. For Scenario 2, we observed that the ME model had on average (across all locations and traits) a 12% improved prediction accuracy compared to the uE model. We recommend the use of multivariate mixed models (MT and ME) for cassava genetic evaluation. These models may be useful for other plant species.

  10. Boosted classification trees result in minor to modest improvement in the accuracy in classifying cardiovascular outcomes compared to conventional classification trees

    PubMed Central

    Austin, Peter C; Lee, Douglas S

    2011-01-01

    Purpose: Classification trees are increasingly being used to classifying patients according to the presence or absence of a disease or health outcome. A limitation of classification trees is their limited predictive accuracy. In the data-mining and machine learning literature, boosting has been developed to improve classification. Boosting with classification trees iteratively grows classification trees in a sequence of reweighted datasets. In a given iteration, subjects that were misclassified in the previous iteration are weighted more highly than subjects that were correctly classified. Classifications from each of the classification trees in the sequence are combined through a weighted majority vote to produce a final classification. The authors' objective was to examine whether boosting improved the accuracy of classification trees for predicting outcomes in cardiovascular patients. Methods: We examined the utility of boosting classification trees for classifying 30-day mortality outcomes in patients hospitalized with either acute myocardial infarction or congestive heart failure. Results: Improvements in the misclassification rate using boosted classification trees were at best minor compared to when conventional classification trees were used. Minor to modest improvements to sensitivity were observed, with only a negligible reduction in specificity. For predicting cardiovascular mortality, boosted classification trees had high specificity, but low sensitivity. Conclusions: Gains in predictive accuracy for predicting cardiovascular outcomes were less impressive than gains in performance observed in the data mining literature. PMID:22254181

  11. Assessing the accuracy of predictive models for numerical data: Not r nor r2, why not? Then what?

    PubMed Central

    2017-01-01

    Assessing the accuracy of predictive models is critical because predictive models have been increasingly used across various disciplines and predictive accuracy determines the quality of resultant predictions. Pearson product-moment correlation coefficient (r) and the coefficient of determination (r2) are among the most widely used measures for assessing predictive models for numerical data, although they are argued to be biased, insufficient and misleading. In this study, geometrical graphs were used to illustrate what were used in the calculation of r and r2 and simulations were used to demonstrate the behaviour of r and r2 and to compare three accuracy measures under various scenarios. Relevant confusions about r and r2, has been clarified. The calculation of r and r2 is not based on the differences between the predicted and observed values. The existing error measures suffer various limitations and are unable to tell the accuracy. Variance explained by predictive models based on cross-validation (VEcv) is free of these limitations and is a reliable accuracy measure. Legates and McCabe’s efficiency (E1) is also an alternative accuracy measure. The r and r2 do not measure the accuracy and are incorrect accuracy measures. The existing error measures suffer limitations. VEcv and E1 are recommended for assessing the accuracy. The applications of these accuracy measures would encourage accuracy-improved predictive models to be developed to generate predictions for evidence-informed decision-making. PMID:28837692

  12. Improving transmembrane protein consensus topology prediction using inter-helical interaction.

    PubMed

    Wang, Han; Zhang, Chao; Shi, Xiaohu; Zhang, Li; Zhou, You

    2012-11-01

    Alpha helix transmembrane proteins (αTMPs) represent roughly 30% of all open reading frames (ORFs) in a typical genome and are involved in many critical biological processes. Due to the special physicochemical properties, it is hard to crystallize and obtain high resolution structures experimentally, thus, sequence-based topology prediction is highly desirable for the study of transmembrane proteins (TMPs), both in structure prediction and function prediction. Various model-based topology prediction methods have been developed, but the accuracy of those individual predictors remain poor due to the limitation of the methods or the features they used. Thus, the consensus topology prediction method becomes practical for high accuracy applications by combining the advances of the individual predictors. Here, based on the observation that inter-helical interactions are commonly found within the transmembrane helixes (TMHs) and strongly indicate the existence of them, we present a novel consensus topology prediction method for αTMPs, CNTOP, which incorporates four top leading individual topology predictors, and further improves the prediction accuracy by using the predicted inter-helical interactions. The method achieved 87% prediction accuracy based on a benchmark dataset and 78% accuracy based on a non-redundant dataset which is composed of polytopic αTMPs. Our method derives the highest topology accuracy than any other individual predictors and consensus predictors, at the same time, the TMHs are more accurately predicted in their length and locations, where both the false positives (FPs) and the false negatives (FNs) decreased dramatically. The CNTOP is available at: http://ccst.jlu.edu.cn/JCSB/cntop/CNTOP.html. Copyright © 2012 Elsevier B.V. All rights reserved.

  13. Predictive accuracy of combined genetic and environmental risk scores.

    PubMed

    Dudbridge, Frank; Pashayan, Nora; Yang, Jian

    2018-02-01

    The substantial heritability of most complex diseases suggests that genetic data could provide useful risk prediction. To date the performance of genetic risk scores has fallen short of the potential implied by heritability, but this can be explained by insufficient sample sizes for estimating highly polygenic models. When risk predictors already exist based on environment or lifestyle, two key questions are to what extent can they be improved by adding genetic information, and what is the ultimate potential of combined genetic and environmental risk scores? Here, we extend previous work on the predictive accuracy of polygenic scores to allow for an environmental score that may be correlated with the polygenic score, for example when the environmental factors mediate the genetic risk. We derive common measures of predictive accuracy and improvement as functions of the training sample size, chip heritabilities of disease and environmental score, and genetic correlation between disease and environmental risk factors. We consider simple addition of the two scores and a weighted sum that accounts for their correlation. Using examples from studies of cardiovascular disease and breast cancer, we show that improvements in discrimination are generally small but reasonable degrees of reclassification could be obtained with current sample sizes. Correlation between genetic and environmental scores has only minor effects on numerical results in realistic scenarios. In the longer term, as the accuracy of polygenic scores improves they will come to dominate the predictive accuracy compared to environmental scores. © 2017 WILEY PERIODICALS, INC.

  14. Predictive accuracy of combined genetic and environmental risk scores

    PubMed Central

    Pashayan, Nora; Yang, Jian

    2017-01-01

    ABSTRACT The substantial heritability of most complex diseases suggests that genetic data could provide useful risk prediction. To date the performance of genetic risk scores has fallen short of the potential implied by heritability, but this can be explained by insufficient sample sizes for estimating highly polygenic models. When risk predictors already exist based on environment or lifestyle, two key questions are to what extent can they be improved by adding genetic information, and what is the ultimate potential of combined genetic and environmental risk scores? Here, we extend previous work on the predictive accuracy of polygenic scores to allow for an environmental score that may be correlated with the polygenic score, for example when the environmental factors mediate the genetic risk. We derive common measures of predictive accuracy and improvement as functions of the training sample size, chip heritabilities of disease and environmental score, and genetic correlation between disease and environmental risk factors. We consider simple addition of the two scores and a weighted sum that accounts for their correlation. Using examples from studies of cardiovascular disease and breast cancer, we show that improvements in discrimination are generally small but reasonable degrees of reclassification could be obtained with current sample sizes. Correlation between genetic and environmental scores has only minor effects on numerical results in realistic scenarios. In the longer term, as the accuracy of polygenic scores improves they will come to dominate the predictive accuracy compared to environmental scores. PMID:29178508

  15. Improving the Accuracy of Software-Based Energy Analysis for Residential Buildings (Presentation)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Polly, B.

    2011-09-01

    This presentation describes the basic components of software-based energy analysis for residential buildings, explores the concepts of 'error' and 'accuracy' when analysis predictions are compared to measured data, and explains how NREL is working to continuously improve the accuracy of energy analysis methods.

  16. Exploring the genetic architecture and improving genomic prediction accuracy for mastitis and milk production traits in dairy cattle by mapping variants to hepatic transcriptomic regions responsive to intra-mammary infection.

    PubMed

    Fang, Lingzhao; Sahana, Goutam; Ma, Peipei; Su, Guosheng; Yu, Ying; Zhang, Shengli; Lund, Mogens Sandø; Sørensen, Peter

    2017-05-12

    A better understanding of the genetic architecture of complex traits can contribute to improve genomic prediction. We hypothesized that genomic variants associated with mastitis and milk production traits in dairy cattle are enriched in hepatic transcriptomic regions that are responsive to intra-mammary infection (IMI). Genomic markers [e.g. single nucleotide polymorphisms (SNPs)] from those regions, if included, may improve the predictive ability of a genomic model. We applied a genomic feature best linear unbiased prediction model (GFBLUP) to implement the above strategy by considering the hepatic transcriptomic regions responsive to IMI as genomic features. GFBLUP, an extension of GBLUP, includes a separate genomic effect of SNPs within a genomic feature, and allows differential weighting of the individual marker relationships in the prediction equation. Since GFBLUP is computationally intensive, we investigated whether a SNP set test could be a computationally fast way to preselect predictive genomic features. The SNP set test assesses the association between a genomic feature and a trait based on single-SNP genome-wide association studies. We applied these two approaches to mastitis and milk production traits (milk, fat and protein yield) in Holstein (HOL, n = 5056) and Jersey (JER, n = 1231) cattle. We observed that a majority of genomic features were enriched in genomic variants that were associated with mastitis and milk production traits. Compared to GBLUP, the accuracy of genomic prediction with GFBLUP was marginally improved (3.2 to 3.9%) in within-breed prediction. The highest increase (164.4%) in prediction accuracy was observed in across-breed prediction. The significance of genomic features based on the SNP set test were correlated with changes in prediction accuracy of GFBLUP (P < 0.05). GFBLUP provides a framework for integrating multiple layers of biological knowledge to provide novel insights into the biological basis of complex traits, and to improve the accuracy of genomic prediction. The SNP set test might be used as a first-step to improve GFBLUP models. Approaches like GFBLUP and SNP set test will become increasingly useful, as the functional annotations of genomes keep accumulating for a range of species and traits.

  17. Mean Expected Error in Prediction of Total Body Water: A True Accuracy Comparison between Bioimpedance Spectroscopy and Single Frequency Regression Equations

    PubMed Central

    Abtahi, Shirin; Abtahi, Farhad; Ellegård, Lars; Johannsson, Gudmundur; Bosaeus, Ingvar

    2015-01-01

    For several decades electrical bioimpedance (EBI) has been used to assess body fluid distribution and body composition. Despite the development of several different approaches for assessing total body water (TBW), it remains uncertain whether bioimpedance spectroscopic (BIS) approaches are more accurate than single frequency regression equations. The main objective of this study was to answer this question by calculating the expected accuracy of a single measurement for different EBI methods. The results of this study showed that all methods produced similarly high correlation and concordance coefficients, indicating good accuracy as a method. Even the limits of agreement produced from the Bland-Altman analysis indicated that the performance of single frequency, Sun's prediction equations, at population level was close to the performance of both BIS methods; however, when comparing the Mean Absolute Percentage Error value between the single frequency prediction equations and the BIS methods, a significant difference was obtained, indicating slightly better accuracy for the BIS methods. Despite the higher accuracy of BIS methods over 50 kHz prediction equations at both population and individual level, the magnitude of the improvement was small. Such slight improvement in accuracy of BIS methods is suggested insufficient to warrant their clinical use where the most accurate predictions of TBW are required, for example, when assessing over-fluidic status on dialysis. To reach expected errors below 4-5%, novel and individualized approaches must be developed to improve the accuracy of bioimpedance-based methods for the advent of innovative personalized health monitoring applications. PMID:26137489

  18. Improved Displacement Transfer Functions for Structure Deformed Shape Predictions Using Discretely Distributed Surface Strains

    NASA Technical Reports Server (NTRS)

    Ko, William L.; Fleischer, Van Tran

    2012-01-01

    In the formulations of earlier Displacement Transfer Functions for structure shape predictions, the surface strain distributions, along a strain-sensing line, were represented with piecewise linear functions. To improve the shape-prediction accuracies, Improved Displacement Transfer Functions were formulated using piecewise nonlinear strain representations. Through discretization of an embedded beam (depth-wise cross section of a structure along a strain-sensing line) into multiple small domains, piecewise nonlinear functions were used to describe the surface strain distributions along the discretized embedded beam. Such piecewise approach enabled the piecewise integrations of the embedded beam curvature equations to yield slope and deflection equations in recursive forms. The resulting Improved Displacement Transfer Functions, written in summation forms, were expressed in terms of beam geometrical parameters and surface strains along the strain-sensing line. By feeding the surface strains into the Improved Displacement Transfer Functions, structural deflections could be calculated at multiple points for mapping out the overall structural deformed shapes for visual display. The shape-prediction accuracies of the Improved Displacement Transfer Functions were then examined in view of finite-element-calculated deflections using different tapered cantilever tubular beams. It was found that by using the piecewise nonlinear strain representations, the shape-prediction accuracies could be greatly improved, especially for highly-tapered cantilever tubular beams.

  19. Prediction of Industrial Electric Energy Consumption in Anhui Province Based on GA-BP Neural Network

    NASA Astrophysics Data System (ADS)

    Zhang, Jiajing; Yin, Guodong; Ni, Youcong; Chen, Jinlan

    2018-01-01

    In order to improve the prediction accuracy of industrial electrical energy consumption, a prediction model of industrial electrical energy consumption was proposed based on genetic algorithm and neural network. The model use genetic algorithm to optimize the weights and thresholds of BP neural network, and the model is used to predict the energy consumption of industrial power in Anhui Province, to improve the prediction accuracy of industrial electric energy consumption in Anhui province. By comparing experiment of GA-BP prediction model and BP neural network model, the GA-BP model is more accurate with smaller number of neurons in the hidden layer.

  20. A new software for prediction of femoral neck fractures.

    PubMed

    Testi, Debora; Cappello, Angelo; Sgallari, Fiorella; Rumpf, Martin; Viceconti, Marco

    2004-08-01

    Femoral neck fractures are an important clinical, social and economic problem. Even if many different attempts have been carried out to improve the accuracy predicting the fracture risk, it was demonstrated in retrospective studies that the standard clinical protocol achieves an accuracy of about 65%. A new procedure was developed including for the prediction not only bone mineral density but also geometric and femoral strength information and achieving an accuracy of about 80% in a previous retrospective study. Aim of the present work was to re-engineer research-based procedures and develop a real-time software for the prediction of the risk for femoral fracture. The result was efficient, repeatable and easy to use software for the evaluation of the femoral neck fracture risk to be inserted in the daily clinical practice providing a useful tool for the improvement of fracture prediction.

  1. Predicting Earth orientation changes from global forecasts of atmosphere-hydrosphere dynamics

    NASA Astrophysics Data System (ADS)

    Dobslaw, Henryk; Dill, Robert

    2018-02-01

    Effective Angular Momentum (EAM) functions obtained from global numerical simulations of atmosphere, ocean, and land surface dynamics are routinely processed by the Earth System Modelling group at Deutsches GeoForschungsZentrum. EAM functions are available since January 1976 with up to 3 h temporal resolution. Additionally, 6 days-long EAM forecasts are routinely published every day. Based on hindcast experiments with 305 individual predictions distributed over 15 months, we demonstrate that EAM forecasts improve the prediction accuracy of the Earth Orientation Parameters at all forecast horizons between 1 and 6 days. At day 6, prediction accuracy improves down to 1.76 mas for the terrestrial pole offset, and 2.6 mas for Δ UT1, which correspond to an accuracy increase of about 41% over predictions published in Bulletin A by the International Earth Rotation and Reference System Service.

  2. Improving Prediction Accuracy for WSN Data Reduction by Applying Multivariate Spatio-Temporal Correlation

    PubMed Central

    Carvalho, Carlos; Gomes, Danielo G.; Agoulmine, Nazim; de Souza, José Neuman

    2011-01-01

    This paper proposes a method based on multivariate spatial and temporal correlation to improve prediction accuracy in data reduction for Wireless Sensor Networks (WSN). Prediction of data not sent to the sink node is a technique used to save energy in WSNs by reducing the amount of data traffic. However, it may not be very accurate. Simulations were made involving simple linear regression and multiple linear regression functions to assess the performance of the proposed method. The results show a higher correlation between gathered inputs when compared to time, which is an independent variable widely used for prediction and forecasting. Prediction accuracy is lower when simple linear regression is used, whereas multiple linear regression is the most accurate one. In addition to that, our proposal outperforms some current solutions by about 50% in humidity prediction and 21% in light prediction. To the best of our knowledge, we believe that we are probably the first to address prediction based on multivariate correlation for WSN data reduction. PMID:22346626

  3. Accurate disulfide-bonding network predictions improve ab initio structure prediction of cysteine-rich proteins

    PubMed Central

    Yang, Jing; He, Bao-Ji; Jang, Richard; Zhang, Yang; Shen, Hong-Bin

    2015-01-01

    Abstract Motivation: Cysteine-rich proteins cover many important families in nature but there are currently no methods specifically designed for modeling the structure of these proteins. The accuracy of disulfide connectivity pattern prediction, particularly for the proteins of higher-order connections, e.g. >3 bonds, is too low to effectively assist structure assembly simulations. Results: We propose a new hierarchical order reduction protocol called Cyscon for disulfide-bonding prediction. The most confident disulfide bonds are first identified and bonding prediction is then focused on the remaining cysteine residues based on SVR training. Compared with purely machine learning-based approaches, Cyscon improved the average accuracy of connectivity pattern prediction by 21.9%. For proteins with more than 5 disulfide bonds, Cyscon improved the accuracy by 585% on the benchmark set of PDBCYS. When applied to 158 non-redundant cysteine-rich proteins, Cyscon predictions helped increase (or decrease) the TM-score (or RMSD) of the ab initio QUARK modeling by 12.1% (or 14.4%). This result demonstrates a new avenue to improve the ab initio structure modeling for cysteine-rich proteins. Availability and implementation: http://www.csbio.sjtu.edu.cn/bioinf/Cyscon/ Contact: zhng@umich.edu or hbshen@sjtu.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26254435

  4. Accuracy Analysis of a Box-wing Theoretical SRP Model

    NASA Astrophysics Data System (ADS)

    Wang, Xiaoya; Hu, Xiaogong; Zhao, Qunhe; Guo, Rui

    2016-07-01

    For Beidou satellite navigation system (BDS) a high accuracy SRP model is necessary for high precise applications especially with Global BDS establishment in future. The BDS accuracy for broadcast ephemeris need be improved. So, a box-wing theoretical SRP model with fine structure and adding conical shadow factor of earth and moon were established. We verified this SRP model by the GPS Block IIF satellites. The calculation was done with the data of PRN 1, 24, 25, 27 satellites. The results show that the physical SRP model for POD and forecast for GPS IIF satellite has higher accuracy with respect to Bern empirical model. The 3D-RMS of orbit is about 20 centimeters. The POD accuracy for both models is similar but the prediction accuracy with the physical SRP model is more than doubled. We tested 1-day 3-day and 7-day orbit prediction. The longer is the prediction arc length, the more significant is the improvement. The orbit prediction accuracy with the physical SRP model for 1-day, 3-day and 7-day arc length are 0.4m, 2.0m, 10.0m respectively. But they are 0.9m, 5.5m and 30m with Bern empirical model respectively. We apply this means to the BDS and give out a SRP model for Beidou satellites. Then we test and verify the model with Beidou data of one month only for test. Initial results show the model is good but needs more data for verification and improvement. The orbit residual RMS is similar to that with our empirical force model which only estimate the force for along track, across track direction and y-bias. But the orbit overlap and SLR observation evaluation show some improvement. The remaining empirical force is reduced significantly for present Beidou constellation.

  5. Testing the applicability of artificial intelligence techniques to the subject of erythemal ultraviolet solar radiation. Part two: an intelligent system based on multi-classifier technique.

    PubMed

    Elminir, Hamdy K; Own, Hala S; Azzam, Yosry A; Riad, A M

    2008-03-28

    The problem we address here describes the on-going research effort that takes place to shed light on the applicability of using artificial intelligence techniques to predict the local noon erythemal UV irradiance in the plain areas of Egypt. In light of this fact, we use the bootstrap aggregating (bagging) algorithm to improve the prediction accuracy reported by a multi-layer perceptron (MLP) network. The results showed that, the overall prediction accuracy for the MLP network was only 80.9%. When bagging algorithm is used, the accuracy reached 94.8%; an improvement of about 13.9% was achieved. These improvements demonstrate the efficiency of the bagging procedure, and may be used as a promising tool at least for the plain areas of Egypt.

  6. Impact of fitting dominance and additive effects on accuracy of genomic prediction of breeding values in layers.

    PubMed

    Heidaritabar, M; Wolc, A; Arango, J; Zeng, J; Settar, P; Fulton, J E; O'Sullivan, N P; Bastiaansen, J W M; Fernando, R L; Garrick, D J; Dekkers, J C M

    2016-10-01

    Most genomic prediction studies fit only additive effects in models to estimate genomic breeding values (GEBV). However, if dominance genetic effects are an important source of variation for complex traits, accounting for them may improve the accuracy of GEBV. We investigated the effect of fitting dominance and additive effects on the accuracy of GEBV for eight egg production and quality traits in a purebred line of brown layers using pedigree or genomic information (42K single-nucleotide polymorphism (SNP) panel). Phenotypes were corrected for the effect of hatch date. Additive and dominance genetic variances were estimated using genomic-based [genomic best linear unbiased prediction (GBLUP)-REML and BayesC] and pedigree-based (PBLUP-REML) methods. Breeding values were predicted using a model that included both additive and dominance effects and a model that included only additive effects. The reference population consisted of approximately 1800 animals hatched between 2004 and 2009, while approximately 300 young animals hatched in 2010 were used for validation. Accuracy of prediction was computed as the correlation between phenotypes and estimated breeding values of the validation animals divided by the square root of the estimate of heritability in the whole population. The proportion of dominance variance to total phenotypic variance ranged from 0.03 to 0.22 with PBLUP-REML across traits, from 0 to 0.03 with GBLUP-REML and from 0.01 to 0.05 with BayesC. Accuracies of GEBV ranged from 0.28 to 0.60 across traits. Inclusion of dominance effects did not improve the accuracy of GEBV, and differences in their accuracies between genomic-based methods were small (0.01-0.05), with GBLUP-REML yielding higher prediction accuracies than BayesC for egg production, egg colour and yolk weight, while BayesC yielded higher accuracies than GBLUP-REML for the other traits. In conclusion, fitting dominance effects did not impact accuracy of genomic prediction of breeding values in this population. © 2016 Blackwell Verlag GmbH.

  7. Prediction Accuracy: The Role of Feedback in 6th Graders' Recall Predictions

    ERIC Educational Resources Information Center

    Al-Harthy, Ibrahim S.

    2016-01-01

    The current study focused on the role of feedback on students' prediction accuracy (calibration). This phenomenon has been widely studied, but questions remain about how best to improve it. In the current investigation, fifty-seven students from sixth grade were randomly assigned to control and experimental groups. Thirty pictures were chosen from…

  8. Genomic Prediction Accounting for Residual Heteroskedasticity

    PubMed Central

    Ou, Zhining; Tempelman, Robert J.; Steibel, Juan P.; Ernst, Catherine W.; Bates, Ronald O.; Bello, Nora M.

    2015-01-01

    Whole-genome prediction (WGP) models that use single-nucleotide polymorphism marker information to predict genetic merit of animals and plants typically assume homogeneous residual variance. However, variability is often heterogeneous across agricultural production systems and may subsequently bias WGP-based inferences. This study extends classical WGP models based on normality, heavy-tailed specifications and variable selection to explicitly account for environmentally-driven residual heteroskedasticity under a hierarchical Bayesian mixed-models framework. WGP models assuming homogeneous or heterogeneous residual variances were fitted to training data generated under simulation scenarios reflecting a gradient of increasing heteroskedasticity. Model fit was based on pseudo-Bayes factors and also on prediction accuracy of genomic breeding values computed on a validation data subset one generation removed from the simulated training dataset. Homogeneous vs. heterogeneous residual variance WGP models were also fitted to two quantitative traits, namely 45-min postmortem carcass temperature and loin muscle pH, recorded in a swine resource population dataset prescreened for high and mild residual heteroskedasticity, respectively. Fit of competing WGP models was compared using pseudo-Bayes factors. Predictive ability, defined as the correlation between predicted and observed phenotypes in validation sets of a five-fold cross-validation was also computed. Heteroskedastic error WGP models showed improved model fit and enhanced prediction accuracy compared to homoskedastic error WGP models although the magnitude of the improvement was small (less than two percentage points net gain in prediction accuracy). Nevertheless, accounting for residual heteroskedasticity did improve accuracy of selection, especially on individuals of extreme genetic merit. PMID:26564950

  9. Improved accuracy of intraocular lens power calculation with the Zeiss IOLMaster.

    PubMed

    Olsen, Thomas

    2007-02-01

    This study aimed to demonstrate how the level of accuracy in intraocular lens (IOL) power calculation can be improved with optical biometry using partial optical coherence interferometry (PCI) (Zeiss IOLMaster) and current anterior chamber depth (ACD) prediction algorithms. Intraocular lens power in 461 consecutive cataract operations was calculated using both PCI and ultrasound and the accuracy of the results of each technique were compared. To illustrate the importance of ACD prediction per se, predictions were calculated using both a recently published 5-variable method and the Haigis 2-variable method and the results compared. All calculations were optimized in retrospect to account for systematic errors, including IOL constants and other off-set errors. The average absolute IOL prediction error (observed minus expected refraction) was 0.65 dioptres with ultrasound and 0.43 D with PCI using the 5-variable ACD prediction method (p < 0.00001). The number of predictions within +/- 0.5 D, +/- 1.0 D and +/- 2.0 D of the expected outcome was 62.5%, 92.4% and 99.9% with PCI, compared with 45.5%, 77.3% and 98.4% with ultrasound, respectively (p < 0.00001). The 2-variable ACD method resulted in an average error in PCI predictions of 0.46 D, which was significantly higher than the error in the 5-variable method (p < 0.001). The accuracy of IOL power calculation can be significantly improved using calibrated axial length readings obtained with PCI and modern IOL power calculation formulas incorporating the latest generation ACD prediction algorithms.

  10. Can machine-learning improve cardiovascular risk prediction using routine clinical data?

    PubMed Central

    Kai, Joe; Garibaldi, Jonathan M.; Qureshi, Nadeem

    2017-01-01

    Background Current approaches to predict cardiovascular risk fail to identify many people who would benefit from preventive treatment, while others receive unnecessary intervention. Machine-learning offers opportunity to improve accuracy by exploiting complex interactions between risk factors. We assessed whether machine-learning can improve cardiovascular risk prediction. Methods Prospective cohort study using routine clinical data of 378,256 patients from UK family practices, free from cardiovascular disease at outset. Four machine-learning algorithms (random forest, logistic regression, gradient boosting machines, neural networks) were compared to an established algorithm (American College of Cardiology guidelines) to predict first cardiovascular event over 10-years. Predictive accuracy was assessed by area under the ‘receiver operating curve’ (AUC); and sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) to predict 7.5% cardiovascular risk (threshold for initiating statins). Findings 24,970 incident cardiovascular events (6.6%) occurred. Compared to the established risk prediction algorithm (AUC 0.728, 95% CI 0.723–0.735), machine-learning algorithms improved prediction: random forest +1.7% (AUC 0.745, 95% CI 0.739–0.750), logistic regression +3.2% (AUC 0.760, 95% CI 0.755–0.766), gradient boosting +3.3% (AUC 0.761, 95% CI 0.755–0.766), neural networks +3.6% (AUC 0.764, 95% CI 0.759–0.769). The highest achieving (neural networks) algorithm predicted 4,998/7,404 cases (sensitivity 67.5%, PPV 18.4%) and 53,458/75,585 non-cases (specificity 70.7%, NPV 95.7%), correctly predicting 355 (+7.6%) more patients who developed cardiovascular disease compared to the established algorithm. Conclusions Machine-learning significantly improves accuracy of cardiovascular risk prediction, increasing the number of patients identified who could benefit from preventive treatment, while avoiding unnecessary treatment of others. PMID:28376093

  11. Can machine-learning improve cardiovascular risk prediction using routine clinical data?

    PubMed

    Weng, Stephen F; Reps, Jenna; Kai, Joe; Garibaldi, Jonathan M; Qureshi, Nadeem

    2017-01-01

    Current approaches to predict cardiovascular risk fail to identify many people who would benefit from preventive treatment, while others receive unnecessary intervention. Machine-learning offers opportunity to improve accuracy by exploiting complex interactions between risk factors. We assessed whether machine-learning can improve cardiovascular risk prediction. Prospective cohort study using routine clinical data of 378,256 patients from UK family practices, free from cardiovascular disease at outset. Four machine-learning algorithms (random forest, logistic regression, gradient boosting machines, neural networks) were compared to an established algorithm (American College of Cardiology guidelines) to predict first cardiovascular event over 10-years. Predictive accuracy was assessed by area under the 'receiver operating curve' (AUC); and sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) to predict 7.5% cardiovascular risk (threshold for initiating statins). 24,970 incident cardiovascular events (6.6%) occurred. Compared to the established risk prediction algorithm (AUC 0.728, 95% CI 0.723-0.735), machine-learning algorithms improved prediction: random forest +1.7% (AUC 0.745, 95% CI 0.739-0.750), logistic regression +3.2% (AUC 0.760, 95% CI 0.755-0.766), gradient boosting +3.3% (AUC 0.761, 95% CI 0.755-0.766), neural networks +3.6% (AUC 0.764, 95% CI 0.759-0.769). The highest achieving (neural networks) algorithm predicted 4,998/7,404 cases (sensitivity 67.5%, PPV 18.4%) and 53,458/75,585 non-cases (specificity 70.7%, NPV 95.7%), correctly predicting 355 (+7.6%) more patients who developed cardiovascular disease compared to the established algorithm. Machine-learning significantly improves accuracy of cardiovascular risk prediction, increasing the number of patients identified who could benefit from preventive treatment, while avoiding unnecessary treatment of others.

  12. Assessing genomic selection prediction accuracy in a dynamic barley breeding

    USDA-ARS?s Scientific Manuscript database

    Genomic selection is a method to improve quantitative traits in crops and livestock by estimating breeding values of selection candidates using phenotype and genome-wide marker data sets. Prediction accuracy has been evaluated through simulation and cross-validation, however validation based on prog...

  13. Linkage disequilibrium among commonly genotyped SNP and variants detected from bull sequence

    USDA-ARS?s Scientific Manuscript database

    Genomic prediction utilizing causal variants could increase selection accuracy above that achieved with SNP genotyped by commercial assays. A number of variants detected from sequencing influential sires are likely to be causal, but noticable improvements in prediction accuracy using imputed sequen...

  14. Improving the spectral measurement accuracy based on temperature distribution and spectra-temperature relationship

    NASA Astrophysics Data System (ADS)

    Li, Zhe; Feng, Jinchao; Liu, Pengyu; Sun, Zhonghua; Li, Gang; Jia, Kebin

    2018-05-01

    Temperature is usually considered as a fluctuation in near-infrared spectral measurement. Chemometric methods were extensively studied to correct the effect of temperature variations. However, temperature can be considered as a constructive parameter that provides detailed chemical information when systematically changed during the measurement. Our group has researched the relationship between temperature-induced spectral variation (TSVC) and normalized squared temperature. In this study, we focused on the influence of temperature distribution in calibration set. Multi-temperature calibration set selection (MTCS) method was proposed to improve the prediction accuracy by considering the temperature distribution of calibration samples. Furthermore, double-temperature calibration set selection (DTCS) method was proposed based on MTCS method and the relationship between TSVC and normalized squared temperature. We compare the prediction performance of PLS models based on random sampling method and proposed methods. The results from experimental studies showed that the prediction performance was improved by using proposed methods. Therefore, MTCS method and DTCS method will be the alternative methods to improve prediction accuracy in near-infrared spectral measurement.

  15. Checking the predictive accuracy of basic symptoms against ultra high-risk criteria and testing of a multivariable prediction model: Evidence from a prospective three-year observational study of persons at clinical high-risk for psychosis.

    PubMed

    Hengartner, M P; Heekeren, K; Dvorsky, D; Walitza, S; Rössler, W; Theodoridou, A

    2017-09-01

    The aim of this study was to critically examine the prognostic validity of various clinical high-risk (CHR) criteria alone and in combination with additional clinical characteristics. A total of 188 CHR positive persons from the region of Zurich, Switzerland (mean age 20.5 years; 60.2% male), meeting ultra high-risk (UHR) and/or basic symptoms (BS) criteria, were followed over three years. The test battery included the Structured Interview for Prodromal Syndromes (SIPS), verbal IQ and many other screening tools. Conversion to psychosis was defined according to ICD-10 criteria for schizophrenia (F20) or brief psychotic disorder (F23). Altogether n=24 persons developed manifest psychosis within three years and according to Kaplan-Meier survival analysis, the projected conversion rate was 17.5%. The predictive accuracy of UHR was statistically significant but poor (area under the curve [AUC]=0.65, P<.05), whereas BS did not predict psychosis beyond mere chance (AUC=0.52, P=.730). Sensitivity and specificity were 0.83 and 0.47 for UHR, and 0.96 and 0.09 for BS. UHR plus BS achieved an AUC=0.66, with sensitivity and specificity of 0.75 and 0.56. In comparison, baseline antipsychotic medication yielded a predictive accuracy of AUC=0.62 (sensitivity=0.42; specificity=0.82). A multivariable prediction model comprising continuous measures of positive symptoms and verbal IQ achieved a substantially improved prognostic accuracy (AUC=0.85; sensitivity=0.86; specificity=0.85; positive predictive value=0.54; negative predictive value=0.97). We showed that BS have no predictive accuracy beyond chance, while UHR criteria poorly predict conversion to psychosis. Combining BS with UHR criteria did not improve the predictive accuracy of UHR alone. In contrast, dimensional measures of both positive symptoms and verbal IQ showed excellent prognostic validity. A critical re-thinking of binary at-risk criteria is necessary in order to improve the prognosis of psychotic disorders. Copyright © 2017 Elsevier Masson SAS. All rights reserved.

  16. Correlation of chemical shifts predicted by molecular dynamics simulations for partially disordered proteins.

    PubMed

    Karp, Jerome M; Eryilmaz, Ertan; Erylimaz, Ertan; Cowburn, David

    2015-01-01

    There has been a longstanding interest in being able to accurately predict NMR chemical shifts from structural data. Recent studies have focused on using molecular dynamics (MD) simulation data as input for improved prediction. Here we examine the accuracy of chemical shift prediction for intein systems, which have regions of intrinsic disorder. We find that using MD simulation data as input for chemical shift prediction does not consistently improve prediction accuracy over use of a static X-ray crystal structure. This appears to result from the complex conformational ensemble of the disordered protein segments. We show that using accelerated molecular dynamics (aMD) simulations improves chemical shift prediction, suggesting that methods which better sample the conformational ensemble like aMD are more appropriate tools for use in chemical shift prediction for proteins with disordered regions. Moreover, our study suggests that data accurately reflecting protein dynamics must be used as input for chemical shift prediction in order to correctly predict chemical shifts in systems with disorder.

  17. Improvement of PM concentration predictability using WRF-CMAQ-DLM coupled system and its applications

    NASA Astrophysics Data System (ADS)

    Lee, Soon Hwan; Kim, Ji Sun; Lee, Kang Yeol; Shon, Keon Tae

    2017-04-01

    Air quality due to increasing Particulate Matter(PM) in Korea in Asia is getting worse. At present, the PM forecast is announced based on the PM concentration predicted from the air quality prediction numerical model. However, forecast accuracy is not as high as expected due to various uncertainties for PM physical and chemical characteristics. The purpose of this study was to develop a numerical-statistically ensemble models to improve the accuracy of prediction of PM10 concentration. Numerical models used in this study are the three dimensional atmospheric model Weather Research and Forecasting(WRF) and the community multiscale air quality model (CMAQ). The target areas for the PM forecast are Seoul, Busan, Daegu, and Daejeon metropolitan areas in Korea. The data used in the model development are PM concentration and CMAQ predictions and the data period is 3 months (March 1 - May 31, 2014). The dynamic-statistical technics for reducing the systematic error of the CMAQ predictions was applied to the dynamic linear model(DLM) based on the Baysian Kalman filter technic. As a result of applying the metrics generated from the dynamic linear model to the forecasting of PM concentrations accuracy was improved. Especially, at the high PM concentration where the damage is relatively large, excellent improvement results are shown.

  18. EVALUATING RISK-PREDICTION MODELS USING DATA FROM ELECTRONIC HEALTH RECORDS.

    PubMed

    Wang, L E; Shaw, Pamela A; Mathelier, Hansie M; Kimmel, Stephen E; French, Benjamin

    2016-03-01

    The availability of data from electronic health records facilitates the development and evaluation of risk-prediction models, but estimation of prediction accuracy could be limited by outcome misclassification, which can arise if events are not captured. We evaluate the robustness of prediction accuracy summaries, obtained from receiver operating characteristic curves and risk-reclassification methods, if events are not captured (i.e., "false negatives"). We derive estimators for sensitivity and specificity if misclassification is independent of marker values. In simulation studies, we quantify the potential for bias in prediction accuracy summaries if misclassification depends on marker values. We compare the accuracy of alternative prognostic models for 30-day all-cause hospital readmission among 4548 patients discharged from the University of Pennsylvania Health System with a primary diagnosis of heart failure. Simulation studies indicate that if misclassification depends on marker values, then the estimated accuracy improvement is also biased, but the direction of the bias depends on the direction of the association between markers and the probability of misclassification. In our application, 29% of the 1143 readmitted patients were readmitted to a hospital elsewhere in Pennsylvania, which reduced prediction accuracy. Outcome misclassification can result in erroneous conclusions regarding the accuracy of risk-prediction models.

  19. Researches on High Accuracy Prediction Methods of Earth Orientation Parameters

    NASA Astrophysics Data System (ADS)

    Xu, X. Q.

    2015-09-01

    The Earth rotation reflects the coupling process among the solid Earth, atmosphere, oceans, mantle, and core of the Earth on multiple spatial and temporal scales. The Earth rotation can be described by the Earth's orientation parameters, which are abbreviated as EOP (mainly including two polar motion components PM_X and PM_Y, and variation in the length of day ΔLOD). The EOP is crucial in the transformation between the terrestrial and celestial reference systems, and has important applications in many areas such as the deep space exploration, satellite precise orbit determination, and astrogeodynamics. However, the EOP products obtained by the space geodetic technologies generally delay by several days to two weeks. The growing demands for modern space navigation make high-accuracy EOP prediction be a worthy topic. This thesis is composed of the following three aspects, for the purpose of improving the EOP forecast accuracy. (1) We analyze the relation between the length of the basic data series and the EOP forecast accuracy, and compare the EOP prediction accuracy for the linear autoregressive (AR) model and the nonlinear artificial neural network (ANN) method by performing the least squares (LS) extrapolations. The results show that the high precision forecast of EOP can be realized by appropriate selection of the basic data series length according to the required time span of EOP prediction: for short-term prediction, the basic data series should be shorter, while for the long-term prediction, the series should be longer. The analysis also showed that the LS+AR model is more suitable for the short-term forecasts, while the LS+ANN model shows the advantages in the medium- and long-term forecasts. (2) We develop for the first time a new method which combines the autoregressive model and Kalman filter (AR+Kalman) in short-term EOP prediction. The equations of observation and state are established using the EOP series and the autoregressive coefficients respectively, which are used to improve/re-evaluate the AR model. Comparing to the single AR model, the AR+Kalman method performs better in the prediction of UT1-UTC and ΔLOD, and the improvement in the prediction of the polar motion is significant. (3) Following the successful Earth Orientation Parameter Prediction Comparison Campaign (EOP PCC), the Earth Orientation Parameter Combination of Prediction Pilot Project (EOPC PPP) was sponsored in 2010. As one of the participants from China, we update and submit the short- and medium-term (1 to 90 days) EOP predictions every day. From the current comparative statistics, our prediction accuracy is on the medium international level. We will carry out more innovative researches to improve the EOP forecast accuracy and enhance our level in EOP forecast.

  20. Orbit Determination for the Lunar Reconnaissance Orbiter Using an Extended Kalman Filter

    NASA Technical Reports Server (NTRS)

    Slojkowski, Steven; Lowe, Jonathan; Woodburn, James

    2015-01-01

    Orbit determination (OD) analysis results are presented for the Lunar Reconnaissance Orbiter (LRO) using a commercially available Extended Kalman Filter, Analytical Graphics' Orbit Determination Tool Kit (ODTK). Process noise models for lunar gravity and solar radiation pressure (SRP) are described and OD results employing the models are presented. Definitive accuracy using ODTK meets mission requirements and is better than that achieved using the operational LRO OD tool, the Goddard Trajectory Determination System (GTDS). Results demonstrate that a Vasicek stochastic model produces better estimates of the coefficient of solar radiation pressure than a Gauss-Markov model, and prediction accuracy using a Vasicek model meets mission requirements over the analysis span. Modeling the effect of antenna motion on range-rate tracking considerably improves residuals and filter-smoother consistency. Inclusion of off-axis SRP process noise and generalized process noise improves filter performance for both definitive and predicted accuracy. Definitive accuracy from the smoother is better than achieved using GTDS and is close to that achieved by precision OD methods used to generate definitive science orbits. Use of a multi-plate dynamic spacecraft area model with ODTK's force model plugin capability provides additional improvements in predicted accuracy.

  1. Dispersal and extrapolation on the accuracy of temporal predictions from distribution models for the Darwin's frog.

    PubMed

    Uribe-Rivera, David E; Soto-Azat, Claudio; Valenzuela-Sánchez, Andrés; Bizama, Gustavo; Simonetti, Javier A; Pliscoff, Patricio

    2017-07-01

    Climate change is a major threat to biodiversity; the development of models that reliably predict its effects on species distributions is a priority for conservation biogeography. Two of the main issues for accurate temporal predictions from Species Distribution Models (SDM) are model extrapolation and unrealistic dispersal scenarios. We assessed the consequences of these issues on the accuracy of climate-driven SDM predictions for the dispersal-limited Darwin's frog Rhinoderma darwinii in South America. We calibrated models using historical data (1950-1975) and projected them across 40 yr to predict distribution under current climatic conditions, assessing predictive accuracy through the area under the ROC curve (AUC) and True Skill Statistics (TSS), contrasting binary model predictions against temporal-independent validation data set (i.e., current presences/absences). To assess the effects of incorporating dispersal processes we compared the predictive accuracy of dispersal constrained models with no dispersal limited SDMs; and to assess the effects of model extrapolation on the predictive accuracy of SDMs, we compared this between extrapolated and no extrapolated areas. The incorporation of dispersal processes enhanced predictive accuracy, mainly due to a decrease in the false presence rate of model predictions, which is consistent with discrimination of suitable but inaccessible habitat. This also had consequences on range size changes over time, which is the most used proxy for extinction risk from climate change. The area of current climatic conditions that was absent in the baseline conditions (i.e., extrapolated areas) represents 39% of the study area, leading to a significant decrease in predictive accuracy of model predictions for those areas. Our results highlight (1) incorporating dispersal processes can improve predictive accuracy of temporal transference of SDMs and reduce uncertainties of extinction risk assessments from global change; (2) as geographical areas subjected to novel climates are expected to arise, they must be reported as they show less accurate predictions under future climate scenarios. Consequently, environmental extrapolation and dispersal processes should be explicitly incorporated to report and reduce uncertainties in temporal predictions of SDMs, respectively. Doing so, we expect to improve the reliability of the information we provide for conservation decision makers under future climate change scenarios. © 2017 by the Ecological Society of America.

  2. Physiologically-based, predictive analytics using the heart-rate-to-Systolic-Ratio significantly improves the timeliness and accuracy of sepsis prediction compared to SIRS.

    PubMed

    Danner, Omar K; Hendren, Sandra; Santiago, Ethel; Nye, Brittany; Abraham, Prasad

    2017-04-01

    Enhancing the efficiency of diagnosis and treatment of severe sepsis by using physiologically-based, predictive analytical strategies has not been fully explored. We hypothesize assessment of heart-rate-to-systolic-ratio significantly increases the timeliness and accuracy of sepsis prediction after emergency department (ED) presentation. We evaluated the records of 53,313 ED patients from a large, urban teaching hospital between January and June 2015. The HR-to-systolic ratio was compared to SIRS criteria for sepsis prediction. There were 884 patients with discharge diagnoses of sepsis, severe sepsis, and/or septic shock. Variations in three presenting variables, heart rate, systolic BP and temperature were determined to be primary early predictors of sepsis with a 74% (654/884) accuracy compared to 34% (304/884) using SIRS criteria (p < 0.0001)in confirmed septic patients. Physiologically-based predictive analytics improved the accuracy and expediency of sepsis identification via detection of variations in HR-to-systolic ratio. This approach may lead to earlier sepsis workup and life-saving interventions. Copyright © 2017 Elsevier Inc. All rights reserved.

  3. Protein Secondary Structure Prediction Using AutoEncoder Network and Bayes Classifier

    NASA Astrophysics Data System (ADS)

    Wang, Leilei; Cheng, Jinyong

    2018-03-01

    Protein secondary structure prediction is belong to bioinformatics,and it's important in research area. In this paper, we propose a new prediction way of protein using bayes classifier and autoEncoder network. Our experiments show some algorithms including the construction of the model, the classification of parameters and so on. The data set is a typical CB513 data set for protein. In terms of accuracy, the method is the cross validation based on the 3-fold. Then we can get the Q3 accuracy. Paper results illustrate that the autoencoder network improved the prediction accuracy of protein secondary structure.

  4. Enhancing the Performance of LibSVM Classifier by Kernel F-Score Feature Selection

    NASA Astrophysics Data System (ADS)

    Sarojini, Balakrishnan; Ramaraj, Narayanasamy; Nickolas, Savarimuthu

    Medical Data mining is the search for relationships and patterns within the medical datasets that could provide useful knowledge for effective clinical decisions. The inclusion of irrelevant, redundant and noisy features in the process model results in poor predictive accuracy. Much research work in data mining has gone into improving the predictive accuracy of the classifiers by applying the techniques of feature selection. Feature selection in medical data mining is appreciable as the diagnosis of the disease could be done in this patient-care activity with minimum number of significant features. The objective of this work is to show that selecting the more significant features would improve the performance of the classifier. We empirically evaluate the classification effectiveness of LibSVM classifier on the reduced feature subset of diabetes dataset. The evaluations suggest that the feature subset selected improves the predictive accuracy of the classifier and reduce false negatives and false positives.

  5. Towards an Online Seizure Advisory System-An Adaptive Seizure Prediction Framework Using Active Learning Heuristics.

    PubMed

    Karuppiah Ramachandran, Vignesh Raja; Alblas, Huibert J; Le, Duc V; Meratnia, Nirvana

    2018-05-24

    In the last decade, seizure prediction systems have gained a lot of attention because of their enormous potential to largely improve the quality-of-life of the epileptic patients. The accuracy of the prediction algorithms to detect seizure in real-world applications is largely limited because the brain signals are inherently uncertain and affected by various factors, such as environment, age, drug intake, etc., in addition to the internal artefacts that occur during the process of recording the brain signals. To deal with such ambiguity, researchers transitionally use active learning, which selects the ambiguous data to be annotated by an expert and updates the classification model dynamically. However, selecting the particular data from a pool of large ambiguous datasets to be labelled by an expert is still a challenging problem. In this paper, we propose an active learning-based prediction framework that aims to improve the accuracy of the prediction with a minimum number of labelled data. The core technique of our framework is employing the Bernoulli-Gaussian Mixture model (BGMM) to determine the feature samples that have the most ambiguity to be annotated by an expert. By doing so, our approach facilitates expert intervention as well as increasing medical reliability. We evaluate seven different classifiers in terms of the classification time and memory required. An active learning framework built on top of the best performing classifier is evaluated in terms of required annotation effort to achieve a high level of prediction accuracy. The results show that our approach can achieve the same accuracy as a Support Vector Machine (SVM) classifier using only 20 % of the labelled data and also improve the prediction accuracy even under the noisy condition.

  6. Comparison of Three Risk Scores to Predict Outcomes of Severe Lower Gastrointestinal Bleeding

    PubMed Central

    Camus, Marine; Jensen, Dennis M.; Ohning, Gordon V.; Kovacs, Thomas O.; Jutabha, Rome; Ghassemi, Kevin A.; Machicado, Gustavo A.; Dulai, Gareth S.; Jensen, Mary Ellen; Gornbein, Jeffrey A.

    2014-01-01

    Background & aims Improved medical decisions by using a score at the initial patient triage level may lead to improvements in patient management, outcomes, and resource utilization. There is no validated score for management of lower gastrointestinal bleeding (LGIB) unlike for upper GIB. The aim of our study was to compare the accuracies of 3 different prognostic scores (CURE Hemostasis prognosis score, Charlston index and ASA score) for the prediction of 30 day rebleeding, surgery and death in severe LGIB. Methods Data on consecutive patients hospitalized with severe GI bleeding from January 2006 to October 2011 in our two-tertiary academic referral centers were prospectively collected. Sensitivities, specificities, accuracies and area under the receiver operating characteristic (AUROC) were computed for three scores for predictions of rebleeding, surgery and mortality at 30 days. Results 235 consecutive patients with LGIB were included between 2006 and 2011. 23% of patients rebled, 6% had surgery, and 7.7% of patients died. The accuracies of each score never reached 70% for predicting rebleeding or surgery in either. The ASA score had a highest accuracy for predicting mortality within 30 days (83.5%) whereas the CURE Hemostasis prognosis score and the Charlson index both had accuracies less than 75% for the prediction of death within 30 days. Conclusions ASA score could be useful to predict death within 30 days. However a new score is still warranted to predict all 30 days outcomes (rebleeding, surgery and death) in LGIB. PMID:25599218

  7. Lunar Reconnaissance Orbiter Orbit Determination Accuracy Analysis

    NASA Technical Reports Server (NTRS)

    Slojkowski, Steven E.

    2014-01-01

    Results from operational OD produced by the NASA Goddard Flight Dynamics Facility for the LRO nominal and extended mission are presented. During the LRO nominal mission, when LRO flew in a low circular orbit, orbit determination requirements were met nearly 100% of the time. When the extended mission began, LRO returned to a more elliptical frozen orbit where gravity and other modeling errors caused numerous violations of mission accuracy requirements. Prediction accuracy is particularly challenged during periods when LRO is in full-Sun. A series of improvements to LRO orbit determination are presented, including implementation of new lunar gravity models, improved spacecraft solar radiation pressure modeling using a dynamic multi-plate area model, a shorter orbit determination arc length, and a constrained plane method for estimation. The analysis presented in this paper shows that updated lunar gravity models improved accuracy in the frozen orbit, and a multiplate dynamic area model improves prediction accuracy during full-Sun orbit periods. Implementation of a 36-hour tracking data arc and plane constraints during edge-on orbit geometry also provide benefits. A comparison of the operational solutions to precision orbit determination solutions shows agreement on a 100- to 250-meter level in definitive accuracy.

  8. Model training across multiple breeding cycles significantly improves genomic prediction accuracy in rye (Secale cereale L.).

    PubMed

    Auinger, Hans-Jürgen; Schönleben, Manfred; Lehermeier, Christina; Schmidt, Malthe; Korzun, Viktor; Geiger, Hartwig H; Piepho, Hans-Peter; Gordillo, Andres; Wilde, Peer; Bauer, Eva; Schön, Chris-Carolin

    2016-11-01

    Genomic prediction accuracy can be significantly increased by model calibration across multiple breeding cycles as long as selection cycles are connected by common ancestors. In hybrid rye breeding, application of genome-based prediction is expected to increase selection gain because of long selection cycles in population improvement and development of hybrid components. Essentially two prediction scenarios arise: (1) prediction of the genetic value of lines from the same breeding cycle in which model training is performed and (2) prediction of lines from subsequent cycles. It is the latter from which a reduction in cycle length and consequently the strongest impact on selection gain is expected. We empirically investigated genome-based prediction of grain yield, plant height and thousand kernel weight within and across four selection cycles of a hybrid rye breeding program. Prediction performance was assessed using genomic and pedigree-based best linear unbiased prediction (GBLUP and PBLUP). A total of 1040 S 2 lines were genotyped with 16 k SNPs and each year testcrosses of 260 S 2 lines were phenotyped in seven or eight locations. The performance gap between GBLUP and PBLUP increased significantly for all traits when model calibration was performed on aggregated data from several cycles. Prediction accuracies obtained from cross-validation were in the order of 0.70 for all traits when data from all cycles (N CS  = 832) were used for model training and exceeded within-cycle accuracies in all cases. As long as selection cycles are connected by a sufficient number of common ancestors and prediction accuracy has not reached a plateau when increasing sample size, aggregating data from several preceding cycles is recommended for predicting genetic values in subsequent cycles despite decreasing relatedness over time.

  9. Bridge Structure Deformation Prediction Based on GNSS Data Using Kalman-ARIMA-GARCH Model

    PubMed Central

    Li, Xiaoqing; Wang, Yu

    2018-01-01

    Bridges are an essential part of the ground transportation system. Health monitoring is fundamentally important for the safety and service life of bridges. A large amount of structural information is obtained from various sensors using sensing technology, and the data processing has become a challenging issue. To improve the prediction accuracy of bridge structure deformation based on data mining and to accurately evaluate the time-varying characteristics of bridge structure performance evolution, this paper proposes a new method for bridge structure deformation prediction, which integrates the Kalman filter, autoregressive integrated moving average model (ARIMA), and generalized autoregressive conditional heteroskedasticity (GARCH). Firstly, the raw deformation data is directly pre-processed using the Kalman filter to reduce the noise. After that, the linear recursive ARIMA model is established to analyze and predict the structure deformation. Finally, the nonlinear recursive GARCH model is introduced to further improve the accuracy of the prediction. Simulation results based on measured sensor data from the Global Navigation Satellite System (GNSS) deformation monitoring system demonstrated that: (1) the Kalman filter is capable of denoising the bridge deformation monitoring data; (2) the prediction accuracy of the proposed Kalman-ARIMA-GARCH model is satisfactory, where the mean absolute error increases only from 3.402 mm to 5.847 mm with the increment of the prediction step; and (3) in comparision to the Kalman-ARIMA model, the Kalman-ARIMA-GARCH model results in superior prediction accuracy as it includes partial nonlinear characteristics (heteroscedasticity); the mean absolute error of five-step prediction using the proposed model is improved by 10.12%. This paper provides a new way for structural behavior prediction based on data processing, which can lay a foundation for the early warning of bridge health monitoring system based on sensor data using sensing technology. PMID:29351254

  10. Bridge Structure Deformation Prediction Based on GNSS Data Using Kalman-ARIMA-GARCH Model.

    PubMed

    Xin, Jingzhou; Zhou, Jianting; Yang, Simon X; Li, Xiaoqing; Wang, Yu

    2018-01-19

    Bridges are an essential part of the ground transportation system. Health monitoring is fundamentally important for the safety and service life of bridges. A large amount of structural information is obtained from various sensors using sensing technology, and the data processing has become a challenging issue. To improve the prediction accuracy of bridge structure deformation based on data mining and to accurately evaluate the time-varying characteristics of bridge structure performance evolution, this paper proposes a new method for bridge structure deformation prediction, which integrates the Kalman filter, autoregressive integrated moving average model (ARIMA), and generalized autoregressive conditional heteroskedasticity (GARCH). Firstly, the raw deformation data is directly pre-processed using the Kalman filter to reduce the noise. After that, the linear recursive ARIMA model is established to analyze and predict the structure deformation. Finally, the nonlinear recursive GARCH model is introduced to further improve the accuracy of the prediction. Simulation results based on measured sensor data from the Global Navigation Satellite System (GNSS) deformation monitoring system demonstrated that: (1) the Kalman filter is capable of denoising the bridge deformation monitoring data; (2) the prediction accuracy of the proposed Kalman-ARIMA-GARCH model is satisfactory, where the mean absolute error increases only from 3.402 mm to 5.847 mm with the increment of the prediction step; and (3) in comparision to the Kalman-ARIMA model, the Kalman-ARIMA-GARCH model results in superior prediction accuracy as it includes partial nonlinear characteristics (heteroscedasticity); the mean absolute error of five-step prediction using the proposed model is improved by 10.12%. This paper provides a new way for structural behavior prediction based on data processing, which can lay a foundation for the early warning of bridge health monitoring system based on sensor data using sensing technology.

  11. Genomic Prediction Accounting for Residual Heteroskedasticity.

    PubMed

    Ou, Zhining; Tempelman, Robert J; Steibel, Juan P; Ernst, Catherine W; Bates, Ronald O; Bello, Nora M

    2015-11-12

    Whole-genome prediction (WGP) models that use single-nucleotide polymorphism marker information to predict genetic merit of animals and plants typically assume homogeneous residual variance. However, variability is often heterogeneous across agricultural production systems and may subsequently bias WGP-based inferences. This study extends classical WGP models based on normality, heavy-tailed specifications and variable selection to explicitly account for environmentally-driven residual heteroskedasticity under a hierarchical Bayesian mixed-models framework. WGP models assuming homogeneous or heterogeneous residual variances were fitted to training data generated under simulation scenarios reflecting a gradient of increasing heteroskedasticity. Model fit was based on pseudo-Bayes factors and also on prediction accuracy of genomic breeding values computed on a validation data subset one generation removed from the simulated training dataset. Homogeneous vs. heterogeneous residual variance WGP models were also fitted to two quantitative traits, namely 45-min postmortem carcass temperature and loin muscle pH, recorded in a swine resource population dataset prescreened for high and mild residual heteroskedasticity, respectively. Fit of competing WGP models was compared using pseudo-Bayes factors. Predictive ability, defined as the correlation between predicted and observed phenotypes in validation sets of a five-fold cross-validation was also computed. Heteroskedastic error WGP models showed improved model fit and enhanced prediction accuracy compared to homoskedastic error WGP models although the magnitude of the improvement was small (less than two percentage points net gain in prediction accuracy). Nevertheless, accounting for residual heteroskedasticity did improve accuracy of selection, especially on individuals of extreme genetic merit. Copyright © 2016 Ou et al.

  12. Response Latency as a Predictor of the Accuracy of Children's Reports

    ERIC Educational Resources Information Center

    Ackerman, Rakefet; Koriat, Asher

    2011-01-01

    Researchers have explored various diagnostic cues to the accuracy of information provided by child eyewitnesses. Previous studies indicated that children's confidence in their reports predicts the relative accuracy of these reports, and that the confidence-accuracy relationship generally improves as children grow older. In this study, we examined…

  13. Adaptive Trajectory Prediction Algorithm for Climbing Flights

    NASA Technical Reports Server (NTRS)

    Schultz, Charles Alexander; Thipphavong, David P.; Erzberger, Heinz

    2012-01-01

    Aircraft climb trajectories are difficult to predict, and large errors in these predictions reduce the potential operational benefits of some advanced features for NextGen. The algorithm described in this paper improves climb trajectory prediction accuracy by adjusting trajectory predictions based on observed track data. It utilizes rate-of-climb and airspeed measurements derived from position data to dynamically adjust the aircraft weight modeled for trajectory predictions. In simulations with weight uncertainty, the algorithm is able to adapt to within 3 percent of the actual gross weight within two minutes of the initial adaptation. The root-mean-square of altitude errors for five-minute predictions was reduced by 73 percent. Conflict detection performance also improved, with a 15 percent reduction in missed alerts and a 10 percent reduction in false alerts. In a simulation with climb speed capture intent and weight uncertainty, the algorithm improved climb trajectory prediction accuracy by up to 30 percent and conflict detection performance, reducing missed and false alerts by up to 10 percent.

  14. Improving orbit prediction accuracy through supervised machine learning

    NASA Astrophysics Data System (ADS)

    Peng, Hao; Bai, Xiaoli

    2018-05-01

    Due to the lack of information such as the space environment condition and resident space objects' (RSOs') body characteristics, current orbit predictions that are solely grounded on physics-based models may fail to achieve required accuracy for collision avoidance and have led to satellite collisions already. This paper presents a methodology to predict RSOs' trajectories with higher accuracy than that of the current methods. Inspired by the machine learning (ML) theory through which the models are learned based on large amounts of observed data and the prediction is conducted without explicitly modeling space objects and space environment, the proposed ML approach integrates physics-based orbit prediction algorithms with a learning-based process that focuses on reducing the prediction errors. Using a simulation-based space catalog environment as the test bed, the paper demonstrates three types of generalization capability for the proposed ML approach: (1) the ML model can be used to improve the same RSO's orbit information that is not available during the learning process but shares the same time interval as the training data; (2) the ML model can be used to improve predictions of the same RSO at future epochs; and (3) the ML model based on a RSO can be applied to other RSOs that share some common features.

  15. Improving risk prediction accuracy for new soldiers in the U.S. Army by adding self-report survey data to administrative data.

    PubMed

    Bernecker, Samantha L; Rosellini, Anthony J; Nock, Matthew K; Chiu, Wai Tat; Gutierrez, Peter M; Hwang, Irving; Joiner, Thomas E; Naifeh, James A; Sampson, Nancy A; Zaslavsky, Alan M; Stein, Murray B; Ursano, Robert J; Kessler, Ronald C

    2018-04-03

    High rates of mental disorders, suicidality, and interpersonal violence early in the military career have raised interest in implementing preventive interventions with high-risk new enlistees. The Army Study to Assess Risk and Resilience in Servicemembers (STARRS) developed risk-targeting systems for these outcomes based on machine learning methods using administrative data predictors. However, administrative data omit many risk factors, raising the question whether risk targeting could be improved by adding self-report survey data to prediction models. If so, the Army may gain from routinely administering surveys that assess additional risk factors. The STARRS New Soldier Survey was administered to 21,790 Regular Army soldiers who agreed to have survey data linked to administrative records. As reported previously, machine learning models using administrative data as predictors found that small proportions of high-risk soldiers accounted for high proportions of negative outcomes. Other machine learning models using self-report survey data as predictors were developed previously for three of these outcomes: major physical violence and sexual violence perpetration among men and sexual violence victimization among women. Here we examined the extent to which this survey information increases prediction accuracy, over models based solely on administrative data, for those three outcomes. We used discrete-time survival analysis to estimate a series of models predicting first occurrence, assessing how model fit improved and concentration of risk increased when adding the predicted risk score based on survey data to the predicted risk score based on administrative data. The addition of survey data improved prediction significantly for all outcomes. In the most extreme case, the percentage of reported sexual violence victimization among the 5% of female soldiers with highest predicted risk increased from 17.5% using only administrative predictors to 29.4% adding survey predictors, a 67.9% proportional increase in prediction accuracy. Other proportional increases in concentration of risk ranged from 4.8% to 49.5% (median = 26.0%). Data from an ongoing New Soldier Survey could substantially improve accuracy of risk models compared to models based exclusively on administrative predictors. Depending upon the characteristics of interventions used, the increase in targeting accuracy from survey data might offset survey administration costs.

  16. Comparison of univariate and multivariate models for prediction of major and minor elements from laser-induced breakdown spectra with and without masking

    NASA Astrophysics Data System (ADS)

    Dyar, M. Darby; Fassett, Caleb I.; Giguere, Stephen; Lepore, Kate; Byrne, Sarah; Boucher, Thomas; Carey, CJ; Mahadevan, Sridhar

    2016-09-01

    This study uses 1356 spectra from 452 geologically-diverse samples, the largest suite of LIBS rock spectra ever assembled, to compare the accuracy of elemental predictions in models that use only spectral regions thought to contain peaks arising from the element of interest versus those that use information in the entire spectrum. Results show that for the elements Si, Al, Ti, Fe, Mg, Ca, Na, K, Ni, Mn, Cr, Co, and Zn, univariate predictions based on single emission lines are by far the least accurate, no matter how carefully the region of channels/wavelengths is chosen and despite the prominence of the selected emission lines. An automated iterative algorithm was developed to sweep through all 5485 channels of data and select the single region that produces the optimal prediction accuracy for each element using univariate analysis. For the eight major elements, use of this technique results in a 35% improvement in prediction accuracy; for minors, the improvement is 13%. The best wavelength region choice for any given univariate analysis is likely to be an inherent property of the specific training set that cannot be generalized. In comparison, multivariate analysis using partial least-squares (PLS) almost universally outperforms univariate analysis. PLS using all the same wavelength regions from the univariate analysis produces results that improve in accuracy by 63% for major elements and 3% for minor element. This difference is likely a reflection of signal to noise ratios, which are far better for major elements than for minor elements, and likely limit their prediction accuracy by any technique. We also compare predictions using specific wavelength ranges for each element against those employing all channels. Masking out channels to focus on emission lines from a specific element that occurs decreases prediction accuracy for major elements but is useful for minor elements with low signals and proportionally much higher noise; use of PLS rather than univariate analysis is still recommended. Finally, we tested the generalizability of our results by analyzing a second data set from a different instrument. Overall prediction accuracies for the mixed data sets are higher than for either set alone for all major and minor elements except Ni, Cr, and Co, where results are roughly comparable.

  17. Improved Genetic Profiling of Anthropometric Traits Using a Big Data Approach.

    PubMed

    Canela-Xandri, Oriol; Rawlik, Konrad; Woolliams, John A; Tenesa, Albert

    2016-01-01

    Genome-wide association studies (GWAS) promised to translate their findings into clinically beneficial improvements of patient management by tailoring disease management to the individual through the prediction of disease risk. However, the ability to translate genetic findings from GWAS into predictive tools that are of clinical utility and which may inform clinical practice has, so far, been encouraging but limited. Here we propose to use a more powerful statistical approach, the use of which has traditionally been limited due to computational requirements and lack of sufficiently large individual level genotyped cohorts, but which improve the prediction of multiple medically relevant phenotypes using the same panel of SNPs. As a proof of principle, we used a shared panel of 319,038 common SNPs with MAF > 0.05 to train the prediction models in 114,264 unrelated White-British individuals for height and four obesity related traits (body mass index, basal metabolic rate, body fat percentage, and waist-to-hip ratio). We obtained prediction accuracies that ranged between 46% and 75% of the maximum achievable given the captured heritable component. For height, this represents an improvement in prediction accuracy of up to 68% (184% more phenotypic variance explained) over SNPs reported to be robustly associated with height in a previous GWAS meta-analysis of similar size. Across-population predictions in White non-British individuals were similar to those in White-British whilst those in Asian and Black individuals were informative but less accurate. We estimate that the genotyping of circa 500,000 unrelated individuals will yield predictions between 66% and 82% of the SNP-heritability captured by common variants in our array. Prediction accuracies did not improve when including rarer SNPs or when fitting multiple traits jointly in multivariate models.

  18. Comparison of Three Risk Scores to Predict Outcomes of Severe Lower Gastrointestinal Bleeding.

    PubMed

    Camus, Marine; Jensen, Dennis M; Ohning, Gordon V; Kovacs, Thomas O; Jutabha, Rome; Ghassemi, Kevin A; Machicado, Gustavo A; Dulai, Gareth S; Jensen, Mary E; Gornbein, Jeffrey A

    2016-01-01

    Improved medical decisions by using a score at the initial patient triage level may lead to improvements in patient management, outcomes, and resource utilization. There is no validated score for management of lower gastrointestinal bleeding (LGIB) unlike for upper gastrointestinal bleeding. The aim of our study was to compare the accuracies of 3 different prognostic scores [Center for Ulcer Research and Education Hemostasis prognosis score, Charlson index, and American Society of Anesthesiologists (ASA) score] for the prediction of 30-day rebleeding, surgery, and death in severe LGIB. Data on consecutive patients hospitalized with severe gastrointestinal bleeding from January 2006 to October 2011 in our 2 tertiary academic referral centers were prospectively collected. Sensitivities, specificities, accuracies, and area under the receiver operator characteristic curve were computed for 3 scores for predictions of rebleeding, surgery, and mortality at 30 days. Two hundred thirty-five consecutive patients with LGIB were included between 2006 and 2011. Twenty-three percent of patients rebled, 6% had surgery, and 7.7% of patients died. The accuracies of each score never reached 70% for predicting rebleeding or surgery in either. The ASA score had a highest accuracy for predicting mortality within 30 days (83.5%), whereas the Center for Ulcer Research and Education Hemostasis prognosis score and the Charlson index both had accuracies <75% for the prediction of death within 30 days. ASA score could be useful to predict death within 30 days. However, a new score is still warranted to predict all 30 days outcomes (rebleeding, surgery, and death) in LGIB.

  19. Data Prediction for Public Events in Professional Domains Based on Improved RNN- LSTM

    NASA Astrophysics Data System (ADS)

    Song, Bonan; Fan, Chunxiao; Wu, Yuexin; Sun, Juanjuan

    2018-02-01

    The traditional data services of prediction for emergency or non-periodic events usually cannot generate satisfying result or fulfill the correct prediction purpose. However, these events are influenced by external causes, which mean certain a priori information of these events generally can be collected through the Internet. This paper studied the above problems and proposed an improved model—LSTM (Long Short-term Memory) dynamic prediction and a priori information sequence generation model by combining RNN-LSTM and public events a priori information. In prediction tasks, the model is qualified for determining trends, and its accuracy also is validated. This model generates a better performance and prediction results than the previous one. Using a priori information can increase the accuracy of prediction; LSTM can better adapt to the changes of time sequence; LSTM can be widely applied to the same type of prediction tasks, and other prediction tasks related to time sequence.

  20. Short communication: Improving the accuracy of genomic prediction of body conformation traits in Chinese Holsteins using markers derived from high-density marker panels.

    PubMed

    Song, H; Li, L; Ma, P; Zhang, S; Su, G; Lund, M S; Zhang, Q; Ding, X

    2018-06-01

    This study investigated the efficiency of genomic prediction with adding the markers identified by genome-wide association study (GWAS) using a data set of imputed high-density (HD) markers from 54K markers in Chinese Holsteins. Among 3,056 Chinese Holsteins with imputed HD data, 2,401 individuals born before October 1, 2009, were used for GWAS and a reference population for genomic prediction, and the 220 younger cows were used as a validation population. In total, 1,403, 1,536, and 1,383 significant single nucleotide polymorphisms (SNP; false discovery rate at 0.05) associated with conformation final score, mammary system, and feet and legs were identified, respectively. About 2 to 3% genetic variance of 3 traits was explained by these significant SNP. Only a very small proportion of significant SNP identified by GWAS was included in the 54K marker panel. Three new marker sets (54K+) were herein produced by adding significant SNP obtained by linear mixed model for each trait into the 54K marker panel. Genomic breeding values were predicted using a Bayesian variable selection (BVS) model. The accuracies of genomic breeding value by BVS based on the 54K+ data were 2.0 to 5.2% higher than those based on the 54K data. The imputed HD markers yielded 1.4% higher accuracy on average (BVS) than the 54K data. Both the 54K+ and HD data generated lower bias of genomic prediction, and the 54K+ data yielded the lowest bias in all situations. Our results show that the imputed HD data were not very useful for improving the accuracy of genomic prediction and that adding the significant markers derived from the imputed HD marker panel could improve the accuracy of genomic prediction and decrease the bias of genomic prediction. Copyright © 2018 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  1. Monitoring and regulation of learning in medical education: the need for predictive cues.

    PubMed

    de Bruin, Anique B H; Dunlosky, John; Cavalcanti, Rodrigo B

    2017-06-01

    Being able to accurately monitor learning activities is a key element in self-regulated learning in all settings, including medical schools. Yet students' ability to monitor their progress is often limited, leading to inefficient use of study time. Interventions that improve the accuracy of students' monitoring can optimise self-regulated learning, leading to higher achievement. This paper reviews findings from cognitive psychology and explores potential applications in medical education, as well as areas for future research. Effective monitoring depends on students' ability to generate information ('cues') that accurately reflects their knowledge and skills. The ability of these 'cues' to predict achievement is referred to as 'cue diagnosticity'. Interventions that improve the ability of students to elicit predictive cues typically fall into two categories: (i) self-generation of cues and (ii) generation of cues that is delayed after self-study. Providing feedback and support is useful when cues are predictive but may be too complex to be readily used. Limited evidence exists about interventions to improve the accuracy of self-monitoring among medical students or trainees. Developing interventions that foster use of predictive cues can enhance the accuracy of self-monitoring, thereby improving self-study and clinical reasoning. First, insight should be gained into the characteristics of predictive cues used by medical students and trainees. Next, predictive cue prompts should be designed and tested to improve monitoring and regulation of learning. Finally, the use of predictive cues should be explored in relation to teaching and learning clinical reasoning. Improving self-regulated learning is important to help medical students and trainees efficiently acquire knowledge and skills necessary for clinical practice. Interventions that help students generate and use predictive cues hold the promise of improved self-regulated learning and achievement. This framework is applicable to learning in several areas, including the development of clinical reasoning. © 2017 The Authors Medical Education published by Association for the Study of Medical Education and John Wiley & Sons Ltd.

  2. Proposal for a New Predictive Model of Short-Term Mortality After Living Donor Liver Transplantation due to Acute Liver Failure.

    PubMed

    Chung, Hyun Sik; Lee, Yu Jung; Jo, Yun Sung

    2017-02-21

    BACKGROUND Acute liver failure (ALF) is known to be a rapidly progressive and fatal disease. Various models which could help to estimate the post-transplant outcome for ALF have been developed; however, none of them have been proved to be the definitive predictive model of accuracy. We suggest a new predictive model, and investigated which model has the highest predictive accuracy for the short-term outcome in patients who underwent living donor liver transplantation (LDLT) due to ALF. MATERIAL AND METHODS Data from a total 88 patients were collected retrospectively. King's College Hospital criteria (KCH), Child-Turcotte-Pugh (CTP) classification, and model for end-stage liver disease (MELD) score were calculated. Univariate analysis was performed, and then multivariate statistical adjustment for preoperative variables of ALF prognosis was performed. A new predictive model was developed, called the MELD conjugated serum phosphorus model (MELD-p). The individual diagnostic accuracy and cut-off value of models in predicting 3-month post-transplant mortality were evaluated using the area under the receiver operating characteristic curve (AUC). The difference in AUC between MELD-p and the other models was analyzed. The diagnostic improvement in MELD-p was assessed using the net reclassification improvement (NRI) and integrated discrimination improvement (IDI). RESULTS The MELD-p and MELD scores had high predictive accuracy (AUC >0.9). KCH and serum phosphorus had an acceptable predictive ability (AUC >0.7). The CTP classification failed to show discriminative accuracy in predicting 3-month post-transplant mortality. The difference in AUC between MELD-p and the other models had statistically significant associations with CTP and KCH. The cut-off value of MELD-p was 3.98 for predicting 3-month post-transplant mortality. The NRI was 9.9% and the IDI was 2.9%. CONCLUSIONS MELD-p score can predict 3-month post-transplant mortality better than other scoring systems after LDLT due to ALF. The recommended cut-off value of MELD-p is 3.98.

  3. Improving Prediction Accuracy of “Central Line-Associated Blood Stream Infections” Using Data Mining Models

    PubMed Central

    Noaman, Amin Y.; Jamjoom, Arwa; Al-Abdullah, Nabeela; Nasir, Mahreen; Ali, Anser G.

    2017-01-01

    Prediction of nosocomial infections among patients is an important part of clinical surveillance programs to enable the related personnel to take preventive actions in advance. Designing a clinical surveillance program with capability of predicting nosocomial infections is a challenging task due to several reasons, including high dimensionality of medical data, heterogenous data representation, and special knowledge required to extract patterns for prediction. In this paper, we present details of six data mining methods implemented using cross industry standard process for data mining to predict central line-associated blood stream infections. For our study, we selected datasets of healthcare-associated infections from US National Healthcare Safety Network and consumer survey data from Hospital Consumer Assessment of Healthcare Providers and Systems. Our experiments show that central line-associated blood stream infections (CLABSIs) can be successfully predicted using AdaBoost method with an accuracy up to 89.7%. This will help in implementing effective clinical surveillance programs for infection control, as well as improving the accuracy detection of CLABSIs. Also, this reduces patients' hospital stay cost and maintains patients' safety. PMID:29085836

  4. Adaptive time-variant models for fuzzy-time-series forecasting.

    PubMed

    Wong, Wai-Keung; Bai, Enjian; Chu, Alice Wai-Ching

    2010-12-01

    A fuzzy time series has been applied to the prediction of enrollment, temperature, stock indices, and other domains. Related studies mainly focus on three factors, namely, the partition of discourse, the content of forecasting rules, and the methods of defuzzification, all of which greatly influence the prediction accuracy of forecasting models. These studies use fixed analysis window sizes for forecasting. In this paper, an adaptive time-variant fuzzy-time-series forecasting model (ATVF) is proposed to improve forecasting accuracy. The proposed model automatically adapts the analysis window size of fuzzy time series based on the prediction accuracy in the training phase and uses heuristic rules to generate forecasting values in the testing phase. The performance of the ATVF model is tested using both simulated and actual time series including the enrollments at the University of Alabama, Tuscaloosa, and the Taiwan Stock Exchange Capitalization Weighted Stock Index (TAIEX). The experiment results show that the proposed ATVF model achieves a significant improvement in forecasting accuracy as compared to other fuzzy-time-series forecasting models.

  5. Study design requirements for RNA sequencing-based breast cancer diagnostics.

    PubMed

    Mer, Arvind Singh; Klevebring, Daniel; Grönberg, Henrik; Rantalainen, Mattias

    2016-02-01

    Sequencing-based molecular characterization of tumors provides information required for individualized cancer treatment. There are well-defined molecular subtypes of breast cancer that provide improved prognostication compared to routine biomarkers. However, molecular subtyping is not yet implemented in routine breast cancer care. Clinical translation is dependent on subtype prediction models providing high sensitivity and specificity. In this study we evaluate sample size and RNA-sequencing read requirements for breast cancer subtyping to facilitate rational design of translational studies. We applied subsampling to ascertain the effect of training sample size and the number of RNA sequencing reads on classification accuracy of molecular subtype and routine biomarker prediction models (unsupervised and supervised). Subtype classification accuracy improved with increasing sample size up to N = 750 (accuracy = 0.93), although with a modest improvement beyond N = 350 (accuracy = 0.92). Prediction of routine biomarkers achieved accuracy of 0.94 (ER) and 0.92 (Her2) at N = 200. Subtype classification improved with RNA-sequencing library size up to 5 million reads. Development of molecular subtyping models for cancer diagnostics requires well-designed studies. Sample size and the number of RNA sequencing reads directly influence accuracy of molecular subtyping. Results in this study provide key information for rational design of translational studies aiming to bring sequencing-based diagnostics to the clinic.

  6. Improving satellite-driven PM2.5 models with Moderate Resolution Imaging Spectroradiometer fire counts in the southeastern U.S.

    PubMed

    Hu, Xuefei; Waller, Lance A; Lyapustin, Alexei; Wang, Yujie; Liu, Yang

    2014-10-16

    Multiple studies have developed surface PM 2.5 (particle size less than 2.5 µm in aerodynamic diameter) prediction models using satellite-derived aerosol optical depth as the primary predictor and meteorological and land use variables as secondary variables. To our knowledge, satellite-retrieved fire information has not been used for PM 2.5 concentration prediction in statistical models. Fire data could be a useful predictor since fires are significant contributors of PM 2.5 . In this paper, we examined whether remotely sensed fire count data could improve PM 2.5 prediction accuracy in the southeastern U.S. in a spatial statistical model setting. A sensitivity analysis showed that when the radius of the buffer zone centered at each PM 2.5 monitoring site reached 75 km, fire count data generally have the greatest predictive power of PM 2.5 across the models considered. Cross validation (CV) generated an R 2 of 0.69, a mean prediction error of 2.75 µg/m 3 , and root-mean-square prediction errors (RMSPEs) of 4.29 µg/m 3 , indicating a good fit between the dependent and predictor variables. A comparison showed that the prediction accuracy was improved more substantially from the nonfire model to the fire model at sites with higher fire counts. With increasing fire counts, CV RMSPE decreased by values up to 1.5 µg/m 3 , exhibiting a maximum improvement of 13.4% in prediction accuracy. Fire count data were shown to have better performance in southern Georgia and in the spring season due to higher fire occurrence. Our findings indicate that fire count data provide a measurable improvement in PM 2.5 concentration estimation, especially in areas and seasons prone to fire events.

  7. Improving satellite-driven PM2.5 models with Moderate Resolution Imaging Spectroradiometer fire counts in the southeastern U.S

    PubMed Central

    Hu, Xuefei; Waller, Lance A.; Lyapustin, Alexei; Wang, Yujie; Liu, Yang

    2017-01-01

    Multiple studies have developed surface PM2.5 (particle size less than 2.5 µm in aerodynamic diameter) prediction models using satellite-derived aerosol optical depth as the primary predictor and meteorological and land use variables as secondary variables. To our knowledge, satellite-retrieved fire information has not been used for PM2.5 concentration prediction in statistical models. Fire data could be a useful predictor since fires are significant contributors of PM2.5. In this paper, we examined whether remotely sensed fire count data could improve PM2.5 prediction accuracy in the southeastern U.S. in a spatial statistical model setting. A sensitivity analysis showed that when the radius of the buffer zone centered at each PM2.5 monitoring site reached 75 km, fire count data generally have the greatest predictive power of PM2.5 across the models considered. Cross validation (CV) generated an R2 of 0.69, a mean prediction error of 2.75 µg/m3, and root-mean-square prediction errors (RMSPEs) of 4.29 µg/m3, indicating a good fit between the dependent and predictor variables. A comparison showed that the prediction accuracy was improved more substantially from the nonfire model to the fire model at sites with higher fire counts. With increasing fire counts, CV RMSPE decreased by values up to 1.5 µg/m3, exhibiting a maximum improvement of 13.4% in prediction accuracy. Fire count data were shown to have better performance in southern Georgia and in the spring season due to higher fire occurrence. Our findings indicate that fire count data provide a measurable improvement in PM2.5 concentration estimation, especially in areas and seasons prone to fire events. PMID:28967648

  8. Predicting protein subcellular locations using hierarchical ensemble of Bayesian classifiers based on Markov chains.

    PubMed

    Bulashevska, Alla; Eils, Roland

    2006-06-14

    The subcellular location of a protein is closely related to its function. It would be worthwhile to develop a method to predict the subcellular location for a given protein when only the amino acid sequence of the protein is known. Although many efforts have been made to predict subcellular location from sequence information only, there is the need for further research to improve the accuracy of prediction. A novel method called HensBC is introduced to predict protein subcellular location. HensBC is a recursive algorithm which constructs a hierarchical ensemble of classifiers. The classifiers used are Bayesian classifiers based on Markov chain models. We tested our method on six various datasets; among them are Gram-negative bacteria dataset, data for discriminating outer membrane proteins and apoptosis proteins dataset. We observed that our method can predict the subcellular location with high accuracy. Another advantage of the proposed method is that it can improve the accuracy of the prediction of some classes with few sequences in training and is therefore useful for datasets with imbalanced distribution of classes. This study introduces an algorithm which uses only the primary sequence of a protein to predict its subcellular location. The proposed recursive scheme represents an interesting methodology for learning and combining classifiers. The method is computationally efficient and competitive with the previously reported approaches in terms of prediction accuracies as empirical results indicate. The code for the software is available upon request.

  9. Risk-adjusted capitation based on the Diagnostic Cost Group Model: an empirical evaluation with health survey information.

    PubMed Central

    Lamers, L M

    1999-01-01

    OBJECTIVE: To evaluate the predictive accuracy of the Diagnostic Cost Group (DCG) model using health survey information. DATA SOURCES/STUDY SETTING: Longitudinal data collected for a sample of members of a Dutch sickness fund. In the Netherlands the sickness funds provide compulsory health insurance coverage for the 60 percent of the population in the lowest income brackets. STUDY DESIGN: A demographic model and DCG capitation models are estimated by means of ordinary least squares, with an individual's annual healthcare expenditures in 1994 as the dependent variable. For subgroups based on health survey information, costs predicted by the models are compared with actual costs. Using stepwise regression procedures a subset of relevant survey variables that could improve the predictive accuracy of the three-year DCG model was identified. Capitation models were extended with these variables. DATA COLLECTION/EXTRACTION METHODS: For the empirical analysis, panel data of sickness fund members were used that contained demographic information, annual healthcare expenditures, and diagnostic information from hospitalizations for each member. In 1993, a mailed health survey was conducted among a random sample of 15,000 persons in the panel data set, with a 70 percent response rate. PRINCIPAL FINDINGS: The predictive accuracy of the demographic model improves when it is extended with diagnostic information from prior hospitalizations (DCGs). A subset of survey variables further improves the predictive accuracy of the DCG capitation models. The predictable profits and losses based on survey information for the DCG models are smaller than for the demographic model. Most persons with predictable losses based on health survey information were not hospitalized in the preceding year. CONCLUSIONS: The use of diagnostic information from prior hospitalizations is a promising option for improving the demographic capitation payment formula. This study suggests that diagnostic information from outpatient utilization is complementary to DCGs in predicting future costs. PMID:10029506

  10. Accuracy of algorithms to predict accessory pathway location in children with Wolff-Parkinson-White syndrome.

    PubMed

    Wren, Christopher; Vogel, Melanie; Lord, Stephen; Abrams, Dominic; Bourke, John; Rees, Philip; Rosenthal, Eric

    2012-02-01

    The aim of this study was to examine the accuracy in predicting pathway location in children with Wolff-Parkinson-White syndrome for each of seven published algorithms. ECGs from 100 consecutive children with Wolff-Parkinson-White syndrome undergoing electrophysiological study were analysed by six investigators using seven published algorithms, six of which had been developed in adult patients. Accuracy and concordance of predictions were adjusted for the number of pathway locations. Accessory pathways were left-sided in 49, septal in 20 and right-sided in 31 children. Overall accuracy of prediction was 30-49% for the exact location and 61-68% including adjacent locations. Concordance between investigators varied between 41% and 86%. No algorithm was better at predicting septal pathways (accuracy 5-35%, improving to 40-78% including adjacent locations), but one was significantly worse. Predictive accuracy was 24-53% for the exact location of right-sided pathways (50-71% including adjacent locations) and 32-55% for the exact location of left-sided pathways (58-73% including adjacent locations). All algorithms were less accurate in our hands than in other authors' own assessment. None performed well in identifying midseptal or right anteroseptal accessory pathway locations.

  11. Sharing reference data and including cows in the reference population improve genomic predictions in Danish Jersey

    USDA-ARS?s Scientific Manuscript database

    Small reference populations limit the accuracy of genomic prediction in numerically small breeds, such as the Danish Jersey. The objective of this study was to investigate two approaches to improve genomic prediction by increasing the size of the reference population for Danish Jerseys. The first ap...

  12. Correlation of ground tests and analyses of a dynamically scaled Space Station model configuration

    NASA Technical Reports Server (NTRS)

    Javeed, Mehzad; Edighoffer, Harold H.; Mcgowan, Paul E.

    1993-01-01

    Verification of analytical models through correlation with ground test results of a complex space truss structure is demonstrated. A multi-component, dynamically scaled space station model configuration is the focus structure for this work. Previously established test/analysis correlation procedures are used to develop improved component analytical models. Integrated system analytical models, consisting of updated component analytical models, are compared with modal test results to establish the accuracy of system-level dynamic predictions. Design sensitivity model updating methods are shown to be effective for providing improved component analytical models. Also, the effects of component model accuracy and interface modeling fidelity on the accuracy of integrated model predictions is examined.

  13. Effect of genetic architecture on the prediction accuracy of quantitative traits in samples of unrelated individuals.

    PubMed

    Morgante, Fabio; Huang, Wen; Maltecca, Christian; Mackay, Trudy F C

    2018-06-01

    Predicting complex phenotypes from genomic data is a fundamental aim of animal and plant breeding, where we wish to predict genetic merits of selection candidates; and of human genetics, where we wish to predict disease risk. While genomic prediction models work well with populations of related individuals and high linkage disequilibrium (LD) (e.g., livestock), comparable models perform poorly for populations of unrelated individuals and low LD (e.g., humans). We hypothesized that low prediction accuracies in the latter situation may occur when the genetics architecture of the trait departs from the infinitesimal and additive architecture assumed by most prediction models. We used simulated data for 10,000 lines based on sequence data from a population of unrelated, inbred Drosophila melanogaster lines to evaluate this hypothesis. We show that, even in very simplified scenarios meant as a stress test of the commonly used Genomic Best Linear Unbiased Predictor (G-BLUP) method, using all common variants yields low prediction accuracy regardless of the trait genetic architecture. However, prediction accuracy increases when predictions are informed by the genetic architecture inferred from mapping the top variants affecting main effects and interactions in the training data, provided there is sufficient power for mapping. When the true genetic architecture is largely or partially due to epistatic interactions, the additive model may not perform well, while models that account explicitly for interactions generally increase prediction accuracy. Our results indicate that accounting for genetic architecture can improve prediction accuracy for quantitative traits.

  14. Optically Tuned MM-Wave IMPATT Source.

    DTIC Science & Technology

    1987-07-01

    phase of the work has been extended and generalised. Accuracy of the theory in predicting tuning at the higher oscillator voltage swings has been greatly...Accuracy of the theory in predicting tuning at the higher oscillator voltage swings has been greatly improved by reformulating the Bessel function...voltage modulation and a peak optically injected locking current of 100 pA the predicted ftl locking range would be 540MHz, a practicaUy useful value. 4

  15. Bayesian model aggregation for ensemble-based estimates of protein pKa values

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gosink, Luke J.; Hogan, Emilie A.; Pulsipher, Trenton C.

    2014-03-01

    This paper investigates an ensemble-based technique called Bayesian Model Averaging (BMA) to improve the performance of protein amino acid pmore » $$K_a$$ predictions. Structure-based p$$K_a$$ calculations play an important role in the mechanistic interpretation of protein structure and are also used to determine a wide range of protein properties. A diverse set of methods currently exist for p$$K_a$$ prediction, ranging from empirical statistical models to {\\it ab initio} quantum mechanical approaches. However, each of these methods are based on a set of assumptions that have inherent bias and sensitivities that can effect a model's accuracy and generalizability for p$$K_a$$ prediction in complicated biomolecular systems. We use BMA to combine eleven diverse prediction methods that each estimate pKa values of amino acids in staphylococcal nuclease. These methods are based on work conducted for the pKa Cooperative and the pKa measurements are based on experimental work conducted by the Garc{\\'i}a-Moreno lab. Our study demonstrates that the aggregated estimate obtained from BMA outperforms all individual prediction methods in our cross-validation study with improvements from 40-70\\% over other method classes. This work illustrates a new possible mechanism for improving the accuracy of p$$K_a$$ prediction and lays the foundation for future work on aggregate models that balance computational cost with prediction accuracy.« less

  16. Software reliability studies

    NASA Technical Reports Server (NTRS)

    Hoppa, Mary Ann; Wilson, Larry W.

    1994-01-01

    There are many software reliability models which try to predict future performance of software based on data generated by the debugging process. Our research has shown that by improving the quality of the data one can greatly improve the predictions. We are working on methodologies which control some of the randomness inherent in the standard data generation processes in order to improve the accuracy of predictions. Our contribution is twofold in that we describe an experimental methodology using a data structure called the debugging graph and apply this methodology to assess the robustness of existing models. The debugging graph is used to analyze the effects of various fault recovery orders on the predictive accuracy of several well-known software reliability algorithms. We found that, along a particular debugging path in the graph, the predictive performance of different models can vary greatly. Similarly, just because a model 'fits' a given path's data well does not guarantee that the model would perform well on a different path. Further we observed bug interactions and noted their potential effects on the predictive process. We saw that not only do different faults fail at different rates, but that those rates can be affected by the particular debugging stage at which the rates are evaluated. Based on our experiment, we conjecture that the accuracy of a reliability prediction is affected by the fault recovery order as well as by fault interaction.

  17. Impacts of Earth rotation parameters on GNSS ultra-rapid orbit prediction: Derivation and real-time correction

    NASA Astrophysics Data System (ADS)

    Wang, Qianxin; Hu, Chao; Xu, Tianhe; Chang, Guobin; Hernández Moraleda, Alberto

    2017-12-01

    Analysis centers (ACs) for global navigation satellite systems (GNSSs) cannot accurately obtain real-time Earth rotation parameters (ERPs). Thus, the prediction of ultra-rapid orbits in the international terrestrial reference system (ITRS) has to utilize the predicted ERPs issued by the International Earth Rotation and Reference Systems Service (IERS) or the International GNSS Service (IGS). In this study, the accuracy of ERPs predicted by IERS and IGS is analyzed. The error of the ERPs predicted for one day can reach 0.15 mas and 0.053 ms in polar motion and UT1-UTC direction, respectively. Then, the impact of ERP errors on ultra-rapid orbit prediction by GNSS is studied. The methods for orbit integration and frame transformation in orbit prediction with introduced ERP errors dominate the accuracy of the predicted orbit. Experimental results show that the transformation from the geocentric celestial references system (GCRS) to ITRS exerts the strongest effect on the accuracy of the predicted ultra-rapid orbit. To obtain the most accurate predicted ultra-rapid orbit, a corresponding real-time orbit correction method is developed. First, orbits without ERP-related errors are predicted on the basis of ITRS observed part of ultra-rapid orbit for use as reference. Then, the corresponding predicted orbit is transformed from GCRS to ITRS to adjust for the predicted ERPs. Finally, the corrected ERPs with error slopes are re-introduced to correct the predicted orbit in ITRS. To validate the proposed method, three experimental schemes are designed: function extrapolation, simulation experiments, and experiments with predicted ultra-rapid orbits and international GNSS Monitoring and Assessment System (iGMAS) products. Experimental results show that using the proposed correction method with IERS products considerably improved the accuracy of ultra-rapid orbit prediction (except the geosynchronous BeiDou orbits). The accuracy of orbit prediction is enhanced by at least 50% (error related to ERP) when a highly accurate observed orbit is used with the correction method. For iGMAS-predicted orbits, the accuracy improvement ranges from 8.5% for the inclined BeiDou orbits to 17.99% for the GPS orbits. This demonstrates that the correction method proposed by this study can optimize the ultra-rapid orbit prediction.

  18. The effect of using genealogy-based haplotypes for genomic prediction

    PubMed Central

    2013-01-01

    Background Genomic prediction uses two sources of information: linkage disequilibrium between markers and quantitative trait loci, and additive genetic relationships between individuals. One way to increase the accuracy of genomic prediction is to capture more linkage disequilibrium by regression on haplotypes instead of regression on individual markers. The aim of this study was to investigate the accuracy of genomic prediction using haplotypes based on local genealogy information. Methods A total of 4429 Danish Holstein bulls were genotyped with the 50K SNP chip. Haplotypes were constructed using local genealogical trees. Effects of haplotype covariates were estimated with two types of prediction models: (1) assuming that effects had the same distribution for all haplotype covariates, i.e. the GBLUP method and (2) assuming that a large proportion (π) of the haplotype covariates had zero effect, i.e. a Bayesian mixture method. Results About 7.5 times more covariate effects were estimated when fitting haplotypes based on local genealogical trees compared to fitting individuals markers. Genealogy-based haplotype clustering slightly increased the accuracy of genomic prediction and, in some cases, decreased the bias of prediction. With the Bayesian method, accuracy of prediction was less sensitive to parameter π when fitting haplotypes compared to fitting markers. Conclusions Use of haplotypes based on genealogy can slightly increase the accuracy of genomic prediction. Improved methods to cluster the haplotypes constructed from local genealogy could lead to additional gains in accuracy. PMID:23496971

  19. The effect of using genealogy-based haplotypes for genomic prediction.

    PubMed

    Edriss, Vahid; Fernando, Rohan L; Su, Guosheng; Lund, Mogens S; Guldbrandtsen, Bernt

    2013-03-06

    Genomic prediction uses two sources of information: linkage disequilibrium between markers and quantitative trait loci, and additive genetic relationships between individuals. One way to increase the accuracy of genomic prediction is to capture more linkage disequilibrium by regression on haplotypes instead of regression on individual markers. The aim of this study was to investigate the accuracy of genomic prediction using haplotypes based on local genealogy information. A total of 4429 Danish Holstein bulls were genotyped with the 50K SNP chip. Haplotypes were constructed using local genealogical trees. Effects of haplotype covariates were estimated with two types of prediction models: (1) assuming that effects had the same distribution for all haplotype covariates, i.e. the GBLUP method and (2) assuming that a large proportion (π) of the haplotype covariates had zero effect, i.e. a Bayesian mixture method. About 7.5 times more covariate effects were estimated when fitting haplotypes based on local genealogical trees compared to fitting individuals markers. Genealogy-based haplotype clustering slightly increased the accuracy of genomic prediction and, in some cases, decreased the bias of prediction. With the Bayesian method, accuracy of prediction was less sensitive to parameter π when fitting haplotypes compared to fitting markers. Use of haplotypes based on genealogy can slightly increase the accuracy of genomic prediction. Improved methods to cluster the haplotypes constructed from local genealogy could lead to additional gains in accuracy.

  20. CD-Based Indices for Link Prediction in Complex Network.

    PubMed

    Wang, Tao; Wang, Hongjue; Wang, Xiaoxia

    2016-01-01

    Lots of similarity-based algorithms have been designed to deal with the problem of link prediction in the past decade. In order to improve prediction accuracy, a novel cosine similarity index CD based on distance between nodes and cosine value between vectors is proposed in this paper. Firstly, node coordinate matrix can be obtained by node distances which are different from distance matrix and row vectors of the matrix are regarded as coordinates of nodes. Then, cosine value between node coordinates is used as their similarity index. A local community density index LD is also proposed. Then, a series of CD-based indices include CD-LD-k, CD*LD-k, CD-k and CDI are presented and applied in ten real networks. Experimental results demonstrate the effectiveness of CD-based indices. The effects of network clustering coefficient and assortative coefficient on prediction accuracy of indices are analyzed. CD-LD-k and CD*LD-k can improve prediction accuracy without considering the assortative coefficient of network is negative or positive. According to analysis of relative precision of each method on each network, CD-LD-k and CD*LD-k indices have excellent average performance and robustness. CD and CD-k indices perform better on positive assortative networks than on negative assortative networks. For negative assortative networks, we improve and refine CD index, referred as CDI index, combining the advantages of CD index and evolutionary mechanism of the network model BA. Experimental results reveal that CDI index can increase prediction accuracy of CD on negative assortative networks.

  1. CD-Based Indices for Link Prediction in Complex Network

    PubMed Central

    Wang, Tao; Wang, Hongjue; Wang, Xiaoxia

    2016-01-01

    Lots of similarity-based algorithms have been designed to deal with the problem of link prediction in the past decade. In order to improve prediction accuracy, a novel cosine similarity index CD based on distance between nodes and cosine value between vectors is proposed in this paper. Firstly, node coordinate matrix can be obtained by node distances which are different from distance matrix and row vectors of the matrix are regarded as coordinates of nodes. Then, cosine value between node coordinates is used as their similarity index. A local community density index LD is also proposed. Then, a series of CD-based indices include CD-LD-k, CD*LD-k, CD-k and CDI are presented and applied in ten real networks. Experimental results demonstrate the effectiveness of CD-based indices. The effects of network clustering coefficient and assortative coefficient on prediction accuracy of indices are analyzed. CD-LD-k and CD*LD-k can improve prediction accuracy without considering the assortative coefficient of network is negative or positive. According to analysis of relative precision of each method on each network, CD-LD-k and CD*LD-k indices have excellent average performance and robustness. CD and CD-k indices perform better on positive assortative networks than on negative assortative networks. For negative assortative networks, we improve and refine CD index, referred as CDI index, combining the advantages of CD index and evolutionary mechanism of the network model BA. Experimental results reveal that CDI index can increase prediction accuracy of CD on negative assortative networks. PMID:26752405

  2. Short-arc measurement and fitting based on the bidirectional prediction of observed data

    NASA Astrophysics Data System (ADS)

    Fei, Zhigen; Xu, Xiaojie; Georgiadis, Anthimos

    2016-02-01

    To measure a short arc is a notoriously difficult problem. In this study, the bidirectional prediction method based on the Radial Basis Function Neural Network (RBFNN) to the observed data distributed along a short arc is proposed to increase the corresponding arc length, and thus improve its fitting accuracy. Firstly, the rationality of regarding observed data as a time series is discussed in accordance with the definition of a time series. Secondly, the RBFNN is constructed to predict the observed data where the interpolation method is used for enlarging the size of training examples in order to improve the learning accuracy of the RBFNN’s parameters. Finally, in the numerical simulation section, we focus on simulating how the size of the training sample and noise level influence the learning error and prediction error of the built RBFNN. Typically, the observed data coming from a 5{}^\\circ short arc are used to evaluate the performance of the Hyper method known as the ‘unbiased fitting method of circle’ with a different noise level before and after prediction. A number of simulation experiments reveal that the fitting stability and accuracy of the Hyper method after prediction are far superior to the ones before prediction.

  3. Parametric Bayesian priors and better choice of negative examples improve protein function prediction.

    PubMed

    Youngs, Noah; Penfold-Brown, Duncan; Drew, Kevin; Shasha, Dennis; Bonneau, Richard

    2013-05-01

    Computational biologists have demonstrated the utility of using machine learning methods to predict protein function from an integration of multiple genome-wide data types. Yet, even the best performing function prediction algorithms rely on heuristics for important components of the algorithm, such as choosing negative examples (proteins without a given function) or determining key parameters. The improper choice of negative examples, in particular, can hamper the accuracy of protein function prediction. We present a novel approach for choosing negative examples, using a parameterizable Bayesian prior computed from all observed annotation data, which also generates priors used during function prediction. We incorporate this new method into the GeneMANIA function prediction algorithm and demonstrate improved accuracy of our algorithm over current top-performing function prediction methods on the yeast and mouse proteomes across all metrics tested. Code and Data are available at: http://bonneaulab.bio.nyu.edu/funcprop.html

  4. Accuracy of Predicted Genomic Breeding Values in Purebred and Crossbred Pigs.

    PubMed

    Hidalgo, André M; Bastiaansen, John W M; Lopes, Marcos S; Harlizius, Barbara; Groenen, Martien A M; de Koning, Dirk-Jan

    2015-05-26

    Genomic selection has been widely implemented in dairy cattle breeding when the aim is to improve performance of purebred animals. In pigs, however, the final product is a crossbred animal. This may affect the efficiency of methods that are currently implemented for dairy cattle. Therefore, the objective of this study was to determine the accuracy of predicted breeding values in crossbred pigs using purebred genomic and phenotypic data. A second objective was to compare the predictive ability of SNPs when training is done in either single or multiple populations for four traits: age at first insemination (AFI); total number of piglets born (TNB); litter birth weight (LBW); and litter variation (LVR). We performed marker-based and pedigree-based predictions. Within-population predictions for the four traits ranged from 0.21 to 0.72. Multi-population prediction yielded accuracies ranging from 0.18 to 0.67. Predictions across purebred populations as well as predicting genetic merit of crossbreds from their purebred parental lines for AFI performed poorly (not significantly different from zero). In contrast, accuracies of across-population predictions and accuracies of purebred to crossbred predictions for LBW and LVR ranged from 0.08 to 0.31 and 0.11 to 0.31, respectively. Accuracy for TNB was zero for across-population prediction, whereas for purebred to crossbred prediction it ranged from 0.08 to 0.22. In general, marker-based outperformed pedigree-based prediction across populations and traits. However, in some cases pedigree-based prediction performed similarly or outperformed marker-based prediction. There was predictive ability when purebred populations were used to predict crossbred genetic merit using an additive model in the populations studied. AFI was the only exception, indicating that predictive ability depends largely on the genetic correlation between PB and CB performance, which was 0.31 for AFI. Multi-population prediction was no better than within-population prediction for the purebred validation set. Accuracy of prediction was very trait-dependent. Copyright © 2015 Hidalgo et al.

  5. Small angle X-ray scattering and cross-linking for data assisted protein structure prediction in CASP 12 with prospects for improved accuracy.

    PubMed

    Ogorzalek, Tadeusz L; Hura, Greg L; Belsom, Adam; Burnett, Kathryn H; Kryshtafovych, Andriy; Tainer, John A; Rappsilber, Juri; Tsutakawa, Susan E; Fidelis, Krzysztof

    2018-03-01

    Experimental data offers empowering constraints for structure prediction. These constraints can be used to filter equivalently scored models or more powerfully within optimization functions toward prediction. In CASP12, Small Angle X-ray Scattering (SAXS) and Cross-Linking Mass Spectrometry (CLMS) data, measured on an exemplary set of novel fold targets, were provided to the CASP community of protein structure predictors. As solution-based techniques, SAXS and CLMS can efficiently measure states of the full-length sequence in its native solution conformation and assembly. However, this experimental data did not substantially improve prediction accuracy judged by fits to crystallographic models. One issue, beyond intrinsic limitations of the algorithms, was a disconnect between crystal structures and solution-based measurements. Our analyses show that many targets had substantial percentages of disordered regions (up to 40%) or were multimeric or both. Thus, solution measurements of flexibility and assembly support variations that may confound prediction algorithms trained on crystallographic data and expecting globular fully-folded monomeric proteins. Here, we consider the CLMS and SAXS data collected, the information in these solution measurements, and the challenges in incorporating them into computational prediction. As improvement opportunities were only partly realized in CASP12, we provide guidance on how data from the full-length biological unit and the solution state can better aid prediction of the folded monomer or subunit. We furthermore describe strategic integrations of solution measurements with computational prediction programs with the aim of substantially improving foundational knowledge and the accuracy of computational algorithms for biologically-relevant structure predictions for proteins in solution. © 2018 Wiley Periodicals, Inc.

  6. Iowa calibration of MEPDG performance prediction models.

    DOT National Transportation Integrated Search

    2013-06-01

    This study aims to improve the accuracy of AASHTO Mechanistic-Empirical Pavement Design Guide (MEPDG) pavement : performance predictions for Iowa pavement systems through local calibration of MEPDG prediction models. A total of 130 : representative p...

  7. Asymmetric bagging and feature selection for activities prediction of drug molecules.

    PubMed

    Li, Guo-Zheng; Meng, Hao-Hua; Lu, Wen-Cong; Yang, Jack Y; Yang, Mary Qu

    2008-05-28

    Activities of drug molecules can be predicted by QSAR (quantitative structure activity relationship) models, which overcomes the disadvantages of high cost and long cycle by employing the traditional experimental method. With the fact that the number of drug molecules with positive activity is rather fewer than that of negatives, it is important to predict molecular activities considering such an unbalanced situation. Here, asymmetric bagging and feature selection are introduced into the problem and asymmetric bagging of support vector machines (asBagging) is proposed on predicting drug activities to treat the unbalanced problem. At the same time, the features extracted from the structures of drug molecules affect prediction accuracy of QSAR models. Therefore, a novel algorithm named PRIFEAB is proposed, which applies an embedded feature selection method to remove redundant and irrelevant features for asBagging. Numerical experimental results on a data set of molecular activities show that asBagging improve the AUC and sensitivity values of molecular activities and PRIFEAB with feature selection further helps to improve the prediction ability. Asymmetric bagging can help to improve prediction accuracy of activities of drug molecules, which can be furthermore improved by performing feature selection to select relevant features from the drug molecules data sets.

  8. Prediction of welding shrinkage deformation of bridge steel box girder based on wavelet neural network

    NASA Astrophysics Data System (ADS)

    Tao, Yulong; Miao, Yunshui; Han, Jiaqi; Yan, Feiyun

    2018-05-01

    Aiming at the low accuracy of traditional forecasting methods such as linear regression method, this paper presents a prediction method for predicting the relationship between bridge steel box girder and its displacement with wavelet neural network. Compared with traditional forecasting methods, this scheme has better local characteristics and learning ability, which greatly improves the prediction ability of deformation. Through analysis of the instance and found that after compared with the traditional prediction method based on wavelet neural network, the rigid beam deformation prediction accuracy is higher, and is superior to the BP neural network prediction results, conform to the actual demand of engineering design.

  9. Boomerang: A method for recursive reclassification.

    PubMed

    Devlin, Sean M; Ostrovnaya, Irina; Gönen, Mithat

    2016-09-01

    While there are many validated prognostic classifiers used in practice, often their accuracy is modest and heterogeneity in clinical outcomes exists in one or more risk subgroups. Newly available markers, such as genomic mutations, may be used to improve the accuracy of an existing classifier by reclassifying patients from a heterogenous group into a higher or lower risk category. The statistical tools typically applied to develop the initial classifiers are not easily adapted toward this reclassification goal. In this article, we develop a new method designed to refine an existing prognostic classifier by incorporating new markers. The two-stage algorithm called Boomerang first searches for modifications of the existing classifier that increase the overall predictive accuracy and then merges to a prespecified number of risk groups. Resampling techniques are proposed to assess the improvement in predictive accuracy when an independent validation data set is not available. The performance of the algorithm is assessed under various simulation scenarios where the marker frequency, degree of censoring, and total sample size are varied. The results suggest that the method selects few false positive markers and is able to improve the predictive accuracy of the classifier in many settings. Lastly, the method is illustrated on an acute myeloid leukemia data set where a new refined classifier incorporates four new mutations into the existing three category classifier and is validated on an independent data set. © 2016, The International Biometric Society.

  10. Boomerang: A Method for Recursive Reclassification

    PubMed Central

    Devlin, Sean M.; Ostrovnaya, Irina; Gönen, Mithat

    2016-01-01

    Summary While there are many validated prognostic classifiers used in practice, often their accuracy is modest and heterogeneity in clinical outcomes exists in one or more risk subgroups. Newly available markers, such as genomic mutations, may be used to improve the accuracy of an existing classifier by reclassifying patients from a heterogenous group into a higher or lower risk category. The statistical tools typically applied to develop the initial classifiers are not easily adapted towards this reclassification goal. In this paper, we develop a new method designed to refine an existing prognostic classifier by incorporating new markers. The two-stage algorithm called Boomerang first searches for modifications of the existing classifier that increase the overall predictive accuracy and then merges to a pre-specified number of risk groups. Resampling techniques are proposed to assess the improvement in predictive accuracy when an independent validation data set is not available. The performance of the algorithm is assessed under various simulation scenarios where the marker frequency, degree of censoring, and total sample size are varied. The results suggest that the method selects few false positive markers and is able to improve the predictive accuracy of the classifier in many settings. Lastly, the method is illustrated on an acute myeloid leukemia dataset where a new refined classifier incorporates four new mutations into the existing three category classifier and is validated on an independent dataset. PMID:26754051

  11. Utilizing multiple scale models to improve predictions of extra-axial hemorrhage in the immature piglet.

    PubMed

    Scott, Gregory G; Margulies, Susan S; Coats, Brittany

    2016-10-01

    Traumatic brain injury (TBI) is a leading cause of death and disability in the USA. To help understand and better predict TBI, researchers have developed complex finite element (FE) models of the head which incorporate many biological structures such as scalp, skull, meninges, brain (with gray/white matter differentiation), and vasculature. However, most models drastically simplify the membranes and substructures between the pia and arachnoid membranes. We hypothesize that substructures in the pia-arachnoid complex (PAC) contribute substantially to brain deformation following head rotation, and that when included in FE models accuracy of extra-axial hemorrhage prediction improves. To test these hypotheses, microscale FE models of the PAC were developed to span the variability of PAC substructure anatomy and regional density. The constitutive response of these models were then integrated into an existing macroscale FE model of the immature piglet brain to identify changes in cortical stress distribution and predictions of extra-axial hemorrhage (EAH). Incorporating regional variability of PAC substructures substantially altered the distribution of principal stress on the cortical surface of the brain compared to a uniform representation of the PAC. Simulations of 24 non-impact rapid head rotations in an immature piglet animal model resulted in improved accuracy of EAH prediction (to 94 % sensitivity, 100 % specificity), as well as a high accuracy in regional hemorrhage prediction (to 82-100 % sensitivity, 100 % specificity). We conclude that including a biofidelic PAC substructure variability in FE models of the head is essential for improved predictions of hemorrhage at the brain/skull interface.

  12. Efficient differentially private learning improves drug sensitivity prediction.

    PubMed

    Honkela, Antti; Das, Mrinal; Nieminen, Arttu; Dikmen, Onur; Kaski, Samuel

    2018-02-06

    Users of a personalised recommendation system face a dilemma: recommendations can be improved by learning from data, but only if other users are willing to share their private information. Good personalised predictions are vitally important in precision medicine, but genomic information on which the predictions are based is also particularly sensitive, as it directly identifies the patients and hence cannot easily be anonymised. Differential privacy has emerged as a potentially promising solution: privacy is considered sufficient if presence of individual patients cannot be distinguished. However, differentially private learning with current methods does not improve predictions with feasible data sizes and dimensionalities. We show that useful predictors can be learned under powerful differential privacy guarantees, and even from moderately-sized data sets, by demonstrating significant improvements in the accuracy of private drug sensitivity prediction with a new robust private regression method. Our method matches the predictive accuracy of the state-of-the-art non-private lasso regression using only 4x more samples under relatively strong differential privacy guarantees. Good performance with limited data is achieved by limiting the sharing of private information by decreasing the dimensionality and by projecting outliers to fit tighter bounds, therefore needing to add less noise for equal privacy. The proposed differentially private regression method combines theoretical appeal and asymptotic efficiency with good prediction accuracy even with moderate-sized data. As already the simple-to-implement method shows promise on the challenging genomic data, we anticipate rapid progress towards practical applications in many fields. This article was reviewed by Zoltan Gaspari and David Kreil.

  13. Effects of Recovery Behavior and Strain-Rate Dependence of Stress-Strain Curve on Prediction Accuracy of Thermal Stress Analysis During Casting

    NASA Astrophysics Data System (ADS)

    Motoyama, Yuichi; Shiga, Hidetoshi; Sato, Takeshi; Kambe, Hiroshi; Yoshida, Makoto

    2017-06-01

    Recovery behavior (recovery) and strain-rate dependence of the stress-strain curve (strain-rate dependence) are incorporated into constitutive equations of alloys to predict residual stress and thermal stress during casting. Nevertheless, few studies have systematically investigated the effects of these metallurgical phenomena on the prediction accuracy of thermal stress in a casting. This study compares the thermal stress analysis results with in situ thermal stress measurement results of an Al-Si-Cu specimen during casting. The results underscore the importance for the alloy constitutive equation of incorporating strain-rate dependence to predict thermal stress that develops at high temperatures where the alloy shows strong strain-rate dependence of the stress-strain curve. However, the prediction accuracy of the thermal stress developed at low temperatures did not improve by considering the strain-rate dependence. Incorporating recovery into the constitutive equation improved the accuracy of the simulated thermal stress at low temperatures. Results of comparison implied that the constitutive equation should include strain-rate dependence to simulate defects that develop from thermal stress at high temperatures, such as hot tearing and hot cracking. Recovery should be incorporated into the alloy constitutive equation to predict the casting residual stress and deformation caused by the thermal stress developed mainly in the low temperature range.

  14. RNA secondary structure prediction with pseudoknots: Contribution of algorithm versus energy model.

    PubMed

    Jabbari, Hosna; Wark, Ian; Montemagno, Carlo

    2018-01-01

    RNA is a biopolymer with various applications inside the cell and in biotechnology. Structure of an RNA molecule mainly determines its function and is essential to guide nanostructure design. Since experimental structure determination is time-consuming and expensive, accurate computational prediction of RNA structure is of great importance. Prediction of RNA secondary structure is relatively simpler than its tertiary structure and provides information about its tertiary structure, therefore, RNA secondary structure prediction has received attention in the past decades. Numerous methods with different folding approaches have been developed for RNA secondary structure prediction. While methods for prediction of RNA pseudoknot-free structure (structures with no crossing base pairs) have greatly improved in terms of their accuracy, methods for prediction of RNA pseudoknotted secondary structure (structures with crossing base pairs) still have room for improvement. A long-standing question for improving the prediction accuracy of RNA pseudoknotted secondary structure is whether to focus on the prediction algorithm or the underlying energy model, as there is a trade-off on computational cost of the prediction algorithm versus the generality of the method. The aim of this work is to argue when comparing different methods for RNA pseudoknotted structure prediction, the combination of algorithm and energy model should be considered and a method should not be considered superior or inferior to others if they do not use the same scoring model. We demonstrate that while the folding approach is important in structure prediction, it is not the only important factor in prediction accuracy of a given method as the underlying energy model is also as of great value. Therefore we encourage researchers to pay particular attention in comparing methods with different energy models.

  15. Genomic prediction of piglet response to infection with one of two porcine reproductive and respiratory syndrome virus isolates.

    PubMed

    Waide, Emily H; Tuggle, Christopher K; Serão, Nick V L; Schroyen, Martine; Hess, Andrew; Rowland, Raymond R R; Lunney, Joan K; Plastow, Graham; Dekkers, Jack C M

    2018-02-01

    Genomic prediction of the pig's response to the porcine reproductive and respiratory syndrome (PRRS) virus (PRRSV) would be a useful tool in the swine industry. This study investigated the accuracy of genomic prediction based on porcine SNP60 Beadchip data using training and validation datasets from populations with different genetic backgrounds that were challenged with different PRRSV isolates. Genomic prediction accuracy averaged 0.34 for viral load (VL) and 0.23 for weight gain (WG) following experimental PRRSV challenge, which demonstrates that genomic selection could be used to improve response to PRRSV infection. Training on WG data during infection with a less virulent PRRSV, KS06, resulted in poor accuracy of prediction for WG during infection with a more virulent PRRSV, NVSL. Inclusion of single nucleotide polymorphisms (SNPs) that are in linkage disequilibrium with a major quantitative trait locus (QTL) on chromosome 4 was vital for accurate prediction of VL. Overall, SNPs that were significantly associated with either trait in single SNP genome-wide association analysis were unable to predict the phenotypes with an accuracy as high as that obtained by using all genotyped SNPs across the genome. Inclusion of data from close relatives into the training population increased whole genome prediction accuracy by 33% for VL and by 37% for WG but did not affect the accuracy of prediction when using only SNPs in the major QTL region. Results show that genomic prediction of response to PRRSV infection is moderately accurate and, when using all SNPs on the porcine SNP60 Beadchip, is not very sensitive to differences in virulence of the PRRSV in training and validation populations. Including close relatives in the training population increased prediction accuracy when using the whole genome or SNPs other than those near a major QTL.

  16. The wisdom of crowds in action: Forecasting epidemic diseases with a web-based prediction market system.

    PubMed

    Li, Eldon Y; Tung, Chen-Yuan; Chang, Shu-Hsun

    2016-08-01

    The quest for an effective system capable of monitoring and predicting the trends of epidemic diseases is a critical issue for communities worldwide. With the prevalence of Internet access, more and more researchers today are using data from both search engines and social media to improve the prediction accuracy. In particular, a prediction market system (PMS) exploits the wisdom of crowds on the Internet to effectively accomplish relatively high accuracy. This study presents the architecture of a PMS and demonstrates the matching mechanism of logarithmic market scoring rules. The system was implemented to predict infectious diseases in Taiwan with the wisdom of crowds in order to improve the accuracy of epidemic forecasting. The PMS architecture contains three design components: database clusters, market engine, and Web applications. The system accumulated knowledge from 126 health professionals for 31 weeks to predict five disease indicators: the confirmed cases of dengue fever, the confirmed cases of severe and complicated influenza, the rate of enterovirus infections, the rate of influenza-like illnesses, and the confirmed cases of severe and complicated enterovirus infection. Based on the winning ratio, the PMS predicts the trends of three out of five disease indicators more accurately than does the existing system that uses the five-year average values of historical data for the same weeks. In addition, the PMS with the matching mechanism of logarithmic market scoring rules is easy to understand for health professionals and applicable to predict all the five disease indicators. The PMS architecture of this study affords organizations and individuals to implement it for various purposes in our society. The system can continuously update the data and improve prediction accuracy in monitoring and forecasting the trends of epidemic diseases. Future researchers could replicate and apply the PMS demonstrated in this study to more infectious diseases and wider geographical areas, especially the under-developed countries across Asia and Africa. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  17. Efficient pairwise RNA structure prediction using probabilistic alignment constraints in Dynalign

    PubMed Central

    2007-01-01

    Background Joint alignment and secondary structure prediction of two RNA sequences can significantly improve the accuracy of the structural predictions. Methods addressing this problem, however, are forced to employ constraints that reduce computation by restricting the alignments and/or structures (i.e. folds) that are permissible. In this paper, a new methodology is presented for the purpose of establishing alignment constraints based on nucleotide alignment and insertion posterior probabilities. Using a hidden Markov model, posterior probabilities of alignment and insertion are computed for all possible pairings of nucleotide positions from the two sequences. These alignment and insertion posterior probabilities are additively combined to obtain probabilities of co-incidence for nucleotide position pairs. A suitable alignment constraint is obtained by thresholding the co-incidence probabilities. The constraint is integrated with Dynalign, a free energy minimization algorithm for joint alignment and secondary structure prediction. The resulting method is benchmarked against the previous version of Dynalign and against other programs for pairwise RNA structure prediction. Results The proposed technique eliminates manual parameter selection in Dynalign and provides significant computational time savings in comparison to prior constraints in Dynalign while simultaneously providing a small improvement in the structural prediction accuracy. Savings are also realized in memory. In experiments over a 5S RNA dataset with average sequence length of approximately 120 nucleotides, the method reduces computation by a factor of 2. The method performs favorably in comparison to other programs for pairwise RNA structure prediction: yielding better accuracy, on average, and requiring significantly lesser computational resources. Conclusion Probabilistic analysis can be utilized in order to automate the determination of alignment constraints for pairwise RNA structure prediction methods in a principled fashion. These constraints can reduce the computational and memory requirements of these methods while maintaining or improving their accuracy of structural prediction. This extends the practical reach of these methods to longer length sequences. The revised Dynalign code is freely available for download. PMID:17445273

  18. Efficiency of using first-generation information during second-generation selection: results of computer simulation.

    Treesearch

    T.Z. Ye; K.J.S. Jayawickrama; G.R. Johnson

    2004-01-01

    BLUP (Best linear unbiased prediction) method has been widely used in forest tree improvement programs. Since one of the properties of BLUP is that related individuals contribute to the predictions of each other, it seems logical that integrating data from all generations and from all populations would improve both the precision and accuracy in predicting genetic...

  19. A fast and robust iterative algorithm for prediction of RNA pseudoknotted secondary structures

    PubMed Central

    2014-01-01

    Background Improving accuracy and efficiency of computational methods that predict pseudoknotted RNA secondary structures is an ongoing challenge. Existing methods based on free energy minimization tend to be very slow and are limited in the types of pseudoknots that they can predict. Incorporating known structural information can improve prediction accuracy; however, there are not many methods for prediction of pseudoknotted structures that can incorporate structural information as input. There is even less understanding of the relative robustness of these methods with respect to partial information. Results We present a new method, Iterative HFold, for pseudoknotted RNA secondary structure prediction. Iterative HFold takes as input a pseudoknot-free structure, and produces a possibly pseudoknotted structure whose energy is at least as low as that of any (density-2) pseudoknotted structure containing the input structure. Iterative HFold leverages strengths of earlier methods, namely the fast running time of HFold, a method that is based on the hierarchical folding hypothesis, and the energy parameters of HotKnots V2.0. Our experimental evaluation on a large data set shows that Iterative HFold is robust with respect to partial information, with average accuracy on pseudoknotted structures steadily increasing from roughly 54% to 79% as the user provides up to 40% of the input structure. Iterative HFold is much faster than HotKnots V2.0, while having comparable accuracy. Iterative HFold also has significantly better accuracy than IPknot on our HK-PK and IP-pk168 data sets. Conclusions Iterative HFold is a robust method for prediction of pseudoknotted RNA secondary structures, whose accuracy with more than 5% information about true pseudoknot-free structures is better than that of IPknot, and with about 35% information about true pseudoknot-free structures compares well with that of HotKnots V2.0 while being significantly faster. Iterative HFold and all data used in this work are freely available at http://www.cs.ubc.ca/~hjabbari/software.php. PMID:24884954

  20. A Coupled Surface Nudging Scheme for use in Retrospective ...

    EPA Pesticide Factsheets

    A surface analysis nudging scheme coupling atmospheric and land surface thermodynamic parameters has been implemented into WRF v3.8 (latest version) for use with retrospective weather and climate simulations, as well as for applications in air quality, hydrology, and ecosystem modeling. This scheme is known as the flux-adjusting surface data assimilation system (FASDAS) developed by Alapaty et al. (2008). This scheme provides continuous adjustments for soil moisture and temperature (via indirect nudging) and for surface air temperature and water vapor mixing ratio (via direct nudging). The simultaneous application of indirect and direct nudging maintains greater consistency between the soil temperature–moisture and the atmospheric surface layer mass-field variables. The new method, FASDAS, consistently improved the accuracy of the model simulations at weather prediction scales for different horizontal grid resolutions, as well as for high resolution regional climate predictions. This new capability has been released in WRF Version 3.8 as option grid_sfdda = 2. This new capability increased the accuracy of atmospheric inputs for use air quality, hydrology, and ecosystem modeling research to improve the accuracy of respective end-point research outcome. IMPACT: A new method, FASDAS, was implemented into the WRF model to consistently improve the accuracy of the model simulations at weather prediction scales for different horizontal grid resolutions, as wel

  1. Adjusting the Stems Regional Forest Growth Model to Improve Local Predictions

    Treesearch

    W. Brad Smith

    1983-01-01

    A simple procedure using double sampling is described for adjusting growth in the STEMS regional forest growth model to compensate for subregional variations. Predictive accuracy of the STEMS model (a distance-independent, individual tree growth model for Lake States forests) was improved by using this procedure

  2. Does Incorporating Change in APRI or FIB-4 Indices Over Time Improve the Accuracy of a Single Index for Identifying Liver Fibrosis in Persons With Chronic Hepatitis C Virus Infection?

    PubMed

    Gounder, Prabhu P; Haering, Celia; Bruden, Dana J T; Townshend-Bulson, Lisa; Simons, Brenna C; Spradling, Philip R; McMahon, Brian J

    2018-01-01

    The aspartate aminotransferase-to-platelet ratio index (APRI) and a fibrosis index calculated using platelets (FIB-4) have been proposed as noninvasive markers of liver fibrosis. To determine APRI/FIB-4 accuracy for predicting histologic liver fibrosis and evaluate whether incorporating change in index improves test accuracy in hepatitis C virus (HCV)-infected Alaska Native persons. Using liver histology as the gold standard, we determined the test characteristics of APRI to predict Metavir ≥F2 fibrosis and FIB-4 to predict Metavir ≥F3 fibrosis. Index discrimination was measured as the area under the receiver operator characteristic curve. We fit a logistic regression model to determine whether incorporating change in APRI/FIB-4 over time improved index discrimination. Among 283 participants, 46% were female, 48% had a body mass index >30, 11% had diabetes mellitus, 8% reported current heavy alcohol use. Participants were infected with HCV genotypes 1 (68%), 2 (17%), or 3 (15%). On liver histology, 30% of study participants had ≥F2 fibrosis and 15% had ≥F3 fibrosis. The positive predictive value of an APRI>1.5/FIB-4>3.25 for identifying fibrosis was 77%/78%. The negative predictive value of an APRI<0.5/FIB-4<1.45 was 91%/87%. The area under the receiver operator characteristic curve of an APRI/FIB-4 for identifying fibrosis was 0.82/0.84. Incorporating change in APRI/FIB-4 did not improve index discrimination. The accuracy of APRI/FIB-4 for identifying liver fibrosis in HCV-infected Alaska Native persons is similar to that reported in other populations and could help prioritize patients for treatment living in areas without access to liver biopsy. Change in APRI/FIB-4 was not predictive of degree of fibrosis.

  3. Genomic prediction using phenotypes from pedigreed lines with no marker data

    USDA-ARS?s Scientific Manuscript database

    Until now genomic prediction in plant breeding has only used information from individuals that have been genotyped. In practice, information from non-genotyped relatives of genotyped individuals can be used to improve the genomic prediction accuracy. Single-step genomic prediction integrates all the...

  4. Accuracy of predicting genomic breeding values for residual feed intake in Angus and Charolais beef cattle.

    PubMed

    Chen, L; Schenkel, F; Vinsky, M; Crews, D H; Li, C

    2013-10-01

    In beef cattle, phenotypic data that are difficult and/or costly to measure, such as feed efficiency, and DNA marker genotypes are usually available on a small number of animals of different breeds or populations. To achieve a maximal accuracy of genomic prediction using the phenotype and genotype data, strategies for forming a training population to predict genomic breeding values (GEBV) of the selection candidates need to be evaluated. In this study, we examined the accuracy of predicting GEBV for residual feed intake (RFI) based on 522 Angus and 395 Charolais steers genotyped on SNP with the Illumina Bovine SNP50 Beadchip for 3 training population forming strategies: within breed, across breed, and by pooling data from the 2 breeds (i.e., combined). Two other scenarios with the training and validation data split by birth year and by sire family within a breed were also investigated to assess the impact of genetic relationships on the accuracy of genomic prediction. Three statistical methods including the best linear unbiased prediction with the relationship matrix defined based on the pedigree (PBLUP), based on the SNP genotypes (GBLUP), and a Bayesian method (BayesB) were used to predict the GEBV. The results showed that the accuracy of the GEBV prediction was the highest when the prediction was within breed and when the validation population had greater genetic relationships with the training population, with a maximum of 0.58 for Angus and 0.64 for Charolais. The within-breed prediction accuracies dropped to 0.29 and 0.38, respectively, when the validation populations had a minimal pedigree link with the training population. When the training population of a different breed was used to predict the GEBV of the validation population, that is, across-breed genomic prediction, the accuracies were further reduced to 0.10 to 0.22, depending on the prediction method used. Pooling data from the 2 breeds to form the training population resulted in accuracies increased to 0.31 and 0.43, respectively, for the Angus and Charolais validation populations. The results suggested that the genetic relationship of selection candidates with the training population has a greater impact on the accuracy of GEBV using the Illumina Bovine SNP50 Beadchip. Pooling data from different breeds to form the training population will improve the accuracy of across breed genomic prediction for RFI in beef cattle.

  5. Estimation of genomic prediction accuracy from reference populations with varying degrees of relationship.

    PubMed

    Lee, S Hong; Clark, Sam; van der Werf, Julius H J

    2017-01-01

    Genomic prediction is emerging in a wide range of fields including animal and plant breeding, risk prediction in human precision medicine and forensic. It is desirable to establish a theoretical framework for genomic prediction accuracy when the reference data consists of information sources with varying degrees of relationship to the target individuals. A reference set can contain both close and distant relatives as well as 'unrelated' individuals from the wider population in the genomic prediction. The various sources of information were modeled as different populations with different effective population sizes (Ne). Both the effective number of chromosome segments (Me) and Ne are considered to be a function of the data used for prediction. We validate our theory with analyses of simulated as well as real data, and illustrate that the variation in genomic relationships with the target is a predictor of the information content of the reference set. With a similar amount of data available for each source, we show that close relatives can have a substantially larger effect on genomic prediction accuracy than lesser related individuals. We also illustrate that when prediction relies on closer relatives, there is less improvement in prediction accuracy with an increase in training data or marker panel density. We release software that can estimate the expected prediction accuracy and power when combining different reference sources with various degrees of relationship to the target, which is useful when planning genomic prediction (before or after collecting data) in animal, plant and human genetics.

  6. Calibration of PMIS pavement performance prediction models.

    DOT National Transportation Integrated Search

    2012-02-01

    Improve the accuracy of TxDOTs existing pavement performance prediction models through calibrating these models using actual field data obtained from the Pavement Management Information System (PMIS). : Ensure logical performance superiority patte...

  7. Sixty-five years of the long march in protein secondary structure prediction: the final stretch?

    PubMed Central

    Yang, Yuedong; Gao, Jianzhao; Wang, Jihua; Heffernan, Rhys; Hanson, Jack; Paliwal, Kuldip; Zhou, Yaoqi

    2018-01-01

    Abstract Protein secondary structure prediction began in 1951 when Pauling and Corey predicted helical and sheet conformations for protein polypeptide backbone even before the first protein structure was determined. Sixty-five years later, powerful new methods breathe new life into this field. The highest three-state accuracy without relying on structure templates is now at 82–84%, a number unthinkable just a few years ago. These improvements came from increasingly larger databases of protein sequences and structures for training, the use of template secondary structure information and more powerful deep learning techniques. As we are approaching to the theoretical limit of three-state prediction (88–90%), alternative to secondary structure prediction (prediction of backbone torsion angles and Cα-atom-based angles and torsion angles) not only has more room for further improvement but also allows direct prediction of three-dimensional fragment structures with constantly improved accuracy. About 20% of all 40-residue fragments in a database of 1199 non-redundant proteins have <6 Å root-mean-squared distance from the native conformations by SPIDER2. More powerful deep learning methods with improved capability of capturing long-range interactions begin to emerge as the next generation of techniques for secondary structure prediction. The time has come to finish off the final stretch of the long march towards protein secondary structure prediction. PMID:28040746

  8. Entropy-based link prediction in weighted networks

    NASA Astrophysics Data System (ADS)

    Xu, Zhongqi; Pu, Cunlai; Ramiz Sharafat, Rajput; Li, Lunbo; Yang, Jian

    2017-01-01

    Information entropy has been proved to be an effective tool to quantify the structural importance of complex networks. In the previous work (Xu et al, 2016 \\cite{xu2016}), we measure the contribution of a path in link prediction with information entropy. In this paper, we further quantify the contribution of a path with both path entropy and path weight, and propose a weighted prediction index based on the contributions of paths, namely Weighted Path Entropy (WPE), to improve the prediction accuracy in weighted networks. Empirical experiments on six weighted real-world networks show that WPE achieves higher prediction accuracy than three typical weighted indices.

  9. Dynamic Filtering Improves Attentional State Prediction with fNIRS

    NASA Technical Reports Server (NTRS)

    Harrivel, Angela R.; Weissman, Daniel H.; Noll, Douglas C.; Huppert, Theodore; Peltier, Scott J.

    2016-01-01

    Brain activity can predict a person's level of engagement in an attentional task. However, estimates of brain activity are often confounded by measurement artifacts and systemic physiological noise. The optimal method for filtering this noise - thereby increasing such state prediction accuracy - remains unclear. To investigate this, we asked study participants to perform an attentional task while we monitored their brain activity with functional near infrared spectroscopy (fNIRS). We observed higher state prediction accuracy when noise in the fNIRS hemoglobin [Hb] signals was filtered with a non-stationary (adaptive) model as compared to static regression (84% +/- 6% versus 72% +/- 15%).

  10. Implementation of Chaotic Gaussian Particle Swarm Optimization for Optimize Learning-to-Rank Software Defect Prediction Model Construction

    NASA Astrophysics Data System (ADS)

    Buchari, M. A.; Mardiyanto, S.; Hendradjaya, B.

    2018-03-01

    Finding the existence of software defect as early as possible is the purpose of research about software defect prediction. Software defect prediction activity is required to not only state the existence of defects, but also to be able to give a list of priorities which modules require a more intensive test. Therefore, the allocation of test resources can be managed efficiently. Learning to rank is one of the approach that can provide defect module ranking data for the purposes of software testing. In this study, we propose a meta-heuristic chaotic Gaussian particle swarm optimization to improve the accuracy of learning to rank software defect prediction approach. We have used 11 public benchmark data sets as experimental data. Our overall results has demonstrated that the prediction models construct using Chaotic Gaussian Particle Swarm Optimization gets better accuracy on 5 data sets, ties in 5 data sets and gets worse in 1 data sets. Thus, we conclude that the application of Chaotic Gaussian Particle Swarm Optimization in Learning-to-Rank approach can improve the accuracy of the defect module ranking in data sets that have high-dimensional features.

  11. Genomic selection across multiple breeding cycles in applied bread wheat breeding.

    PubMed

    Michel, Sebastian; Ametz, Christian; Gungor, Huseyin; Epure, Doru; Grausgruber, Heinrich; Löschenberger, Franziska; Buerstmayr, Hermann

    2016-06-01

    We evaluated genomic selection across five breeding cycles of bread wheat breeding. Bias of within-cycle cross-validation and methods for improving the prediction accuracy were assessed. The prospect of genomic selection has been frequently shown by cross-validation studies using the same genetic material across multiple environments, but studies investigating genomic selection across multiple breeding cycles in applied bread wheat breeding are lacking. We estimated the prediction accuracy of grain yield, protein content and protein yield of 659 inbred lines across five independent breeding cycles and assessed the bias of within-cycle cross-validation. We investigated the influence of outliers on the prediction accuracy and predicted protein yield by its components traits. A high average heritability was estimated for protein content, followed by grain yield and protein yield. The bias of the prediction accuracy using populations from individual cycles using fivefold cross-validation was accordingly substantial for protein yield (17-712 %) and less pronounced for protein content (8-86 %). Cross-validation using the cycles as folds aimed to avoid this bias and reached a maximum prediction accuracy of [Formula: see text] = 0.51 for protein content, [Formula: see text] = 0.38 for grain yield and [Formula: see text] = 0.16 for protein yield. Dropping outlier cycles increased the prediction accuracy of grain yield to [Formula: see text] = 0.41 as estimated by cross-validation, while dropping outlier environments did not have a significant effect on the prediction accuracy. Independent validation suggests, on the other hand, that careful consideration is necessary before an outlier correction is undertaken, which removes lines from the training population. Predicting protein yield by multiplying genomic estimated breeding values of grain yield and protein content raised the prediction accuracy to [Formula: see text] = 0.19 for this derived trait.

  12. A new ionospheric index MF2

    NASA Astrophysics Data System (ADS)

    Mikhailov, A. V.; Mikhailov, V. V.

    1995-02-01

    A new ionospheric index MF2 to improve monthly median foF2 regression and prediction accuracy is proposed. The interhemispheric magnetic conjunction of the F2-region was used to derive this index for the northern hemisphere. Since the monthly MF2 index varies in regular way with the season and in the course of solar cycle this allows an easy long-term prediction. Using MF2 instead of direct solar R12 index considerably improves the quality of the foF2 versus solar activity level regression (by 30% for middle, and by 10% for high latitudes.) For the rising phase of solar cycle 22, MF2 yields much better foF2 prediction accuracy than Consultative Committee on International Radiopropagation (CCIR) numerical maps can achieve.

  13. Improvement of shallow landslide prediction accuracy using soil parameterisation for a granite area in South Korea

    NASA Astrophysics Data System (ADS)

    Kim, M. S.; Onda, Y.; Kim, J. K.

    2015-01-01

    SHALSTAB model applied to shallow landslides induced by rainfall to evaluate soil properties related with the effect of soil depth for a granite area in Jinbu region, Republic of Korea. Soil depth measured by a knocking pole test and two soil parameters from direct shear test (a and b) as well as one soil parameters from a triaxial compression test (c) were collected to determine the input parameters for the model. Experimental soil data were used for the first simulation (Case I) and, soil data represented the effect of measured soil depth and average soil depth from soil data of Case I were used in the second (Case II) and third simulations (Case III), respectively. All simulations were analysed using receiver operating characteristic (ROC) analysis to determine the accuracy of prediction. ROC analysis results for first simulation showed the low ROC values under 0.75 may be due to the internal friction angle and particularly the cohesion value. Soil parameters calculated from a stochastic hydro-geomorphological model were applied to the SHALSTAB model. The accuracy of Case II and Case III using ROC analysis showed higher accuracy values rather than first simulation. Our results clearly demonstrate that the accuracy of shallow landslide prediction can be improved when soil parameters represented the effect of soil thickness.

  14. Genomic selection models double the accuracy of predicted breeding values for bacterial cold water disease resistance compared to a traditional pedigree-based model in rainbow trout aquaculture.

    PubMed

    Vallejo, Roger L; Leeds, Timothy D; Gao, Guangtu; Parsons, James E; Martin, Kyle E; Evenhuis, Jason P; Fragomeni, Breno O; Wiens, Gregory D; Palti, Yniv

    2017-02-01

    Previously, we have shown that bacterial cold water disease (BCWD) resistance in rainbow trout can be improved using traditional family-based selection, but progress has been limited to exploiting only between-family genetic variation. Genomic selection (GS) is a new alternative that enables exploitation of within-family genetic variation. We compared three GS models [single-step genomic best linear unbiased prediction (ssGBLUP), weighted ssGBLUP (wssGBLUP), and BayesB] to predict genomic-enabled breeding values (GEBV) for BCWD resistance in a commercial rainbow trout population, and compared the accuracy of GEBV to traditional estimates of breeding values (EBV) from a pedigree-based BLUP (P-BLUP) model. We also assessed the impact of sampling design on the accuracy of GEBV predictions. For these comparisons, we used BCWD survival phenotypes recorded on 7893 fish from 102 families, of which 1473 fish from 50 families had genotypes [57 K single nucleotide polymorphism (SNP) array]. Naïve siblings of the training fish (n = 930 testing fish) were genotyped to predict their GEBV and mated to produce 138 progeny testing families. In the following generation, 9968 progeny were phenotyped to empirically assess the accuracy of GEBV predictions made on their non-phenotyped parents. The accuracy of GEBV from all tested GS models were substantially higher than the P-BLUP model EBV. The highest increase in accuracy relative to the P-BLUP model was achieved with BayesB (97.2 to 108.8%), followed by wssGBLUP at iteration 2 (94.4 to 97.1%) and 3 (88.9 to 91.2%) and ssGBLUP (83.3 to 85.3%). Reducing the training sample size to n = ~1000 had no negative impact on the accuracy (0.67 to 0.72), but with n = ~500 the accuracy dropped to 0.53 to 0.61 if the training and testing fish were full-sibs, and even substantially lower, to 0.22 to 0.25, when they were not full-sibs. Using progeny performance data, we showed that the accuracy of genomic predictions is substantially higher than estimates obtained from the traditional pedigree-based BLUP model for BCWD resistance. Overall, we found that using a much smaller training sample size compared to similar studies in livestock, GS can substantially improve the selection accuracy and genetic gains for this trait in a commercial rainbow trout breeding population.

  15. Prediction of clinical behaviour and treatment for cancers.

    PubMed

    Futschik, Matthias E; Sullivan, Mike; Reeve, Anthony; Kasabov, Nikola

    2003-01-01

    Prediction of clinical behaviour and treatment for cancers is based on the integration of clinical and pathological parameters. Recent reports have demonstrated that gene expression profiling provides a powerful new approach for determining disease outcome. If clinical and microarray data each contain independent information then it should be possible to combine these datasets to gain more accurate prognostic information. Here, we have used existing clinical information and microarray data to generate a combined prognostic model for outcome prediction for diffuse large B-cell lymphoma (DLBCL). A prediction accuracy of 87.5% was achieved. This constitutes a significant improvement compared to the previously most accurate prognostic model with an accuracy of 77.6%. The model introduced here may be generally applicable to the combination of various types of molecular and clinical data for improving medical decision support systems and individualising patient care.

  16. Machine Learning Algorithms Outperform Conventional Regression Models in Predicting Development of Hepatocellular Carcinoma

    PubMed Central

    Singal, Amit G.; Mukherjee, Ashin; Elmunzer, B. Joseph; Higgins, Peter DR; Lok, Anna S.; Zhu, Ji; Marrero, Jorge A; Waljee, Akbar K

    2015-01-01

    Background Predictive models for hepatocellular carcinoma (HCC) have been limited by modest accuracy and lack of validation. Machine learning algorithms offer a novel methodology, which may improve HCC risk prognostication among patients with cirrhosis. Our study's aim was to develop and compare predictive models for HCC development among cirrhotic patients, using conventional regression analysis and machine learning algorithms. Methods We enrolled 442 patients with Child A or B cirrhosis at the University of Michigan between January 2004 and September 2006 (UM cohort) and prospectively followed them until HCC development, liver transplantation, death, or study termination. Regression analysis and machine learning algorithms were used to construct predictive models for HCC development, which were tested on an independent validation cohort from the Hepatitis C Antiviral Long-term Treatment against Cirrhosis (HALT-C) Trial. Both models were also compared to the previously published HALT-C model. Discrimination was assessed using receiver operating characteristic curve analysis and diagnostic accuracy was assessed with net reclassification improvement and integrated discrimination improvement statistics. Results After a median follow-up of 3.5 years, 41 patients developed HCC. The UM regression model had a c-statistic of 0.61 (95%CI 0.56-0.67), whereas the machine learning algorithm had a c-statistic of 0.64 (95%CI 0.60–0.69) in the validation cohort. The machine learning algorithm had significantly better diagnostic accuracy as assessed by net reclassification improvement (p<0.001) and integrated discrimination improvement (p=0.04). The HALT-C model had a c-statistic of 0.60 (95%CI 0.50-0.70) in the validation cohort and was outperformed by the machine learning algorithm (p=0.047). Conclusion Machine learning algorithms improve the accuracy of risk stratifying patients with cirrhosis and can be used to accurately identify patients at high-risk for developing HCC. PMID:24169273

  17. Mapping water table depth using geophysical and environmental variables.

    PubMed

    Buchanan, S; Triantafilis, J

    2009-01-01

    Despite its importance, accurate representation of the spatial distribution of water table depth remains one of the greatest deficiencies in many hydrological investigations. Historically, both inverse distance weighting (IDW) and ordinary kriging (OK) have been used to interpolate depths. These methods, however, have major limitations: namely they require large numbers of measurements to represent the spatial variability of water table depth and they do not represent the variation between measurement points. We address this issue by assessing the benefits of using stepwise multiple linear regression (MLR) with three different ancillary data sets to predict the water table depth at 100-m intervals. The ancillary data sets used are Electromagnetic (EM34 and EM38), gamma radiometric: potassium (K), uranium (eU), thorium (eTh), total count (TC), and morphometric data. Results show that MLR offers significant precision and accuracy benefits over OK and IDW. Inclusion of the morphometric data set yielded the greatest (16%) improvement in prediction accuracy compared with IDW, followed by the electromagnetic data set (5%). Use of the gamma radiometric data set showed no improvement. The greatest improvement, however, resulted when all data sets were combined (37% increase in prediction accuracy over IDW). Significantly, however, the use of MLR also allows for prediction in variations in water table depth between measurement points, which is crucial for land management.

  18. A Technical Analysis Information Fusion Approach for Stock Price Analysis and Modeling

    NASA Astrophysics Data System (ADS)

    Lahmiri, Salim

    In this paper, we address the problem of technical analysis information fusion in improving stock market index-level prediction. We present an approach for analyzing stock market price behavior based on different categories of technical analysis metrics and a multiple predictive system. Each category of technical analysis measures is used to characterize stock market price movements. The presented predictive system is based on an ensemble of neural networks (NN) coupled with particle swarm intelligence for parameter optimization where each single neural network is trained with a specific category of technical analysis measures. The experimental evaluation on three international stock market indices and three individual stocks show that the presented ensemble-based technical indicators fusion system significantly improves forecasting accuracy in comparison with single NN. Also, it outperforms the classical neural network trained with index-level lagged values and NN trained with stationary wavelet transform details and approximation coefficients. As a result, technical information fusion in NN ensemble architecture helps improving prediction accuracy.

  19. Short communication: Improving accuracy of predicting breeding values in Brazilian Holstein population by adding data from Nordic and French Holstein populations.

    PubMed

    Li, X; Lund, M S; Zhang, Q; Costa, C N; Ducrocq, V; Su, G

    2016-06-01

    The present study investigated the improvement of prediction reliabilities for 3 production traits in Brazilian Holsteins that had no genotype information by adding information from Nordic and French Holstein bulls that had genotypes. The estimated across-country genetic correlations (ranging from 0.604 to 0.726) indicated that an important genotype by environment interaction exists between Brazilian and Nordic (or Nordic and French) populations. Prediction reliabilities for Brazilian genotyped bulls were greatly increased by including data of Nordic and French bulls, and a 2-trait single-step genomic BLUP performed much better than the corresponding pedigree-based BLUP. However, only a minor improvement in prediction reliabilities was observed in nongenotyped Brazilian cows. The results indicate that although there is a large genotype by environment interaction, inclusion of a foreign reference population can improve accuracy of genetic evaluation for the Brazilian Holstein population. However, a Brazilian reference population is necessary to obtain a more accurate genomic evaluation. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  20. Genotyping by sequencing for genomic prediction in a soybean breeding population.

    PubMed

    Jarquín, Diego; Kocak, Kyle; Posadas, Luis; Hyma, Katie; Jedlicka, Joseph; Graef, George; Lorenz, Aaron

    2014-08-29

    Advances in genotyping technology, such as genotyping by sequencing (GBS), are making genomic prediction more attractive to reduce breeding cycle times and costs associated with phenotyping. Genomic prediction and selection has been studied in several crop species, but no reports exist in soybean. The objectives of this study were (i) evaluate prospects for genomic selection using GBS in a typical soybean breeding program and (ii) evaluate the effect of GBS marker selection and imputation on genomic prediction accuracy. To achieve these objectives, a set of soybean lines sampled from the University of Nebraska Soybean Breeding Program were genotyped using GBS and evaluated for yield and other agronomic traits at multiple Nebraska locations. Genotyping by sequencing scored 16,502 single nucleotide polymorphisms (SNPs) with minor-allele frequency (MAF) > 0.05 and percentage of missing values ≤ 5% on 301 elite soybean breeding lines. When SNPs with up to 80% missing values were included, 52,349 SNPs were scored. Prediction accuracy for grain yield, assessed using cross validation, was estimated to be 0.64, indicating good potential for using genomic selection for grain yield in soybean. Filtering SNPs based on missing data percentage had little to no effect on prediction accuracy, especially when random forest imputation was used to impute missing values. The highest accuracies were observed when random forest imputation was used on all SNPs, but differences were not significant. A standard additive G-BLUP model was robust; modeling additive-by-additive epistasis did not provide any improvement in prediction accuracy. The effect of training population size on accuracy began to plateau around 100, but accuracy steadily climbed until the largest possible size was used in this analysis. Including only SNPs with MAF > 0.30 provided higher accuracies when training populations were smaller. Using GBS for genomic prediction in soybean holds good potential to expedite genetic gain. Our results suggest that standard additive G-BLUP models can be used on unfiltered, imputed GBS data without loss in accuracy.

  1. Mining HIV protease cleavage data using genetic programming with a sum-product function.

    PubMed

    Yang, Zheng Rong; Dalby, Andrew R; Qiu, Jing

    2004-12-12

    In order to design effective HIV inhibitors, studying and understanding the mechanism of HIV protease cleavage specification is critical. Various methods have been developed to explore the specificity of HIV protease cleavage activity. However, success in both extracting discriminant rules and maintaining high prediction accuracy is still challenging. The earlier study had employed genetic programming with a min-max scoring function to extract discriminant rules with success. However, the decision will finally be degenerated to one residue making further improvement of the prediction accuracy difficult. The challenge of revising the min-max scoring function so as to improve the prediction accuracy motivated this study. This paper has designed a new scoring function called a sum-product function for extracting HIV protease cleavage discriminant rules using genetic programming methods. The experiments show that the new scoring function is superior to the min-max scoring function. The software package can be obtained by request to Dr Zheng Rong Yang.

  2. An ensemble framework for identifying essential proteins.

    PubMed

    Zhang, Xue; Xiao, Wangxin; Acencio, Marcio Luis; Lemke, Ney; Wang, Xujing

    2016-08-25

    Many centrality measures have been proposed to mine and characterize the correlations between network topological properties and protein essentiality. However, most of them show limited prediction accuracy, and the number of common predicted essential proteins by different methods is very small. In this paper, an ensemble framework is proposed which integrates gene expression data and protein-protein interaction networks (PINs). It aims to improve the prediction accuracy of basic centrality measures. The idea behind this ensemble framework is that different protein-protein interactions (PPIs) may show different contributions to protein essentiality. Five standard centrality measures (degree centrality, betweenness centrality, closeness centrality, eigenvector centrality, and subgraph centrality) are integrated into the ensemble framework respectively. We evaluated the performance of the proposed ensemble framework using yeast PINs and gene expression data. The results show that it can considerably improve the prediction accuracy of the five centrality measures individually. It can also remarkably increase the number of common predicted essential proteins among those predicted by each centrality measure individually and enable each centrality measure to find more low-degree essential proteins. This paper demonstrates that it is valuable to differentiate the contributions of different PPIs for identifying essential proteins based on network topological characteristics. The proposed ensemble framework is a successful paradigm to this end.

  3. Can plantar soft tissue mechanics enhance prognosis of diabetic foot ulcer?

    PubMed

    Naemi, R; Chatzistergos, P; Suresh, S; Sundar, L; Chockalingam, N; Ramachandran, A

    2017-04-01

    To investigate if the assessment of the mechanical properties of plantar soft tissue can increase the accuracy of predicting Diabetic Foot Ulceration (DFU). 40 patients with diabetic neuropathy and no DFU were recruited. Commonly assessed clinical parameters along with plantar soft tissue stiffness and thickness were measured at baseline using ultrasound elastography technique. 7 patients developed foot ulceration during a 12months follow-up. Logistic regression was used to identify parameters that contribute to predicting the DFU incidence. The effect of using parameters related to the mechanical behaviour of plantar soft tissue on the specificity, sensitivity, prediction strength and accuracy of the predicting models for DFU was assessed. Patients with higher plantar soft tissue thickness and lower stiffness at the 1st Metatarsal head area showed an increased risk of DFU. Adding plantar soft tissue stiffness and thickness to the model improved its specificity (by 3%), sensitivity (by 14%), prediction accuracy (by 5%) and prognosis strength (by 1%). The model containing all predictors was able to effectively (χ 2 (8, N=40)=17.55, P<0.05) distinguish between the patients with and without DFU incidence. The mechanical properties of plantar soft tissue can be used to improve the predictability of DFU in moderate/high risk patients. Copyright © 2017 Elsevier B.V. All rights reserved.

  4. Improving the prediction of going concern of Taiwanese listed companies using a hybrid of LASSO with data mining techniques.

    PubMed

    Goo, Yeung-Ja James; Chi, Der-Jang; Shen, Zong-De

    2016-01-01

    The purpose of this study is to establish rigorous and reliable going concern doubt (GCD) prediction models. This study first uses the least absolute shrinkage and selection operator (LASSO) to select variables and then applies data mining techniques to establish prediction models, such as neural network (NN), classification and regression tree (CART), and support vector machine (SVM). The samples of this study include 48 GCD listed companies and 124 NGCD (non-GCD) listed companies from 2002 to 2013 in the TEJ database. We conduct fivefold cross validation in order to identify the prediction accuracy. According to the empirical results, the prediction accuracy of the LASSO-NN model is 88.96 % (Type I error rate is 12.22 %; Type II error rate is 7.50 %), the prediction accuracy of the LASSO-CART model is 88.75 % (Type I error rate is 13.61 %; Type II error rate is 14.17 %), and the prediction accuracy of the LASSO-SVM model is 89.79 % (Type I error rate is 10.00 %; Type II error rate is 15.83 %).

  5. New insights from cluster analysis methods for RNA secondary structure prediction

    PubMed Central

    Rogers, Emily; Heitsch, Christine

    2016-01-01

    A widening gap exists between the best practices for RNA secondary structure prediction developed by computational researchers and the methods used in practice by experimentalists. Minimum free energy (MFE) predictions, although broadly used, are outperformed by methods which sample from the Boltzmann distribution and data mine the results. In particular, moving beyond the single structure prediction paradigm yields substantial gains in accuracy. Furthermore, the largest improvements in accuracy and precision come from viewing secondary structures not at the base pair level but at lower granularity/higher abstraction. This suggests that random errors affecting precision and systematic ones affecting accuracy are both reduced by this “fuzzier” view of secondary structures. Thus experimentalists who are willing to adopt a more rigorous, multilayered approach to secondary structure prediction by iterating through these levels of granularity will be much better able to capture fundamental aspects of RNA base pairing. PMID:26971529

  6. Use of Cell Viability Assay Data Improves the Prediction Accuracy of Conventional Quantitative Structure–Activity Relationship Models of Animal Carcinogenicity

    PubMed Central

    Zhu, Hao; Rusyn, Ivan; Richard, Ann; Tropsha, Alexander

    2008-01-01

    Background To develop efficient approaches for rapid evaluation of chemical toxicity and human health risk of environmental compounds, the National Toxicology Program (NTP) in collaboration with the National Center for Chemical Genomics has initiated a project on high-throughput screening (HTS) of environmental chemicals. The first HTS results for a set of 1,408 compounds tested for their effects on cell viability in six different cell lines have recently become available via PubChem. Objectives We have explored these data in terms of their utility for predicting adverse health effects of the environmental agents. Methods and results Initially, the classification k nearest neighbor (kNN) quantitative structure–activity relationship (QSAR) modeling method was applied to the HTS data only, for a curated data set of 384 compounds. The resulting models had prediction accuracies for training, test (containing 275 compounds together), and external validation (109 compounds) sets as high as 89%, 71%, and 74%, respectively. We then asked if HTS results could be of value in predicting rodent carcinogenicity. We identified 383 compounds for which data were available from both the Berkeley Carcinogenic Potency Database and NTP–HTS studies. We found that compounds classified by HTS as “actives” in at least one cell line were likely to be rodent carcinogens (sensitivity 77%); however, HTS “inactives” were far less informative (specificity 46%). Using chemical descriptors only, kNN QSAR modeling resulted in 62.3% prediction accuracy for rodent carcinogenicity applied to this data set. Importantly, the prediction accuracy of the model was significantly improved (72.7%) when chemical descriptors were augmented by HTS data, which were regarded as biological descriptors. Conclusions Our studies suggest that combining NTP–HTS profiles with conventional chemical descriptors could considerably improve the predictive power of computational approaches in toxicology. PMID:18414635

  7. Improving the Yule-Nielsen modified Neugebauer model by dot surface coverages depending on the ink superposition conditions

    NASA Astrophysics Data System (ADS)

    Hersch, Roger David; Crété, Frédérique

    2004-12-01

    Dot gain is different when dots are printed alone, printed in superposition with one ink or printed in superposition with two inks. In addition, the dot gain may also differ depending on which solid ink the considered halftone layer is superposed. In a previous research project, we developed a model for computing the effective surface coverage of a dot according to its superposition conditions. In the present contribution, we improve the Yule-Nielsen modified Neugebauer model by integrating into it our effective dot surface coverage computation model. Calibration of the reproduction curves mapping nominal to effective surface coverages in every superposition condition is carried out by fitting effective dot surfaces which minimize the sum of square differences between the measured reflection density spectra and reflection density spectra predicted according to the Yule-Nielsen modified Neugebauer model. In order to predict the reflection spectrum of a patch, its known nominal surface coverage values are converted into effective coverage values by weighting the contributions from different reproduction curves according to the weights of the contributing superposition conditions. We analyze the colorimetric prediction improvement brought by our extended dot surface coverage model for clustered-dot offset prints, thermal transfer prints and ink-jet prints. The color differences induced by the differences between measured reflection spectra and reflection spectra predicted according to the new dot surface estimation model are quantified on 729 different cyan, magenta, yellow patches covering the full color gamut. As a reference, these differences are also computed for the classical Yule-Nielsen modified spectral Neugebauer model incorporating a single halftone reproduction curve for each ink. Taking into account dot surface coverages according to different superposition conditions considerably improves the predictions of the Yule-Nielsen modified Neugebauer model. In the case of offset prints, the mean difference between predictions and measurements expressed in CIE-LAB CIE-94 ΔE94 values is reduced at 100 lpi from 1.54 to 0.90 (accuracy improvement factor: 1.7) and at 150 lpi it is reduced from 1.87 to 1.00 (accuracy improvement factor: 1.8). Similar improvements have been observed for a thermal transfer printer at 600 dpi, at lineatures of 50 and 75 lpi. In the case of an ink-jet printer at 600 dpi, the mean ΔE94 value is reduced at 75 lpi from 3.03 to 0.90 (accuracy improvement factor: 3.4) and at 100 lpi from 3.08 to 0.91 (accuracy improvement factor: 3.4).

  8. Improving the Yule-Nielsen modified Neugebauer model by dot surface coverages depending on the ink superposition conditions

    NASA Astrophysics Data System (ADS)

    Hersch, Roger David; Crete, Frederique

    2005-01-01

    Dot gain is different when dots are printed alone, printed in superposition with one ink or printed in superposition with two inks. In addition, the dot gain may also differ depending on which solid ink the considered halftone layer is superposed. In a previous research project, we developed a model for computing the effective surface coverage of a dot according to its superposition conditions. In the present contribution, we improve the Yule-Nielsen modified Neugebauer model by integrating into it our effective dot surface coverage computation model. Calibration of the reproduction curves mapping nominal to effective surface coverages in every superposition condition is carried out by fitting effective dot surfaces which minimize the sum of square differences between the measured reflection density spectra and reflection density spectra predicted according to the Yule-Nielsen modified Neugebauer model. In order to predict the reflection spectrum of a patch, its known nominal surface coverage values are converted into effective coverage values by weighting the contributions from different reproduction curves according to the weights of the contributing superposition conditions. We analyze the colorimetric prediction improvement brought by our extended dot surface coverage model for clustered-dot offset prints, thermal transfer prints and ink-jet prints. The color differences induced by the differences between measured reflection spectra and reflection spectra predicted according to the new dot surface estimation model are quantified on 729 different cyan, magenta, yellow patches covering the full color gamut. As a reference, these differences are also computed for the classical Yule-Nielsen modified spectral Neugebauer model incorporating a single halftone reproduction curve for each ink. Taking into account dot surface coverages according to different superposition conditions considerably improves the predictions of the Yule-Nielsen modified Neugebauer model. In the case of offset prints, the mean difference between predictions and measurements expressed in CIE-LAB CIE-94 ΔE94 values is reduced at 100 lpi from 1.54 to 0.90 (accuracy improvement factor: 1.7) and at 150 lpi it is reduced from 1.87 to 1.00 (accuracy improvement factor: 1.8). Similar improvements have been observed for a thermal transfer printer at 600 dpi, at lineatures of 50 and 75 lpi. In the case of an ink-jet printer at 600 dpi, the mean ΔE94 value is reduced at 75 lpi from 3.03 to 0.90 (accuracy improvement factor: 3.4) and at 100 lpi from 3.08 to 0.91 (accuracy improvement factor: 3.4).

  9. Revolutionizing Toxicity Testing For Predicting Developmental Outcomes (DNT4)

    EPA Science Inventory

    Characterizing risk from environmental chemical exposure currently requires extensive animal testing; however, alternative approaches are being researched to increase throughput of chemicals screened, decrease reliance on animal testing, and improve accuracy in predicting adverse...

  10. Orion Pad Abort 1 Flight Test: Simulation Predictions Versus Flight Data

    NASA Technical Reports Server (NTRS)

    Stillwater, Ryan Allanque; Merritt, Deborah S.

    2011-01-01

    The presentation covers the pre-flight simulation predictions of the Orion Pad Abort 1. The pre-flight simulation predictions are compared to the Orion Pad Abort 1 flight test data. Finally the flight test data is compared to the updated simulation predictions, which show a ove rall improvement in the accuracy of the simulation predictions.

  11. Improved fibrosis staging by elastometry and blood test in chronic hepatitis C.

    PubMed

    Calès, Paul; Boursier, Jérôme; Ducancelle, Alexandra; Oberti, Frédéric; Hubert, Isabelle; Hunault, Gilles; de Lédinghen, Victor; Zarski, Jean-Pierre; Salmon, Dominique; Lunel, Françoise

    2014-07-01

    Our main objective was to improve non-invasive fibrosis staging accuracy by resolving the limits of previous methods via new test combinations. Our secondary objectives were to improve staging precision, by developing a detailed fibrosis classification, and reliability (personalized accuracy) determination. All patients (729) included in the derivation population had chronic hepatitis C, liver biopsy, 6 blood tests and Fibroscan. Validation populations included 1584 patients. The most accurate combination was provided by using most markers of FibroMeter and Fibroscan results targeted for significant fibrosis, i.e. 'E-FibroMeter'. Its classification accuracy (91.7%) and precision (assessed by F difference with Metavir: 0.62 ± 0.57) were better than those of FibroMeter (84.1%, P < 0.001; 0.72 ± 0.57, P < 0.001), Fibroscan (88.2%, P = 0.011; 0.68 ± 0.57, P = 0.020), and a previous CSF-SF classification of FibroMeter + Fibroscan (86.7%, P < 0.001; 0.65 ± 0.57, P = 0.044). The accuracy for fibrosis absence (F0) was increased, e.g. from 16.0% with Fibroscan to 75.0% with E-FibroMeter (P < 0.001). Cirrhosis sensitivity was improved, e.g. E-FibroMeter: 92.7% vs. Fibroscan: 83.3%, P = 0.004. The combination improved reliability by deleting unreliable results (accuracy <50%) observed with a single test (1.2% of patients) and increasing optimal reliability (accuracy ≥85%) from 80.4% of patients with Fibroscan (accuracy: 90.9%) to 94.2% of patients with E-FibroMeter (accuracy: 92.9%), P < 0.001. The patient rate with 100% predictive values for cirrhosis by the best combination was twice (36.2%) that of the best single test (FibroMeter: 16.2%, P < 0.001). The new test combination increased: accuracy, globally and especially in patients without fibrosis, staging precision, cirrhosis prediction, and even reliability, thus offering improved fibrosis staging. © 2013 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  12. Faster Trees: Strategies for Accelerated Training and Prediction of Random Forests for Classification of Polsar Images

    NASA Astrophysics Data System (ADS)

    Hänsch, Ronny; Hellwich, Olaf

    2018-04-01

    Random Forests have continuously proven to be one of the most accurate, robust, as well as efficient methods for the supervised classification of images in general and polarimetric synthetic aperture radar data in particular. While the majority of previous work focus on improving classification accuracy, we aim for accelerating the training of the classifier as well as its usage during prediction while maintaining its accuracy. Unlike other approaches we mainly consider algorithmic changes to stay as much as possible independent of platform and programming language. The final model achieves an approximately 60 times faster training and a 500 times faster prediction, while the accuracy is only marginally decreased by roughly 1 %.

  13. Genomic selection accuracies within and between environments and small breeding groups in white spruce.

    PubMed

    Beaulieu, Jean; Doerksen, Trevor K; MacKay, John; Rainville, André; Bousquet, Jean

    2014-12-02

    Genomic selection (GS) may improve selection response over conventional pedigree-based selection if markers capture more detailed information than pedigrees in recently domesticated tree species and/or make it more cost effective. Genomic prediction accuracies using 1748 trees and 6932 SNPs representative of as many distinct gene loci were determined for growth and wood traits in white spruce, within and between environments and breeding groups (BG), each with an effective size of Ne ≈ 20. Marker subsets were also tested. Model fits and/or cross-validation (CV) prediction accuracies for ridge regression (RR) and the least absolute shrinkage and selection operator models approached those of pedigree-based models. With strong relatedness between CV sets, prediction accuracies for RR within environment and BG were high for wood (r = 0.71-0.79) and moderately high for growth (r = 0.52-0.69) traits, in line with trends in heritabilities. For both classes of traits, these accuracies achieved between 83% and 92% of those obtained with phenotypes and pedigree information. Prediction into untested environments remained moderately high for wood (r ≥ 0.61) but dropped significantly for growth (r ≥ 0.24) traits, emphasizing the need to phenotype in all test environments and model genotype-by-environment interactions for growth traits. Removing relatedness between CV sets sharply decreased prediction accuracies for all traits and subpopulations, falling near zero between BGs with no known shared ancestry. For marker subsets, similar patterns were observed but with lower prediction accuracies. Given the need for high relatedness between CV sets to obtain good prediction accuracies, we recommend to build GS models for prediction within the same breeding population only. Breeding groups could be merged to build genomic prediction models as long as the total effective population size does not exceed 50 individuals in order to obtain high prediction accuracy such as that obtained in the present study. A number of markers limited to a few hundred would not negatively impact prediction accuracies, but these could decrease more rapidly over generations. The most promising short-term approach for genomic selection would likely be the selection of superior individuals within large full-sib families vegetatively propagated to implement multiclonal forestry.

  14. COMPASS: A computational model to predict changes in MMSE scores 24-months after initial assessment of Alzheimer's disease.

    PubMed

    Zhu, Fan; Panwar, Bharat; Dodge, Hiroko H; Li, Hongdong; Hampstead, Benjamin M; Albin, Roger L; Paulson, Henry L; Guan, Yuanfang

    2016-10-05

    We present COMPASS, a COmputational Model to Predict the development of Alzheimer's diSease Spectrum, to model Alzheimer's disease (AD) progression. This was the best-performing method in recent crowdsourcing benchmark study, DREAM Alzheimer's Disease Big Data challenge to predict changes in Mini-Mental State Examination (MMSE) scores over 24-months using standardized data. In the present study, we conducted three additional analyses beyond the DREAM challenge question to improve the clinical contribution of our approach, including: (1) adding pre-validated baseline cognitive composite scores of ADNI-MEM and ADNI-EF, (2) identifying subjects with significant declines in MMSE scores, and (3) incorporating SNPs of top 10 genes connected to APOE identified from functional-relationship network. For (1) above, we significantly improved predictive accuracy, especially for the Mild Cognitive Impairment (MCI) group. For (2), we achieved an area under ROC of 0.814 in predicting significant MMSE decline: our model has 100% precision at 5% recall, and 91% accuracy at 10% recall. For (3), "genetic only" model has Pearson's correlation of 0.15 to predict progression in the MCI group. Even though addition of this limited genetic model to COMPASS did not improve prediction of progression of MCI group, the predictive ability of SNP information extended beyond well-known APOE allele.

  15. Evaluating Word Prediction Software for Students with Physical Disabilities

    ERIC Educational Resources Information Center

    Mezei, Peter; Heller, Kathryn Wolff

    2005-01-01

    Although word prediction software was originally developed for individuals with physical disabilities, little research has been conducted featuring participants with physical disabilities. Using the Co:Writer 4000 word prediction software, three participants with physical disabilities improved typing rate and spelling accuracy, and two of these…

  16. Use Of Clinical Decision Analysis In Predicting The Efficacy Of Newer Radiological Imaging Modalities: Radioscintigraphy Versus Single Photon Transverse Section Emission Computed Tomography

    NASA Astrophysics Data System (ADS)

    Prince, John R.

    1982-12-01

    Sensitivity, specificity, and predictive accuracy have been shown to be useful measures of the clinical efficacy of diagnostic tests and can be used to predict the potential improvement in diagnostic certitude resulting from the introduction of a competing technology. This communication demonstrates how the informal use of clinical decision analysis may guide health planners in the allocation of resources, purchasing decisions, and implementation of high technology. For didactic purposes the focus is on a comparison between conventional planar radioscintigraphy (RS) and single photon transverse section emission conputed tomography (SPECT). For example, positive predictive accuracy (PPA) for brain RS in a specialist hospital with a 50% disease prevalance is about 95%. SPECT should increase this predicted accuracy to 96%. In a primary care hospital with only a 15% disease prevalance the PPA is only 77% and SPECT may increase this accuracy to about 79%. Similar calculations based on published data show that marginal improvements are expected with SPECT in the liver. It is concluded that: a) The decision to purchase a high technology imaging modality such as SPECT for clinical purposes should be analyzed on an individual organ system and institutional basis. High technology may be justified in specialist hospitals but not necessarily in primary care hospitals. This is more dependent on disease prevalance than procedure volume; b) It is questionable whether SPECT imaging will be competitive with standard RS procedures. Research should concentrate on the development of different medical applications.

  17. FMRI Is a Valid Noninvasive Alternative to Wada Testing

    PubMed Central

    Binder, Jeffrey R.

    2010-01-01

    Partial removal of the anterior temporal lobe (ATL) is a highly effective surgical treatment for intractable temporal lobe epilepsy, yet roughly half of patients who undergo left ATL resection show decline in language or verbal memory function postoperatively. Two recent studies demonstrate that preoperative fMRI can predict postoperative naming and verbal memory changes in such patients. Most importantly, fMRI significantly improves the accuracy of prediction relative to other noninvasive measures used alone. Addition of language and memory lateralization data from the intracarotid amobarbital (Wada) test did not improve prediction accuracy in these studies. Thus, fMRI provides patients and practitioners with a safe, non-invasive, and well-validated tool for making better-informed decisions regarding elective surgery based on a quantitative assessment of cognitive risk. PMID:20850386

  18. Dynamic filtering improves attentional state prediction with fNIRS

    PubMed Central

    Harrivel, Angela R.; Weissman, Daniel H.; Noll, Douglas C.; Huppert, Theodore; Peltier, Scott J.

    2016-01-01

    Brain activity can predict a person’s level of engagement in an attentional task. However, estimates of brain activity are often confounded by measurement artifacts and systemic physiological noise. The optimal method for filtering this noise – thereby increasing such state prediction accuracy – remains unclear. To investigate this, we asked study participants to perform an attentional task while we monitored their brain activity with functional near infrared spectroscopy (fNIRS). We observed higher state prediction accuracy when noise in the fNIRS hemoglobin [Hb] signals was filtered with a non-stationary (adaptive) model as compared to static regression (84% ± 6% versus 72% ± 15%). PMID:27231602

  19. Mammographic density, breast cancer risk and risk prediction

    PubMed Central

    Vachon, Celine M; van Gils, Carla H; Sellers, Thomas A; Ghosh, Karthik; Pruthi, Sandhya; Brandt, Kathleen R; Pankratz, V Shane

    2007-01-01

    In this review, we examine the evidence for mammographic density as an independent risk factor for breast cancer, describe the risk prediction models that have incorporated density, and discuss the current and future implications of using mammographic density in clinical practice. Mammographic density is a consistent and strong risk factor for breast cancer in several populations and across age at mammogram. Recently, this risk factor has been added to existing breast cancer risk prediction models, increasing the discriminatory accuracy with its inclusion, albeit slightly. With validation, these models may replace the existing Gail model for clinical risk assessment. However, absolute risk estimates resulting from these improved models are still limited in their ability to characterize an individual's probability of developing cancer. Promising new measures of mammographic density, including volumetric density, which can be standardized using full-field digital mammography, will likely result in a stronger risk factor and improve accuracy of risk prediction models. PMID:18190724

  20. In-class didactic versus self-directed teaching of the probe-based confocal laser endomicroscopy (pCLE) criteria for Barrett's esophagus.

    PubMed

    Rzouq, Fadi; Vennalaganti, Prashanth; Pakseresht, Kavous; Kanakadandi, Vijay; Parasa, Sravanthi; Mathur, Sharad C; Alsop, Benjamin R; Hornung, Benjamin; Gupta, Neil; Sharma, Prateek

    2016-02-01

    Optimal teaching methods for disease recognition using probe-based confocal laser endomicroscopy (pCLE) have not been developed. Our aim was to compare in-class didactic teaching vs. self-directed teaching of Barrett's neoplasia diagnosis using pCLE. This randomized controlled trial was conducted at a tertiary academic center. Study participants with no prior pCLE experience were randomized to in-class didactic (group 1) or self-directed teaching groups (group 2). For group 1, an expert conducted a classroom teaching session using standardized educational material. Participants in group 2 were provided with the same material on an audio PowerPoint. After initial training, all participants graded an initial set of 20 pCLE videos and reviewed correct responses with the expert (group 1) or on audio PowerPoint (group 2). Finally, all participants completed interpretations of a further 40 videos. Eighteen trainees (8 medical students, 10 gastroenterology trainees) participated in the study. Overall diagnostic accuracy for neoplasia prediction by pCLE was 77 % (95 % confidence interval [CI] 74.0 % - 79.2 %); of predictions made with high confidence (53 %), the accuracy was 85 % (95 %CI 81.8 % - 87.8 %). The overall accuracy and interobserver agreement was significantly higher in group 1 than in group 2 for all predictions (80.4 % vs. 73 %; P = 0.005) and for high confidence predictions (90 % vs. 80 %; P < 0.001). Following feedback (after the initial 20 videos), the overall accuracy improved from 73 % to 79 % (P = 0.04), mainly driven by a significant improvement in group 1 (74 % to 84 %; P < 0.01). Accuracy of prediction significantly improved with time in endoscopy training (72 % students, 77 % FY1, 82 % FY2, and 85 % FY3; P = 0.003). For novice trainees, in-class didactic teaching enables significantly better recognition of the pCLE features of Barrett's esophagus than self-directed teaching. The in-class didactic group had a shorter learning curve and were able to achieve 90 % accuracy for their high confidence predictions. © Georg Thieme Verlag KG Stuttgart · New York.

  1. Comparison of baseline removal methods for laser-induced breakdown spectroscopy of geological samples

    NASA Astrophysics Data System (ADS)

    Dyar, M. Darby; Giguere, Stephen; Carey, CJ; Boucher, Thomas

    2016-12-01

    This project examines the causes, effects, and optimization of continuum removal in laser-induced breakdown spectroscopy (LIBS) to produce the best possible prediction accuracy of elemental composition in geological samples. We compare prediction accuracy resulting from several different techniques for baseline removal, including asymmetric least squares (ALS), adaptive iteratively reweighted penalized least squares (Air-PLS), fully automatic baseline correction (FABC), continuous wavelet transformation, median filtering, polynomial fitting, the iterative thresholding Dietrich method, convex hull/rubber band techniques, and a newly-developed technique for Custom baseline removal (BLR). We assess the predictive performance of these methods using partial least-squares analysis for 13 elements of geological interest, expressed as the weight percentages of SiO2, Al2O3, TiO2, FeO, MgO, CaO, Na2O, K2O, and the parts per million concentrations of Ni, Cr, Zn, Mn, and Co. We find that previously published methods for baseline subtraction generally produce equivalent prediction accuracies for major elements. When those pre-existing methods are used, automated optimization of their adjustable parameters is always necessary to wring the best predictive accuracy out of a data set; ideally, it should be done for each individual variable. The new technique of Custom BLR produces significant improvements in prediction accuracy over existing methods across varying geological data sets, instruments, and varying analytical conditions. These results also demonstrate the dual objectives of the continuum removal problem: removing a smooth underlying signal to fit individual peaks (univariate analysis) versus using feature selection to select only those channels that contribute to best prediction accuracy for multivariate analyses. Overall, the current practice of using generalized, one-method-fits-all-spectra baseline removal results in poorer predictive performance for all methods. The extra steps needed to optimize baseline removal for each predicted variable and empower multivariate techniques with the best possible input data for optimal prediction accuracy are shown to be well worth the slight increase in necessary computations and complexity.

  2. Sequence-Based Prediction of RNA-Binding Proteins Using Random Forest with Minimum Redundancy Maximum Relevance Feature Selection.

    PubMed

    Ma, Xin; Guo, Jing; Sun, Xiao

    2015-01-01

    The prediction of RNA-binding proteins is one of the most challenging problems in computation biology. Although some studies have investigated this problem, the accuracy of prediction is still not sufficient. In this study, a highly accurate method was developed to predict RNA-binding proteins from amino acid sequences using random forests with the minimum redundancy maximum relevance (mRMR) method, followed by incremental feature selection (IFS). We incorporated features of conjoint triad features and three novel features: binding propensity (BP), nonbinding propensity (NBP), and evolutionary information combined with physicochemical properties (EIPP). The results showed that these novel features have important roles in improving the performance of the predictor. Using the mRMR-IFS method, our predictor achieved the best performance (86.62% accuracy and 0.737 Matthews correlation coefficient). High prediction accuracy and successful prediction performance suggested that our method can be a useful approach to identify RNA-binding proteins from sequence information.

  3. Population variation in isotopic composition of shorebird feathers: Implications for determining molting grounds

    USGS Publications Warehouse

    Torres-Dowdall, J.; Farmer, A.H.; Bucher, E.H.; Rye, R.O.; Landis, G.

    2009-01-01

    Stable isotope analyses have revolutionized the study of migratory connectivity. However, as with all tools, their limitations must be understood in order to derive the maximum benefit of a particular application. The goal of this study was to evaluate the efficacy of stable isotopes of C, N, H, O and S for assigning known-origin feathers to the molting sites of migrant shorebird species wintering and breeding in Argentina. Specific objectives were to: 1) compare the efficacy of the technique for studying shorebird species with different migration patterns, life histories and habitat-use patterns; 2) evaluate the grouping of species with similar migration and habitat use patterns in a single analysis to potentially improve prediction accuracy; and 3) evaluate the potential gains in prediction accuracy that might be achieved from using multiple stable isotopes. The efficacy of stable isotope ratios to determine origin was found to vary with species. While one species (White-rumped Sandpiper, Calidris fuscicollis) had high levels of accuracy assigning samples to known origin (91% of samples correctly assigned), another (Collared Plover, Charadrius collaris) showed low levels of accuracy (52% of samples correctly assigned). Intra-individual variability may account for this difference in efficacy. The prediction model for three species with similar migration and habitat-use patterns performed poorly compared with the model for just one of the species (71% versus 91% of samples correctly assigned). Thus, combining multiple sympatric species may not improve model prediction accuracy. Increasing the number of stable isotopes in the analyses increased the accuracy of assigning shorebirds to their molting origin, but the best combination - involving a subset of all the isotopes analyzed - varied among species.

  4. SCPRED: accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences.

    PubMed

    Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke

    2008-05-01

    Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods.

  5. SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences

    PubMed Central

    Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke

    2008-01-01

    Background Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. Results SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. Conclusion The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods. PMID:18452616

  6. Comparing strategies for selection of low-density SNPs for imputation-mediated genomic prediction in U. S. Holsteins.

    PubMed

    He, Jun; Xu, Jiaqi; Wu, Xiao-Lin; Bauck, Stewart; Lee, Jungjae; Morota, Gota; Kachman, Stephen D; Spangler, Matthew L

    2018-04-01

    SNP chips are commonly used for genotyping animals in genomic selection but strategies for selecting low-density (LD) SNPs for imputation-mediated genomic selection have not been addressed adequately. The main purpose of the present study was to compare the performance of eight LD (6K) SNP panels, each selected by a different strategy exploiting a combination of three major factors: evenly-spaced SNPs, increased minor allele frequencies, and SNP-trait associations either for single traits independently or for all the three traits jointly. The imputation accuracies from 6K to 80K SNP genotypes were between 96.2 and 98.2%. Genomic prediction accuracies obtained using imputed 80K genotypes were between 0.817 and 0.821 for daughter pregnancy rate, between 0.838 and 0.844 for fat yield, and between 0.850 and 0.863 for milk yield. The two SNP panels optimized on the three major factors had the highest genomic prediction accuracy (0.821-0.863), and these accuracies were very close to those obtained using observed 80K genotypes (0.825-0.868). Further exploration of the underlying relationships showed that genomic prediction accuracies did not respond linearly to imputation accuracies, but were significantly affected by genotype (imputation) errors of SNPs in association with the traits to be predicted. SNPs optimal for map coverage and MAF were favorable for obtaining accurate imputation of genotypes whereas trait-associated SNPs improved genomic prediction accuracies. Thus, optimal LD SNP panels were the ones that combined both strengths. The present results have practical implications on the design of LD SNP chips for imputation-enabled genomic prediction.

  7. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Moslehi, Salim; Reddy, T. Agami; Katipamula, Srinivas

    This research was undertaken to evaluate different inverse models for predicting power output of solar photovoltaic (PV) systems under different practical scenarios. In particular, we have investigated whether PV power output prediction accuracy can be improved if module/cell temperature was measured in addition to climatic variables, and also the extent to which prediction accuracy degrades if solar irradiation is not measured on the plane of array but only on a horizontal surface. We have also investigated the significance of different independent or regressor variables, such as wind velocity and incident angle modifier in predicting PV power output and cell temperature.more » The inverse regression model forms have been evaluated both in terms of their goodness-of-fit, and their accuracy and robustness in terms of their predictive performance. Given the accuracy of the measurements, expected CV-RMSE of hourly power output prediction over the year varies between 3.2% and 8.6% when only climatic data are used. Depending on what type of measured climatic and PV performance data is available, different scenarios have been identified and the corresponding appropriate modeling pathways have been proposed. The corresponding models are to be implemented on a controller platform for optimum operational planning of microgrids and integrated energy systems.« less

  8. High accuracy prediction of beta-turns and their types using propensities and multiple alignments.

    PubMed

    Fuchs, Patrick F J; Alix, Alain J P

    2005-06-01

    We have developed a method that predicts both the presence and the type of beta-turns, using a straightforward approach based on propensities and multiple alignments. The propensities were calculated classically, but the way to use them for prediction was completely new: starting from a tetrapeptide sequence on which one wants to evaluate the presence of a beta-turn, the propensity for a given residue is modified by taking into account all the residues present in the multiple alignment at this position. The evaluation of a score is then done by weighting these propensities by the use of Position-specific score matrices generated by PSI-BLAST. The introduction of secondary structure information predicted by PSIPRED or SSPRO2 as well as taking into account the flanking residues around the tetrapeptide improved the accuracy greatly. This latter evaluated on a database of 426 reference proteins (previously used on other studies) by a sevenfold crossvalidation gave very good results with a Matthews Correlation Coefficient (MCC) of 0.42 and an overall prediction accuracy of 74.8%; this places our method among the best ones. A jackknife test was also done, which gave results within the same range. This shows that it is possible to reach neural networks accuracy with considerably less computional cost and complexity. Furthermore, propensities remain excellent descriptors of amino acid tendencies to belong to beta-turns, which can be useful for peptide or protein engineering and design. For beta-turn type prediction, we reached the best accuracy ever published in terms of MCC (except for the irregular type IV) in the range of 0.25-0.30 for types I, II, and I' and 0.13-0.15 for types VIII, II', and IV. To our knowledge, our method is the only one available on the Web that predicts types I' and II'. The accuracy evaluated on two larger databases of 547 and 823 proteins was not improved significantly. All of this was implemented into a Web server called COUDES (French acronym for: Chercher Ou Une Deviation Existe Surement), which is available at the following URL: http://bioserv.rpbs.jussieu.fr/Coudes/index.html within the new bioinformatics platform RPBS.

  9. Using Demographic Subgroup and Dummy Variable Equations to Predict College Freshman Grade Average.

    ERIC Educational Resources Information Center

    Sawyer, Richard

    1986-01-01

    This study was designed to determine whether adjustments for the differential prediction observed among sex, racial/ethnic, or age subgroups in one freshman class at a college could be used to improve prediction accuracy for these subgroups in future freshman classes. (Author/LMO)

  10. SSGP: SNP-set based genomic prediction to incorporate biological information

    USDA-ARS?s Scientific Manuscript database

    Genomic prediction has emerged as an effective approach in plant and animal breeding and in precision medicine. Much research has been devoted to an improved accuracy in genomic prediction, and one of the potential ways is to incorporate biological information. Due to the statistical and computation...

  11. Development of a two-fluid drag law for clustered particles using direct numerical simulation and validation through experiments

    NASA Astrophysics Data System (ADS)

    Abbasi Baharanchi, Ahmadreza

    This dissertation focused on development and utilization of numerical and experimental approaches to improve the CFD modeling of fluidization flow of cohesive micron size particles. The specific objectives of this research were: (1) Developing a cluster prediction mechanism applicable to Two-Fluid Modeling (TFM) of gas-solid systems (2) Developing more accurate drag models for Two-Fluid Modeling (TFM) of gas-solid fluidization flow with the presence of cohesive interparticle forces (3) using the developed model to explore the improvement of accuracy of TFM in simulation of fluidization flow of cohesive powders (4) Understanding the causes and influential factor which led to improvements and quantification of improvements (5) Gathering data from a fast fluidization flow and use these data for benchmark validations. Simulation results with two developed cluster-aware drag models showed that cluster prediction could effectively influence the results in both the first and second cluster-aware models. It was proven that improvement of accuracy of TFM modeling using three versions of the first hybrid model was significant and the best improvements were obtained by using the smallest values of the switch parameter which led to capturing the smallest chances of cluster prediction. In the case of the second hybrid model, dependence of critical model parameter on only Reynolds number led to the fact that improvement of accuracy was significant only in dense section of the fluidized bed. This finding may suggest that a more sophisticated particle resolved DNS model, which can span wide range of solid volume fraction, can be used in the formulation of the cluster-aware drag model. The results of experiment suing high speed imaging indicated the presence of particle clusters in the fluidization flow of FCC inside the riser of FIU-CFB facility. In addition, pressure data was successfully captured along the fluidization column of the facility and used as benchmark validation data for the second hybrid model developed in the present dissertation. It was shown the second hybrid model could predict the pressure data in the dense section of the fluidization column with better accuracy.

  12. Size at emergence improves accuracy of age estimates in forensically-useful beetle Creophilus maxillosus L. (Staphylinidae).

    PubMed

    Matuszewski, Szymon; Frątczak-Łagiewska, Katarzyna

    2018-02-05

    Insects colonizing human or animal cadavers may be used to estimate post-mortem interval (PMI) usually by aging larvae or pupae sampled on a crime scene. The accuracy of insect age estimates in a forensic context is reduced by large intraspecific variation in insect development time. Here we test the concept that insect size at emergence may be used to predict insect physiological age and accordingly to improve the accuracy of age estimates in forensic entomology. Using results of laboratory study on development of forensically-useful beetle Creophilus maxillosus (Linnaeus, 1758) (Staphylinidae) we demonstrate that its physiological age at emergence [i.e. thermal summation value (K) needed for emergence] fall with an increase of beetle size. In the validation study it was found that K estimated based on the adult insect size was significantly closer to the true K as compared to K from the general thermal summation model. Using beetle length at emergence as a predictor variable and male or female specific model regressing K against beetle length gave the most accurate predictions of age. These results demonstrate that size of C. maxillosus at emergence improves accuracy of age estimates in a forensic context.

  13. Accuracy of Genomic Prediction in Switchgrass (Panicum virgatum L.) Improved by Accounting for Linkage Disequilibrium

    PubMed Central

    Ramstein, Guillaume P.; Evans, Joseph; Kaeppler, Shawn M.; Mitchell, Robert B.; Vogel, Kenneth P.; Buell, C. Robin; Casler, Michael D.

    2016-01-01

    Switchgrass is a relatively high-yielding and environmentally sustainable biomass crop, but further genetic gains in biomass yield must be achieved to make it an economically viable bioenergy feedstock. Genomic selection (GS) is an attractive technology to generate rapid genetic gains in switchgrass, and meet the goals of a substantial displacement of petroleum use with biofuels in the near future. In this study, we empirically assessed prediction procedures for genomic selection in two different populations, consisting of 137 and 110 half-sib families of switchgrass, tested in two locations in the United States for three agronomic traits: dry matter yield, plant height, and heading date. Marker data were produced for the families’ parents by exome capture sequencing, generating up to 141,030 polymorphic markers with available genomic-location and annotation information. We evaluated prediction procedures that varied not only by learning schemes and prediction models, but also by the way the data were preprocessed to account for redundancy in marker information. More complex genomic prediction procedures were generally not significantly more accurate than the simplest procedure, likely due to limited population sizes. Nevertheless, a highly significant gain in prediction accuracy was achieved by transforming the marker data through a marker correlation matrix. Our results suggest that marker-data transformations and, more generally, the account of linkage disequilibrium among markers, offer valuable opportunities for improving prediction procedures in GS. Some of the achieved prediction accuracies should motivate implementation of GS in switchgrass breeding programs. PMID:26869619

  14. Analysis of energy-based algorithms for RNA secondary structure prediction

    PubMed Central

    2012-01-01

    Background RNA molecules play critical roles in the cells of organisms, including roles in gene regulation, catalysis, and synthesis of proteins. Since RNA function depends in large part on its folded structures, much effort has been invested in developing accurate methods for prediction of RNA secondary structure from the base sequence. Minimum free energy (MFE) predictions are widely used, based on nearest neighbor thermodynamic parameters of Mathews, Turner et al. or those of Andronescu et al. Some recently proposed alternatives that leverage partition function calculations find the structure with maximum expected accuracy (MEA) or pseudo-expected accuracy (pseudo-MEA) methods. Advances in prediction methods are typically benchmarked using sensitivity, positive predictive value and their harmonic mean, namely F-measure, on datasets of known reference structures. Since such benchmarks document progress in improving accuracy of computational prediction methods, it is important to understand how measures of accuracy vary as a function of the reference datasets and whether advances in algorithms or thermodynamic parameters yield statistically significant improvements. Our work advances such understanding for the MFE and (pseudo-)MEA-based methods, with respect to the latest datasets and energy parameters. Results We present three main findings. First, using the bootstrap percentile method, we show that the average F-measure accuracy of the MFE and (pseudo-)MEA-based algorithms, as measured on our largest datasets with over 2000 RNAs from diverse families, is a reliable estimate (within a 2% range with high confidence) of the accuracy of a population of RNA molecules represented by this set. However, average accuracy on smaller classes of RNAs such as a class of 89 Group I introns used previously in benchmarking algorithm accuracy is not reliable enough to draw meaningful conclusions about the relative merits of the MFE and MEA-based algorithms. Second, on our large datasets, the algorithm with best overall accuracy is a pseudo MEA-based algorithm of Hamada et al. that uses a generalized centroid estimator of base pairs. However, between MFE and other MEA-based methods, there is no clear winner in the sense that the relative accuracy of the MFE versus MEA-based algorithms changes depending on the underlying energy parameters. Third, of the four parameter sets we considered, the best accuracy for the MFE-, MEA-based, and pseudo-MEA-based methods is 0.686, 0.680, and 0.711, respectively (on a scale from 0 to 1 with 1 meaning perfect structure predictions) and is obtained with a thermodynamic parameter set obtained by Andronescu et al. called BL* (named after the Boltzmann likelihood method by which the parameters were derived). Conclusions Large datasets should be used to obtain reliable measures of the accuracy of RNA structure prediction algorithms, and average accuracies on specific classes (such as Group I introns and Transfer RNAs) should be interpreted with caution, considering the relatively small size of currently available datasets for such classes. The accuracy of the MEA-based methods is significantly higher when using the BL* parameter set of Andronescu et al. than when using the parameters of Mathews and Turner, and there is no significant difference between the accuracy of MEA-based methods and MFE when using the BL* parameters. The pseudo-MEA-based method of Hamada et al. with the BL* parameter set significantly outperforms all other MFE and MEA-based algorithms on our large data sets. PMID:22296803

  15. Analysis of energy-based algorithms for RNA secondary structure prediction.

    PubMed

    Hajiaghayi, Monir; Condon, Anne; Hoos, Holger H

    2012-02-01

    RNA molecules play critical roles in the cells of organisms, including roles in gene regulation, catalysis, and synthesis of proteins. Since RNA function depends in large part on its folded structures, much effort has been invested in developing accurate methods for prediction of RNA secondary structure from the base sequence. Minimum free energy (MFE) predictions are widely used, based on nearest neighbor thermodynamic parameters of Mathews, Turner et al. or those of Andronescu et al. Some recently proposed alternatives that leverage partition function calculations find the structure with maximum expected accuracy (MEA) or pseudo-expected accuracy (pseudo-MEA) methods. Advances in prediction methods are typically benchmarked using sensitivity, positive predictive value and their harmonic mean, namely F-measure, on datasets of known reference structures. Since such benchmarks document progress in improving accuracy of computational prediction methods, it is important to understand how measures of accuracy vary as a function of the reference datasets and whether advances in algorithms or thermodynamic parameters yield statistically significant improvements. Our work advances such understanding for the MFE and (pseudo-)MEA-based methods, with respect to the latest datasets and energy parameters. We present three main findings. First, using the bootstrap percentile method, we show that the average F-measure accuracy of the MFE and (pseudo-)MEA-based algorithms, as measured on our largest datasets with over 2000 RNAs from diverse families, is a reliable estimate (within a 2% range with high confidence) of the accuracy of a population of RNA molecules represented by this set. However, average accuracy on smaller classes of RNAs such as a class of 89 Group I introns used previously in benchmarking algorithm accuracy is not reliable enough to draw meaningful conclusions about the relative merits of the MFE and MEA-based algorithms. Second, on our large datasets, the algorithm with best overall accuracy is a pseudo MEA-based algorithm of Hamada et al. that uses a generalized centroid estimator of base pairs. However, between MFE and other MEA-based methods, there is no clear winner in the sense that the relative accuracy of the MFE versus MEA-based algorithms changes depending on the underlying energy parameters. Third, of the four parameter sets we considered, the best accuracy for the MFE-, MEA-based, and pseudo-MEA-based methods is 0.686, 0.680, and 0.711, respectively (on a scale from 0 to 1 with 1 meaning perfect structure predictions) and is obtained with a thermodynamic parameter set obtained by Andronescu et al. called BL* (named after the Boltzmann likelihood method by which the parameters were derived). Large datasets should be used to obtain reliable measures of the accuracy of RNA structure prediction algorithms, and average accuracies on specific classes (such as Group I introns and Transfer RNAs) should be interpreted with caution, considering the relatively small size of currently available datasets for such classes. The accuracy of the MEA-based methods is significantly higher when using the BL* parameter set of Andronescu et al. than when using the parameters of Mathews and Turner, and there is no significant difference between the accuracy of MEA-based methods and MFE when using the BL* parameters. The pseudo-MEA-based method of Hamada et al. with the BL* parameter set significantly outperforms all other MFE and MEA-based algorithms on our large data sets.

  16. Learning Linear Spatial-Numeric Associations Improves Accuracy of Memory for Numbers

    PubMed Central

    Thompson, Clarissa A.; Opfer, John E.

    2016-01-01

    Memory for numbers improves with age and experience. One potential source of improvement is a logarithmic-to-linear shift in children’s representations of magnitude. To test this, Kindergartners and second graders estimated the location of numbers on number lines and recalled numbers presented in vignettes (Study 1). Accuracy at number-line estimation predicted memory accuracy on a numerical recall task after controlling for the effect of age and ability to approximately order magnitudes (mapper status). To test more directly whether linear numeric magnitude representations caused improvements in memory, half of children were given feedback on their number-line estimates (Study 2). As expected, learning linear representations was again linked to memory for numerical information even after controlling for age and mapper status. These results suggest that linear representations of numerical magnitude may be a causal factor in development of numeric recall accuracy. PMID:26834688

  17. Learning Linear Spatial-Numeric Associations Improves Accuracy of Memory for Numbers.

    PubMed

    Thompson, Clarissa A; Opfer, John E

    2016-01-01

    Memory for numbers improves with age and experience. One potential source of improvement is a logarithmic-to-linear shift in children's representations of magnitude. To test this, Kindergartners and second graders estimated the location of numbers on number lines and recalled numbers presented in vignettes (Study 1). Accuracy at number-line estimation predicted memory accuracy on a numerical recall task after controlling for the effect of age and ability to approximately order magnitudes (mapper status). To test more directly whether linear numeric magnitude representations caused improvements in memory, half of children were given feedback on their number-line estimates (Study 2). As expected, learning linear representations was again linked to memory for numerical information even after controlling for age and mapper status. These results suggest that linear representations of numerical magnitude may be a causal factor in development of numeric recall accuracy.

  18. Partition dataset according to amino acid type improves the prediction of deleterious non-synonymous SNPs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yang, Jing; Li, Yuan-Yuan; Shanghai Center for Bioinformation Technology, Shanghai 200235

    2012-03-02

    Highlights: Black-Right-Pointing-Pointer Proper dataset partition can improve the prediction of deleterious nsSNPs. Black-Right-Pointing-Pointer Partition according to original residue type at nsSNP is a good criterion. Black-Right-Pointing-Pointer Similar strategy is supposed promising in other machine learning problems. -- Abstract: Many non-synonymous SNPs (nsSNPs) are associated with diseases, and numerous machine learning methods have been applied to train classifiers for sorting disease-associated nsSNPs from neutral ones. The continuously accumulated nsSNP data allows us to further explore better prediction approaches. In this work, we partitioned the training data into 20 subsets according to either original or substituted amino acid type at the nsSNPmore » site. Using support vector machine (SVM), training classification models on each subset resulted in an overall accuracy of 76.3% or 74.9% depending on the two different partition criteria, while training on the whole dataset obtained an accuracy of only 72.6%. Moreover, the dataset was also randomly divided into 20 subsets, but the corresponding accuracy was only 73.2%. Our results demonstrated that partitioning the whole training dataset into subsets properly, i.e., according to the residue type at the nsSNP site, will improve the performance of the trained classifiers significantly, which should be valuable in developing better tools for predicting the disease-association of nsSNPs.« less

  19. A general strategy for performing temperature-programming in high performance liquid chromatography--further improvements in the accuracy of retention time predictions of segmented temperature gradients.

    PubMed

    Wiese, Steffen; Teutenberg, Thorsten; Schmidt, Torsten C

    2012-01-27

    In the present work it is shown that the linear elution strength (LES) model which was adapted from temperature-programming gas chromatography (GC) can also be employed for systematic method development in high-temperature liquid chromatography (HT-HPLC). The ability to predict isothermal retention times based on temperature-gradient as well as isothermal input data was investigated. For a small temperature interval of ΔT=40°C, both approaches result in very similar predictions. Average relative errors of predicted retention times of 2.7% and 1.9% were observed for simulations based on isothermal and temperature-gradient measurements, respectively. Concurrently, it was investigated whether the accuracy of retention time predictions of segmented temperature gradients can be further improved by temperature dependent calculation of the parameter S(T) of the LES relationship. It was found that the accuracy of retention time predictions of multi-step temperature gradients can be improved to around 1.5%, if S(T) was also calculated temperature dependent. The adjusted experimental design making use of four temperature-gradient measurements was applied for systematic method development of selected food additives by high-temperature liquid chromatography. Method development was performed within a temperature interval from 40°C to 180°C using water as mobile phase. Two separation methods were established where selected food additives were baseline separated. In addition, a good agreement between simulation and experiment was observed, because an average relative error of predicted retention times of complex segmented temperature gradients less than 5% was observed. Finally, a schedule of recommendations to assist the practitioner during systematic method development in high-temperature liquid chromatography was established. Copyright © 2011 Elsevier B.V. All rights reserved.

  20. Clinical time series prediction: Toward a hierarchical dynamical system framework.

    PubMed

    Liu, Zitao; Hauskrecht, Milos

    2015-09-01

    Developing machine learning and data mining algorithms for building temporal models of clinical time series is important for understanding of the patient condition, the dynamics of a disease, effect of various patient management interventions and clinical decision making. In this work, we propose and develop a novel hierarchical framework for modeling clinical time series data of varied length and with irregularly sampled observations. Our hierarchical dynamical system framework for modeling clinical time series combines advantages of the two temporal modeling approaches: the linear dynamical system and the Gaussian process. We model the irregularly sampled clinical time series by using multiple Gaussian process sequences in the lower level of our hierarchical framework and capture the transitions between Gaussian processes by utilizing the linear dynamical system. The experiments are conducted on the complete blood count (CBC) panel data of 1000 post-surgical cardiac patients during their hospitalization. Our framework is evaluated and compared to multiple baseline approaches in terms of the mean absolute prediction error and the absolute percentage error. We tested our framework by first learning the time series model from data for the patients in the training set, and then using it to predict future time series values for the patients in the test set. We show that our model outperforms multiple existing models in terms of its predictive accuracy. Our method achieved a 3.13% average prediction accuracy improvement on ten CBC lab time series when it was compared against the best performing baseline. A 5.25% average accuracy improvement was observed when only short-term predictions were considered. A new hierarchical dynamical system framework that lets us model irregularly sampled time series data is a promising new direction for modeling clinical time series and for improving their predictive performance. Copyright © 2014 Elsevier B.V. All rights reserved.

  1. Health-based risk adjustment: is inpatient and outpatient diagnostic information sufficient?

    PubMed

    Lamers, L M

    Adequate risk adjustment is critical to the success of market-oriented health care reforms in many countries. Currently used risk adjusters based on demographic and diagnostic cost groups (DCGs) do not reflect expected costs accurately. This study examines the simultaneous predictive accuracy of inpatient and outpatient morbidity measures and prior costs. DCGs, pharmacy cost groups (PCGs), and prior year's costs improve the predictive accuracy of the demographic model substantially. DCGs and PCGs seem complementary in their ability to predict future costs. However, this study shows that the combination of DCGs and PCGs still leaves room for cream skimming.

  2. Evaluation of Data-Driven Models for Predicting Solar Photovoltaics Power Output

    DOE PAGES

    Moslehi, Salim; Reddy, T. Agami; Katipamula, Srinivas

    2017-09-10

    This research was undertaken to evaluate different inverse models for predicting power output of solar photovoltaic (PV) systems under different practical scenarios. In particular, we have investigated whether PV power output prediction accuracy can be improved if module/cell temperature was measured in addition to climatic variables, and also the extent to which prediction accuracy degrades if solar irradiation is not measured on the plane of array but only on a horizontal surface. We have also investigated the significance of different independent or regressor variables, such as wind velocity and incident angle modifier in predicting PV power output and cell temperature.more » The inverse regression model forms have been evaluated both in terms of their goodness-of-fit, and their accuracy and robustness in terms of their predictive performance. Given the accuracy of the measurements, expected CV-RMSE of hourly power output prediction over the year varies between 3.2% and 8.6% when only climatic data are used. Depending on what type of measured climatic and PV performance data is available, different scenarios have been identified and the corresponding appropriate modeling pathways have been proposed. The corresponding models are to be implemented on a controller platform for optimum operational planning of microgrids and integrated energy systems.« less

  3. Novel applications of multitask learning and multiple output regression to multiple genetic trait prediction.

    PubMed

    He, Dan; Kuhn, David; Parida, Laxmi

    2016-06-15

    Given a set of biallelic molecular markers, such as SNPs, with genotype values encoded numerically on a collection of plant, animal or human samples, the goal of genetic trait prediction is to predict the quantitative trait values by simultaneously modeling all marker effects. Genetic trait prediction is usually represented as linear regression models. In many cases, for the same set of samples and markers, multiple traits are observed. Some of these traits might be correlated with each other. Therefore, modeling all the multiple traits together may improve the prediction accuracy. In this work, we view the multitrait prediction problem from a machine learning angle: as either a multitask learning problem or a multiple output regression problem, depending on whether different traits share the same genotype matrix or not. We then adapted multitask learning algorithms and multiple output regression algorithms to solve the multitrait prediction problem. We proposed a few strategies to improve the least square error of the prediction from these algorithms. Our experiments show that modeling multiple traits together could improve the prediction accuracy for correlated traits. The programs we used are either public or directly from the referred authors, such as MALSAR (http://www.public.asu.edu/~jye02/Software/MALSAR/) package. The Avocado data set has not been published yet and is available upon request. dhe@us.ibm.com. © The Author 2016. Published by Oxford University Press.

  4. Modeling Soil Organic Carbon at Regional Scale by Combining Multi-Spectral Images with Laboratory Spectra.

    PubMed

    Peng, Yi; Xiong, Xiong; Adhikari, Kabindra; Knadel, Maria; Grunwald, Sabine; Greve, Mogens Humlekrog

    2015-01-01

    There is a great challenge in combining soil proximal spectra and remote sensing spectra to improve the accuracy of soil organic carbon (SOC) models. This is primarily because mixing of spectral data from different sources and technologies to improve soil models is still in its infancy. The first objective of this study was to integrate information of SOC derived from visible near-infrared reflectance (Vis-NIR) spectra in the laboratory with remote sensing (RS) images to improve predictions of topsoil SOC in the Skjern river catchment, Denmark. The second objective was to improve SOC prediction results by separately modeling uplands and wetlands. A total of 328 topsoil samples were collected and analyzed for SOC. Satellite Pour l'Observation de la Terre (SPOT5), Landsat Data Continuity Mission (Landsat 8) images, laboratory Vis-NIR and other ancillary environmental data including terrain parameters and soil maps were compiled to predict topsoil SOC using Cubist regression and Bayesian kriging. The results showed that the model developed from RS data, ancillary environmental data and laboratory spectral data yielded a lower root mean square error (RMSE) (2.8%) and higher R2 (0.59) than the model developed from only RS data and ancillary environmental data (RMSE: 3.6%, R2: 0.46). Plant-available water (PAW) was the most important predictor for all the models because of its close relationship with soil organic matter content. Moreover, vegetation indices, such as the Normalized Difference Vegetation Index (NDVI) and Enhanced Vegetation Index (EVI), were very important predictors in SOC spatial models. Furthermore, the 'upland model' was able to more accurately predict SOC compared with the 'upland & wetland model'. However, the separately calibrated 'upland and wetland model' did not improve the prediction accuracy for wetland sites, since it was not possible to adequately discriminate the vegetation in the RS summer images. We conclude that laboratory Vis-NIR spectroscopy adds critical information that significantly improves the prediction accuracy of SOC compared to using RS data alone. We recommend the incorporation of laboratory spectra with RS data and other environmental data to improve soil spatial modeling and digital soil mapping (DSM).

  5. Modeling Soil Organic Carbon at Regional Scale by Combining Multi-Spectral Images with Laboratory Spectra

    PubMed Central

    Peng, Yi; Xiong, Xiong; Adhikari, Kabindra; Knadel, Maria; Grunwald, Sabine; Greve, Mogens Humlekrog

    2015-01-01

    There is a great challenge in combining soil proximal spectra and remote sensing spectra to improve the accuracy of soil organic carbon (SOC) models. This is primarily because mixing of spectral data from different sources and technologies to improve soil models is still in its infancy. The first objective of this study was to integrate information of SOC derived from visible near-infrared reflectance (Vis-NIR) spectra in the laboratory with remote sensing (RS) images to improve predictions of topsoil SOC in the Skjern river catchment, Denmark. The second objective was to improve SOC prediction results by separately modeling uplands and wetlands. A total of 328 topsoil samples were collected and analyzed for SOC. Satellite Pour l’Observation de la Terre (SPOT5), Landsat Data Continuity Mission (Landsat 8) images, laboratory Vis-NIR and other ancillary environmental data including terrain parameters and soil maps were compiled to predict topsoil SOC using Cubist regression and Bayesian kriging. The results showed that the model developed from RS data, ancillary environmental data and laboratory spectral data yielded a lower root mean square error (RMSE) (2.8%) and higher R2 (0.59) than the model developed from only RS data and ancillary environmental data (RMSE: 3.6%, R2: 0.46). Plant-available water (PAW) was the most important predictor for all the models because of its close relationship with soil organic matter content. Moreover, vegetation indices, such as the Normalized Difference Vegetation Index (NDVI) and Enhanced Vegetation Index (EVI), were very important predictors in SOC spatial models. Furthermore, the ‘upland model’ was able to more accurately predict SOC compared with the ‘upland & wetland model’. However, the separately calibrated ‘upland and wetland model’ did not improve the prediction accuracy for wetland sites, since it was not possible to adequately discriminate the vegetation in the RS summer images. We conclude that laboratory Vis-NIR spectroscopy adds critical information that significantly improves the prediction accuracy of SOC compared to using RS data alone. We recommend the incorporation of laboratory spectra with RS data and other environmental data to improve soil spatial modeling and digital soil mapping (DSM). PMID:26555071

  6. Comparison of Models and Whole-Genome Profiling Approaches for Genomic-Enabled Prediction of Septoria Tritici Blotch, Stagonospora Nodorum Blotch, and Tan Spot Resistance in Wheat.

    PubMed

    Juliana, Philomin; Singh, Ravi P; Singh, Pawan K; Crossa, Jose; Rutkoski, Jessica E; Poland, Jesse A; Bergstrom, Gary C; Sorrells, Mark E

    2017-07-01

    The leaf spotting diseases in wheat that include Septoria tritici blotch (STB) caused by , Stagonospora nodorum blotch (SNB) caused by , and tan spot (TS) caused by pose challenges to breeding programs in selecting for resistance. A promising approach that could enable selection prior to phenotyping is genomic selection that uses genome-wide markers to estimate breeding values (BVs) for quantitative traits. To evaluate this approach for seedling and/or adult plant resistance (APR) to STB, SNB, and TS, we compared the predictive ability of least-squares (LS) approach with genomic-enabled prediction models including genomic best linear unbiased predictor (GBLUP), Bayesian ridge regression (BRR), Bayes A (BA), Bayes B (BB), Bayes Cπ (BC), Bayesian least absolute shrinkage and selection operator (BL), and reproducing kernel Hilbert spaces markers (RKHS-M), a pedigree-based model (RKHS-P) and RKHS markers and pedigree (RKHS-MP). We observed that LS gave the lowest prediction accuracies and RKHS-MP, the highest. The genomic-enabled prediction models and RKHS-P gave similar accuracies. The increase in accuracy using genomic prediction models over LS was 48%. The mean genomic prediction accuracies were 0.45 for STB (APR), 0.55 for SNB (seedling), 0.66 for TS (seedling) and 0.48 for TS (APR). We also compared markers from two whole-genome profiling approaches: genotyping by sequencing (GBS) and diversity arrays technology sequencing (DArTseq) for prediction. While, GBS markers performed slightly better than DArTseq, combining markers from the two approaches did not improve accuracies. We conclude that implementing GS in breeding for these diseases would help to achieve higher accuracies and rapid gains from selection. Copyright © 2017 Crop Science Society of America.

  7. Hydrometeorological model for streamflow prediction

    USGS Publications Warehouse

    Tangborn, Wendell V.

    1979-01-01

    The hydrometeorological model described in this manual was developed to predict seasonal streamflow from water in storage in a basin using streamflow and precipitation data. The model, as described, applies specifically to the Skokomish, Nisqually, and Cowlitz Rivers, in Washington State, and more generally to streams in other regions that derive seasonal runoff from melting snow. Thus the techniques demonstrated for these three drainage basins can be used as a guide for applying this method to other streams. Input to the computer program consists of daily averages of gaged runoff of these streams, and daily values of precipitation collected at Longmire, Kid Valley, and Cushman Dam. Predictions are based on estimates of the absolute storage of water, predominately as snow: storage is approximately equal to basin precipitation less observed runoff. A pre-forecast test season is used to revise the storage estimate and improve the prediction accuracy. To obtain maximum prediction accuracy for operational applications with this model , a systematic evaluation of several hydrologic and meteorologic variables is first necessary. Six input options to the computer program that control prediction accuracy are developed and demonstrated. Predictions of streamflow can be made at any time and for any length of season, although accuracy is usually poor for early-season predictions (before December 1) or for short seasons (less than 15 days). The coefficient of prediction (CP), the chief measure of accuracy used in this manual, approaches zero during the late autumn and early winter seasons and reaches a maximum of about 0.85 during the spring snowmelt season. (Kosco-USGS)

  8. Research on cardiovascular disease prediction based on distance metric learning

    NASA Astrophysics Data System (ADS)

    Ni, Zhuang; Liu, Kui; Kang, Guixia

    2018-04-01

    Distance metric learning algorithm has been widely applied to medical diagnosis and exhibited its strengths in classification problems. The k-nearest neighbour (KNN) is an efficient method which treats each feature equally. The large margin nearest neighbour classification (LMNN) improves the accuracy of KNN by learning a global distance metric, which did not consider the locality of data distributions. In this paper, we propose a new distance metric algorithm adopting cosine metric and LMNN named COS-SUBLMNN which takes more care about local feature of data to overcome the shortage of LMNN and improve the classification accuracy. The proposed methodology is verified on CVDs patient vector derived from real-world medical data. The Experimental results show that our method provides higher accuracy than KNN and LMNN did, which demonstrates the effectiveness of the Risk predictive model of CVDs based on COS-SUBLMNN.

  9. Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests

    PubMed Central

    2011-01-01

    Background Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI), but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests) were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression) in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test. Results Press' Q test showed that all classifiers performed better than chance alone (p < 0.05). Support Vector Machines showed the larger overall classification accuracy (Median (Me) = 0.76) an area under the ROC (Me = 0.90). However this method showed high specificity (Me = 1.0) but low sensitivity (Me = 0.3). Random Forest ranked second in overall accuracy (Me = 0.73) with high area under the ROC (Me = 0.73) specificity (Me = 0.73) and sensitivity (Me = 0.64). Linear Discriminant Analysis also showed acceptable overall accuracy (Me = 0.66), with acceptable area under the ROC (Me = 0.72) specificity (Me = 0.66) and sensitivity (Me = 0.64). The remaining classifiers showed overall classification accuracy above a median value of 0.63, but for most sensitivity was around or even lower than a median value of 0.5. Conclusions When taking into account sensitivity, specificity and overall classification accuracy Random Forests and Linear Discriminant analysis rank first among all the classifiers tested in prediction of dementia using several neuropsychological tests. These methods may be used to improve accuracy, sensitivity and specificity of Dementia predictions from neuropsychological testing. PMID:21849043

  10. Voxel inversion of airborne electromagnetic data for improved groundwater model construction and prediction accuracy

    NASA Astrophysics Data System (ADS)

    Kruse Christensen, Nikolaj; Ferre, Ty Paul A.; Fiandaca, Gianluca; Christensen, Steen

    2017-03-01

    We present a workflow for efficient construction and calibration of large-scale groundwater models that includes the integration of airborne electromagnetic (AEM) data and hydrological data. In the first step, the AEM data are inverted to form a 3-D geophysical model. In the second step, the 3-D geophysical model is translated, using a spatially dependent petrophysical relationship, to form a 3-D hydraulic conductivity distribution. The geophysical models and the hydrological data are used to estimate spatially distributed petrophysical shape factors. The shape factors primarily work as translators between resistivity and hydraulic conductivity, but they can also compensate for structural defects in the geophysical model. The method is demonstrated for a synthetic case study with sharp transitions among various types of deposits. Besides demonstrating the methodology, we demonstrate the importance of using geophysical regularization constraints that conform well to the depositional environment. This is done by inverting the AEM data using either smoothness (smooth) constraints or minimum gradient support (sharp) constraints, where the use of sharp constraints conforms best to the environment. The dependency on AEM data quality is also tested by inverting the geophysical model using data corrupted with four different levels of background noise. Subsequently, the geophysical models are used to construct competing groundwater models for which the shape factors are calibrated. The performance of each groundwater model is tested with respect to four types of prediction that are beyond the calibration base: a pumping well's recharge area and groundwater age, respectively, are predicted by applying the same stress as for the hydrologic model calibration; and head and stream discharge are predicted for a different stress situation. As expected, in this case the predictive capability of a groundwater model is better when it is based on a sharp geophysical model instead of a smoothness constraint. This is true for predictions of recharge area, head change, and stream discharge, while we find no improvement for prediction of groundwater age. Furthermore, we show that the model prediction accuracy improves with AEM data quality for predictions of recharge area, head change, and stream discharge, while there appears to be no accuracy improvement for the prediction of groundwater age.

  11. Accuracy of active chirp linearization for broadband frequency modulated continuous wave ladar.

    PubMed

    Barber, Zeb W; Babbitt, Wm Randall; Kaylor, Brant; Reibel, Randy R; Roos, Peter A

    2010-01-10

    As the bandwidth and linearity of frequency modulated continuous wave chirp ladar increase, the resulting range resolution, precisions, and accuracy are improved correspondingly. An analysis of a very broadband (several THz) and linear (<1 ppm) chirped ladar system based on active chirp linearization is presented. Residual chirp nonlinearity and material dispersion are analyzed as to their effect on the dynamic range, precision, and accuracy of the system. Measurement precision and accuracy approaching the part per billion level is predicted.

  12. The novel application of artificial neural network on bioelectrical impedance analysis to assess the body composition in elderly

    PubMed Central

    2013-01-01

    Background This study aims to improve accuracy of Bioelectrical Impedance Analysis (BIA) prediction equations for estimating fat free mass (FFM) of the elderly by using non-linear Back Propagation Artificial Neural Network (BP-ANN) model and to compare the predictive accuracy with the linear regression model by using energy dual X-ray absorptiometry (DXA) as reference method. Methods A total of 88 Taiwanese elderly adults were recruited in this study as subjects. Linear regression equations and BP-ANN prediction equation were developed using impedances and other anthropometrics for predicting the reference FFM measured by DXA (FFMDXA) in 36 male and 26 female Taiwanese elderly adults. The FFM estimated by BIA prediction equations using traditional linear regression model (FFMLR) and BP-ANN model (FFMANN) were compared to the FFMDXA. The measuring results of an additional 26 elderly adults were used to validate than accuracy of the predictive models. Results The results showed the significant predictors were impedance, gender, age, height and weight in developed FFMLR linear model (LR) for predicting FFM (coefficient of determination, r2 = 0.940; standard error of estimate (SEE) = 2.729 kg; root mean square error (RMSE) = 2.571kg, P < 0.001). The above predictors were set as the variables of the input layer by using five neurons in the BP-ANN model (r2 = 0.987 with a SD = 1.192 kg and relatively lower RMSE = 1.183 kg), which had greater (improved) accuracy for estimating FFM when compared with linear model. The results showed a better agreement existed between FFMANN and FFMDXA than that between FFMLR and FFMDXA. Conclusion When compared the performance of developed prediction equations for estimating reference FFMDXA, the linear model has lower r2 with a larger SD in predictive results than that of BP-ANN model, which indicated ANN model is more suitable for estimating FFM. PMID:23388042

  13. Improved Prediction of Blood-Brain Barrier Permeability Through Machine Learning with Combined Use of Molecular Property-Based Descriptors and Fingerprints.

    PubMed

    Yuan, Yaxia; Zheng, Fang; Zhan, Chang-Guo

    2018-03-21

    Blood-brain barrier (BBB) permeability of a compound determines whether the compound can effectively enter the brain. It is an essential property which must be accounted for in drug discovery with a target in the brain. Several computational methods have been used to predict the BBB permeability. In particular, support vector machine (SVM), which is a kernel-based machine learning method, has been used popularly in this field. For SVM training and prediction, the compounds are characterized by molecular descriptors. Some SVM models were based on the use of molecular property-based descriptors (including 1D, 2D, and 3D descriptors) or fragment-based descriptors (known as the fingerprints of a molecule). The selection of descriptors is critical for the performance of a SVM model. In this study, we aimed to develop a generally applicable new SVM model by combining all of the features of the molecular property-based descriptors and fingerprints to improve the accuracy for the BBB permeability prediction. The results indicate that our SVM model has improved accuracy compared to the currently available models of the BBB permeability prediction.

  14. Ensemble-based prediction of RNA secondary structures.

    PubMed

    Aghaeepour, Nima; Hoos, Holger H

    2013-04-24

    Accurate structure prediction methods play an important role for the understanding of RNA function. Energy-based, pseudoknot-free secondary structure prediction is one of the most widely used and versatile approaches, and improved methods for this task have received much attention over the past five years. Despite the impressive progress that as been achieved in this area, existing evaluations of the prediction accuracy achieved by various algorithms do not provide a comprehensive, statistically sound assessment. Furthermore, while there is increasing evidence that no prediction algorithm consistently outperforms all others, no work has been done to exploit the complementary strengths of multiple approaches. In this work, we present two contributions to the area of RNA secondary structure prediction. Firstly, we use state-of-the-art, resampling-based statistical methods together with a previously published and increasingly widely used dataset of high-quality RNA structures to conduct a comprehensive evaluation of existing RNA secondary structure prediction procedures. The results from this evaluation clarify the performance relationship between ten well-known existing energy-based pseudoknot-free RNA secondary structure prediction methods and clearly demonstrate the progress that has been achieved in recent years. Secondly, we introduce AveRNA, a generic and powerful method for combining a set of existing secondary structure prediction procedures into an ensemble-based method that achieves significantly higher prediction accuracies than obtained from any of its component procedures. Our new, ensemble-based method, AveRNA, improves the state of the art for energy-based, pseudoknot-free RNA secondary structure prediction by exploiting the complementary strengths of multiple existing prediction procedures, as demonstrated using a state-of-the-art statistical resampling approach. In addition, AveRNA allows an intuitive and effective control of the trade-off between false negative and false positive base pair predictions. Finally, AveRNA can make use of arbitrary sets of secondary structure prediction procedures and can therefore be used to leverage improvements in prediction accuracy offered by algorithms and energy models developed in the future. Our data, MATLAB software and a web-based version of AveRNA are publicly available at http://www.cs.ubc.ca/labs/beta/Software/AveRNA.

  15. Presymptomatic electrophysiological tests predict clinical onset and survival in SOD1(G93A) ALS mice.

    PubMed

    Mancuso, Renzo; Osta, Rosario; Navarro, Xavier

    2014-12-01

    We assessed the predictive value of electrophysiological tests as a marker of clinical disease onset and survival in superoxide-dismutase 1 (SOD1)(G93A) mice. We evaluated the accuracy of electrophysiological tests in differentiating transgenic versus wild-type mice. We made a correlation analysis of electrophysiological parameters and the onset of symptoms, survival, and number of spinal motoneurons. Presymptomatic electrophysiological tests show great accuracy in differentiating transgenic versus wild-type mice, with the most sensitive parameter being the tibialis anterior compound muscle action potential (CMAP) amplitude. The CMAP amplitude at age 10 weeks correlated significantly with clinical disease onset and survival. Electrophysiological tests increased their survival prediction accuracy when evaluated at later stages of the disease and also predicted the amount of lumbar spinal motoneuron preservation. Electrophysiological tests predict clinical disease onset, survival, and spinal motoneuron preservation in SOD1(G93A) mice. This is a methodological improvement for preclinical studies. © 2014 Wiley Periodicals, Inc.

  16. Retweets as a Predictor of Relationships among Users on Social Media.

    PubMed

    Tsugawa, Sho; Kito, Kosuke

    2017-01-01

    Link prediction is the problem of detecting missing links or predicting future link formation in a network. Application of link prediction to social media, such as Twitter and Facebook, is useful both for developing novel services and for sociological analyses. While most existing research on link prediction uses only the social network topology for the prediction, in social media, records of user activities such as posting, replying, and reposting are available. These records are expected to reflect user interest, and so incorporating them should improve link prediction. However, research into link prediction using the records of user activities is still in its infancy, and the effectiveness of such records for link prediction has not been fully explored. In this study, we focus in particular on records of reposting as a promising source that could be useful for link prediction, and investigate their effectiveness for link prediction on the popular social media platform Twitter. Our results show that (1) the prediction accuracy of techniques using reposting records is higher than that of popular topology-based techniques such as common neighbors and resource allocation for actively retweeting users, (2) the accuracy of link prediction techniques that use network topology alone can be improved by incorporating reposting records.

  17. Retweets as a Predictor of Relationships among Users on Social Media

    PubMed Central

    Kito, Kosuke

    2017-01-01

    Link prediction is the problem of detecting missing links or predicting future link formation in a network. Application of link prediction to social media, such as Twitter and Facebook, is useful both for developing novel services and for sociological analyses. While most existing research on link prediction uses only the social network topology for the prediction, in social media, records of user activities such as posting, replying, and reposting are available. These records are expected to reflect user interest, and so incorporating them should improve link prediction. However, research into link prediction using the records of user activities is still in its infancy, and the effectiveness of such records for link prediction has not been fully explored. In this study, we focus in particular on records of reposting as a promising source that could be useful for link prediction, and investigate their effectiveness for link prediction on the popular social media platform Twitter. Our results show that (1) the prediction accuracy of techniques using reposting records is higher than that of popular topology-based techniques such as common neighbors and resource allocation for actively retweeting users, (2) the accuracy of link prediction techniques that use network topology alone can be improved by incorporating reposting records. PMID:28107489

  18. CSmetaPred: a consensus method for prediction of catalytic residues.

    PubMed

    Choudhary, Preeti; Kumar, Shailesh; Bachhawat, Anand Kumar; Pandit, Shashi Bhushan

    2017-12-22

    Knowledge of catalytic residues can play an essential role in elucidating mechanistic details of an enzyme. However, experimental identification of catalytic residues is a tedious and time-consuming task, which can be expedited by computational predictions. Despite significant development in active-site prediction methods, one of the remaining issues is ranked positions of putative catalytic residues among all ranked residues. In order to improve ranking of catalytic residues and their prediction accuracy, we have developed a meta-approach based method CSmetaPred. In this approach, residues are ranked based on the mean of normalized residue scores derived from four well-known catalytic residue predictors. The mean residue score of CSmetaPred is combined with predicted pocket information to improve prediction performance in meta-predictor, CSmetaPred_poc. Both meta-predictors are evaluated on two comprehensive benchmark datasets and three legacy datasets using Receiver Operating Characteristic (ROC) and Precision Recall (PR) curves. The visual and quantitative analysis of ROC and PR curves shows that meta-predictors outperform their constituent methods and CSmetaPred_poc is the best of evaluated methods. For instance, on CSAMAC dataset CSmetaPred_poc (CSmetaPred) achieves highest Mean Average Specificity (MAS), a scalar measure for ROC curve, of 0.97 (0.96). Importantly, median predicted rank of catalytic residues is the lowest (best) for CSmetaPred_poc. Considering residues ranked ≤20 classified as true positive in binary classification, CSmetaPred_poc achieves prediction accuracy of 0.94 on CSAMAC dataset. Moreover, on the same dataset CSmetaPred_poc predicts all catalytic residues within top 20 ranks for ~73% of enzymes. Furthermore, benchmarking of prediction on comparative modelled structures showed that models result in better prediction than only sequence based predictions. These analyses suggest that CSmetaPred_poc is able to rank putative catalytic residues at lower (better) ranked positions, which can facilitate and expedite their experimental characterization. The benchmarking studies showed that employing meta-approach in combining residue-level scores derived from well-known catalytic residue predictors can improve prediction accuracy as well as provide improved ranked positions of known catalytic residues. Hence, such predictions can assist experimentalist to prioritize residues for mutational studies in their efforts to characterize catalytic residues. Both meta-predictors are available as webserver at: http://14.139.227.206/csmetapred/ .

  19. Integrative Approaches for Predicting in vivo Effects of Chemicals from their Structural Descriptors and the Results of Short-term Biological Assays

    PubMed Central

    Low, Yen S.; Sedykh, Alexander; Rusyn, Ivan; Tropsha, Alexander

    2017-01-01

    Cheminformatics approaches such as Quantitative Structure Activity Relationship (QSAR) modeling have been used traditionally for predicting chemical toxicity. In recent years, high throughput biological assays have been increasingly employed to elucidate mechanisms of chemical toxicity and predict toxic effects of chemicals in vivo. The data generated in such assays can be considered as biological descriptors of chemicals that can be combined with molecular descriptors and employed in QSAR modeling to improve the accuracy of toxicity prediction. In this review, we discuss several approaches for integrating chemical and biological data for predicting biological effects of chemicals in vivo and compare their performance across several data sets. We conclude that while no method consistently shows superior performance, the integrative approaches rank consistently among the best yet offer enriched interpretation of models over those built with either chemical or biological data alone. We discuss the outlook for such interdisciplinary methods and offer recommendations to further improve the accuracy and interpretability of computational models that predict chemical toxicity. PMID:24805064

  20. Effects of field plot size on prediction accuracy of aboveground biomass in airborne laser scanning-assisted inventories in tropical rain forests of Tanzania.

    PubMed

    Mauya, Ernest William; Hansen, Endre Hofstad; Gobakken, Terje; Bollandsås, Ole Martin; Malimbwi, Rogers Ernest; Næsset, Erik

    2015-12-01

    Airborne laser scanning (ALS) has recently emerged as a promising tool to acquire auxiliary information for improving aboveground biomass (AGB) estimation in sample-based forest inventories. Under design-based and model-assisted inferential frameworks, the estimation relies on a model that relates the auxiliary ALS metrics to AGB estimated on ground plots. The size of the field plots has been identified as one source of model uncertainty because of the so-called boundary effects which increases with decreasing plot size. Recent research in tropical forests has aimed to quantify the boundary effects on model prediction accuracy, but evidence of the consequences for the final AGB estimates is lacking. In this study we analyzed the effect of field plot size on model prediction accuracy and its implication when used in a model-assisted inferential framework. The results showed that the prediction accuracy of the model improved as the plot size increased. The adjusted R 2 increased from 0.35 to 0.74 while the relative root mean square error decreased from 63.6 to 29.2%. Indicators of boundary effects were identified and confirmed to have significant effects on the model residuals. Variance estimates of model-assisted mean AGB relative to corresponding variance estimates of pure field-based AGB, decreased with increasing plot size in the range from 200 to 3000 m 2 . The variance ratio of field-based estimates relative to model-assisted variance ranged from 1.7 to 7.7. This study showed that the relative improvement in precision of AGB estimation when increasing field-plot size, was greater for an ALS-assisted inventory compared to that of a pure field-based inventory.

  1. Radiomics-based Prognosis Analysis for Non-Small Cell Lung Cancer

    NASA Astrophysics Data System (ADS)

    Zhang, Yucheng; Oikonomou, Anastasia; Wong, Alexander; Haider, Masoom A.; Khalvati, Farzad

    2017-04-01

    Radiomics characterizes tumor phenotypes by extracting large numbers of quantitative features from radiological images. Radiomic features have been shown to provide prognostic value in predicting clinical outcomes in several studies. However, several challenges including feature redundancy, unbalanced data, and small sample sizes have led to relatively low predictive accuracy. In this study, we explore different strategies for overcoming these challenges and improving predictive performance of radiomics-based prognosis for non-small cell lung cancer (NSCLC). CT images of 112 patients (mean age 75 years) with NSCLC who underwent stereotactic body radiotherapy were used to predict recurrence, death, and recurrence-free survival using a comprehensive radiomics analysis. Different feature selection and predictive modeling techniques were used to determine the optimal configuration of prognosis analysis. To address feature redundancy, comprehensive analysis indicated that Random Forest models and Principal Component Analysis were optimum predictive modeling and feature selection methods, respectively, for achieving high prognosis performance. To address unbalanced data, Synthetic Minority Over-sampling technique was found to significantly increase predictive accuracy. A full analysis of variance showed that data endpoints, feature selection techniques, and classifiers were significant factors in affecting predictive accuracy, suggesting that these factors must be investigated when building radiomics-based predictive models for cancer prognosis.

  2. Diagnostic accuracy of tuberculous lymphadenitis fine needle aspiration biopsy confirmed by PCR as gold standard

    NASA Astrophysics Data System (ADS)

    DSuryadi; Delyuzar; Soekimin

    2018-03-01

    Indonesia is the second country with the TB (tuberculosis) burden in the world. Improvement in controlling TB and reducing the complications can accelerate early diagnosis and correct treatment. PCR test is a gold standard. However, it is quite expensive for routine diagnosis. Therefore, an accurate and cheaper diagnostic method such as fine needle aspiration biopsy is needed. The study aimsto determine the accuracy of fine needle aspiration biopsy cytology in the diagnosis of tuberculous lymphadenitis. A cross-sectional analytic study was conducted to the samples from patients suspected with tuberculous lymphadenitis. The fine needle aspiration biopsy (FNAB)test was performed and confirmed by PCR test.There is a comparison to the sensitivity, specificity, accuracy, positive predictive value and negative predictive value of both methods. Sensitivity (92.50%), specificity (96.49%), accuracy (94.85%), positive predictive value (94.87%) and negative predictive value (94.83%) were in FNAB test compared to gold standard. We concluded that fine needle aspiration biopsy is a recommendation for a cheaper and accurate diagnostic test for tuberculous lymphadenitis diagnosis.

  3. Prediction of soil properties using imaging spectroscopy: Considering fractional vegetation cover to improve accuracy

    NASA Astrophysics Data System (ADS)

    Franceschini, M. H. D.; Demattê, J. A. M.; da Silva Terra, F.; Vicente, L. E.; Bartholomeus, H.; de Souza Filho, C. R.

    2015-06-01

    Spectroscopic techniques have become attractive to assess soil properties because they are fast, require little labor and may reduce the amount of laboratory waste produced when compared to conventional methods. Imaging spectroscopy (IS) can have further advantages compared to laboratory or field proximal spectroscopic approaches such as providing spatially continuous information with a high density. However, the accuracy of IS derived predictions decreases when the spectral mixture of soil with other targets occurs. This paper evaluates the use of spectral data obtained by an airborne hyperspectral sensor (ProSpecTIR-VS - Aisa dual sensor) for prediction of physical and chemical properties of Brazilian highly weathered soils (i.e., Oxisols). A methodology to assess the soil spectral mixture is adapted and a progressive spectral dataset selection procedure, based on bare soil fractional cover, is proposed and tested. Satisfactory performances are obtained specially for the quantification of clay, sand and CEC using airborne sensor data (R2 of 0.77, 0.79 and 0.54; RPD of 2.14, 2.22 and 1.50, respectively), after spectral data selection is performed; although results obtained for laboratory data are more accurate (R2 of 0.92, 0.85 and 0.75; RPD of 3.52, 2.62 and 2.04, for clay, sand and CEC, respectively). Most importantly, predictions based on airborne-derived spectra for which the bare soil fractional cover is not taken into account show considerable lower accuracy, for example for clay, sand and CEC (RPD of 1.52, 1.64 and 1.16, respectively). Therefore, hyperspectral remotely sensed data can be used to predict topsoil properties of highly weathered soils, although spectral mixture of bare soil with vegetation must be considered in order to achieve an improved prediction accuracy.

  4. Genetic Variance Partitioning and Genome-Wide Prediction with Allele Dosage Information in Autotetraploid Potato.

    PubMed

    Endelman, Jeffrey B; Carley, Cari A Schmitz; Bethke, Paul C; Coombs, Joseph J; Clough, Mark E; da Silva, Washington L; De Jong, Walter S; Douches, David S; Frederick, Curtis M; Haynes, Kathleen G; Holm, David G; Miller, J Creighton; Muñoz, Patricio R; Navarro, Felix M; Novy, Richard G; Palta, Jiwan P; Porter, Gregory A; Rak, Kyle T; Sathuvalli, Vidyasagar R; Thompson, Asunta L; Yencho, G Craig

    2018-05-01

    As one of the world's most important food crops, the potato ( Solanum tuberosum L.) has spurred innovation in autotetraploid genetics, including in the use of SNP arrays to determine allele dosage at thousands of markers. By combining genotype and pedigree information with phenotype data for economically important traits, the objectives of this study were to (1) partition the genetic variance into additive vs. nonadditive components, and (2) determine the accuracy of genome-wide prediction. Between 2012 and 2017, a training population of 571 clones was evaluated for total yield, specific gravity, and chip fry color. Genomic covariance matrices for additive ( G ), digenic dominant ( D ), and additive × additive epistatic ( G # G ) effects were calculated using 3895 markers, and the numerator relationship matrix ( A ) was calculated from a 13-generation pedigree. Based on model fit and prediction accuracy, mixed model analysis with G was superior to A for yield and fry color but not specific gravity. The amount of additive genetic variance captured by markers was 20% of the total genetic variance for specific gravity, compared to 45% for yield and fry color. Within the training population, including nonadditive effects improved accuracy and/or bias for all three traits when predicting total genotypic value. When six F 1 populations were used for validation, prediction accuracy ranged from 0.06 to 0.63 and was consistently lower (0.13 on average) without allele dosage information. We conclude that genome-wide prediction is feasible in potato and that it will improve selection for breeding value given the substantial amount of nonadditive genetic variance in elite germplasm. Copyright © 2018 by the Genetics Society of America.

  5. Operationalizing the Diagnostic Criteria for Mild Cognitive Impairment: The Salience of Objective Measures in Predicting Incident Dementia.

    PubMed

    Brodaty, Henry; Aerts, Liesbeth; Crawford, John D; Heffernan, Megan; Kochan, Nicole A; Reppermund, Simone; Kang, Kristan; Maston, Kate; Draper, Brian; Trollor, Julian N; Sachdev, Perminder S

    2017-05-01

    Mild cognitive impairment (MCI) is considered an intermediate stage between normal aging and dementia. It is diagnosed in the presence of subjective cognitive decline and objective cognitive impairment without significant functional impairment, although there are no standard operationalizations for each of these criteria. The objective of this study is to determine which operationalization of the MCI criteria is most accurate at predicting dementia. Six-year longitudinal study, part of the Sydney Memory and Ageing Study. Community-based. 873 community-dwelling dementia-free adults between 70 and 90 years of age. Persons from a non-English speaking background were excluded. Seven different operationalizations for subjective cognitive decline and eight measures of objective cognitive impairment (resulting in 56 different MCI operational algorithms) were applied. The accuracy of each algorithm to predict progression to dementia over 6 years was examined for 618 individuals. Baseline MCI prevalence varied between 0.4% and 30.2% and dementia conversion between 15.9% and 61.9% across different algorithms. The predictive accuracy for progression to dementia was poor. The highest accuracy was achieved based on objective cognitive impairment alone. Inclusion of subjective cognitive decline or mild functional impairment did not improve dementia prediction accuracy. Not MCI, but objective cognitive impairment alone, is the best predictor for progression to dementia in a community sample. Nevertheless, clinical assessment procedures need to be refined to improve the identification of pre-dementia individuals. Copyright © 2016 American Association for Geriatric Psychiatry. Published by Elsevier Inc. All rights reserved.

  6. Moderate efficiency of clinicians' predictions decreased for blurred clinical conditions and benefits from the use of BRASS index. A longitudinal study on geriatric patients' outcomes.

    PubMed

    Signorini, Giulia; Dagani, Jessica; Bulgari, Viola; Ferrari, Clarissa; de Girolamo, Giovanni

    2016-01-01

    Accurate prognosis is an essential aspect of good clinical practice and efficient health services, particularly for chronic and disabling diseases, as in geriatric populations. This study aims to examine the accuracy of clinical prognostic predictions and to devise prediction models combining clinical variables and clinicians' prognosis for a geriatric patient sample. In a sample of 329 consecutive older patients admitted to 10 geriatric units, we evaluated the accuracy of clinicians' prognosis regarding three outcomes at discharge: global functioning, length of stay (LoS) in hospital, and destination at discharge (DD). A comprehensive set of sociodemographic, clinical, and treatment-related information were also collected. Moderate predictive performance was found for all three outcomes: area under receiver operating characteristic curve of 0.79 and 0.78 for functioning and LoS, respectively, and moderate concordance, Cohen's K = 0.45, between predicted and observed DD. Predictive models found the Blaylock Risk Assessment Screening Score together with clinicians' judgment relevant to improve predictions for all outcomes (absolute improvement in adjusted and pseudo-R(2) up to 19%). Although the clinicians' estimates were important factors in predicting global functioning, LoS, and DD, more research is needed regarding both methodological aspects and clinical measurements, to improve prognostic clinical indices. Copyright © 2016 Elsevier Inc. All rights reserved.

  7. Genomic predictions across Nordic Holstein and Nordic Red using the genomic best linear unbiased prediction model with different genomic relationship matrices.

    PubMed

    Zhou, L; Lund, M S; Wang, Y; Su, G

    2014-08-01

    This study investigated genomic predictions across Nordic Holstein and Nordic Red using various genomic relationship matrices. Different sources of information, such as consistencies of linkage disequilibrium (LD) phase and marker effects, were used to construct the genomic relationship matrices (G-matrices) across these two breeds. Single-trait genomic best linear unbiased prediction (GBLUP) model and two-trait GBLUP model were used for single-breed and two-breed genomic predictions. The data included 5215 Nordic Holstein bulls and 4361 Nordic Red bulls, which was composed of three populations: Danish Red, Swedish Red and Finnish Ayrshire. The bulls were genotyped with 50 000 SNP chip. Using the two-breed predictions with a joint Nordic Holstein and Nordic Red reference population, accuracies increased slightly for all traits in Nordic Red, but only for some traits in Nordic Holstein. Among the three subpopulations of Nordic Red, accuracies increased more for Danish Red than for Swedish Red and Finnish Ayrshire. This is because closer genetic relationships exist between Danish Red and Nordic Holstein. Among Danish Red, individuals with higher genomic relationship coefficients with Nordic Holstein showed more increased accuracies in the two-breed predictions. Weighting the two-breed G-matrices by LD phase consistencies, marker effects or both did not further improve accuracies of the two-breed predictions. © 2014 Blackwell Verlag GmbH.

  8. Predicting phenotype from genotype: Improving accuracy through more robust experimental and computational modeling

    PubMed Central

    Gallion, Jonathan; Koire, Amanda; Katsonis, Panagiotis; Schoenegge, Anne‐Marie; Bouvier, Michel

    2017-01-01

    Abstract Computational prediction yields efficient and scalable initial assessments of how variants of unknown significance may affect human health. However, when discrepancies between these predictions and direct experimental measurements of functional impact arise, inaccurate computational predictions are frequently assumed as the source. Here, we present a methodological analysis indicating that shortcomings in both computational and biological data can contribute to these disagreements. We demonstrate that incomplete assaying of multifunctional proteins can affect the strength of correlations between prediction and experiments; a variant's full impact on function is better quantified by considering multiple assays that probe an ensemble of protein functions. Additionally, many variants predictions are sensitive to protein alignment construction and can be customized to maximize relevance of predictions to a specific experimental question. We conclude that inconsistencies between computation and experiment can often be attributed to the fact that they do not test identical hypotheses. Aligning the design of the computational input with the design of the experimental output will require cooperation between computational and biological scientists, but will also lead to improved estimations of computational prediction accuracy and a better understanding of the genotype–phenotype relationship. PMID:28230923

  9. Predicting phenotype from genotype: Improving accuracy through more robust experimental and computational modeling.

    PubMed

    Gallion, Jonathan; Koire, Amanda; Katsonis, Panagiotis; Schoenegge, Anne-Marie; Bouvier, Michel; Lichtarge, Olivier

    2017-05-01

    Computational prediction yields efficient and scalable initial assessments of how variants of unknown significance may affect human health. However, when discrepancies between these predictions and direct experimental measurements of functional impact arise, inaccurate computational predictions are frequently assumed as the source. Here, we present a methodological analysis indicating that shortcomings in both computational and biological data can contribute to these disagreements. We demonstrate that incomplete assaying of multifunctional proteins can affect the strength of correlations between prediction and experiments; a variant's full impact on function is better quantified by considering multiple assays that probe an ensemble of protein functions. Additionally, many variants predictions are sensitive to protein alignment construction and can be customized to maximize relevance of predictions to a specific experimental question. We conclude that inconsistencies between computation and experiment can often be attributed to the fact that they do not test identical hypotheses. Aligning the design of the computational input with the design of the experimental output will require cooperation between computational and biological scientists, but will also lead to improved estimations of computational prediction accuracy and a better understanding of the genotype-phenotype relationship. © 2017 The Authors. **Human Mutation published by Wiley Periodicals, Inc.

  10. Improved numerical methods for turbulent viscous recirculating flows

    NASA Technical Reports Server (NTRS)

    Turan, A.; Vandoormaal, J. P.

    1988-01-01

    The performance of discrete methods for the prediction of fluid flows can be enhanced by improving the convergence rate of solvers and by increasing the accuracy of the discrete representation of the equations of motion. This report evaluates the gains in solver performance that are available when various acceleration methods are applied. Various discretizations are also examined and two are recommended because of their accuracy and robustness. Insertion of the improved discretization and solver accelerator into a TEACH mode, that has been widely applied to combustor flows, illustrates the substantial gains to be achieved.

  11. Requirements for an Advanced Low Earth Orbit (LEO) Sounder (ALS) for Improved Regional Weather Prediction and Monitoring of Greenhouse Gases

    NASA Technical Reports Server (NTRS)

    Pagano, Thomas S.; Chahine, Moustafa T.; Susskind, Joel

    2008-01-01

    Hyperspectral infrared atmospheric sounders (e.g., the Atmospheric Infrared Sounder (AIRS) on Aqua and the Infrared Atmospheric Sounding Interferometer (IASI) on Met Op) provide highly accurate temperature and water vapor profiles in the lower to upper troposphere. These systems are vital operational components of our National Weather Prediction system and the AIRS has demonstrated over 6 hrs of forecast improvement on the 5 day operational forecast. Despite the success in the mid troposphere to lower stratosphere, a reduction in sensitivity and accuracy has been seen in these systems in the boundary layer over land. In this paper we demonstrate the potential improvement associated with higher spatial resolution (1 km vs currently 13.5 km) on the accuracy of boundary layer products with an added consequence of higher yield of cloud free scenes. This latter feature is related to the number of samples that can be assimilated and has also shown to have a significant impact on improving forecast accuracy. We also present a set of frequencies and resolutions that will improve vertical resolution of temperature and water vapor and trace gas species throughout the atmosphere. Development of an Advanced Low Earth Orbit (LEO) Sounder (ALS) with these improvements will improve weather forecast at the regional scale and of tropical storms and hurricanes. Improvements are also expected in the accuracy of the water vapor and cloud properties products, enhancing process studies and providing a better match to the resolution of future climate models. The improvements of technology required for the ALS are consistent with the current state of technology as demonstrated in NASA Instrument Incubator Program and NOAA's Hyperspectral Environmental Suite (HES) formulation phase development programs.

  12. Prediction and validation of residual feed intake and dry matter intake in Danish lactating dairy cows using mid-infrared spectroscopy of milk.

    PubMed

    Shetty, N; Løvendahl, P; Lund, M S; Buitenhuis, A J

    2017-01-01

    The present study explored the effectiveness of Fourier transform mid-infrared (FT-IR) spectral profiles as a predictor for dry matter intake (DMI) and residual feed intake (RFI). The partial least squares regression method was used to develop the prediction models. The models were validated using different external test sets, one randomly leaving out 20% of the records (validation A), the second randomly leaving out 20% of cows (validation B), and a third (for DMI prediction models) randomly leaving out one cow (validation C). The data included 1,044 records from 140 cows; 97 were Danish Holstein and 43 Danish Jersey. Results showed better accuracies for validation A compared with other validation methods. Milk yield (MY) contributed largely to DMI prediction; MY explained 59% of the variation and the validated model error root mean square error of prediction (RMSEP) was 2.24kg. The model was improved by adding live weight (LW) as an additional predictor trait, where the accuracy R 2 increased from 0.59 to 0.72 and error RMSEP decreased from 2.24 to 1.83kg. When only the milk FT-IR spectral profile was used in DMI prediction, a lower prediction ability was obtained, with R 2 =0.30 and RMSEP=2.91kg. However, once the spectral information was added, along with MY and LW as predictors, model accuracy improved and R 2 increased to 0.81 and RMSEP decreased to 1.49kg. Prediction accuracies of RFI changed throughout lactation. The RFI prediction model for the early-lactation stage was better compared with across lactation or mid- and late-lactation stages, with R 2 =0.46 and RMSEP=1.70. The most important spectral wavenumbers that contributed to DMI and RFI prediction models included fat, protein, and lactose peaks. Comparable prediction results were obtained when using infrared-predicted fat, protein, and lactose instead of full spectra, indicating that FT-IR spectral data do not add significant new information to improve DMI and RFI prediction models. Therefore, in practice, if full FT-IR spectral data are not stored, it is possible to achieve similar DMI or RFI prediction results based on standard milk control data. For DMI, the milk fat region was responsible for the major variation in milk spectra; for RFI, the major variation in milk spectra was within the milk protein region. Copyright © 2017 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  13. Accuracy of genomic prediction using deregressed breeding values estimated from purebred and crossbred offspring phenotypes in pigs.

    PubMed

    Hidalgo, A M; Bastiaansen, J W M; Lopes, M S; Veroneze, R; Groenen, M A M; de Koning, D-J

    2015-07-01

    Genomic selection is applied to dairy cattle breeding to improve the genetic progress of purebred (PB) animals, whereas in pigs and poultry the target is a crossbred (CB) animal for which a different strategy appears to be needed. The source of information used to estimate the breeding values, i.e., using phenotypes of CB or PB animals, may affect the accuracy of prediction. The objective of our study was to assess the direct genomic value (DGV) accuracy of CB and PB pigs using different sources of phenotypic information. Data used were from 3 populations: 2,078 Dutch Landrace-based, 2,301 Large White-based, and 497 crossbreds from an F1 cross between the 2 lines. Two female reproduction traits were analyzed: gestation length (GLE) and total number of piglets born (TNB). Phenotypes used in the analyses originated from offspring of genotyped individuals. Phenotypes collected on CB and PB animals were analyzed as separate traits using a single-trait model. Breeding values were estimated separately for each trait in a pedigree BLUP analysis and subsequently deregressed. Deregressed EBV for each trait originating from different sources (CB or PB offspring) were used to study the accuracy of genomic prediction. Accuracy of prediction was computed as the correlation between DGV and the DEBV of the validation population. Accuracy of prediction within PB populations ranged from 0.43 to 0.62 across GLE and TNB. Accuracies to predict genetic merit of CB animals with one PB population in the training set ranged from 0.12 to 0.28, with the exception of using the CB offspring phenotype of the Dutch Landrace that resulted in an accuracy estimate around 0 for both traits. Accuracies to predict genetic merit of CB animals with both parental PB populations in the training set ranged from 0.17 to 0.30. We conclude that prediction within population and trait had good predictive ability regardless of the trait being the PB or CB performance, whereas using PB population(s) to predict genetic merit of CB animals had zero to moderate predictive ability. We observed that the DGV accuracy of CB animals when training on PB data was greater than or equal to training on CB data. However, when results are corrected for the different levels of reliabilities in the PB and CB training data, we showed that training on CB data does outperform PB data for the prediction of CB genetic merit, indicating that more CB animals should be phenotyped to increase the reliability and, consequently, accuracy of DGV for CB genetic merit.

  14. An Experimental Study in Determining Energy Expenditure from Treadmill Walking using Hip-Worn Inertial Sensors

    PubMed Central

    Vathsangam, Harshvardhan; Emken, Adar; Schroeder, E. Todd; Spruijt-Metz, Donna; Sukhatme, Gaurav S.

    2011-01-01

    This paper describes an experimental study in estimating energy expenditure from treadmill walking using a single hip-mounted triaxial inertial sensor comprised of a triaxial accelerometer and a triaxial gyroscope. Typical physical activity characterization using accelerometer generated counts suffers from two drawbacks - imprecison (due to proprietary counts) and incompleteness (due to incomplete movement description). We address these problems in the context of steady state walking by directly estimating energy expenditure with data from a hip-mounted inertial sensor. We represent the cyclic nature of walking with a Fourier transform of sensor streams and show how one can map this representation to energy expenditure (as measured by V O2 consumption, mL/min) using three regression techniques - Least Squares Regression (LSR), Bayesian Linear Regression (BLR) and Gaussian Process Regression (GPR). We perform a comparative analysis of the accuracy of sensor streams in predicting energy expenditure (measured by RMS prediction accuracy). Triaxial information is more accurate than uniaxial information. LSR based approaches are prone to outlier sensitivity and overfitting. Gyroscopic information showed equivalent if not better prediction accuracy as compared to accelerometers. Combining accelerometer and gyroscopic information provided better accuracy than using either sensor alone. We also analyze the best algorithmic approach among linear and nonlinear methods as measured by RMS prediction accuracy and run time. Nonlinear regression methods showed better prediction accuracy but required an order of magnitude of run time. This paper emphasizes the role of probabilistic techniques in conjunction with joint modeling of triaxial accelerations and rotational rates to improve energy expenditure prediction for steady-state treadmill walking. PMID:21690001

  15. Optimization of rotamers prior to template minimization improves stability predictions made by computational protein design.

    PubMed

    Davey, James A; Chica, Roberto A

    2015-04-01

    Computational protein design (CPD) predictions are highly dependent on the structure of the input template used. However, it is unclear how small differences in template geometry translate to large differences in stability prediction accuracy. Herein, we explored how structural changes to the input template affect the outcome of stability predictions by CPD. To do this, we prepared alternate templates by Rotamer Optimization followed by energy Minimization (ROM) and used them to recapitulate the stability of 84 protein G domain β1 mutant sequences. In the ROM process, side-chain rotamers for wild-type (WT) or mutant sequences are optimized on crystal or nuclear magnetic resonance (NMR) structures prior to template minimization, resulting in alternate structures termed ROM templates. We show that use of ROM templates prepared from sequences known to be stable results predominantly in improved prediction accuracy compared to using the minimized crystal or NMR structures. Conversely, ROM templates prepared from sequences that are less stable than the WT reduce prediction accuracy by increasing the number of false positives. These observed changes in prediction outcomes are attributed to differences in side-chain contacts made by rotamers in ROM templates. Finally, we show that ROM templates prepared from sequences that are unfolded or that adopt a nonnative fold result in the selective enrichment of sequences that are also unfolded or that adopt a nonnative fold, respectively. Our results demonstrate the existence of a rotamer bias caused by the input template that can be harnessed to skew predictions toward sequences displaying desired characteristics. © 2014 The Protein Society.

  16. Accuracy of genomic prediction in switchgrass ( Panicum virgatum L.) improved by accounting for linkage disequilibrium

    DOE PAGES

    Ramstein, Guillaume P.; Evans, Joseph; Kaeppler, Shawn M.; ...

    2016-02-11

    Switchgrass is a relatively high-yielding and environmentally sustainable biomass crop, but further genetic gains in biomass yield must be achieved to make it an economically viable bioenergy feedstock. Genomic selection (GS) is an attractive technology to generate rapid genetic gains in switchgrass, and meet the goals of a substantial displacement of petroleum use with biofuels in the near future. In this study, we empirically assessed prediction procedures for genomic selection in two different populations, consisting of 137 and 110 half-sib families of switchgrass, tested in two locations in the United States for three agronomic traits: dry matter yield, plant height,more » and heading date. Marker data were produced for the families’ parents by exome capture sequencing, generating up to 141,030 polymorphic markers with available genomic-location and annotation information. We evaluated prediction procedures that varied not only by learning schemes and prediction models, but also by the way the data were preprocessed to account for redundancy in marker information. More complex genomic prediction procedures were generally not significantly more accurate than the simplest procedure, likely due to limited population sizes. Nevertheless, a highly significant gain in prediction accuracy was achieved by transforming the marker data through a marker correlation matrix. Our results suggest that marker-data transformations and, more generally, the account of linkage disequilibrium among markers, offer valuable opportunities for improving prediction procedures in GS. Furthermore, some of the achieved prediction accuracies should motivate implementation of GS in switchgrass breeding programs.« less

  17. Accuracy of genomic prediction in switchgrass ( Panicum virgatum L.) improved by accounting for linkage disequilibrium

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ramstein, Guillaume P.; Evans, Joseph; Kaeppler, Shawn M.

    Switchgrass is a relatively high-yielding and environmentally sustainable biomass crop, but further genetic gains in biomass yield must be achieved to make it an economically viable bioenergy feedstock. Genomic selection (GS) is an attractive technology to generate rapid genetic gains in switchgrass, and meet the goals of a substantial displacement of petroleum use with biofuels in the near future. In this study, we empirically assessed prediction procedures for genomic selection in two different populations, consisting of 137 and 110 half-sib families of switchgrass, tested in two locations in the United States for three agronomic traits: dry matter yield, plant height,more » and heading date. Marker data were produced for the families’ parents by exome capture sequencing, generating up to 141,030 polymorphic markers with available genomic-location and annotation information. We evaluated prediction procedures that varied not only by learning schemes and prediction models, but also by the way the data were preprocessed to account for redundancy in marker information. More complex genomic prediction procedures were generally not significantly more accurate than the simplest procedure, likely due to limited population sizes. Nevertheless, a highly significant gain in prediction accuracy was achieved by transforming the marker data through a marker correlation matrix. Our results suggest that marker-data transformations and, more generally, the account of linkage disequilibrium among markers, offer valuable opportunities for improving prediction procedures in GS. Furthermore, some of the achieved prediction accuracies should motivate implementation of GS in switchgrass breeding programs.« less

  18. Accuracy of genomic selection in European maize elite breeding populations.

    PubMed

    Zhao, Yusheng; Gowda, Manje; Liu, Wenxin; Würschum, Tobias; Maurer, Hans P; Longin, Friedrich H; Ranc, Nicolas; Reif, Jochen C

    2012-03-01

    Genomic selection is a promising breeding strategy for rapid improvement of complex traits. The objective of our study was to investigate the prediction accuracy of genomic breeding values through cross validation. The study was based on experimental data of six segregating populations from a half-diallel mating design with 788 testcross progenies from an elite maize breeding program. The plants were intensively phenotyped in multi-location field trials and fingerprinted with 960 SNP markers. We used random regression best linear unbiased prediction in combination with fivefold cross validation. The prediction accuracy across populations was higher for grain moisture (0.90) than for grain yield (0.58). The accuracy of genomic selection realized for grain yield corresponds to the precision of phenotyping at unreplicated field trials in 3-4 locations. As for maize up to three generations are feasible per year, selection gain per unit time is high and, consequently, genomic selection holds great promise for maize breeding programs.

  19. The predictability of consumer visitation patterns

    NASA Astrophysics Data System (ADS)

    Krumme, Coco; Llorente, Alejandro; Cebrian, Manuel; Pentland, Alex ("Sandy"); Moro, Esteban

    2013-04-01

    We consider hundreds of thousands of individual economic transactions to ask: how predictable are consumers in their merchant visitation patterns? Our results suggest that, in the long-run, much of our seemingly elective activity is actually highly predictable. Notwithstanding a wide range of individual preferences, shoppers share regularities in how they visit merchant locations over time. Yet while aggregate behavior is largely predictable, the interleaving of shopping events introduces important stochastic elements at short time scales. These short- and long-scale patterns suggest a theoretical upper bound on predictability, and describe the accuracy of a Markov model in predicting a person's next location. We incorporate population-level transition probabilities in the predictive models, and find that in many cases these improve accuracy. While our results point to the elusiveness of precise predictions about where a person will go next, they suggest the existence, at large time-scales, of regularities across the population.

  20. The predictability of consumer visitation patterns

    PubMed Central

    Krumme, Coco; Llorente, Alejandro; Cebrian, Manuel; Pentland, Alex ("Sandy"); Moro, Esteban

    2013-01-01

    We consider hundreds of thousands of individual economic transactions to ask: how predictable are consumers in their merchant visitation patterns? Our results suggest that, in the long-run, much of our seemingly elective activity is actually highly predictable. Notwithstanding a wide range of individual preferences, shoppers share regularities in how they visit merchant locations over time. Yet while aggregate behavior is largely predictable, the interleaving of shopping events introduces important stochastic elements at short time scales. These short- and long-scale patterns suggest a theoretical upper bound on predictability, and describe the accuracy of a Markov model in predicting a person's next location. We incorporate population-level transition probabilities in the predictive models, and find that in many cases these improve accuracy. While our results point to the elusiveness of precise predictions about where a person will go next, they suggest the existence, at large time-scales, of regularities across the population. PMID:23598917

  1. Assessing the capacity of social determinants of health data to augment predictive models identifying patients in need of wraparound social services.

    PubMed

    Kasthurirathne, Suranga N; Vest, Joshua R; Menachemi, Nir; Halverson, Paul K; Grannis, Shaun J

    2018-01-01

    A growing variety of diverse data sources is emerging to better inform health care delivery and health outcomes. We sought to evaluate the capacity for clinical, socioeconomic, and public health data sources to predict the need for various social service referrals among patients at a safety-net hospital. We integrated patient clinical data and community-level data representing patients' social determinants of health (SDH) obtained from multiple sources to build random forest decision models to predict the need for any, mental health, dietitian, social work, or other SDH service referrals. To assess the impact of SDH on improving performance, we built separate decision models using clinical and SDH determinants and clinical data only. Decision models predicting the need for any, mental health, and dietitian referrals yielded sensitivity, specificity, and accuracy measures ranging between 60% and 75%. Specificity and accuracy scores for social work and other SDH services ranged between 67% and 77%, while sensitivity scores were between 50% and 63%. Area under the receiver operating characteristic curve values for the decision models ranged between 70% and 78%. Models for predicting the need for any services reported positive predictive values between 65% and 73%. Positive predictive values for predicting individual outcomes were below 40%. The need for various social service referrals can be predicted with considerable accuracy using a wide range of readily available clinical and community data that measure socioeconomic and public health conditions. While the use of SDH did not result in significant performance improvements, our approach represents a novel and important application of risk predictive modeling. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  2. Improving CSF biomarker accuracy in predicting prevalent and incident Alzheimer disease

    PubMed Central

    Fagan, A.M.; Williams, M.M.; Ghoshal, N.; Aeschleman, M.; Grant, E.A.; Marcus, D.S.; Mintun, M.A.; Holtzman, D.M.; Morris, J.C.

    2011-01-01

    Objective: To investigate factors, including cognitive and brain reserve, which may independently predict prevalent and incident dementia of the Alzheimer type (DAT) and to determine whether inclusion of identified factors increases the predictive accuracy of the CSF biomarkers Aβ42, tau, ptau181, tau/Aβ42, and ptau181/Aβ42. Methods: Logistic regression identified variables that predicted prevalent DAT when considered together with each CSF biomarker in a cross-sectional sample of 201 participants with normal cognition and 46 with DAT. The area under the receiver operating characteristic curve (AUC) from the resulting model was compared with the AUC generated using the biomarker alone. In a second sample with normal cognition at baseline and longitudinal data available (n = 213), Cox proportional hazards models identified variables that predicted incident DAT together with each biomarker, and the models' concordance probability estimate (CPE), which was compared to the CPE generated using the biomarker alone. Results: APOE genotype including an ε4 allele, male gender, and smaller normalized whole brain volumes (nWBV) were cross-sectionally associated with DAT when considered together with every biomarker. In the longitudinal sample (mean follow-up = 3.2 years), 14 participants (6.6%) developed DAT. Older age predicted a faster time to DAT in every model, and greater education predicted a slower time in 4 of 5 models. Inclusion of ancillary variables resulted in better cross-sectional prediction of DAT for all biomarkers (p < 0.0021), and better longitudinal prediction for 4 of 5 biomarkers (p < 0.0022). Conclusions: The predictive accuracy of CSF biomarkers is improved by including age, education, and nWBV in analyses. PMID:21228296

  3. A review of propeller noise prediction methodology: 1919-1994

    NASA Technical Reports Server (NTRS)

    Metzger, F. Bruce

    1995-01-01

    This report summarizes a review of the literature regarding propeller noise prediction methods. The review is divided into six sections: (1) early methods; (2) more recent methods based on earlier theory; (3) more recent methods based on the Acoustic Analogy; (4) more recent methods based on Computational Acoustics; (5) empirical methods; and (6) broadband methods. The report concludes that there are a large number of noise prediction procedures available which vary markedly in complexity. Deficiencies in accuracy of methods in many cases may be related, not to the methods themselves, but the accuracy and detail of the aerodynamic inputs used to calculate noise. The steps recommended in the report to provide accurate and easy to use prediction methods are: (1) identify reliable test data; (2) define and conduct test programs to fill gaps in the existing data base; (3) identify the most promising prediction methods; (4) evaluate promising prediction methods relative to the data base; (5) identify and correct the weaknesses in the prediction methods, including lack of user friendliness, and include features now available only in research codes; (6) confirm the accuracy of improved prediction methods to the data base; and (7) make the methods widely available and provide training in their use.

  4. A novel hybrid decomposition-and-ensemble model based on CEEMD and GWO for short-term PM2.5 concentration forecasting

    NASA Astrophysics Data System (ADS)

    Niu, Mingfei; Wang, Yufang; Sun, Shaolong; Li, Yongwu

    2016-06-01

    To enhance prediction reliability and accuracy, a hybrid model based on the promising principle of "decomposition and ensemble" and a recently proposed meta-heuristic called grey wolf optimizer (GWO) is introduced for daily PM2.5 concentration forecasting. Compared with existing PM2.5 forecasting methods, this proposed model has improved the prediction accuracy and hit rates of directional prediction. The proposed model involves three main steps, i.e., decomposing the original PM2.5 series into several intrinsic mode functions (IMFs) via complementary ensemble empirical mode decomposition (CEEMD) for simplifying the complex data; individually predicting each IMF with support vector regression (SVR) optimized by GWO; integrating all predicted IMFs for the ensemble result as the final prediction by another SVR optimized by GWO. Seven benchmark models, including single artificial intelligence (AI) models, other decomposition-ensemble models with different decomposition methods and models with the same decomposition-ensemble method but optimized by different algorithms, are considered to verify the superiority of the proposed hybrid model. The empirical study indicates that the proposed hybrid decomposition-ensemble model is remarkably superior to all considered benchmark models for its higher prediction accuracy and hit rates of directional prediction.

  5. BIG DATA ANALYTICS AND PRECISION ANIMAL AGRICULTURE SYMPOSIUM: Data to decisions.

    PubMed

    White, B J; Amrine, D E; Larson, R L

    2018-04-14

    Big data are frequently used in many facets of business and agronomy to enhance knowledge needed to improve operational decisions. Livestock operations collect data of sufficient quantity to perform predictive analytics. Predictive analytics can be defined as a methodology and suite of data evaluation techniques to generate a prediction for specific target outcomes. The objective of this manuscript is to describe the process of using big data and the predictive analytic framework to create tools to drive decisions in livestock production, health, and welfare. The predictive analytic process involves selecting a target variable, managing the data, partitioning the data, then creating algorithms, refining algorithms, and finally comparing accuracy of the created classifiers. The partitioning of the datasets allows model building and refining to occur prior to testing the predictive accuracy of the model with naive data to evaluate overall accuracy. Many different classification algorithms are available for predictive use and testing multiple algorithms can lead to optimal results. Application of a systematic process for predictive analytics using data that is currently collected or that could be collected on livestock operations will facilitate precision animal management through enhanced livestock operational decisions.

  6. A Prediction Algorithm for Drug Response in Patients with Mesial Temporal Lobe Epilepsy Based on Clinical and Genetic Information

    PubMed Central

    Carvalho, Benilton S.; Bilevicius, Elizabeth; Alvim, Marina K. M.; Lopes-Cendes, Iscia

    2017-01-01

    Mesial temporal lobe epilepsy is the most common form of adult epilepsy in surgical series. Currently, the only characteristic used to predict poor response to clinical treatment in this syndrome is the presence of hippocampal sclerosis. Single nucleotide polymorphisms (SNPs) located in genes encoding drug transporter and metabolism proteins could influence response to therapy. Therefore, we aimed to evaluate whether combining information from clinical variables as well as SNPs in candidate genes could improve the accuracy of predicting response to drug therapy in patients with mesial temporal lobe epilepsy. For this, we divided 237 patients into two groups: 75 responsive and 162 refractory to antiepileptic drug therapy. We genotyped 119 SNPs in ABCB1, ABCC2, CYP1A1, CYP1A2, CYP1B1, CYP2C9, CYP2C19, CYP2D6, CYP2E1, CYP3A4, and CYP3A5 genes. We used 98 additional SNPs to evaluate population stratification. We assessed a first scenario using only clinical variables and a second one including SNP information. The random forests algorithm combined with leave-one-out cross-validation was used to identify the best predictive model in each scenario and compared their accuracies using the area under the curve statistic. Additionally, we built a variable importance plot to present the set of most relevant predictors on the best model. The selected best model included the presence of hippocampal sclerosis and 56 SNPs. Furthermore, including SNPs in the model improved accuracy from 0.4568 to 0.8177. Our findings suggest that adding genetic information provided by SNPs, located on drug transport and metabolism genes, can improve the accuracy for predicting which patients with mesial temporal lobe epilepsy are likely to be refractory to drug treatment, making it possible to identify patients who may benefit from epilepsy surgery sooner. PMID:28052106

  7. Selecting Optimal Random Forest Predictive Models: A Case Study on Predicting the Spatial Distribution of Seabed Hardness

    PubMed Central

    Li, Jin; Tran, Maggie; Siwabessy, Justy

    2016-01-01

    Spatially continuous predictions of seabed hardness are important baseline environmental information for sustainable management of Australia’s marine jurisdiction. Seabed hardness is often inferred from multibeam backscatter data with unknown accuracy and can be inferred from underwater video footage at limited locations. In this study, we classified the seabed into four classes based on two new seabed hardness classification schemes (i.e., hard90 and hard70). We developed optimal predictive models to predict seabed hardness using random forest (RF) based on the point data of hardness classes and spatially continuous multibeam data. Five feature selection (FS) methods that are variable importance (VI), averaged variable importance (AVI), knowledge informed AVI (KIAVI), Boruta and regularized RF (RRF) were tested based on predictive accuracy. Effects of highly correlated, important and unimportant predictors on the accuracy of RF predictive models were examined. Finally, spatial predictions generated using the most accurate models were visually examined and analysed. This study confirmed that: 1) hard90 and hard70 are effective seabed hardness classification schemes; 2) seabed hardness of four classes can be predicted with a high degree of accuracy; 3) the typical approach used to pre-select predictive variables by excluding highly correlated variables needs to be re-examined; 4) the identification of the important and unimportant predictors provides useful guidelines for further improving predictive models; 5) FS methods select the most accurate predictive model(s) instead of the most parsimonious ones, and AVI and Boruta are recommended for future studies; and 6) RF is an effective modelling method with high predictive accuracy for multi-level categorical data and can be applied to ‘small p and large n’ problems in environmental sciences. Additionally, automated computational programs for AVI need to be developed to increase its computational efficiency and caution should be taken when applying filter FS methods in selecting predictive models. PMID:26890307

  8. Selecting Optimal Random Forest Predictive Models: A Case Study on Predicting the Spatial Distribution of Seabed Hardness.

    PubMed

    Li, Jin; Tran, Maggie; Siwabessy, Justy

    2016-01-01

    Spatially continuous predictions of seabed hardness are important baseline environmental information for sustainable management of Australia's marine jurisdiction. Seabed hardness is often inferred from multibeam backscatter data with unknown accuracy and can be inferred from underwater video footage at limited locations. In this study, we classified the seabed into four classes based on two new seabed hardness classification schemes (i.e., hard90 and hard70). We developed optimal predictive models to predict seabed hardness using random forest (RF) based on the point data of hardness classes and spatially continuous multibeam data. Five feature selection (FS) methods that are variable importance (VI), averaged variable importance (AVI), knowledge informed AVI (KIAVI), Boruta and regularized RF (RRF) were tested based on predictive accuracy. Effects of highly correlated, important and unimportant predictors on the accuracy of RF predictive models were examined. Finally, spatial predictions generated using the most accurate models were visually examined and analysed. This study confirmed that: 1) hard90 and hard70 are effective seabed hardness classification schemes; 2) seabed hardness of four classes can be predicted with a high degree of accuracy; 3) the typical approach used to pre-select predictive variables by excluding highly correlated variables needs to be re-examined; 4) the identification of the important and unimportant predictors provides useful guidelines for further improving predictive models; 5) FS methods select the most accurate predictive model(s) instead of the most parsimonious ones, and AVI and Boruta are recommended for future studies; and 6) RF is an effective modelling method with high predictive accuracy for multi-level categorical data and can be applied to 'small p and large n' problems in environmental sciences. Additionally, automated computational programs for AVI need to be developed to increase its computational efficiency and caution should be taken when applying filter FS methods in selecting predictive models.

  9. Effects of number of training generations on genomic prediction for various traits in a layer chicken population.

    PubMed

    Weng, Ziqing; Wolc, Anna; Shen, Xia; Fernando, Rohan L; Dekkers, Jack C M; Arango, Jesus; Settar, Petek; Fulton, Janet E; O'Sullivan, Neil P; Garrick, Dorian J

    2016-03-19

    Genomic estimated breeding values (GEBV) based on single nucleotide polymorphism (SNP) genotypes are widely used in animal improvement programs. It is typically assumed that the larger the number of animals is in the training set, the higher is the prediction accuracy of GEBV. The aim of this study was to quantify genomic prediction accuracy depending on the number of ancestral generations included in the training set, and to determine the optimal number of training generations for different traits in an elite layer breeding line. Phenotypic records for 16 traits on 17,793 birds were used. All parents and some selection candidates from nine non-overlapping generations were genotyped for 23,098 segregating SNPs. An animal model with pedigree relationships (PBLUP) and the BayesB genomic prediction model were applied to predict EBV or GEBV at each validation generation (progeny of the most recent training generation) based on varying numbers of immediately preceding ancestral generations. Prediction accuracy of EBV or GEBV was assessed as the correlation between EBV and phenotypes adjusted for fixed effects, divided by the square root of trait heritability. The optimal number of training generations that resulted in the greatest prediction accuracy of GEBV was determined for each trait. The relationship between optimal number of training generations and heritability was investigated. On average, accuracies were higher with the BayesB model than with PBLUP. Prediction accuracies of GEBV increased as the number of closely-related ancestral generations included in the training set increased, but reached an asymptote or slightly decreased when distant ancestral generations were used in the training set. The optimal number of training generations was 4 or more for high heritability traits but less than that for low heritability traits. For less heritable traits, limiting the training datasets to individuals closely related to the validation population resulted in the best predictions. The effect of adding distant ancestral generations in the training set on prediction accuracy differed between traits and the optimal number of necessary training generations is associated with the heritability of traits.

  10. Best Practices for Mudweight Window Generation and Accuracy Assessment between Seismic Based Pore Pressure Prediction Methodologies for a Near-Salt Field in Mississippi Canyon, Gulf of Mexico

    NASA Astrophysics Data System (ADS)

    Mannon, Timothy Patrick, Jr.

    Improving well design has and always will be the primary goal in drilling operations in the oil and gas industry. Oil and gas plays are continuing to move into increasingly hostile drilling environments, including near and/or sub-salt proximities. The ability to reduce the risk and uncertainly involved in drilling operations in unconventional geologic settings starts with improving the techniques for mudweight window modeling. To address this issue, an analysis of wellbore stability and well design improvement has been conducted. This study will show a systematic approach to well design by focusing on best practices for mudweight window projection for a field in Mississippi Canyon, Gulf of Mexico. The field includes depleted reservoirs and is in close proximity of salt intrusions. Analysis of offset wells has been conducted in the interest of developing an accurate picture of the subsurface environment by making connections between depth, non-productive time (NPT) events, and mudweights used. Commonly practiced petrophysical methods of pore pressure, fracture pressure, and shear failure gradient prediction have been applied to key offset wells in order to enhance the well design for two proposed wells. For the first time in the literature, the accuracy of the commonly accepted, seismic interval velocity based and the relatively new, seismic frequency based methodologies for pore pressure prediction are qualitatively and quantitatively compared for accuracy. Accuracy standards will be based on the agreement of the seismic outputs to pressure data obtained while drilling and petrophysically based pore pressure outputs for each well. The results will show significantly higher accuracy for the seismic frequency based approach in wells that were in near/sub-salt environments and higher overall accuracy for all of the wells in the study as a whole.

  11. Integrated Computational Solution for Predicting Skin Sensitization Potential of Molecules

    PubMed Central

    Desai, Aarti; Singh, Vivek K.; Jere, Abhay

    2016-01-01

    Introduction Skin sensitization forms a major toxicological endpoint for dermatology and cosmetic products. Recent ban on animal testing for cosmetics demands for alternative methods. We developed an integrated computational solution (SkinSense) that offers a robust solution and addresses the limitations of existing computational tools i.e. high false positive rate and/or limited coverage. Results The key components of our solution include: QSAR models selected from a combinatorial set, similarity information and literature-derived sub-structure patterns of known skin protein reactive groups. Its prediction performance on a challenge set of molecules showed accuracy = 75.32%, CCR = 74.36%, sensitivity = 70.00% and specificity = 78.72%, which is better than several existing tools including VEGA (accuracy = 45.00% and CCR = 54.17% with ‘High’ reliability scoring), DEREK (accuracy = 72.73% and CCR = 71.44%) and TOPKAT (accuracy = 60.00% and CCR = 61.67%). Although, TIMES-SS showed higher predictive power (accuracy = 90.00% and CCR = 92.86%), the coverage was very low (only 10 out of 77 molecules were predicted reliably). Conclusions Owing to improved prediction performance and coverage, our solution can serve as a useful expert system towards Integrated Approaches to Testing and Assessment for skin sensitization. It would be invaluable to cosmetic/ dermatology industry for pre-screening their molecules, and reducing time, cost and animal testing. PMID:27271321

  12. Verification of MICNOISE computer program for the prediction of highway noise

    DOT National Transportation Integrated Search

    1974-01-01

    The objectives of this study were to verify the computer program used by the Virginia Department of Highways to predict highway sound pressure levels, to determine whether the accuracy and usefulness of the program could be improved, and to make reco...

  13. Efficient use of unlabeled data for protein sequence classification: a comparative study.

    PubMed

    Kuksa, Pavel; Huang, Pai-Hsi; Pavlovic, Vladimir

    2009-04-29

    Recent studies in computational primary protein sequence analysis have leveraged the power of unlabeled data. For example, predictive models based on string kernels trained on sequences known to belong to particular folds or superfamilies, the so-called labeled data set, can attain significantly improved accuracy if this data is supplemented with protein sequences that lack any class tags-the unlabeled data. In this study, we present a principled and biologically motivated computational framework that more effectively exploits the unlabeled data by only using the sequence regions that are more likely to be biologically relevant for better prediction accuracy. As overly-represented sequences in large uncurated databases may bias the estimation of computational models that rely on unlabeled data, we also propose a method to remove this bias and improve performance of the resulting classifiers. Combined with state-of-the-art string kernels, our proposed computational framework achieves very accurate semi-supervised protein remote fold and homology detection on three large unlabeled databases. It outperforms current state-of-the-art methods and exhibits significant reduction in running time. The unlabeled sequences used under the semi-supervised setting resemble the unpolished gemstones; when used as-is, they may carry unnecessary features and hence compromise the classification accuracy but once cut and polished, they improve the accuracy of the classifiers considerably.

  14. A time series based sequence prediction algorithm to detect activities of daily living in smart home.

    PubMed

    Marufuzzaman, M; Reaz, M B I; Ali, M A M; Rahman, L F

    2015-01-01

    The goal of smart homes is to create an intelligent environment adapting the inhabitants need and assisting the person who needs special care and safety in their daily life. This can be reached by collecting the ADL (activities of daily living) data and further analysis within existing computing elements. In this research, a very recent algorithm named sequence prediction via enhanced episode discovery (SPEED) is modified and in order to improve accuracy time component is included. The modified SPEED or M-SPEED is a sequence prediction algorithm, which modified the previous SPEED algorithm by using time duration of appliance's ON-OFF states to decide the next state. M-SPEED discovered periodic episodes of inhabitant behavior, trained it with learned episodes, and made decisions based on the obtained knowledge. The results showed that M-SPEED achieves 96.8% prediction accuracy, which is better than other time prediction algorithms like PUBS, ALZ with temporal rules and the previous SPEED. Since human behavior shows natural temporal patterns, duration times can be used to predict future events more accurately. This inhabitant activity prediction system will certainly improve the smart homes by ensuring safety and better care for elderly and handicapped people.

  15. Taxi Time Prediction at Charlotte Airport Using Fast-Time Simulation and Machine Learning Techniques

    NASA Technical Reports Server (NTRS)

    Lee, Hanbong

    2016-01-01

    Accurate taxi time prediction is required for enabling efficient runway scheduling that can increase runway throughput and reduce taxi times and fuel consumptions on the airport surface. Currently NASA and American Airlines are jointly developing a decision-support tool called Spot and Runway Departure Advisor (SARDA) that assists airport ramp controllers to make gate pushback decisions and improve the overall efficiency of airport surface traffic. In this presentation, we propose to use Linear Optimized Sequencing (LINOS), a discrete-event fast-time simulation tool, to predict taxi times and provide the estimates to the runway scheduler in real-time airport operations. To assess its prediction accuracy, we also introduce a data-driven analytical method using machine learning techniques. These two taxi time prediction methods are evaluated with actual taxi time data obtained from the SARDA human-in-the-loop (HITL) simulation for Charlotte Douglas International Airport (CLT) using various performance measurement metrics. Based on the taxi time prediction results, we also discuss how the prediction accuracy can be affected by the operational complexity at this airport and how we can improve the fast time simulation model before implementing it with an airport scheduling algorithm in a real-time environment.

  16. Calibration between Undergraduate Students' Prediction of and Actual Performance: The Role of Gender and Performance Attributions

    ERIC Educational Resources Information Center

    Gutierrez, Antonio P.; Price, Addison F.

    2017-01-01

    This study investigated changes in male and female students' prediction and postdiction calibration accuracy and bias scores, and the predictive effects of explanatory styles on these variables beyond gender. Seventy undergraduate students rated their confidence in performance before and after a 40-item exam. There was an improvement in students'…

  17. Wind Prediction Accuracy for Air Traffic Management Decision Support Tools

    NASA Technical Reports Server (NTRS)

    Cole, Rod; Green, Steve; Jardin, Matt; Schwartz, Barry; Benjamin, Stan

    2000-01-01

    The performance of Air Traffic Management and flight deck decision support tools depends in large part on the accuracy of the supporting 4D trajectory predictions. This is particularly relevant to conflict prediction and active advisories for the resolution of conflicts and the conformance with of traffic-flow management flow-rate constraints (e.g., arrival metering / required time of arrival). Flight test results have indicated that wind prediction errors may represent the largest source of trajectory prediction error. The tests also discovered relatively large errors (e.g., greater than 20 knots), existing in pockets of space and time critical to ATM DST performance (one or more sectors, greater than 20 minutes), are inadequately represented by the classic RMS aggregate prediction-accuracy studies of the past. To facilitate the identification and reduction of DST-critical wind-prediction errors, NASA has lead a collaborative research and development activity with MIT Lincoln Laboratories and the Forecast Systems Lab of the National Oceanographic and Atmospheric Administration (NOAA). This activity, begun in 1996, has focussed on the development of key metrics for ATM DST performance, assessment of wind-prediction skill for state of the art systems, and development/validation of system enhancements to improve skill. A 13 month study was conducted for the Denver Center airspace in 1997. Two complementary wind-prediction systems were analyzed and compared to the forecast performance of the then standard 60 km Rapid Update Cycle - version 1 (RUC-1). One system, developed by NOAA, was the prototype 40-km RUC-2 that became operational at NCEP in 1999. RUC-2 introduced a faster cycle (1 hr vs. 3 hr) and improved mesoscale physics. The second system, Augmented Winds (AW), is a prototype en route wind application developed by MITLL based on the Integrated Terminal Wind System (ITWS). AW is run at a local facility (Center) level, and updates RUC predictions based on an optimal interpolation of the latest ACARS reports since the RUC run. This paper presents an overview of the study's results including the identification and use of new large mor wind-prediction accuracy metrics that are key to ATM DST performance.

  18. Weather models as virtual sensors to data-driven rainfall predictions in urban watersheds

    NASA Astrophysics Data System (ADS)

    Cozzi, Lorenzo; Galelli, Stefano; Pascal, Samuel Jolivet De Marc; Castelletti, Andrea

    2013-04-01

    Weather and climate predictions are a key element of urban hydrology where they are used to inform water management and assist in flood warning delivering. Indeed, the modelling of the very fast dynamics of urbanized catchments can be substantially improved by the use of weather/rainfall predictions. For example, in Singapore Marina Reservoir catchment runoff processes have a very short time of concentration (roughly one hour) and observational data are thus nearly useless for runoff predictions and weather prediction are required. Unfortunately, radar nowcasting methods do not allow to carrying out long - term weather predictions, whereas numerical models are limited by their coarse spatial scale. Moreover, numerical models are usually poorly reliable because of the fast motion and limited spatial extension of rainfall events. In this study we investigate the combined use of data-driven modelling techniques and weather variables observed/simulated with a numerical model as a way to improve rainfall prediction accuracy and lead time in the Singapore metropolitan area. To explore the feasibility of the approach, we use a Weather Research and Forecast (WRF) model as a virtual sensor network for the input variables (the states of the WRF model) to a machine learning rainfall prediction model. More precisely, we combine an input variable selection method and a non-parametric tree-based model to characterize the empirical relation between the rainfall measured at the catchment level and all possible weather input variables provided by WRF model. We explore different lead time to evaluate the model reliability for different long - term predictions, as well as different time lags to see how past information could improve results. Results show that the proposed approach allow a significant improvement of the prediction accuracy of the WRF model on the Singapore urban area.

  19. Surrogate Modeling of High-Fidelity Fracture Simulations for Real-Time Residual Strength Predictions

    NASA Technical Reports Server (NTRS)

    Spear, Ashley D.; Priest, Amanda R.; Veilleux, Michael G.; Ingraffea, Anthony R.; Hochhalter, Jacob D.

    2011-01-01

    A surrogate model methodology is described for predicting in real time the residual strength of flight structures with discrete-source damage. Starting with design of experiment, an artificial neural network is developed that takes as input discrete-source damage parameters and outputs a prediction of the structural residual strength. Target residual strength values used to train the artificial neural network are derived from 3D finite element-based fracture simulations. A residual strength test of a metallic, integrally-stiffened panel is simulated to show that crack growth and residual strength are determined more accurately in discrete-source damage cases by using an elastic-plastic fracture framework rather than a linear-elastic fracture mechanics-based method. Improving accuracy of the residual strength training data would, in turn, improve accuracy of the surrogate model. When combined, the surrogate model methodology and high-fidelity fracture simulation framework provide useful tools for adaptive flight technology.

  20. Surrogate Modeling of High-Fidelity Fracture Simulations for Real-Time Residual Strength Predictions

    NASA Technical Reports Server (NTRS)

    Spear, Ashley D.; Priest, Amanda R.; Veilleux, Michael G.; Ingraffea, Anthony R.; Hochhalter, Jacob D.

    2011-01-01

    A surrogate model methodology is described for predicting, during flight, the residual strength of aircraft structures that sustain discrete-source damage. Starting with design of experiment, an artificial neural network is developed that takes as input discrete-source damage parameters and outputs a prediction of the structural residual strength. Target residual strength values used to train the artificial neural network are derived from 3D finite element-based fracture simulations. Two ductile fracture simulations are presented to show that crack growth and residual strength are determined more accurately in discrete-source damage cases by using an elastic-plastic fracture framework rather than a linear-elastic fracture mechanics-based method. Improving accuracy of the residual strength training data does, in turn, improve accuracy of the surrogate model. When combined, the surrogate model methodology and high fidelity fracture simulation framework provide useful tools for adaptive flight technology.

  1. Link prediction boosted psychiatry disorder classification for functional connectivity network

    NASA Astrophysics Data System (ADS)

    Li, Weiwei; Mei, Xue; Wang, Hao; Zhou, Yu; Huang, Jiashuang

    2017-02-01

    Functional connectivity network (FCN) is an effective tool in psychiatry disorders classification, and represents cross-correlation of the regional blood oxygenation level dependent signal. However, FCN is often incomplete for suffering from missing and spurious edges. To accurate classify psychiatry disorders and health control with the incomplete FCN, we first `repair' the FCN with link prediction, and then exact the clustering coefficients as features to build a weak classifier for every FCN. Finally, we apply a boosting algorithm to combine these weak classifiers for improving classification accuracy. Our method tested by three datasets of psychiatry disorder, including Alzheimer's Disease, Schizophrenia and Attention Deficit Hyperactivity Disorder. The experimental results show our method not only significantly improves the classification accuracy, but also efficiently reconstructs the incomplete FCN.

  2. The influence of a wall function on turbine blade heat transfer prediction

    NASA Technical Reports Server (NTRS)

    Whitaker, Kevin W.

    1989-01-01

    The second phase of a continuing investigation to improve the prediction of turbine blade heat transfer coefficients was completed. The present study specifically investigated how a numeric wall function in the turbulence model of a two-dimensional boundary layer code, STAN5, affected heat transfer prediction capabilities. Several sources of inaccuracy in the wall function were identified and then corrected or improved. Heat transfer coefficient predictions were then obtained using each one of the modifications to determine its effect. Results indicated that the modifications made to the wall function can significantly affect the prediction of heat transfer coefficients on turbine blades. The improvement in accuracy due the modifications is still inconclusive and is still being investigated.

  3. Analysis of the Mean Absolute Error (MAE) and the Root Mean Square Error (RMSE) in Assessing Rounding Model

    NASA Astrophysics Data System (ADS)

    Wang, Weijie; Lu, Yanmin

    2018-03-01

    Most existing Collaborative Filtering (CF) algorithms predict a rating as the preference of an active user toward a given item, which is always a decimal fraction. Meanwhile, the actual ratings in most data sets are integers. In this paper, we discuss and demonstrate why rounding can bring different influences to these two metrics; prove that rounding is necessary in post-processing of the predicted ratings, eliminate of model prediction bias, improving the accuracy of the prediction. In addition, we also propose two new rounding approaches based on the predicted rating probability distribution, which can be used to round the predicted rating to an optimal integer rating, and get better prediction accuracy compared to the Basic Rounding approach. Extensive experiments on different data sets validate the correctness of our analysis and the effectiveness of our proposed rounding approaches.

  4. Toward Harnessing User Feedback For Machine Learning

    DTIC Science & Technology

    2006-10-02

    machine learning systems. If this resource-the users themselves-could somehow work hand-in-hand with machine learning systems, the accuracy of learning systems could be improved and the users? understanding and trust of the system could improve as well. We conducted a think-aloud study to see how willing users were to provide feedback and to understand what kinds of feedback users could give. Users were shown explanations of machine learning predictions and asked to provide feedback to improve the predictions. We found that users

  5. Localized Density/Drag Prediction for Improved Onboard Orbit Propagation

    DTIC Science & Technology

    2009-09-01

    Localized Density/Drag Prediction for Improved Onboard Orbit Propagation Nathan B. Stastny, Frank R. Chavez, Chin Lin, T. Alan Lovell , Robert A...Terrestrial Physics, Vol. 70, 774-793, 2008 3. Storz, M.F, Bowman, B.R., Branson, J.I., High Accuracy Satellite Drag Model (HASDM), AIAA/ AAS ...Geomagnetic Indices, AIAA/ AAS Astrodynamics Specialist Conference, Honolulu, HI, Aug. 2008 5. Bruinsma, S., Biancale, R., Total Densities Derived from

  6. AGDRIFT: A MODEL FOR ESTIMATING NEAR-FIELD SPRAY DRIFT FROM AERIAL APPLICATIONS

    EPA Science Inventory

    The aerial spray prediction model AgDRIFT(R) embodies the computational engine found in the near-wake Lagrangian model AGricultural DISPersal (AGDISP) but with several important features added that improve the speed and accuracy of its predictions. This article summarizes those c...

  7. Infrared Imagery of Shuttle (IRIS). Task 2, summary report

    NASA Technical Reports Server (NTRS)

    Chocol, C. J.

    1978-01-01

    End-to-end tests of a 16 element indium antimonide sensor array and 10 channels of associated electronic signal processing were completed. Quantitative data were gathered on system responsivity, frequency response, noise, stray capacitance effects, and sensor paralleling. These tests verify that the temperature accuracies, predicted in the Task 1 study, can be obtained with a very carefully designed electro-optical flight system. Pre-flight and inflight calibration of a high quality are mandatory to obtain these accuracies. Also, optical crosstalk in the array-dewar assembly must be carefully eliminated by its design. Tests of the scaled up tracking system reticle also demonstrate that the predicted tracking system accuracies can be met in the flight system. In addition, improvements in the reticle pattern and electronics are possible, which will reduce the complexity of the flight system and increase tracking accuracy.

  8. High accuracy satellite drag model (HASDM)

    NASA Astrophysics Data System (ADS)

    Storz, M.; Bowman, B.; Branson, J.

    The dominant error source in the force models used to predict low perigee satellite trajectories is atmospheric drag. Errors in operational thermospheric density models cause significant errors in predicted satellite positions, since these models do not account for dynamic changes in atmospheric drag for orbit predictions. The Air Force Space Battlelab's High Accuracy Satellite Drag Model (HASDM) estimates and predicts (out three days) a dynamically varying high-resolution density field. HASDM includes the Dynamic Calibration Atmosphere (DCA) algorithm that solves for the phases and amplitudes of the diurnal, semidiurnal and terdiurnal variations of thermospheric density near real-time from the observed drag effects on a set of Low Earth Orbit (LEO) calibration satellites. The density correction is expressed as a function of latitude, local solar time and altitude. In HASDM, a time series prediction filter relates the extreme ultraviolet (EUV) energy index E10.7 and the geomagnetic storm index a p to the DCA density correction parameters. The E10.7 index is generated by the SOLAR2000 model, the first full spectrum model of solar irradiance. The estimated and predicted density fields will be used operationally to significantly improve the accuracy of predicted trajectories for all low perigee satellites.

  9. High accuracy satellite drag model (HASDM)

    NASA Astrophysics Data System (ADS)

    Storz, Mark F.; Bowman, Bruce R.; Branson, Major James I.; Casali, Stephen J.; Tobiska, W. Kent

    The dominant error source in force models used to predict low-perigee satellite trajectories is atmospheric drag. Errors in operational thermospheric density models cause significant errors in predicted satellite positions, since these models do not account for dynamic changes in atmospheric drag for orbit predictions. The Air Force Space Battlelab's High Accuracy Satellite Drag Model (HASDM) estimates and predicts (out three days) a dynamically varying global density field. HASDM includes the Dynamic Calibration Atmosphere (DCA) algorithm that solves for the phases and amplitudes of the diurnal and semidiurnal variations of thermospheric density near real-time from the observed drag effects on a set of Low Earth Orbit (LEO) calibration satellites. The density correction is expressed as a function of latitude, local solar time and altitude. In HASDM, a time series prediction filter relates the extreme ultraviolet (EUV) energy index E10.7 and the geomagnetic storm index ap, to the DCA density correction parameters. The E10.7 index is generated by the SOLAR2000 model, the first full spectrum model of solar irradiance. The estimated and predicted density fields will be used operationally to significantly improve the accuracy of predicted trajectories for all low-perigee satellites.

  10. Numerical simulation of turbulence flow in a Kaplan turbine -Evaluation on turbine performance prediction accuracy-

    NASA Astrophysics Data System (ADS)

    Ko, P.; Kurosawa, S.

    2014-03-01

    The understanding and accurate prediction of the flow behaviour related to cavitation and pressure fluctuation in a Kaplan turbine are important to the design work enhancing the turbine performance including the elongation of the operation life span and the improvement of turbine efficiency. In this paper, high accuracy turbine and cavitation performance prediction method based on entire flow passage for a Kaplan turbine is presented and evaluated. Two-phase flow field is predicted by solving Reynolds-Averaged Navier-Stokes equations expressed by volume of fluid method tracking the free surface and combined with Reynolds Stress model. The growth and collapse of cavitation bubbles are modelled by the modified Rayleigh-Plesset equation. The prediction accuracy is evaluated by comparing with the model test results of Ns 400 Kaplan model turbine. As a result that the experimentally measured data including turbine efficiency, cavitation performance, and pressure fluctuation are accurately predicted. Furthermore, the cavitation occurrence on the runner blade surface and the influence to the hydraulic loss of the flow passage are discussed. Evaluated prediction method for the turbine flow and performance is introduced to facilitate the future design and research works on Kaplan type turbine.

  11. Spatially distributed modeling of soil organic carbon across China with improved accuracy

    NASA Astrophysics Data System (ADS)

    Li, Qi-quan; Zhang, Hao; Jiang, Xin-ye; Luo, Youlin; Wang, Chang-quan; Yue, Tian-xiang; Li, Bing; Gao, Xue-song

    2017-06-01

    There is a need for more detailed spatial information on soil organic carbon (SOC) for the accurate estimation of SOC stock and earth system models. As it is effective to use environmental factors as auxiliary variables to improve the prediction accuracy of spatially distributed modeling, a combined method (HASM_EF) was developed to predict the spatial pattern of SOC across China using high accuracy surface modeling (HASM), artificial neural network (ANN), and principal component analysis (PCA) to introduce land uses, soil types, climatic factors, topographic attributes, and vegetation cover as predictors. The performance of HASM_EF was compared with ordinary kriging (OK), OK, and HASM combined, respectively, with land uses and soil types (OK_LS and HASM_LS), and regression kriging combined with land uses and soil types (RK_LS). Results showed that HASM_EF obtained the lowest prediction errors and the ratio of performance to deviation (RPD) presented the relative improvements of 89.91%, 63.77%, 55.86%, and 42.14%, respectively, compared to the other four methods. Furthermore, HASM_EF generated more details and more realistic spatial information on SOC. The improved performance of HASM_EF can be attributed to the introduction of more environmental factors, to explicit consideration of the multicollinearity of selected factors and the spatial nonstationarity and nonlinearity of relationships between SOC and selected factors, and to the performance of HASM and ANN. This method may play a useful tool in providing more precise spatial information on soil parameters for global modeling across large areas.

  12. Using simple artificial intelligence methods for predicting amyloidogenesis in antibodies

    PubMed Central

    2010-01-01

    Background All polypeptide backbones have the potential to form amyloid fibrils, which are associated with a number of degenerative disorders. However, the likelihood that amyloidosis would actually occur under physiological conditions depends largely on the amino acid composition of a protein. We explore using a naive Bayesian classifier and a weighted decision tree for predicting the amyloidogenicity of immunoglobulin sequences. Results The average accuracy based on leave-one-out (LOO) cross validation of a Bayesian classifier generated from 143 amyloidogenic sequences is 60.84%. This is consistent with the average accuracy of 61.15% for a holdout test set comprised of 103 AM and 28 non-amyloidogenic sequences. The LOO cross validation accuracy increases to 81.08% when the training set is augmented by the holdout test set. In comparison, the average classification accuracy for the holdout test set obtained using a decision tree is 78.64%. Non-amyloidogenic sequences are predicted with average LOO cross validation accuracies between 74.05% and 77.24% using the Bayesian classifier, depending on the training set size. The accuracy for the holdout test set was 89%. For the decision tree, the non-amyloidogenic prediction accuracy is 75.00%. Conclusions This exploratory study indicates that both classification methods may be promising in providing straightforward predictions on the amyloidogenicity of a sequence. Nevertheless, the number of available sequences that satisfy the premises of this study are limited, and are consequently smaller than the ideal training set size. Increasing the size of the training set clearly increases the accuracy, and the expansion of the training set to include not only more derivatives, but more alignments, would make the method more sound. The accuracy of the classifiers may also be improved when additional factors, such as structural and physico-chemical data, are considered. The development of this type of classifier has significant applications in evaluating engineered antibodies, and may be adapted for evaluating engineered proteins in general. PMID:20144194

  13. Analysis of near infrared spectra for age-grading of wild populations of Anopheles gambiae.

    PubMed

    Krajacich, Benjamin J; Meyers, Jacob I; Alout, Haoues; Dabiré, Roch K; Dowell, Floyd E; Foy, Brian D

    2017-11-07

    Understanding the age-structure of mosquito populations, especially malaria vectors such as Anopheles gambiae, is important for assessing the risk of infectious mosquitoes, and how vector control interventions may impact this risk. The use of near-infrared spectroscopy (NIRS) for age-grading has been demonstrated previously on laboratory and semi-field mosquitoes, but to date has not been utilized on wild-caught mosquitoes whose age is externally validated via parity status or parasite infection stage. In this study, we developed regression and classification models using NIRS on datasets of wild An. gambiae (s.l.) reared from larvae collected from the field in Burkina Faso, and two laboratory strains. We compared the accuracy of these models for predicting the ages of wild-caught mosquitoes that had been scored for their parity status as well as for positivity for Plasmodium sporozoites. Regression models utilizing variable selection increased predictive accuracy over the more common full-spectrum partial least squares (PLS) approach for cross-validation of the datasets, validation, and independent test sets. Models produced from datasets that included the greatest range of mosquito samples (i.e. different sampling locations and times) had the highest predictive accuracy on independent testing sets, though overall accuracy on these samples was low. For classification, we found that intramodel accuracy ranged between 73.5-97.0% for grouping of mosquitoes into "early" and "late" age classes, with the highest prediction accuracy found in laboratory colonized mosquitoes. However, this accuracy was decreased on test sets, with the highest classification of an independent set of wild-caught larvae reared to set ages being 69.6%. Variation in NIRS data, likely from dietary, genetic, and other factors limits the accuracy of this technique with wild-caught mosquitoes. Alternative algorithms may help improve prediction accuracy, but care should be taken to either maximize variety in models or minimize confounders.

  14. Using simple artificial intelligence methods for predicting amyloidogenesis in antibodies.

    PubMed

    David, Maria Pamela C; Concepcion, Gisela P; Padlan, Eduardo A

    2010-02-08

    All polypeptide backbones have the potential to form amyloid fibrils, which are associated with a number of degenerative disorders. However, the likelihood that amyloidosis would actually occur under physiological conditions depends largely on the amino acid composition of a protein. We explore using a naive Bayesian classifier and a weighted decision tree for predicting the amyloidogenicity of immunoglobulin sequences. The average accuracy based on leave-one-out (LOO) cross validation of a Bayesian classifier generated from 143 amyloidogenic sequences is 60.84%. This is consistent with the average accuracy of 61.15% for a holdout test set comprised of 103 AM and 28 non-amyloidogenic sequences. The LOO cross validation accuracy increases to 81.08% when the training set is augmented by the holdout test set. In comparison, the average classification accuracy for the holdout test set obtained using a decision tree is 78.64%. Non-amyloidogenic sequences are predicted with average LOO cross validation accuracies between 74.05% and 77.24% using the Bayesian classifier, depending on the training set size. The accuracy for the holdout test set was 89%. For the decision tree, the non-amyloidogenic prediction accuracy is 75.00%. This exploratory study indicates that both classification methods may be promising in providing straightforward predictions on the amyloidogenicity of a sequence. Nevertheless, the number of available sequences that satisfy the premises of this study are limited, and are consequently smaller than the ideal training set size. Increasing the size of the training set clearly increases the accuracy, and the expansion of the training set to include not only more derivatives, but more alignments, would make the method more sound. The accuracy of the classifiers may also be improved when additional factors, such as structural and physico-chemical data, are considered. The development of this type of classifier has significant applications in evaluating engineered antibodies, and may be adapted for evaluating engineered proteins in general.

  15. Regional Seismic Travel-Time Prediction, Uncertainty, and Location Improvement in Western Eurasia

    NASA Astrophysics Data System (ADS)

    Flanagan, M. P.; Myers, S. C.

    2004-12-01

    We investigate our ability to improve regional travel-time prediction and seismic event location using an a priori, three-dimensional velocity model of Western Eurasia and North Africa: WENA1.0 [Pasyanos et al., 2004]. Our objective is to improve the accuracy of seismic location estimates and calculate representative location uncertainty estimates. As we focus on the geographic region of Western Eurasia, the Middle East, and North Africa, we develop, test, and validate 3D model-based travel-time prediction models for 30 stations in the study region. Three principal results are presented. First, the 3D WENA1.0 velocity model improves travel-time prediction over the iasp91 model, as measured by variance reduction, for regional Pg, Pn, and P phases recorded at the 30 stations. Second, a distance-dependent uncertainty model is developed and tested for the WENA1.0 model. Third, an end-to-end validation test based on 500 event relocations demonstrates improved location performance over the 1-dimensional iasp91 model. Validation of the 3D model is based on a comparison of approximately 11,000 Pg, Pn, and P travel-time predictions and empirical observations from ground truth (GT) events. Ray coverage for the validation dataset is chosen to provide representative, regional-distance sampling across Eurasia and North Africa. The WENA1.0 model markedly improves travel-time predictions for most stations with an average variance reduction of 25% for all ray paths. We find that improvement is station dependent, with some stations benefiting greatly from WENA1.0 predictions (52% at APA, 33% at BKR, and 32% at NIL), some stations showing moderate improvement (12% at KEV, 14% at BOM, and 12% at TAM), some benefiting only slightly (6% at MOX, and 4% at SVE), and some are degraded (-6% at MLR and -18% at QUE). We further test WENA1.0 by comparing location accuracy with results obtained using the iasp91 model. Again, relocation of these events is dependent on ray paths that evenly sample WENA1.0 and therefore provide an unbiased assessment of location performance. A statistically significant sample is achieved by generating 500 location realizations based on 5 events with location accuracy between 1 km and 5 km. Each realization is a randomly selected event with location determined by randomly selecting 5 stations from the available network. In 340 cases (68% of the instances), locations are improved, and average mislocation is reduced from 31 km to 26 km. Preliminary test of uncertainty estimates suggest that our uncertainty model produces location uncertainty ellipses that are representative of location accuracy. These results highlight the importance of accurate GT datasets in assessing regional travel-time models and demonstrate that an a priori 3D model can markedly improve our ability to locate small magnitude events in a regional monitoring context. This work was performed under the auspices of the U.S. Department of Energy by the University of California Lawrence Livermore National Laboratory under contract No. W-7405-Eng-48, Contribution UCRL-CONF-206386.

  16. Improving prediction accuracy of cooling load using EMD, PSR and RBFNN

    NASA Astrophysics Data System (ADS)

    Shen, Limin; Wen, Yuanmei; Li, Xiaohong

    2017-08-01

    To increase the accuracy for the prediction of cooling load demand, this work presents an EMD (empirical mode decomposition)-PSR (phase space reconstruction) based RBFNN (radial basis function neural networks) method. Firstly, analyzed the chaotic nature of the real cooling load demand, transformed the non-stationary cooling load historical data into several stationary intrinsic mode functions (IMFs) by using EMD. Secondly, compared the RBFNN prediction accuracies of each IMFs and proposed an IMF combining scheme that is combine the lower-frequency components (called IMF4-IMF6 combined) while keep the higher frequency component (IMF1, IMF2, IMF3) and the residual unchanged. Thirdly, reconstruct phase space for each combined components separately, process the highest frequency component (IMF1) by differential method and predict with RBFNN in the reconstructed phase spaces. Real cooling load data of a centralized ice storage cooling systems in Guangzhou are used for simulation. The results show that the proposed hybrid method outperforms the traditional methods.

  17. Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields.

    PubMed

    Wang, Sheng; Peng, Jian; Ma, Jianzhu; Xu, Jinbo

    2016-01-11

    Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.

  18. Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields

    NASA Astrophysics Data System (ADS)

    Wang, Sheng; Peng, Jian; Ma, Jianzhu; Xu, Jinbo

    2016-01-01

    Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.

  19. Using Deep Learning for Compound Selectivity Prediction.

    PubMed

    Zhang, Ruisheng; Li, Juan; Lu, Jingjing; Hu, Rongjing; Yuan, Yongna; Zhao, Zhili

    2016-01-01

    Compound selectivity prediction plays an important role in identifying potential compounds that bind to the target of interest with high affinity. However, there is still short of efficient and accurate computational approaches to analyze and predict compound selectivity. In this paper, we propose two methods to improve the compound selectivity prediction. We employ an improved multitask learning method in Neural Networks (NNs), which not only incorporates both activity and selectivity for other targets, but also uses a probabilistic classifier with a logistic regression. We further improve the compound selectivity prediction by using the multitask learning method in Deep Belief Networks (DBNs) which can build a distributed representation model and improve the generalization of the shared tasks. In addition, we assign different weights to the auxiliary tasks that are related to the primary selectivity prediction task. In contrast to other related work, our methods greatly improve the accuracy of the compound selectivity prediction, in particular, using the multitask learning in DBNs with modified weights obtains the best performance.

  20. Diagnosis and prediction of periodontally compromised teeth using a deep learning-based convolutional neural network algorithm.

    PubMed

    Lee, Jae-Hong; Kim, Do-Hyung; Jeong, Seong-Nyum; Choi, Seong-Ho

    2018-04-01

    The aim of the current study was to develop a computer-assisted detection system based on a deep convolutional neural network (CNN) algorithm and to evaluate the potential usefulness and accuracy of this system for the diagnosis and prediction of periodontally compromised teeth (PCT). Combining pretrained deep CNN architecture and a self-trained network, periapical radiographic images were used to determine the optimal CNN algorithm and weights. The diagnostic and predictive accuracy, sensitivity, specificity, positive predictive value, negative predictive value, receiver operating characteristic (ROC) curve, area under the ROC curve, confusion matrix, and 95% confidence intervals (CIs) were calculated using our deep CNN algorithm, based on a Keras framework in Python. The periapical radiographic dataset was split into training (n=1,044), validation (n=348), and test (n=348) datasets. With the deep learning algorithm, the diagnostic accuracy for PCT was 81.0% for premolars and 76.7% for molars. Using 64 premolars and 64 molars that were clinically diagnosed as severe PCT, the accuracy of predicting extraction was 82.8% (95% CI, 70.1%-91.2%) for premolars and 73.4% (95% CI, 59.9%-84.0%) for molars. We demonstrated that the deep CNN algorithm was useful for assessing the diagnosis and predictability of PCT. Therefore, with further optimization of the PCT dataset and improvements in the algorithm, a computer-aided detection system can be expected to become an effective and efficient method of diagnosing and predicting PCT.

  1. Predictive display design for the vehicles with time delay in dynamic response

    NASA Astrophysics Data System (ADS)

    Efremov, A. V.; Tiaglik, M. S.; Irgaleev, I. H.; Efremov, E. V.

    2018-02-01

    The two ways for the improvement of flying qualities are considered: the predictive display (PD) and the predictive display integrated with the flight control system (FCS). The both ways allow to transforming the controlled element dynamics in the crossover frequency range, to improve the accuracy of tracking and to suppress the effect of time delay in the vehicle response too. The technique for optimization of the predictive law is applied to the landing task. The results of the mathematical modeling and experimental investigations carried out for this task are considered in the paper.

  2. ShinyGPAS: interactive genomic prediction accuracy simulator based on deterministic formulas.

    PubMed

    Morota, Gota

    2017-12-20

    Deterministic formulas for the accuracy of genomic predictions highlight the relationships among prediction accuracy and potential factors influencing prediction accuracy prior to performing computationally intensive cross-validation. Visualizing such deterministic formulas in an interactive manner may lead to a better understanding of how genetic factors control prediction accuracy. The software to simulate deterministic formulas for genomic prediction accuracy was implemented in R and encapsulated as a web-based Shiny application. Shiny genomic prediction accuracy simulator (ShinyGPAS) simulates various deterministic formulas and delivers dynamic scatter plots of prediction accuracy versus genetic factors impacting prediction accuracy, while requiring only mouse navigation in a web browser. ShinyGPAS is available at: https://chikudaisei.shinyapps.io/shinygpas/ . ShinyGPAS is a shiny-based interactive genomic prediction accuracy simulator using deterministic formulas. It can be used for interactively exploring potential factors that influence prediction accuracy in genome-enabled prediction, simulating achievable prediction accuracy prior to genotyping individuals, or supporting in-class teaching. ShinyGPAS is open source software and it is hosted online as a freely available web-based resource with an intuitive graphical user interface.

  3. Improving lung cancer prognosis assessment by incorporating synthetic minority oversampling technique and score fusion method

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yan, Shiju; Qian, Wei; Guan, Yubao

    2016-06-15

    Purpose: This study aims to investigate the potential to improve lung cancer recurrence risk prediction performance for stage I NSCLS patients by integrating oversampling, feature selection, and score fusion techniques and develop an optimal prediction model. Methods: A dataset involving 94 early stage lung cancer patients was retrospectively assembled, which includes CT images, nine clinical and biological (CB) markers, and outcome of 3-yr disease-free survival (DFS) after surgery. Among the 94 patients, 74 remained DFS and 20 had cancer recurrence. Applying a computer-aided detection scheme, tumors were segmented from the CT images and 35 quantitative image (QI) features were initiallymore » computed. Two normalized Gaussian radial basis function network (RBFN) based classifiers were built based on QI features and CB markers separately. To improve prediction performance, the authors applied a synthetic minority oversampling technique (SMOTE) and a BestFirst based feature selection method to optimize the classifiers and also tested fusion methods to combine QI and CB based prediction results. Results: Using a leave-one-case-out cross-validation (K-fold cross-validation) method, the computed areas under a receiver operating characteristic curve (AUCs) were 0.716 ± 0.071 and 0.642 ± 0.061, when using the QI and CB based classifiers, respectively. By fusion of the scores generated by the two classifiers, AUC significantly increased to 0.859 ± 0.052 (p < 0.05) with an overall prediction accuracy of 89.4%. Conclusions: This study demonstrated the feasibility of improving prediction performance by integrating SMOTE, feature selection, and score fusion techniques. Combining QI features and CB markers and performing SMOTE prior to feature selection in classifier training enabled RBFN based classifier to yield improved prediction accuracy.« less

  4. Tendon-driven continuum robot for neuroendoscopy: validation of extended kinematic mapping for hysteresis operation.

    PubMed

    Kato, Takahisa; Okumura, Ichiro; Kose, Hidekazu; Takagi, Kiyoshi; Hata, Nobuhiko

    2016-04-01

    The hysteresis operation is an outstanding issue in tendon-driven actuation--which is used in robot-assisted surgery--as it is incompatible with kinematic mapping for control and trajectory planning. Here, a new tendon-driven continuum robot, designed to fit existing neuroendoscopes, is presented with kinematic mapping for hysteresis operation. With attention to tension in tendons as a salient factor of the hysteresis operation, extended forward kinematic mapping (FKM) has been developed. In the experiment, the significance of every component in the robot for the hysteresis operation has been investigated. Moreover, the prediction accuracy of postures by the extended FKM has been determined experimentally and compared with piecewise constant curvature assumption. The tendons were the most predominant factor affecting the hysteresis operation of the robot. The extended FKM including friction in tendons predicted the postures in the hysteresis operation with improved accuracy (2.89 and 3.87 mm for the single and the antagonistic-tendons layouts, respectively). The measured accuracy was within the target value of 5 mm for planning of neuroendoscopic resection of intraventricle tumors. The friction in tendons was the most predominant factor for the hysteresis operation in the robot. The extended FKM including this factor can improve prediction accuracy of the postures in the hysteresis operation. The trajectory of the new robot can be planned within target value for the neuroendoscopic procedure by using the extended FKM.

  5. On-line analysis of algae in water by discrete three-dimensional fluorescence spectroscopy.

    PubMed

    Zhao, Nanjing; Zhang, Xiaoling; Yin, Gaofang; Yang, Ruifang; Hu, Li; Chen, Shuang; Liu, Jianguo; Liu, Wenqing

    2018-03-19

    In view of the problem of the on-line measurement of algae classification, a method of algae classification and concentration determination based on the discrete three-dimensional fluorescence spectra was studied in this work. The discrete three-dimensional fluorescence spectra of twelve common species of algae belonging to five categories were analyzed, the discrete three-dimensional standard spectra of five categories were built, and the recognition, classification and concentration prediction of algae categories were realized by the discrete three-dimensional fluorescence spectra coupled with non-negative weighted least squares linear regression analysis. The results show that similarities between discrete three-dimensional standard spectra of different categories were reduced and the accuracies of recognition, classification and concentration prediction of the algae categories were significantly improved. By comparing with that of the chlorophyll a fluorescence excitation spectra method, the recognition accuracy rate in pure samples by discrete three-dimensional fluorescence spectra is improved 1.38%, and the recovery rate and classification accuracy in pure diatom samples 34.1% and 46.8%, respectively; the recognition accuracy rate of mixed samples by discrete-three dimensional fluorescence spectra is enhanced by 26.1%, the recovery rate of mixed samples with Chlorophyta 37.8%, and the classification accuracy of mixed samples with diatoms 54.6%.

  6. Kalman/Map filtering-aided fast normalized cross correlation-based Wi-Fi fingerprinting location sensing.

    PubMed

    Sun, Yongliang; Xu, Yubin; Li, Cheng; Ma, Lin

    2013-11-13

    A Kalman/map filtering (KMF)-aided fast normalized cross correlation (FNCC)-based Wi-Fi fingerprinting location sensing system is proposed in this paper. Compared with conventional neighbor selection algorithms that calculate localization results with received signal strength (RSS) mean samples, the proposed FNCC algorithm makes use of all the on-line RSS samples and reference point RSS variations to achieve higher fingerprinting accuracy. The FNCC computes efficiently while maintaining the same accuracy as the basic normalized cross correlation. Additionally, a KMF is also proposed to process fingerprinting localization results. It employs a new map matching algorithm to nonlinearize the linear location prediction process of Kalman filtering (KF) that takes advantage of spatial proximities of consecutive localization results. With a calibration model integrated into an indoor map, the map matching algorithm corrects unreasonable prediction locations of the KF according to the building interior structure. Thus, more accurate prediction locations are obtained. Using these locations, the KMF considerably improves fingerprinting algorithm performance. Experimental results demonstrate that the FNCC algorithm with reduced computational complexity outperforms other neighbor selection algorithms and the KMF effectively improves location sensing accuracy by using indoor map information and spatial proximities of consecutive localization results.

  7. Kalman/Map Filtering-Aided Fast Normalized Cross Correlation-Based Wi-Fi Fingerprinting Location Sensing

    PubMed Central

    Sun, Yongliang; Xu, Yubin; Li, Cheng; Ma, Lin

    2013-01-01

    A Kalman/map filtering (KMF)-aided fast normalized cross correlation (FNCC)-based Wi-Fi fingerprinting location sensing system is proposed in this paper. Compared with conventional neighbor selection algorithms that calculate localization results with received signal strength (RSS) mean samples, the proposed FNCC algorithm makes use of all the on-line RSS samples and reference point RSS variations to achieve higher fingerprinting accuracy. The FNCC computes efficiently while maintaining the same accuracy as the basic normalized cross correlation. Additionally, a KMF is also proposed to process fingerprinting localization results. It employs a new map matching algorithm to nonlinearize the linear location prediction process of Kalman filtering (KF) that takes advantage of spatial proximities of consecutive localization results. With a calibration model integrated into an indoor map, the map matching algorithm corrects unreasonable prediction locations of the KF according to the building interior structure. Thus, more accurate prediction locations are obtained. Using these locations, the KMF considerably improves fingerprinting algorithm performance. Experimental results demonstrate that the FNCC algorithm with reduced computational complexity outperforms other neighbor selection algorithms and the KMF effectively improves location sensing accuracy by using indoor map information and spatial proximities of consecutive localization results. PMID:24233027

  8. Comparison of integrated clustering methods for accurate and stable prediction of building energy consumption data

    DOE PAGES

    Hsu, David

    2015-09-27

    Clustering methods are often used to model energy consumption for two reasons. First, clustering is often used to process data and to improve the predictive accuracy of subsequent energy models. Second, stable clusters that are reproducible with respect to non-essential changes can be used to group, target, and interpret observed subjects. However, it is well known that clustering methods are highly sensitive to the choice of algorithms and variables. This can lead to misleading assessments of predictive accuracy and mis-interpretation of clusters in policymaking. This paper therefore introduces two methods to the modeling of energy consumption in buildings: clusterwise regression,more » also known as latent class regression, which integrates clustering and regression simultaneously; and cluster validation methods to measure stability. Using a large dataset of multifamily buildings in New York City, clusterwise regression is compared to common two-stage algorithms that use K-means and model-based clustering with linear regression. Predictive accuracy is evaluated using 20-fold cross validation, and the stability of the perturbed clusters is measured using the Jaccard coefficient. These results show that there seems to be an inherent tradeoff between prediction accuracy and cluster stability. This paper concludes by discussing which clustering methods may be appropriate for different analytical purposes.« less

  9. Aircraft noise prediction program validation

    NASA Technical Reports Server (NTRS)

    Shivashankara, B. N.

    1980-01-01

    A modular computer program (ANOPP) for predicting aircraft flyover and sideline noise was developed. A high quality flyover noise data base for aircraft that are representative of the U.S. commercial fleet was assembled. The accuracy of ANOPP with respect to the data base was determined. The data for source and propagation effects were analyzed and suggestions for improvements to the prediction methodology are given.

  10. A comprehensive comparison of network similarities for link prediction and spurious link elimination

    NASA Astrophysics Data System (ADS)

    Zhang, Peng; Qiu, Dan; Zeng, An; Xiao, Jinghua

    2018-06-01

    Identifying missing interactions in complex networks, known as link prediction, is realized by estimating the likelihood of the existence of a link between two nodes according to the observed links and nodes' attributes. Similar approaches have also been employed to identify and remove spurious links in networks which is crucial for improving the reliability of network data. In network science, the likelihood for two nodes having a connection strongly depends on their structural similarity. The key to address these two problems thus becomes how to objectively measure the similarity between nodes in networks. In the literature, numerous network similarity metrics have been proposed and their accuracy has been discussed independently in previous works. In this paper, we systematically compare the accuracy of 18 similarity metrics in both link prediction and spurious link elimination when the observed networks are very sparse or consist of inaccurate linking information. Interestingly, some methods have high prediction accuracy, they tend to perform low accuracy in identification spurious interaction. We further find that methods can be classified into several cluster according to their behaviors. This work is useful for guiding future use of these similarity metrics for different purposes.

  11. Adjustment of regional regression models of urban-runoff quality using data for Chattanooga, Knoxville, and Nashville, Tennessee

    USGS Publications Warehouse

    Hoos, Anne B.; Patel, Anant R.

    1996-01-01

    Model-adjustment procedures were applied to the combined data bases of storm-runoff quality for Chattanooga, Knoxville, and Nashville, Tennessee, to improve predictive accuracy for storm-runoff quality for urban watersheds in these three cities and throughout Middle and East Tennessee. Data for 45 storms at 15 different sites (five sites in each city) constitute the data base. Comparison of observed values of storm-runoff load and event-mean concentration to the predicted values from the regional regression models for 10 constituents shows prediction errors, as large as 806,000 percent. Model-adjustment procedures, which combine the regional model predictions with local data, are applied to improve predictive accuracy. Standard error of estimate after model adjustment ranges from 67 to 322 percent. Calibration results may be biased due to sampling error in the Tennessee data base. The relatively large values of standard error of estimate for some of the constituent models, although representing significant reduction (at least 50 percent) in prediction error compared to estimation with unadjusted regional models, may be unacceptable for some applications. The user may wish to collect additional local data for these constituents and repeat the analysis, or calibrate an independent local regression model.

  12. Prediction of pesticide acute toxicity using two-dimensional chemical descriptors and target species classification

    EPA Science Inventory

    Previous modelling of the median lethal dose (oral rat LD50) has indicated that local class-based models yield better correlations than global models. We evaluated the hypothesis that dividing the dataset by pesticidal mechanisms would improve prediction accuracy. A linear discri...

  13. Predictive models for Escherichia coli concentrations at inland lake beaches and relationship of model variables to pathogen detection

    EPA Science Inventory

    Methods are needed improve the timeliness and accuracy of recreational water‐quality assessments. Traditional culture methods require 18–24 h to obtain results and may not reflect current conditions. Predictive models, based on environmental and water quality variables, have been...

  14. Accounting for discovery bias in genomic prediction

    USDA-ARS?s Scientific Manuscript database

    Our objective was to evaluate an approach to mitigating discovery bias in genomic prediction. Accuracy may be improved by placing greater emphasis on regions of the genome expected to be more influential on a trait. Methods emphasizing regions result in a phenomenon known as “discovery bias” if info...

  15. Improving accuracy of genomic prediction in Brangus cattle by adding animals with imputed low-density SNP genotypes.

    PubMed

    Lopes, F B; Wu, X-L; Li, H; Xu, J; Perkins, T; Genho, J; Ferretti, R; Tait, R G; Bauck, S; Rosa, G J M

    2018-02-01

    Reliable genomic prediction of breeding values for quantitative traits requires the availability of sufficient number of animals with genotypes and phenotypes in the training set. As of 31 October 2016, there were 3,797 Brangus animals with genotypes and phenotypes. These Brangus animals were genotyped using different commercial SNP chips. Of them, the largest group consisted of 1,535 animals genotyped by the GGP-LDV4 SNP chip. The remaining 2,262 genotypes were imputed to the SNP content of the GGP-LDV4 chip, so that the number of animals available for training the genomic prediction models was more than doubled. The present study showed that the pooling of animals with both original or imputed 40K SNP genotypes substantially increased genomic prediction accuracies on the ten traits. By supplementing imputed genotypes, the relative gains in genomic prediction accuracies on estimated breeding values (EBV) were from 12.60% to 31.27%, and the relative gain in genomic prediction accuracies on de-regressed EBV was slightly small (i.e. 0.87%-18.75%). The present study also compared the performance of five genomic prediction models and two cross-validation methods. The five genomic models predicted EBV and de-regressed EBV of the ten traits similarly well. Of the two cross-validation methods, leave-one-out cross-validation maximized the number of animals at the stage of training for genomic prediction. Genomic prediction accuracy (GPA) on the ten quantitative traits was validated in 1,106 newly genotyped Brangus animals based on the SNP effects estimated in the previous set of 3,797 Brangus animals, and they were slightly lower than GPA in the original data. The present study was the first to leverage currently available genotype and phenotype resources in order to harness genomic prediction in Brangus beef cattle. © 2018 Blackwell Verlag GmbH.

  16. Google Goes Cancer: Improving Outcome Prediction for Cancer Patients by Network-Based Ranking of Marker Genes

    PubMed Central

    Roy, Janine; Aust, Daniela; Knösel, Thomas; Rümmele, Petra; Jahnke, Beatrix; Hentrich, Vera; Rückert, Felix; Niedergethmann, Marco; Weichert, Wilko; Bahra, Marcus; Schlitt, Hans J.; Settmacher, Utz; Friess, Helmut; Büchler, Markus; Saeger, Hans-Detlev; Schroeder, Michael; Pilarsky, Christian; Grützmann, Robert

    2012-01-01

    Predicting the clinical outcome of cancer patients based on the expression of marker genes in their tumors has received increasing interest in the past decade. Accurate predictors of outcome and response to therapy could be used to personalize and thereby improve therapy. However, state of the art methods used so far often found marker genes with limited prediction accuracy, limited reproducibility, and unclear biological relevance. To address this problem, we developed a novel computational approach to identify genes prognostic for outcome that couples gene expression measurements from primary tumor samples with a network of known relationships between the genes. Our approach ranks genes according to their prognostic relevance using both expression and network information in a manner similar to Google's PageRank. We applied this method to gene expression profiles which we obtained from 30 patients with pancreatic cancer, and identified seven candidate marker genes prognostic for outcome. Compared to genes found with state of the art methods, such as Pearson correlation of gene expression with survival time, we improve the prediction accuracy by up to 7%. Accuracies were assessed using support vector machine classifiers and Monte Carlo cross-validation. We then validated the prognostic value of our seven candidate markers using immunohistochemistry on an independent set of 412 pancreatic cancer samples. Notably, signatures derived from our candidate markers were independently predictive of outcome and superior to established clinical prognostic factors such as grade, tumor size, and nodal status. As the amount of genomic data of individual tumors grows rapidly, our algorithm meets the need for powerful computational approaches that are key to exploit these data for personalized cancer therapies in clinical practice. PMID:22615549

  17. Response of data-driven artificial neural network-based TEC models to neutral wind for different locations, seasons, and solar activity levels from the Indian longitude sector

    NASA Astrophysics Data System (ADS)

    Sur, D.; Haldar, S.; Ray, S.; Paul, A.

    2017-07-01

    The perturbations imposed on transionospheric signals by the ionosphere are a major concern for navigation. The dynamic nature of the ionosphere in the low-latitude equatorial region and the Indian longitude sector has some specific characteristics such as sharp temporal and latitudinal variation of total electron content (TEC). TEC in the Indian longitude sector also undergoes seasonal variations. The large magnitude and sharp variation of TEC cause large and variable range errors for satellite-based navigation system such as Global Positioning System (GPS) throughout the day. For accurate navigation using satellite-based augmentation systems, proper prediction of TEC under certain geophysical conditions is necessary in the equatorial region. It has been reported in the literature that prediction accuracy of TEC has been improved using measured data-driven artificial neural network (ANN)-based vertical TEC (VTEC) models, compared to standard ionospheric models. A set of observations carried out in the Indian longitude sector have been reported in this paper in order to find the amount of improvement in performance accuracy of an ANN-based VTEC model after incorporation of neutral wind as model input. The variations of this improvement in prediction accuracy with respect to latitude, longitude, season, and solar activity have also been reported in this paper.

  18. The RAPIDD ebola forecasting challenge: Synthesis and lessons learnt.

    PubMed

    Viboud, Cécile; Sun, Kaiyuan; Gaffey, Robert; Ajelli, Marco; Fumanelli, Laura; Merler, Stefano; Zhang, Qian; Chowell, Gerardo; Simonsen, Lone; Vespignani, Alessandro

    2018-03-01

    Infectious disease forecasting is gaining traction in the public health community; however, limited systematic comparisons of model performance exist. Here we present the results of a synthetic forecasting challenge inspired by the West African Ebola crisis in 2014-2015 and involving 16 international academic teams and US government agencies, and compare the predictive performance of 8 independent modeling approaches. Challenge participants were invited to predict 140 epidemiological targets across 5 different time points of 4 synthetic Ebola outbreaks, each involving different levels of interventions and "fog of war" in outbreak data made available for predictions. Prediction targets included 1-4 week-ahead case incidences, outbreak size, peak timing, and several natural history parameters. With respect to weekly case incidence targets, ensemble predictions based on a Bayesian average of the 8 participating models outperformed any individual model and did substantially better than a null auto-regressive model. There was no relationship between model complexity and prediction accuracy; however, the top performing models for short-term weekly incidence were reactive models with few parameters, fitted to a short and recent part of the outbreak. Individual model outputs and ensemble predictions improved with data accuracy and availability; by the second time point, just before the peak of the epidemic, estimates of final size were within 20% of the target. The 4th challenge scenario - mirroring an uncontrolled Ebola outbreak with substantial data reporting noise - was poorly predicted by all modeling teams. Overall, this synthetic forecasting challenge provided a deep understanding of model performance under controlled data and epidemiological conditions. We recommend such "peace time" forecasting challenges as key elements to improve coordination and inspire collaboration between modeling groups ahead of the next pandemic threat, and to assess model forecasting accuracy for a variety of known and hypothetical pathogens. Published by Elsevier B.V.

  19. Systematic Calibration for Ultra-High Accuracy Inertial Measurement Units.

    PubMed

    Cai, Qingzhong; Yang, Gongliu; Song, Ningfang; Liu, Yiliang

    2016-06-22

    An inertial navigation system (INS) has been widely used in challenging GPS environments. With the rapid development of modern physics, an atomic gyroscope will come into use in the near future with a predicted accuracy of 5 × 10(-6)°/h or better. However, existing calibration methods and devices can not satisfy the accuracy requirements of future ultra-high accuracy inertial sensors. In this paper, an improved calibration model is established by introducing gyro g-sensitivity errors, accelerometer cross-coupling errors and lever arm errors. A systematic calibration method is proposed based on a 51-state Kalman filter and smoother. Simulation results show that the proposed calibration method can realize the estimation of all the parameters using a common dual-axis turntable. Laboratory and sailing tests prove that the position accuracy in a five-day inertial navigation can be improved about 8% by the proposed calibration method. The accuracy can be improved at least 20% when the position accuracy of the atomic gyro INS can reach a level of 0.1 nautical miles/5 d. Compared with the existing calibration methods, the proposed method, with more error sources and high order small error parameters calibrated for ultra-high accuracy inertial measurement units (IMUs) using common turntables, has a great application potential in future atomic gyro INSs.

  20. Clinical time series prediction: towards a hierarchical dynamical system framework

    PubMed Central

    Liu, Zitao; Hauskrecht, Milos

    2014-01-01

    Objective Developing machine learning and data mining algorithms for building temporal models of clinical time series is important for understanding of the patient condition, the dynamics of a disease, effect of various patient management interventions and clinical decision making. In this work, we propose and develop a novel hierarchical framework for modeling clinical time series data of varied length and with irregularly sampled observations. Materials and methods Our hierarchical dynamical system framework for modeling clinical time series combines advantages of the two temporal modeling approaches: the linear dynamical system and the Gaussian process. We model the irregularly sampled clinical time series by using multiple Gaussian process sequences in the lower level of our hierarchical framework and capture the transitions between Gaussian processes by utilizing the linear dynamical system. The experiments are conducted on the complete blood count (CBC) panel data of 1000 post-surgical cardiac patients during their hospitalization. Our framework is evaluated and compared to multiple baseline approaches in terms of the mean absolute prediction error and the absolute percentage error. Results We tested our framework by first learning the time series model from data for the patient in the training set, and then applying the model in order to predict future time series values on the patients in the test set. We show that our model outperforms multiple existing models in terms of its predictive accuracy. Our method achieved a 3.13% average prediction accuracy improvement on ten CBC lab time series when it was compared against the best performing baseline. A 5.25% average accuracy improvement was observed when only short-term predictions were considered. Conclusion A new hierarchical dynamical system framework that lets us model irregularly sampled time series data is a promising new direction for modeling clinical time series and for improving their predictive performance. PMID:25534671

  1. Failure prediction using machine learning and time series in optical network.

    PubMed

    Wang, Zhilong; Zhang, Min; Wang, Danshi; Song, Chuang; Liu, Min; Li, Jin; Lou, Liqi; Liu, Zhuo

    2017-08-07

    In this paper, we propose a performance monitoring and failure prediction method in optical networks based on machine learning. The primary algorithms of this method are the support vector machine (SVM) and double exponential smoothing (DES). With a focus on risk-aware models in optical networks, the proposed protection plan primarily investigates how to predict the risk of an equipment failure. To the best of our knowledge, this important problem has not yet been fully considered. Experimental results showed that the average prediction accuracy of our method was 95% when predicting the optical equipment failure state. This finding means that our method can forecast an equipment failure risk with high accuracy. Therefore, our proposed DES-SVM method can effectively improve traditional risk-aware models to protect services from possible failures and enhance the optical network stability.

  2. Prediction of stock markets by the evolutionary mix-game model

    NASA Astrophysics Data System (ADS)

    Chen, Fang; Gou, Chengling; Guo, Xiaoqian; Gao, Jieping

    2008-06-01

    This paper presents the efforts of using the evolutionary mix-game model, which is a modified form of the agent-based mix-game model, to predict financial time series. Here, we have carried out three methods to improve the original mix-game model by adding the abilities of strategy evolution to agents, and then applying the new model referred to as the evolutionary mix-game model to forecast the Shanghai Stock Exchange Composite Index. The results show that these modifications can improve the accuracy of prediction greatly when proper parameters are chosen.

  3. Single-Step BLUP with Varying Genotyping Effort in Open-Pollinated Picea glauca.

    PubMed

    Ratcliffe, Blaise; El-Dien, Omnia Gamal; Cappa, Eduardo P; Porth, Ilga; Klápště, Jaroslav; Chen, Charles; El-Kassaby, Yousry A

    2017-03-10

    Maximization of genetic gain in forest tree breeding programs is contingent on the accuracy of the predicted breeding values and precision of the estimated genetic parameters. We investigated the effect of the combined use of contemporary pedigree information and genomic relatedness estimates on the accuracy of predicted breeding values and precision of estimated genetic parameters, as well as rankings of selection candidates, using single-step genomic evaluation (HBLUP). In this study, two traits with diverse heritabilities [tree height (HT) and wood density (WD)] were assessed at various levels of family genotyping efforts (0, 25, 50, 75, and 100%) from a population of white spruce ( Picea glauca ) consisting of 1694 trees from 214 open-pollinated families, representing 43 provenances in Québec, Canada. The results revealed that HBLUP bivariate analysis is effective in reducing the known bias in heritability estimates of open-pollinated populations, as it exposes hidden relatedness, potential pedigree errors, and inbreeding. The addition of genomic information in the analysis considerably improved the accuracy in breeding value estimates by accounting for both Mendelian sampling and historical coancestry that were not captured by the contemporary pedigree alone. Increasing family genotyping efforts were associated with continuous improvement in model fit, precision of genetic parameters, and breeding value accuracy. Yet, improvements were observed even at minimal genotyping effort, indicating that even modest genotyping effort is effective in improving genetic evaluation. The combined utilization of both pedigree and genomic information may be a cost-effective approach to increase the accuracy of breeding values in forest tree breeding programs where shallow pedigrees and large testing populations are the norm. Copyright © 2017 Ratcliffe et al.

  4. One-dimensional soil temperature assimilation experiment based on unscented particle filter and Common Land Model

    NASA Astrophysics Data System (ADS)

    Fu, Xiao Lei; Jin, Bao Ming; Jiang, Xiao Lei; Chen, Cheng

    2018-06-01

    Data assimilation is an efficient way to improve the simulation/prediction accuracy in many fields of geosciences especially in meteorological and hydrological applications. This study takes unscented particle filter (UPF) as an example to test its performance at different two probability distribution, Gaussian and Uniform distributions with two different assimilation frequencies experiments (1) assimilating hourly in situ soil surface temperature, (2) assimilating the original Moderate Resolution Imaging Spectroradiometer (MODIS) Land Surface Temperature (LST) once per day. The numerical experiment results show that the filter performs better when increasing the assimilation frequency. In addition, UPF is efficient for improving the soil variables (e.g., soil temperature) simulation/prediction accuracy, though it is not sensitive to the probability distribution for observation error in soil temperature assimilation.

  5. Assessing the accuracy of improved force-matched water models derived from Ab initio molecular dynamics simulations.

    PubMed

    Köster, Andreas; Spura, Thomas; Rutkai, Gábor; Kessler, Jan; Wiebeler, Hendrik; Vrabec, Jadran; Kühne, Thomas D

    2016-07-15

    The accuracy of water models derived from ab initio molecular dynamics simulations by means on an improved force-matching scheme is assessed for various thermodynamic, transport, and structural properties. It is found that although the resulting force-matched water models are typically less accurate than fully empirical force fields in predicting thermodynamic properties, they are nevertheless much more accurate than generally appreciated in reproducing the structure of liquid water and in fact superseding most of the commonly used empirical water models. This development demonstrates the feasibility to routinely parametrize computationally efficient yet predictive potential energy functions based on accurate ab initio molecular dynamics simulations for a large variety of different systems. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  6. Applicability of linear regression equation for prediction of chlorophyll content in rice leaves

    NASA Astrophysics Data System (ADS)

    Li, Yunmei

    2005-09-01

    A modeling approach is used to assess the applicability of the derived equations which are capable to predict chlorophyll content of rice leaves at a given view direction. Two radiative transfer models, including PROSPECT model operated at leaf level and FCR model operated at canopy level, are used in the study. The study is consisted of three steps: (1) Simulation of bidirectional reflectance from canopy with different leaf chlorophyll contents, leaf-area-index (LAI) and under storey configurations; (2) Establishment of prediction relations of chlorophyll content by stepwise regression; and (3) Assessment of the applicability of these relations. The result shows that the accuracy of prediction is affected by different under storey configurations and, however, the accuracy tends to be greatly improved with increase of LAI.

  7. Neural networks for dimensionality reduction of fluorescence spectra and prediction of drinking water disinfection by-products.

    PubMed

    Peleato, Nicolas M; Legge, Raymond L; Andrews, Robert C

    2018-06-01

    The use of fluorescence data coupled with neural networks for improved predictability of drinking water disinfection by-products (DBPs) was investigated. Novel application of autoencoders to process high-dimensional fluorescence data was related to common dimensionality reduction techniques of parallel factors analysis (PARAFAC) and principal component analysis (PCA). The proposed method was assessed based on component interpretability as well as for prediction of organic matter reactivity to formation of DBPs. Optimal prediction accuracies on a validation dataset were observed with an autoencoder-neural network approach or by utilizing the full spectrum without pre-processing. Latent representation by an autoencoder appeared to mitigate overfitting when compared to other methods. Although DBP prediction error was minimized by other pre-processing techniques, PARAFAC yielded interpretable components which resemble fluorescence expected from individual organic fluorophores. Through analysis of the network weights, fluorescence regions associated with DBP formation can be identified, representing a potential method to distinguish reactivity between fluorophore groupings. However, distinct results due to the applied dimensionality reduction approaches were observed, dictating a need for considering the role of data pre-processing in the interpretability of the results. In comparison to common organic measures currently used for DBP formation prediction, fluorescence was shown to improve prediction accuracies, with improvements to DBP prediction best realized when appropriate pre-processing and regression techniques were applied. The results of this study show promise for the potential application of neural networks to best utilize fluorescence EEM data for prediction of organic matter reactivity. Copyright © 2018 Elsevier Ltd. All rights reserved.

  8. Predicting Individuals' Learning Success from Patterns of Pre-Learning MRI Activity

    PubMed Central

    Vo, Loan T. K.; Walther, Dirk B.; Kramer, Arthur F.; Erickson, Kirk I.; Boot, Walter R.; Voss, Michelle W.; Prakash, Ruchika S.; Lee, Hyunkyu; Fabiani, Monica; Gratton, Gabriele; Simons, Daniel J.; Sutton, Bradley P.; Wang, Michelle Y.

    2011-01-01

    Performance in most complex cognitive and psychomotor tasks improves with training, yet the extent of improvement varies among individuals. Is it possible to forecast the benefit that a person might reap from training? Several behavioral measures have been used to predict individual differences in task improvement, but their predictive power is limited. Here we show that individual differences in patterns of time-averaged T2*-weighted MRI images in the dorsal striatum recorded at the initial stage of training predict subsequent learning success in a complex video game with high accuracy. These predictions explained more than half of the variance in learning success among individuals, suggesting that individual differences in neuroanatomy or persistent physiology predict whether and to what extent people will benefit from training in a complex task. Surprisingly, predictions from white matter were highly accurate, while voxels in the gray matter of the dorsal striatum did not contain any information about future training success. Prediction accuracy was higher in the anterior than the posterior half of the dorsal striatum. The link between trainability and the time-averaged T2*-weighted signal in the dorsal striatum reaffirms the role of this part of the basal ganglia in learning and executive functions, such as task-switching and task coordination processes. The ability to predict who will benefit from training by using neuroimaging data collected in the early training phase may have far-reaching implications for the assessment of candidates for specific training programs as well as the study of populations that show deficiencies in learning new skills. PMID:21264257

  9. Does NASA SMAP Improve the Accuracy of Power Outage Models?

    NASA Astrophysics Data System (ADS)

    Quiring, S. M.; McRoberts, D. B.; Toy, B.; Alvarado, B.

    2016-12-01

    Electric power utilities make critical decisions in the days prior to hurricane landfall that are primarily based on the estimated impact to their service area. For example, utilities must determine how many repair crews to request from other utilities, the amount of material and equipment they will need to make repairs, and where in their geographically expansive service area to station crews and materials. Accurate forecasts of the impact of an approaching hurricane within their service area are critical for utilities in balancing the costs and benefits of different levels of resources. The Hurricane Outage Prediction Model (HOPM) are a family of statistical models that utilize predictions of tropical cyclone windspeed and duration of strong winds, along with power system and environmental variables (e.g., soil moisture, long-term precipitation), to forecast the number and location of power outages. This project assesses whether using NASA SMAP soil moisture improves the accuracy of power outage forecasts as compared to using model-derived soil moisture from NLDAS-2. A sensitivity analysis is employed since there have been very few tropical cyclones making landfall in the United States since SMAP was launched. The HOPM is used to predict power outages for 13 historical tropical cyclones and the model is run using twice, once with NLDAS soil moisture and once with SMAP soil moisture. Our results demonstrate that using SMAP soil moisture can have a significant impact on power outage predictions. SMAP has the potential to enhance the accuracy of power outage forecasts. Improved outage forecasts reduce the duration of power outages which reduces economic losses and accelerates recovery.

  10. DNABP: Identification of DNA-Binding Proteins Based on Feature Selection Using a Random Forest and Predicting Binding Residues.

    PubMed

    Ma, Xin; Guo, Jing; Sun, Xiao

    2016-01-01

    DNA-binding proteins are fundamentally important in cellular processes. Several computational-based methods have been developed to improve the prediction of DNA-binding proteins in previous years. However, insufficient work has been done on the prediction of DNA-binding proteins from protein sequence information. In this paper, a novel predictor, DNABP (DNA-binding proteins), was designed to predict DNA-binding proteins using the random forest (RF) classifier with a hybrid feature. The hybrid feature contains two types of novel sequence features, which reflect information about the conservation of physicochemical properties of the amino acids, and the binding propensity of DNA-binding residues and non-binding propensities of non-binding residues. The comparisons with each feature demonstrated that these two novel features contributed most to the improvement in predictive ability. Furthermore, to improve the prediction performance of the DNABP model, feature selection using the minimum redundancy maximum relevance (mRMR) method combined with incremental feature selection (IFS) was carried out during the model construction. The results showed that the DNABP model could achieve 86.90% accuracy, 83.76% sensitivity, 90.03% specificity and a Matthews correlation coefficient of 0.727. High prediction accuracy and performance comparisons with previous research suggested that DNABP could be a useful approach to identify DNA-binding proteins from sequence information. The DNABP web server system is freely available at http://www.cbi.seu.edu.cn/DNABP/.

  11. Improving the accuracy of protein stability predictions with multistate design using a variety of backbone ensembles.

    PubMed

    Davey, James A; Chica, Roberto A

    2014-05-01

    Multistate computational protein design (MSD) with backbone ensembles approximating conformational flexibility can predict higher quality sequences than single-state design with a single fixed backbone. However, it is currently unclear what characteristics of backbone ensembles are required for the accurate prediction of protein sequence stability. In this study, we aimed to improve the accuracy of protein stability predictions made with MSD by using a variety of backbone ensembles to recapitulate the experimentally measured stability of 85 Streptococcal protein G domain β1 sequences. Ensembles tested here include an NMR ensemble as well as those generated by molecular dynamics (MD) simulations, by Backrub motions, and by PertMin, a new method that we developed involving the perturbation of atomic coordinates followed by energy minimization. MSD with the PertMin ensembles resulted in the most accurate predictions by providing the highest number of stable sequences in the top 25, and by correctly binning sequences as stable or unstable with the highest success rate (≈90%) and the lowest number of false positives. The performance of PertMin ensembles is due to the fact that their members closely resemble the input crystal structure and have low potential energy. Conversely, the NMR ensemble as well as those generated by MD simulations at 500 or 1000 K reduced prediction accuracy due to their low structural similarity to the crystal structure. The ensembles tested herein thus represent on- or off-target models of the native protein fold and could be used in future studies to design for desired properties other than stability. Copyright © 2013 Wiley Periodicals, Inc.

  12. Accuracy of the Surgeons' Clinical Prediction of Postoperative Major Complications Using a Visual Analog Scale.

    PubMed

    Woodfield, John C; Sagar, Peter M; Thekkinkattil, Dinesh K; Gogu, Praveen; Plank, Lindsay D; Burke, Dermot

    2017-01-01

    Although the risk factors that contribute to postoperative complications are well recognized, prediction in the context of a particular patient is more difficult. We were interested in using a visual analog scale (VAS) to capture surgeons' prediction of the risk of a major complication and to examine whether this could be improved. The study was performed in 3 stages. In phase I, the surgeon assessed the risk of a major complication on a 100-mm VAS immediately before and after surgery. A quality control questionnaire was designed to check if the VAS was being scored as a linear scale. In phase II, a VAS with 6 subscales for different areas of clinical risk was introduced. In phase III, predictions were completed following the presentation of detailed feedback on the accuracy of prediction of complications. In total, 1295 predictions were made by 58 surgeons in 859 patients. Eight surgeons did not use a linear scale (6 logarithmic, 2 used 4 categories of risk). Surgeons made a meaningful prediction of major complications (preoperative median score 40 mm for complications v. 22 mm for no complication, P < 0.001; postoperative 46 mm v. 21 mm, P < 0.001). In phase I, the discrimination of prediction for preoperative (0.778), postoperative (0.810), and POSSUM (Physiological and Operative Severity Score for the Enumeration of Mortality and Morbidity) morbidity (0.750) prediction was similar. Although there was no improvement in prediction with a multidimensional VAS, there was a significant improvement in the discrimination of prediction after feedback (preoperative, 0.895; postoperative, 0.918). Awareness of different ways a VAS is scored is important when designing and interpreting studies. Clinical assessment of major complications by the surgeon was initially comparable to the prediction of the POSSUM morbidity score and improved significantly following the presentation of clinically relevant feedback. © The Author(s) 2016.

  13. Efficient use of unlabeled data for protein sequence classification: a comparative study

    PubMed Central

    Kuksa, Pavel; Huang, Pai-Hsi; Pavlovic, Vladimir

    2009-01-01

    Background Recent studies in computational primary protein sequence analysis have leveraged the power of unlabeled data. For example, predictive models based on string kernels trained on sequences known to belong to particular folds or superfamilies, the so-called labeled data set, can attain significantly improved accuracy if this data is supplemented with protein sequences that lack any class tags–the unlabeled data. In this study, we present a principled and biologically motivated computational framework that more effectively exploits the unlabeled data by only using the sequence regions that are more likely to be biologically relevant for better prediction accuracy. As overly-represented sequences in large uncurated databases may bias the estimation of computational models that rely on unlabeled data, we also propose a method to remove this bias and improve performance of the resulting classifiers. Results Combined with state-of-the-art string kernels, our proposed computational framework achieves very accurate semi-supervised protein remote fold and homology detection on three large unlabeled databases. It outperforms current state-of-the-art methods and exhibits significant reduction in running time. Conclusion The unlabeled sequences used under the semi-supervised setting resemble the unpolished gemstones; when used as-is, they may carry unnecessary features and hence compromise the classification accuracy but once cut and polished, they improve the accuracy of the classifiers considerably. PMID:19426450

  14. Feature Extraction of Electronic Nose Signals Using QPSO-Based Multiple KFDA Signal Processing

    PubMed Central

    Wen, Tailai; Huang, Daoyu; Lu, Kun; Deng, Changjian; Zeng, Tanyue; Yu, Song; He, Zhiyi

    2018-01-01

    The aim of this research was to enhance the classification accuracy of an electronic nose (E-nose) in different detecting applications. During the learning process of the E-nose to predict the types of different odors, the prediction accuracy was not quite satisfying because the raw features extracted from sensors’ responses were regarded as the input of a classifier without any feature extraction processing. Therefore, in order to obtain more useful information and improve the E-nose’s classification accuracy, in this paper, a Weighted Kernels Fisher Discriminant Analysis (WKFDA) combined with Quantum-behaved Particle Swarm Optimization (QPSO), i.e., QWKFDA, was presented to reprocess the original feature matrix. In addition, we have also compared the proposed method with quite a few previously existing ones including Principal Component Analysis (PCA), Locality Preserving Projections (LPP), Fisher Discriminant Analysis (FDA) and Kernels Fisher Discriminant Analysis (KFDA). Experimental results proved that QWKFDA is an effective feature extraction method for E-nose in predicting the types of wound infection and inflammable gases, which shared much higher classification accuracy than those of the contrast methods. PMID:29382146

  15. Feature Extraction of Electronic Nose Signals Using QPSO-Based Multiple KFDA Signal Processing.

    PubMed

    Wen, Tailai; Yan, Jia; Huang, Daoyu; Lu, Kun; Deng, Changjian; Zeng, Tanyue; Yu, Song; He, Zhiyi

    2018-01-29

    The aim of this research was to enhance the classification accuracy of an electronic nose (E-nose) in different detecting applications. During the learning process of the E-nose to predict the types of different odors, the prediction accuracy was not quite satisfying because the raw features extracted from sensors' responses were regarded as the input of a classifier without any feature extraction processing. Therefore, in order to obtain more useful information and improve the E-nose's classification accuracy, in this paper, a Weighted Kernels Fisher Discriminant Analysis (WKFDA) combined with Quantum-behaved Particle Swarm Optimization (QPSO), i.e., QWKFDA, was presented to reprocess the original feature matrix. In addition, we have also compared the proposed method with quite a few previously existing ones including Principal Component Analysis (PCA), Locality Preserving Projections (LPP), Fisher Discriminant Analysis (FDA) and Kernels Fisher Discriminant Analysis (KFDA). Experimental results proved that QWKFDA is an effective feature extraction method for E-nose in predicting the types of wound infection and inflammable gases, which shared much higher classification accuracy than those of the contrast methods.

  16. Reduced Fragment Diversity for Alpha and Alpha-Beta Protein Structure Prediction using Rosetta.

    PubMed

    Abbass, Jad; Nebel, Jean-Christophe

    2017-01-01

    Protein structure prediction is considered a main challenge in computational biology. The biannual international competition, Critical Assessment of protein Structure Prediction (CASP), has shown in its eleventh experiment that free modelling target predictions are still beyond reliable accuracy, therefore, much effort should be made to improve ab initio methods. Arguably, Rosetta is considered as the most competitive method when it comes to targets with no homologues. Relying on fragments of length 9 and 3 from known structures, Rosetta creates putative structures by assembling candidate fragments. Generally, the structure with the lowest energy score, also known as first model, is chosen to be the "predicted one". A thorough study has been conducted on the role and diversity of 3-mers involved in Rosetta's model "refinement" phase. Usage of the standard number of 3-mers - i.e. 200 - has been shown to degrade alpha and alpha-beta protein conformations initially achieved by assembling 9-mers. Therefore, a new prediction pipeline is proposed for Rosetta where the "refinement" phase is customised according to a target's structural class prediction. Over 8% improvement in terms of first model structure accuracy is reported for alpha and alpha-beta classes when decreasing the number of 3- mers. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  17. Accurate and dynamic predictive model for better prediction in medicine and healthcare.

    PubMed

    Alanazi, H O; Abdullah, A H; Qureshi, K N; Ismail, A S

    2018-05-01

    Information and communication technologies (ICTs) have changed the trend into new integrated operations and methods in all fields of life. The health sector has also adopted new technologies to improve the systems and provide better services to customers. Predictive models in health care are also influenced from new technologies to predict the different disease outcomes. However, still, existing predictive models have suffered from some limitations in terms of predictive outcomes performance. In order to improve predictive model performance, this paper proposed a predictive model by classifying the disease predictions into different categories. To achieve this model performance, this paper uses traumatic brain injury (TBI) datasets. TBI is one of the serious diseases worldwide and needs more attention due to its seriousness and serious impacts on human life. The proposed predictive model improves the predictive performance of TBI. The TBI data set is developed and approved by neurologists to set its features. The experiment results show that the proposed model has achieved significant results including accuracy, sensitivity, and specificity.

  18. Even after Thirteen Class Exams, Students Are Still Overconfident: The Role of Memory for Past Exam Performance in Student Predictions

    ERIC Educational Resources Information Center

    Foster, Nathaniel L.; Was, Christopher A.; Dunlosky, John; Isaacson, Randall M.

    2017-01-01

    Students often are overconfident when they predict their performance on classroom examinations, and their accuracy often does not improve across exams. One contributor to overconfidence may be that students did not have enough experience, and another is that students may under-use their knowledge of prior exam performance to predict performance on…

  19. Flight assessment of the onboard propulsion system model for the Performance Seeking Control algorithm on an F-15 aircraft

    NASA Technical Reports Server (NTRS)

    Orme, John S.; Schkolnik, Gerard S.

    1995-01-01

    Performance Seeking Control (PSC), an onboard, adaptive, real-time optimization algorithm, relies upon an onboard propulsion system model. Flight results illustrated propulsion system performance improvements as calculated by the model. These improvements were subject to uncertainty arising from modeling error. Thus to quantify uncertainty in the PSC performance improvements, modeling accuracy must be assessed. A flight test approach to verify PSC-predicted increases in thrust (FNP) and absolute levels of fan stall margin is developed and applied to flight test data. Application of the excess thrust technique shows that increases of FNP agree to within 3 percent of full-scale measurements for most conditions. Accuracy to these levels is significant because uncertainty bands may now be applied to the performance improvements provided by PSC. Assessment of PSC fan stall margin modeling accuracy was completed with analysis of in-flight stall tests. Results indicate that the model overestimates the stall margin by between 5 to 10 percent. Because PSC achieves performance gains by using available stall margin, this overestimation may represent performance improvements to be recovered with increased modeling accuracy. Assessment of thrust and stall margin modeling accuracy provides a critical piece for a comprehensive understanding of PSC's capabilities and limitations.

  20. Predicting oropharyngeal tumor volume throughout the course of radiation therapy from pretreatment computed tomography data using general linear models.

    PubMed

    Yock, Adam D; Rao, Arvind; Dong, Lei; Beadle, Beth M; Garden, Adam S; Kudchadker, Rajat J; Court, Laurence E

    2014-05-01

    The purpose of this work was to develop and evaluate the accuracy of several predictive models of variation in tumor volume throughout the course of radiation therapy. Nineteen patients with oropharyngeal cancers were imaged daily with CT-on-rails for image-guided alignment per an institutional protocol. The daily volumes of 35 tumors in these 19 patients were determined and used to generate (1) a linear model in which tumor volume changed at a constant rate, (2) a general linear model that utilized the power fit relationship between the daily and initial tumor volumes, and (3) a functional general linear model that identified and exploited the primary modes of variation between time series describing the changing tumor volumes. Primary and nodal tumor volumes were examined separately. The accuracy of these models in predicting daily tumor volumes were compared with those of static and linear reference models using leave-one-out cross-validation. In predicting the daily volume of primary tumors, the general linear model and the functional general linear model were more accurate than the static reference model by 9.9% (range: -11.6%-23.8%) and 14.6% (range: -7.3%-27.5%), respectively, and were more accurate than the linear reference model by 14.2% (range: -6.8%-40.3%) and 13.1% (range: -1.5%-52.5%), respectively. In predicting the daily volume of nodal tumors, only the 14.4% (range: -11.1%-20.5%) improvement in accuracy of the functional general linear model compared to the static reference model was statistically significant. A general linear model and a functional general linear model trained on data from a small population of patients can predict the primary tumor volume throughout the course of radiation therapy with greater accuracy than standard reference models. These more accurate models may increase the prognostic value of information about the tumor garnered from pretreatment computed tomography images and facilitate improved treatment management.

  1. When high working memory capacity is and is not beneficial for predicting nonlinear processes.

    PubMed

    Fischer, Helen; Holt, Daniel V

    2017-04-01

    Predicting the development of dynamic processes is vital in many areas of life. Previous findings are inconclusive as to whether higher working memory capacity (WMC) is always associated with using more accurate prediction strategies, or whether higher WMC can also be associated with using overly complex strategies that do not improve accuracy. In this study, participants predicted a range of systematically varied nonlinear processes based on exponential functions where prediction accuracy could or could not be enhanced using well-calibrated rules. Results indicate that higher WMC participants seem to rely more on well-calibrated strategies, leading to more accurate predictions for processes with highly nonlinear trajectories in the prediction region. Predictions of lower WMC participants, in contrast, point toward an increased use of simple exemplar-based prediction strategies, which perform just as well as more complex strategies when the prediction region is approximately linear. These results imply that with respect to predicting dynamic processes, working memory capacity limits are not generally a strength or a weakness, but that this depends on the process to be predicted.

  2. Accuracy of prediction of genomic breeding values for residual feed intake and carcass and meat quality traits in Bos taurus, Bos indicus, and composite beef cattle.

    PubMed

    Bolormaa, S; Pryce, J E; Kemper, K; Savin, K; Hayes, B J; Barendse, W; Zhang, Y; Reich, C M; Mason, B A; Bunch, R J; Harrison, B E; Reverter, A; Herd, R M; Tier, B; Graser, H-U; Goddard, M E

    2013-07-01

    The aim of this study was to assess the accuracy of genomic predictions for 19 traits including feed efficiency, growth, and carcass and meat quality traits in beef cattle. The 10,181 cattle in our study had real or imputed genotypes for 729,068 SNP although not all cattle were measured for all traits. Animals included Bos taurus, Brahman, composite, and crossbred animals. Genomic EBV (GEBV) were calculated using 2 methods of genomic prediction [BayesR and genomic BLUP (GBLUP)] either using a common training dataset for all breeds or using a training dataset comprising only animals of the same breed. Accuracies of GEBV were assessed using 5-fold cross-validation. The accuracy of genomic prediction varied by trait and by method. Traits with a large number of recorded and genotyped animals and with high heritability gave the greatest accuracy of GEBV. Using GBLUP, the average accuracy was 0.27 across traits and breeds, but the accuracies between breeds and between traits varied widely. When the training population was restricted to animals from the same breed as the validation population, GBLUP accuracies declined by an average of 0.04. The greatest decline in accuracy was found for the 4 composite breeds. The BayesR accuracies were greater by an average of 0.03 than GBLUP accuracies, particularly for traits with known genes of moderate to large effect mutations segregating. The accuracies of 0.43 to 0.48 for IGF-I traits were among the greatest in the study. Although accuracies are low compared with those observed in dairy cattle, genomic selection would still be beneficial for traits that are hard to improve by conventional selection, such as tenderness and residual feed intake. BayesR identified many of the same quantitative trait loci as a genomewide association study but appeared to map them more precisely. All traits appear to be highly polygenic with thousands of SNP independently associated with each trait.

  3. Schrödinger equation solved for the hydrogen molecule with unprecedented accuracy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pachucki, Krzysztof, E-mail: krp@fuw.edu.pl; Komasa, Jacek, E-mail: komasa@man.poznan.pl

    2016-04-28

    The hydrogen molecule can be used for determination of physical constants, including the proton charge radius, and for improved tests of the hypothetical long range force between hadrons, which require a sufficiently accurate knowledge of the molecular levels. In this work, we perform the first step toward a significant improvement in theoretical predictions of H{sub 2} and solve the nonrelativistic Schrödinger equation to the unprecedented accuracy of 10{sup −12}. We hope that it will inspire a parallel progress in the spectroscopy of the molecular hydrogen.

  4. Toward accurate prediction of pKa values for internal protein residues: the importance of conformational relaxation and desolvation energy.

    PubMed

    Wallace, Jason A; Wang, Yuhang; Shi, Chuanyin; Pastoor, Kevin J; Nguyen, Bao-Linh; Xia, Kai; Shen, Jana K

    2011-12-01

    Proton uptake or release controls many important biological processes, such as energy transduction, virus replication, and catalysis. Accurate pK(a) prediction informs about proton pathways, thereby revealing detailed acid-base mechanisms. Physics-based methods in the framework of molecular dynamics simulations not only offer pK(a) predictions but also inform about the physical origins of pK(a) shifts and provide details of ionization-induced conformational relaxation and large-scale transitions. One such method is the recently developed continuous constant pH molecular dynamics (CPHMD) method, which has been shown to be an accurate and robust pK(a) prediction tool for naturally occurring titratable residues. To further examine the accuracy and limitations of CPHMD, we blindly predicted the pK(a) values for 87 titratable residues introduced in various hydrophobic regions of staphylococcal nuclease and variants. The predictions gave a root-mean-square deviation of 1.69 pK units from experiment, and there were only two pK(a)'s with errors greater than 3.5 pK units. Analysis of the conformational fluctuation of titrating side-chains in the context of the errors of calculated pK(a) values indicate that explicit treatment of conformational flexibility and the associated dielectric relaxation gives CPHMD a distinct advantage. Analysis of the sources of errors suggests that more accurate pK(a) predictions can be obtained for the most deeply buried residues by improving the accuracy in calculating desolvation energies. Furthermore, it is found that the generalized Born implicit-solvent model underlying the current CPHMD implementation slightly distorts the local conformational environment such that the inclusion of an explicit-solvent representation may offer improvement of accuracy. Copyright © 2011 Wiley-Liss, Inc.

  5. Can multi-subpopulation reference sets improve the genomic predictive ability for pigs?

    PubMed

    Fangmann, A; Bergfelder-Drüing, S; Tholen, E; Simianer, H; Erbe, M

    2015-12-01

    In most countries and for most livestock species, genomic evaluations are obtained from within-breed analyses. To achieve reliable breeding values, however, a sufficient reference sample size is essential. To increase this size, the use of multibreed reference populations for small populations is considered a suitable option in other species. Over decades, the separate breeding work of different pig breeding organizations in Germany has led to stratified subpopulations in the breed German Large White. Due to this fact and the limited number of Large White animals available in each organization, there was a pressing need for ascertaining if multi-subpopulation genomic prediction is superior compared with within-subpopulation prediction in pigs. Direct genomic breeding values were estimated with genomic BLUP for the trait "number of piglets born alive" using genotype data (Illumina Porcine 60K SNP BeadChip) from 2,053 German Large White animals from five different commercial pig breeding companies. To assess the prediction accuracy of within- and multi-subpopulation reference sets, a random 5-fold cross-validation with 20 replications was performed. The five subpopulations considered were only slightly differentiated from each other. However, the prediction accuracy of the multi-subpopulations approach was not better than that of the within-subpopulation evaluation, for which the predictive ability was already high. Reference sets composed of closely related multi-subpopulation sets performed better than sets of distantly related subpopulations but not better than the within-subpopulation approach. Despite the low differentiation of the five subpopulations, the genetic connectedness between these different subpopulations seems to be too small to improve the prediction accuracy by applying multi-subpopulation reference sets. Consequently, resources should be used for enlarging the reference population within subpopulation, for example, by adding genotyped females.

  6. Genomic assisted selection for enhancing line breeding: merging genomic and phenotypic selection in winter wheat breeding programs with preliminary yield trials.

    PubMed

    Michel, Sebastian; Ametz, Christian; Gungor, Huseyin; Akgöl, Batuhan; Epure, Doru; Grausgruber, Heinrich; Löschenberger, Franziska; Buerstmayr, Hermann

    2017-02-01

    Early generation genomic selection is superior to conventional phenotypic selection in line breeding and can be strongly improved by including additional information from preliminary yield trials. The selection of lines that enter resource-demanding multi-environment trials is a crucial decision in every line breeding program as a large amount of resources are allocated for thoroughly testing these potential varietal candidates. We compared conventional phenotypic selection with various genomic selection approaches across multiple years as well as the merit of integrating phenotypic information from preliminary yield trials into the genomic selection framework. The prediction accuracy using only phenotypic data was rather low (r = 0.21) for grain yield but could be improved by modeling genetic relationships in unreplicated preliminary yield trials (r = 0.33). Genomic selection models were nevertheless found to be superior to conventional phenotypic selection for predicting grain yield performance of lines across years (r = 0.39). We subsequently simplified the problem of predicting untested lines in untested years to predicting tested lines in untested years by combining breeding values from preliminary yield trials and predictions from genomic selection models by a heritability index. This genomic assisted selection led to a 20% increase in prediction accuracy, which could be further enhanced by an appropriate marker selection for both grain yield (r = 0.48) and protein content (r = 0.63). The easy to implement and robust genomic assisted selection gave thus a higher prediction accuracy than either conventional phenotypic or genomic selection alone. The proposed method took the complex inheritance of both low and high heritable traits into account and appears capable to support breeders in their selection decisions to develop enhanced varieties more efficiently.

  7. Development of variable pathlength UV-vis spectroscopy combined with partial-least-squares regression for wastewater chemical oxygen demand (COD) monitoring.

    PubMed

    Chen, Baisheng; Wu, Huanan; Li, Sam Fong Yau

    2014-03-01

    To overcome the challenging task to select an appropriate pathlength for wastewater chemical oxygen demand (COD) monitoring with high accuracy by UV-vis spectroscopy in wastewater treatment process, a variable pathlength approach combined with partial-least squares regression (PLSR) was developed in this study. Two new strategies were proposed to extract relevant information of UV-vis spectral data from variable pathlength measurements. The first strategy was by data fusion with two data fusion levels: low-level data fusion (LLDF) and mid-level data fusion (MLDF). Predictive accuracy was found to improve, indicated by the lower root-mean-square errors of prediction (RMSEP) compared with those obtained for single pathlength measurements. Both fusion levels were found to deliver very robust PLSR models with residual predictive deviations (RPD) greater than 3 (i.e. 3.22 and 3.29, respectively). The second strategy involved calculating the slopes of absorbance against pathlength at each wavelength to generate slope-derived spectra. Without the requirement to select the optimal pathlength, the predictive accuracy (RMSEP) was improved by 20-43% as compared to single pathlength spectroscopy. Comparing to nine-factor models from fusion strategy, the PLSR model from slope-derived spectroscopy was found to be more parsimonious with only five factors and more robust with residual predictive deviation (RPD) of 3.72. It also offered excellent correlation of predicted and measured COD values with R(2) of 0.936. In sum, variable pathlength spectroscopy with the two proposed data analysis strategies proved to be successful in enhancing prediction performance of COD in wastewater and showed high potential to be applied in on-line water quality monitoring. Copyright © 2013 Elsevier B.V. All rights reserved.

  8. Better Forecasting for Better Planning: A Systems Approach.

    ERIC Educational Resources Information Center

    Austin, W. Burnet

    Predictions and forecasts are the most critical features of rational planning as well as the most vulnerable to inaccuracy. Because plans are only as good as their forecasts, current planning procedures could be improved by greater forecasting accuracy. Economic factors explain and predict more than any other set of factors, making economic…

  9. Paraphrasing and Prediction with Self-Explanation as Generative Strategies for Learning Science Principles in a Simulation

    ERIC Educational Resources Information Center

    Morrison, Jennifer R.; Bol, Linda; Ross, Steven M.; Watson, Ginger S.

    2015-01-01

    This study examined the incorporation of generative strategies for the guided discovery of physics principles in a simulation. Participants who either paraphrased or predicted and self-explained guided discovery assignments exhibited improved performance on an achievement test as compared to a control group. Calibration accuracy (the…

  10. Modelling for Prediction vs. Modelling for Understanding: Commentary on Musso et al. (2013)

    ERIC Educational Resources Information Center

    Edelsbrunner, Peter; Schneider, Michael

    2013-01-01

    Musso et al. (2013) predict students' academic achievement with high accuracy one year in advance from cognitive and demographic variables, using artificial neural networks (ANNs). They conclude that ANNs have high potential for theoretical and practical improvements in learning sciences. ANNs are powerful statistical modelling tools but they can…

  11. Comparing large-scale hydrological model predictions with observed streamflow in the Pacific Northwest: effects of climate and groundwater

    Treesearch

    Mohammad Safeeq; Guillaume S. Mauger; Gordon E. Grant; Ivan Arismendi; Alan F. Hamlet; Se-Yeun Lee

    2014-01-01

    Assessing uncertainties in hydrologic models can improve accuracy in predicting future streamflow. Here, simulated streamflows using the Variable Infiltration Capacity (VIC) model at coarse (1/16°) and fine (1/120°) spatial resolutions were evaluated against observed streamflows from 217 watersheds. In...

  12. Semi-supervised learning for genomic prediction of novel traits with small reference populations: an application to residual feed intake in dairy cattle.

    PubMed

    Yao, Chen; Zhu, Xiaojin; Weigel, Kent A

    2016-11-07

    Genomic prediction for novel traits, which can be costly and labor-intensive to measure, is often hampered by low accuracy due to the limited size of the reference population. As an option to improve prediction accuracy, we introduced a semi-supervised learning strategy known as the self-training model, and applied this method to genomic prediction of residual feed intake (RFI) in dairy cattle. We describe a self-training model that is wrapped around a support vector machine (SVM) algorithm, which enables it to use data from animals with and without measured phenotypes. Initially, a SVM model was trained using data from 792 animals with measured RFI phenotypes. Then, the resulting SVM was used to generate self-trained phenotypes for 3000 animals for which RFI measurements were not available. Finally, the SVM model was re-trained using data from up to 3792 animals, including those with measured and self-trained RFI phenotypes. Incorporation of additional animals with self-trained phenotypes enhanced the accuracy of genomic predictions compared to that of predictions that were derived from the subset of animals with measured phenotypes. The optimal ratio of animals with self-trained phenotypes to animals with measured phenotypes (2.5, 2.0, and 1.8) and the maximum increase achieved in prediction accuracy measured as the correlation between predicted and actual RFI phenotypes (5.9, 4.1, and 2.4%) decreased as the size of the initial training set (300, 400, and 500 animals with measured phenotypes) increased. The optimal number of animals with self-trained phenotypes may be smaller when prediction accuracy is measured as the mean squared error rather than the correlation between predicted and actual RFI phenotypes. Our results demonstrate that semi-supervised learning models that incorporate self-trained phenotypes can achieve genomic prediction accuracies that are comparable to those obtained with models using larger training sets that include only animals with measured phenotypes. Semi-supervised learning can be helpful for genomic prediction of novel traits, such as RFI, for which the size of reference population is limited, in particular, when the animals to be predicted and the animals in the reference population originate from the same herd-environment.

  13. Meta-analytical prognostic accuracy of the Comprehensive Assessment of at Risk Mental States (CAARMS): The need for refined prediction.

    PubMed

    Oliver, D; Kotlicka-Antczak, M; Minichino, A; Spada, G; McGuire, P; Fusar-Poli, P

    2018-03-01

    Primary indicated prevention is reliant on accurate tools to predict the onset of psychosis. The gold standard assessment for detecting individuals at clinical high risk (CHR-P) for psychosis in the UK and many other countries is the Comprehensive Assessment for At Risk Mental States (CAARMS). While the prognostic accuracy of CHR-P instruments has been assessed in general, this is the first study to specifically analyse that of the CAARMS. As such, the CAARMS was used as the index test, with the reference index being psychosis onset within 2 years. Six independent studies were analysed using MIDAS (STATA 14), with a total of 1876 help-seeking subjects referred to high risk services (CHR-P+: n=892; CHR-P-: n=984). Area under the curve (AUC), summary receiver operating characteristic curves (SROC), quality assessment, likelihood ratios, and probability modified plots were computed, along with sensitivity analyses and meta-regressions. The current meta-analysis confirmed that the 2-year prognostic accuracy of the CAARMS is only acceptable (AUC=0.79 95% CI: 0.75-0.83) and not outstanding as previously reported. In particular, specificity was poor. Sensitivity of the CAARMS is inferior compared to the SIPS, while specificity is comparably low. However, due to the difficulties in performing these types of studies, power in this meta-analysis was low. These results indicate that refining and improving the prognostic accuracy of the CAARMS should be the mainstream area of research for the next era. Avenues of prediction improvement are critically discussed and presented to better benefit patients and improve outcomes of first episode psychosis. Copyright © 2017 The Authors. Published by Elsevier Masson SAS.. All rights reserved.

  14. Prediction of arterial oxygen partial pressure after changes in FIO₂: validation and clinical application of a novel formula.

    PubMed

    Al-Otaibi, H M; Hardman, J G

    2011-11-01

    Existing methods allow prediction of Pa(O₂) during adjustment of Fi(O₂). However, these are cumbersome and lack sufficient accuracy for use in the clinical setting. The present studies aim to extend the validity of a novel formula designed to predict Pa(O₂) during adjustment of Fi(O₂) and to compare it with the current methods. Sixty-seven new data sets were collected from 46 randomly selected, mechanically ventilated patients. Each data set consisted of two subsets (before and 20 min after Fi(O₂) adjustment) and contained ventilator settings, pH, and arterial blood gas values. We compared the accuracy of Pa(O₂) prediction using a new formula (which utilizes only the pre-adjustment Pa(O₂) and pre- and post-adjustment Fi(O₂) with prediction using assumptions of constant Pa(O₂)/Fi(O₂) or constant Pa(O₂)/Pa(O₂). Subsequently, 20 clinicians predicted Pa(O₂) using the new formula and using Nunn's isoshunt diagram. The accuracy of the clinician's predictions was examined. The 95% limits of agreement (LA(95%)) between predicted and measured Pa(O₂) in the patient group were: new formula 0.11 (2.0) kPa, Pa(O₂)/Fi(O₂) -1.9 (4.4) kPa, and Pa(O₂)/Pa(O₂) -1.0 (3.6) kPa. The LA(95%) of clinicians' predictions of Pa(O₂) were 0.56 (3.6) kPa (new formula) and -2.7 (6.4) kPa (isoshunt diagram). The new formula's prediction of changes in Pa(O₂) is acceptably accurate and reliable and better than any other existing method. Its use by clinicians appears to improve accuracy over the most popular existing method. The simplicity of the new method may allow its regular use in the critical care setting.

  15. Maximizing lipocalin prediction through balanced and diversified training set and decision fusion.

    PubMed

    Nath, Abhigyan; Subbiah, Karthikeyan

    2015-12-01

    Lipocalins are short in sequence length and perform several important biological functions. These proteins are having less than 20% sequence similarity among paralogs. Experimentally identifying them is an expensive and time consuming process. The computational methods based on the sequence similarity for allocating putative members to this family are also far elusive due to the low sequence similarity existing among the members of this family. Consequently, the machine learning methods become a viable alternative for their prediction by using the underlying sequence/structurally derived features as the input. Ideally, any machine learning based prediction method must be trained with all possible variations in the input feature vector (all the sub-class input patterns) to achieve perfect learning. A near perfect learning can be achieved by training the model with diverse types of input instances belonging to the different regions of the entire input space. Furthermore, the prediction performance can be improved through balancing the training set as the imbalanced data sets will tend to produce the prediction bias towards majority class and its sub-classes. This paper is aimed to achieve (i) the high generalization ability without any classification bias through the diversified and balanced training sets as well as (ii) enhanced the prediction accuracy by combining the results of individual classifiers with an appropriate fusion scheme. Instead of creating the training set randomly, we have first used the unsupervised Kmeans clustering algorithm to create diversified clusters of input patterns and created the diversified and balanced training set by selecting an equal number of patterns from each of these clusters. Finally, probability based classifier fusion scheme was applied on boosted random forest algorithm (which produced greater sensitivity) and K nearest neighbour algorithm (which produced greater specificity) to achieve the enhanced predictive performance than that of individual base classifiers. The performance of the learned models trained on Kmeans preprocessed training set is far better than the randomly generated training sets. The proposed method achieved a sensitivity of 90.6%, specificity of 91.4% and accuracy of 91.0% on the first test set and sensitivity of 92.9%, specificity of 96.2% and accuracy of 94.7% on the second blind test set. These results have established that diversifying training set improves the performance of predictive models through superior generalization ability and balancing the training set improves prediction accuracy. For smaller data sets, unsupervised Kmeans based sampling can be an effective technique to increase generalization than that of the usual random splitting method. Copyright © 2015 Elsevier Ltd. All rights reserved.

  16. Sub-Model Partial Least Squares for Improved Accuracy in Quantitative Laser Induced Breakdown Spectroscopy

    NASA Astrophysics Data System (ADS)

    Anderson, R. B.; Clegg, S. M.; Frydenvang, J.

    2015-12-01

    One of the primary challenges faced by the ChemCam instrument on the Curiosity Mars rover is developing a regression model that can accurately predict the composition of the wide range of target types encountered (basalts, calcium sulfate, feldspar, oxides, etc.). The original calibration used 69 rock standards to train a partial least squares (PLS) model for each major element. By expanding the suite of calibration samples to >400 targets spanning a wider range of compositions, the accuracy of the model was improved, but some targets with "extreme" compositions (e.g. pure minerals) were still poorly predicted. We have therefore developed a simple method, referred to as "submodel PLS", to improve the performance of PLS across a wide range of target compositions. In addition to generating a "full" (0-100 wt.%) PLS model for the element of interest, we also generate several overlapping submodels (e.g. for SiO2, we generate "low" (0-50 wt.%), "mid" (30-70 wt.%), and "high" (60-100 wt.%) models). The submodels are generally more accurate than the "full" model for samples within their range because they are able to adjust for matrix effects that are specific to that range. To predict the composition of an unknown target, we first predict the composition with the submodels and the "full" model. Then, based on the predicted composition from the "full" model, the appropriate submodel prediction can be used (e.g. if the full model predicts a low composition, use the "low" model result, which is likely to be more accurate). For samples with "full" predictions that occur in a region of overlap between submodels, the submodel predictions are "blended" using a simple linear weighted sum. The submodel PLS method shows improvements in most of the major elements predicted by ChemCam and reduces the occurrence of negative predictions for low wt.% targets. Submodel PLS is currently being used in conjunction with ICA regression for the major element compositions of ChemCam data.

  17. A Hybrid Short-Term Traffic Flow Prediction Model Based on Singular Spectrum Analysis and Kernel Extreme Learning Machine.

    PubMed

    Shang, Qiang; Lin, Ciyun; Yang, Zhaosheng; Bing, Qichun; Zhou, Xiyang

    2016-01-01

    Short-term traffic flow prediction is one of the most important issues in the field of intelligent transport system (ITS). Because of the uncertainty and nonlinearity, short-term traffic flow prediction is a challenging task. In order to improve the accuracy of short-time traffic flow prediction, a hybrid model (SSA-KELM) is proposed based on singular spectrum analysis (SSA) and kernel extreme learning machine (KELM). SSA is used to filter out the noise of traffic flow time series. Then, the filtered traffic flow data is used to train KELM model, the optimal input form of the proposed model is determined by phase space reconstruction, and parameters of the model are optimized by gravitational search algorithm (GSA). Finally, case validation is carried out using the measured data of an expressway in Xiamen, China. And the SSA-KELM model is compared with several well-known prediction models, including support vector machine, extreme learning machine, and single KLEM model. The experimental results demonstrate that performance of the proposed model is superior to that of the comparison models. Apart from accuracy improvement, the proposed model is more robust.

  18. A Hybrid Short-Term Traffic Flow Prediction Model Based on Singular Spectrum Analysis and Kernel Extreme Learning Machine

    PubMed Central

    Lin, Ciyun; Yang, Zhaosheng; Bing, Qichun; Zhou, Xiyang

    2016-01-01

    Short-term traffic flow prediction is one of the most important issues in the field of intelligent transport system (ITS). Because of the uncertainty and nonlinearity, short-term traffic flow prediction is a challenging task. In order to improve the accuracy of short-time traffic flow prediction, a hybrid model (SSA-KELM) is proposed based on singular spectrum analysis (SSA) and kernel extreme learning machine (KELM). SSA is used to filter out the noise of traffic flow time series. Then, the filtered traffic flow data is used to train KELM model, the optimal input form of the proposed model is determined by phase space reconstruction, and parameters of the model are optimized by gravitational search algorithm (GSA). Finally, case validation is carried out using the measured data of an expressway in Xiamen, China. And the SSA-KELM model is compared with several well-known prediction models, including support vector machine, extreme learning machine, and single KLEM model. The experimental results demonstrate that performance of the proposed model is superior to that of the comparison models. Apart from accuracy improvement, the proposed model is more robust. PMID:27551829

  19. The Theory and Practice of Estimating the Accuracy of Dynamic Flight-Determined Coefficients

    NASA Technical Reports Server (NTRS)

    Maine, R. E.; Iliff, K. W.

    1981-01-01

    Means of assessing the accuracy of maximum likelihood parameter estimates obtained from dynamic flight data are discussed. The most commonly used analytical predictors of accuracy are derived and compared from both statistical and simplified geometrics standpoints. The accuracy predictions are evaluated with real and simulated data, with an emphasis on practical considerations, such as modeling error. Improved computations of the Cramer-Rao bound to correct large discrepancies due to colored noise and modeling error are presented. The corrected Cramer-Rao bound is shown to be the best available analytical predictor of accuracy, and several practical examples of the use of the Cramer-Rao bound are given. Engineering judgement, aided by such analytical tools, is the final arbiter of accuracy estimation.

  20. AERONET Version 3 Release: Providing Significant Improvements for Multi-Decadal Global Aerosol Database and Near Real-Time Validation

    NASA Technical Reports Server (NTRS)

    Holben, Brent; Slutsker, Ilya; Giles, David; Eck, Thomas; Smirnov, Alexander; Sinyuk, Aliaksandr; Schafer, Joel; Sorokin, Mikhail; Rodriguez, Jon; Kraft, Jason; hide

    2016-01-01

    Aerosols are highly variable in space, time and properties. Global assessment from satellite platforms and model predictions rely on validation from AERONET, a highly accurate ground-based network. Ver. 3 represents a significant improvement in accuracy and quality.

  1. Prediction of Adulthood Obesity Using Genetic and Childhood Clinical Risk Factors in the Cardiovascular Risk in Young Finns Study.

    PubMed

    Seyednasrollah, Fatemeh; Mäkelä, Johanna; Pitkänen, Niina; Juonala, Markus; Hutri-Kähönen, Nina; Lehtimäki, Terho; Viikari, Jorma; Kelly, Tanika; Li, Changwei; Bazzano, Lydia; Elo, Laura L; Raitakari, Olli T

    2017-06-01

    Obesity is a known risk factor for cardiovascular disease. Early prediction of obesity is essential for prevention. The aim of this study is to assess the use of childhood clinical factors and the genetic risk factors in predicting adulthood obesity using machine learning methods. A total of 2262 participants from the Cardiovascular Risk in YFS (Young Finns Study) were followed up from childhood (age 3-18 years) to adulthood for 31 years. The data were divided into training (n=1625) and validation (n=637) set. The effect of known genetic risk factors (97 single-nucleotide polymorphisms) was investigated as a weighted genetic risk score of all 97 single-nucleotide polymorphisms (WGRS97) or a subset of 19 most significant single-nucleotide polymorphisms (WGRS19) using boosting machine learning technique. WGRS97 and WGRS19 were validated using external data (n=369) from BHS (Bogalusa Heart Study). WGRS19 improved the accuracy of predicting adulthood obesity in training (area under the curve [AUC=0.787 versus AUC=0.744, P <0.0001) and validation data (AUC=0.769 versus AUC=0.747, P =0.026). WGRS97 improved the accuracy in training (AUC=0.782 versus AUC=0.744, P <0.0001) but not in validation data (AUC=0.749 versus AUC=0.747, P =0.785). Higher WGRS19 associated with higher body mass index at 9 years and WGRS97 at 6 years. Replication in BHS confirmed our findings that WGRS19 and WGRS97 are associated with body mass index. WGRS19 improves prediction of adulthood obesity. Predictive accuracy is highest among young children (3-6 years), whereas among older children (9-18 years) the risk can be identified using childhood clinical factors. The model is helpful in screening children with high risk of developing obesity. © 2017 American Heart Association, Inc.

  2. High precision predictions for exclusive VH production at the LHC

    DOE PAGES

    Li, Ye; Liu, Xiaohui

    2014-06-04

    We present a resummation-improved prediction for pp → VH + 0 jets at the Large Hadron Collider. We focus on highly-boosted final states in the presence of jet veto to suppress the tt¯ background. In this case, conventional fixed-order calculations are plagued by the existence of large Sudakov logarithms α n slog m(p veto T/Q) for Q ~ m V + m H which lead to unreliable predictions as well as large theoretical uncertainties, and thus limit the accuracy when comparing experimental measurements to the Standard Model. In this work, we show that the resummation of Sudakov logarithms beyond themore » next-to-next-to-leading-log accuracy, combined with the next-to-next-to-leading order calculation, reduces the scale uncertainty and stabilizes the perturbative expansion in the region where the vector bosons carry large transverse momentum. Thus, our result improves the precision with which Higgs properties can be determined from LHC measurements using boosted Higgs techniques.« less

  3. Results from the centers for disease control and prevention's predict the 2013-2014 Influenza Season Challenge.

    PubMed

    Biggerstaff, Matthew; Alper, David; Dredze, Mark; Fox, Spencer; Fung, Isaac Chun-Hai; Hickmann, Kyle S; Lewis, Bryan; Rosenfeld, Roni; Shaman, Jeffrey; Tsou, Ming-Hsiang; Velardi, Paola; Vespignani, Alessandro; Finelli, Lyn

    2016-07-22

    Early insights into the timing of the start, peak, and intensity of the influenza season could be useful in planning influenza prevention and control activities. To encourage development and innovation in influenza forecasting, the Centers for Disease Control and Prevention (CDC) organized a challenge to predict the 2013-14 Unites States influenza season. Challenge contestants were asked to forecast the start, peak, and intensity of the 2013-2014 influenza season at the national level and at any or all Health and Human Services (HHS) region level(s). The challenge ran from December 1, 2013-March 27, 2014; contestants were required to submit 9 biweekly forecasts at the national level to be eligible. The selection of the winner was based on expert evaluation of the methodology used to make the prediction and the accuracy of the prediction as judged against the U.S. Outpatient Influenza-like Illness Surveillance Network (ILINet). Nine teams submitted 13 forecasts for all required milestones. The first forecast was due on December 2, 2013; 3/13 forecasts received correctly predicted the start of the influenza season within one week, 1/13 predicted the peak within 1 week, 3/13 predicted the peak ILINet percentage within 1 %, and 4/13 predicted the season duration within 1 week. For the prediction due on December 19, 2013, the number of forecasts that correctly forecasted the peak week increased to 2/13, the peak percentage to 6/13, and the duration of the season to 6/13. As the season progressed, the forecasts became more stable and were closer to the season milestones. Forecasting has become technically feasible, but further efforts are needed to improve forecast accuracy so that policy makers can reliably use these predictions. CDC and challenge contestants plan to build upon the methods developed during this contest to improve the accuracy of influenza forecasts.

  4. Tendon-Driven Continuum Robot for Neuroendoscopy: Validation of Extended Kinematic Mapping for Hysteresis Operation

    PubMed Central

    Takahisa, Kato; Okumura, Ichiro; Kose, Hidekazu; Takagi, Kiyoshi; Hata, Nobuhiko

    2016-01-01

    Purpose The hysteresis operation is an outstanding issue in tendon-driven actuation—which is used in robot-assisted surgery—as it is incompatible with kinematic mapping for control and trajectory planning. Here, a new tendon-driven continuum robot, designed to fit existing neuroendoscopes, is presented with kinematic mapping for hysteresis operation. Methods With attention to tension in tendons as a salient factor of the hysteresis operation, extended forward kinematic mapping (FKM) has been developed. In the experiment, the significance of every component in the robot for the hysteresis operation has been investigated. Moreover, the prediction accuracy of postures by the extended FKM has been determined experimentally and compared with piecewise constant curvature assumption (PCCA). Results The tendons were the most predominant factor affecting the hysteresis operation of the robot. The extended FKM including friction in tendons predicted the postures in the hysteresis operation with improved accuracy (2.89 mm and 3.87 mm for the single and the antagonistic tendons layouts, respectively). The measured accuracy was within the target value of 5 mm for planning of neuroendoscopic resection of intraventricle tumors. Conclusion The friction in tendons was the most predominant factor for the hysteresis operation in the robot. The extended FKM including this factor can improve prediction accuracy of the postures in the hysteresis operation. The trajectory of the new robot can be planned within target value for the neuroendoscopic procedure by using the extended FKM. PMID:26476639

  5. NMRDSP: an accurate prediction of protein shape strings from NMR chemical shifts and sequence data.

    PubMed

    Mao, Wusong; Cong, Peisheng; Wang, Zhiheng; Lu, Longjian; Zhu, Zhongliang; Li, Tonghua

    2013-01-01

    Shape string is structural sequence and is an extremely important structure representation of protein backbone conformations. Nuclear magnetic resonance chemical shifts give a strong correlation with the local protein structure, and are exploited to predict protein structures in conjunction with computational approaches. Here we demonstrate a novel approach, NMRDSP, which can accurately predict the protein shape string based on nuclear magnetic resonance chemical shifts and structural profiles obtained from sequence data. The NMRDSP uses six chemical shifts (HA, H, N, CA, CB and C) and eight elements of structure profiles as features, a non-redundant set (1,003 entries) as the training set, and a conditional random field as a classification algorithm. For an independent testing set (203 entries), we achieved an accuracy of 75.8% for S8 (the eight states accuracy) and 87.8% for S3 (the three states accuracy). This is higher than only using chemical shifts or sequence data, and confirms that the chemical shift and the structure profile are significant features for shape string prediction and their combination prominently improves the accuracy of the predictor. We have constructed the NMRDSP web server and believe it could be employed to provide a solid platform to predict other protein structures and functions. The NMRDSP web server is freely available at http://cal.tongji.edu.cn/NMRDSP/index.jsp.

  6. Genomic relationships based on X chromosome markers and accuracy of genomic predictions with and without X chromosome markers

    PubMed Central

    2014-01-01

    Background Although the X chromosome is the second largest bovine chromosome, markers on the X chromosome are not used for genomic prediction in some countries and populations. In this study, we presented a method for computing genomic relationships using X chromosome markers, investigated the accuracy of imputation from a low density (7K) to the 54K SNP (single nucleotide polymorphism) panel, and compared the accuracy of genomic prediction with and without using X chromosome markers. Methods The impact of considering X chromosome markers on prediction accuracy was assessed using data from Nordic Holstein bulls and different sets of SNPs: (a) the 54K SNPs for reference and test animals, (b) SNPs imputed from the 7K to the 54K SNP panel for test animals, (c) SNPs imputed from the 7K to the 54K panel for half of the reference animals, and (d) the 7K SNP panel for all animals. Beagle and Findhap were used for imputation. GBLUP (genomic best linear unbiased prediction) models with or without X chromosome markers and with or without a residual polygenic effect were used to predict genomic breeding values for 15 traits. Results Averaged over the two imputation datasets, correlation coefficients between imputed and true genotypes for autosomal markers, pseudo-autosomal markers, and X-specific markers were 0.971, 0.831 and 0.935 when using Findhap, and 0.983, 0.856 and 0.937 when using Beagle. Estimated reliabilities of genomic predictions based on the imputed datasets using Findhap or Beagle were very close to those using the real 54K data. Genomic prediction using all markers gave slightly higher reliabilities than predictions without X chromosome markers. Based on our data which included only bulls, using a G matrix that accounted for sex-linked relationships did not improve prediction, compared with a G matrix that did not account for sex-linked relationships. A model that included a polygenic effect did not recover the loss of prediction accuracy from exclusion of X chromosome markers. Conclusions The results from this study suggest that markers on the X chromosome contribute to accuracy of genomic predictions and should be used for routine genomic evaluation. PMID:25080199

  7. Predicting coronary artery disease using different artificial neural network models.

    PubMed

    Colak, M Cengiz; Colak, Cemil; Kocatürk, Hasan; Sağiroğlu, Seref; Barutçu, Irfan

    2008-08-01

    Eight different learning algorithms used for creating artificial neural network (ANN) models and the different ANN models in the prediction of coronary artery disease (CAD) are introduced. This work was carried out as a retrospective case-control study. Overall, 124 consecutive patients who had been diagnosed with CAD by coronary angiography (at least 1 coronary stenosis > 50% in major epicardial arteries) were enrolled in the work. Angiographically, the 113 people (group 2) with normal coronary arteries were taken as control subjects. Multi-layered perceptrons ANN architecture were applied. The ANN models trained with different learning algorithms were performed in 237 records, divided into training (n=171) and testing (n=66) data sets. The performance of prediction was evaluated by sensitivity, specificity and accuracy values based on standard definitions. The results have demonstrated that ANN models trained with eight different learning algorithms are promising because of high (greater than 71%) sensitivity, specificity and accuracy values in the prediction of CAD. Accuracy, sensitivity and specificity values varied between 83.63%-100%, 86.46%-100% and 74.67%-100% for training, respectively. For testing, the values were more than 71% for sensitivity, 76% for specificity and 81% for accuracy. It may be proposed that the use of different learning algorithms other than backpropagation and larger sample sizes can improve the performance of prediction. The proposed ANN models trained with these learning algorithms could be used a promising approach for predicting CAD without the need for invasive diagnostic methods and could help in the prognostic clinical decision.

  8. Bayesian Optimization for Neuroimaging Pre-processing in Brain Age Classification and Prediction

    PubMed Central

    Lancaster, Jenessa; Lorenz, Romy; Leech, Rob; Cole, James H.

    2018-01-01

    Neuroimaging-based age prediction using machine learning is proposed as a biomarker of brain aging, relating to cognitive performance, health outcomes and progression of neurodegenerative disease. However, even leading age-prediction algorithms contain measurement error, motivating efforts to improve experimental pipelines. T1-weighted MRI is commonly used for age prediction, and the pre-processing of these scans involves normalization to a common template and resampling to a common voxel size, followed by spatial smoothing. Resampling parameters are often selected arbitrarily. Here, we sought to improve brain-age prediction accuracy by optimizing resampling parameters using Bayesian optimization. Using data on N = 2003 healthy individuals (aged 16–90 years) we trained support vector machines to (i) distinguish between young (<22 years) and old (>50 years) brains (classification) and (ii) predict chronological age (regression). We also evaluated generalisability of the age-regression model to an independent dataset (CamCAN, N = 648, aged 18–88 years). Bayesian optimization was used to identify optimal voxel size and smoothing kernel size for each task. This procedure adaptively samples the parameter space to evaluate accuracy across a range of possible parameters, using independent sub-samples to iteratively assess different parameter combinations to arrive at optimal values. When distinguishing between young and old brains a classification accuracy of 88.1% was achieved, (optimal voxel size = 11.5 mm3, smoothing kernel = 2.3 mm). For predicting chronological age, a mean absolute error (MAE) of 5.08 years was achieved, (optimal voxel size = 3.73 mm3, smoothing kernel = 3.68 mm). This was compared to performance using default values of 1.5 mm3 and 4mm respectively, resulting in MAE = 5.48 years, though this 7.3% improvement was not statistically significant. When assessing generalisability, best performance was achieved when applying the entire Bayesian optimization framework to the new dataset, out-performing the parameters optimized for the initial training dataset. Our study outlines the proof-of-principle that neuroimaging models for brain-age prediction can use Bayesian optimization to derive case-specific pre-processing parameters. Our results suggest that different pre-processing parameters are selected when optimization is conducted in specific contexts. This potentially motivates use of optimization techniques at many different points during the experimental process, which may improve statistical sensitivity and reduce opportunities for experimenter-led bias. PMID:29483870

  9. MicroRNA based Pan-Cancer Diagnosis and Treatment Recommendation.

    PubMed

    Cheerla, Nikhil; Gevaert, Olivier

    2017-01-13

    The current state-of-the-art in cancer diagnosis and treatment is not ideal; diagnostic tests are accurate but invasive, and treatments are "one-size fits-all" instead of being personalized. Recently, miRNA's have garnered significant attention as cancer biomarkers, owing to their ease of access (circulating miRNA in the blood) and stability. There have been many studies showing the effectiveness of miRNA data in diagnosing specific cancer types, but few studies explore the role of miRNA in predicting treatment outcome. Here we go a step further, using tissue miRNA and clinical data across 21 cancers from the 'The Cancer Genome Atlas' (TCGA) database. We use machine learning techniques to create an accurate pan-cancer diagnosis system, and a prediction model for treatment outcomes. Finally, using these models, we create a web-based tool that diagnoses cancer and recommends the best treatment options. We achieved 97.2% accuracy for classification using a support vector machine classifier with radial basis. The accuracies improved to 99.9-100% when climbing up the embryonic tree and classifying cancers at different stages. We define the accuracy as the ratio of the total number of instances correctly classified to the total instances. The classifier also performed well, achieving greater than 80% sensitivity for many cancer types on independent validation datasets. Many miRNAs selected by our feature selection algorithm had strong previous associations to various cancers and tumor progression. Then, using miRNA, clinical and treatment data and encoding it in a machine-learning readable format, we built a prognosis predictor model to predict the outcome of treatment with 85% accuracy. We used this model to create a tool that recommends personalized treatment regimens. Both the diagnosis and prognosis model, incorporating semi-supervised learning techniques to improve their accuracies with repeated use, were uploaded online for easy access. Our research is a step towards the final goal of diagnosing cancer and predicting treatment recommendations using non-invasive blood tests.

  10. Prediction model of dissolved oxygen in ponds based on ELM neural network

    NASA Astrophysics Data System (ADS)

    Li, Xinfei; Ai, Jiaoyan; Lin, Chunhuan; Guan, Haibin

    2018-02-01

    Dissolved oxygen in ponds is affected by many factors, and its distribution is unbalanced. In this paper, in order to improve the imbalance of dissolved oxygen distribution more effectively, the dissolved oxygen prediction model of Extreme Learning Machine (ELM) intelligent algorithm is established, based on the method of improving dissolved oxygen distribution by artificial push flow. Select the Lake Jing of Guangxi University as the experimental area. Using the model to predict the dissolved oxygen concentration of different voltage pumps, the results show that the ELM prediction accuracy is higher than the BP algorithm, and its mean square error is MSEELM=0.0394, the correlation coefficient RELM=0.9823. The prediction results of the 24V voltage pump push flow show that the discrete prediction curve can approximate the measured values well. The model can provide the basis for the artificial improvement of the dissolved oxygen distribution decision.

  11. Predicting Gene Structures from Multiple RT-PCR Tests

    NASA Astrophysics Data System (ADS)

    Kováč, Jakub; Vinař, Tomáš; Brejová, Broňa

    It has been demonstrated that the use of additional information such as ESTs and protein homology can significantly improve accuracy of gene prediction. However, many sources of external information are still being omitted from consideration. Here, we investigate the use of product lengths from RT-PCR experiments in gene finding. We present hardness results and practical algorithms for several variants of the problem and apply our methods to a real RT-PCR data set in the Drosophila genome. We conclude that the use of RT-PCR data can improve the sensitivity of gene prediction and locate novel splicing variants.

  12. Predicting online ratings based on the opinion spreading process

    NASA Astrophysics Data System (ADS)

    He, Xing-Sheng; Zhou, Ming-Yang; Zhuo, Zhao; Fu, Zhong-Qian; Liu, Jian-Guo

    2015-10-01

    Predicting users' online ratings is always a challenge issue and has drawn lots of attention. In this paper, we present a rating prediction method by combining the user opinion spreading process with the collaborative filtering algorithm, where user similarity is defined by measuring the amount of opinion a user transfers to another based on the primitive user-item rating matrix. The proposed method could produce a more precise rating prediction for each unrated user-item pair. In addition, we introduce a tunable parameter λ to regulate the preferential diffusion relevant to the degree of both opinion sender and receiver. The numerical results for Movielens and Netflix data sets show that this algorithm has a better accuracy than the standard user-based collaborative filtering algorithm using Cosine and Pearson correlation without increasing computational complexity. By tuning λ, our method could further boost the prediction accuracy when using Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) as measurements. In the optimal cases, on Movielens and Netflix data sets, the corresponding algorithmic accuracy (MAE and RMSE) are improved 11.26% and 8.84%, 13.49% and 10.52% compared to the item average method, respectively.

  13. Improving the accuracy of energy baseline models for commercial buildings with occupancy data

    DOE PAGES

    Liang, Xin; Hong, Tianzhen; Shen, Geoffrey Qiping

    2016-07-07

    More than 80% of energy is consumed during operation phase of a building's life cycle, so energy efficiency retrofit for existing buildings is considered a promising way to reduce energy use in buildings. The investment strategies of retrofit depend on the ability to quantify energy savings by “measurement and verification” (M&V), which compares actual energy consumption to how much energy would have been used without retrofit (called the “baseline” of energy use). Although numerous models exist for predicting baseline of energy use, a critical limitation is that occupancy has not been included as a variable. However, occupancy rate is essentialmore » for energy consumption and was emphasized by previous studies. This study develops a new baseline model which is built upon the Lawrence Berkeley National Laboratory (LBNL) model but includes the use of building occupancy data. The study also proposes metrics to quantify the accuracy of prediction and the impacts of variables. However, the results show that including occupancy data does not significantly improve the accuracy of the baseline model, especially for HVAC load. The reasons are discussed further. In addition, sensitivity analysis is conducted to show the influence of parameters in baseline models. To conclude, the results from this study can help us understand the influence of occupancy on energy use, improve energy baseline prediction by including the occupancy factor, reduce risks of M&V and facilitate investment strategies of energy efficiency retrofit.« less

  14. A drift line bias estimator: ARMA-based filter or calibration method, and its application in BDS/GPS-based attitude determination

    NASA Astrophysics Data System (ADS)

    Liang, Zhang; Yanqing, Hou; Jie, Wu

    2016-12-01

    The multi-antenna synchronized receiver (using a common clock) is widely applied in GNSS-based attitude determination (AD) or terrain deformations monitoring, and many other applications, since the high-accuracy single-differenced carrier phase can be used to improve the positioning or AD accuracy. Thus, the line bias (LB) parameter (fractional bias isolating) should be calibrated in the single-differenced phase equations. In the past decades, all researchers estimated the LB as a constant parameter in advance and compensated it in real time. However, the constant LB assumption is inappropriate in practical applications because of the physical length and permittivity changes of the cables, caused by the environmental temperature variation and the instability of receiver-self inner circuit transmitting delay. Considering the LB drift (or colored LB) in practical circumstances, this paper initiates a real-time estimator using auto regressive moving average-based (ARMA) prediction/whitening filter model or Moving average-based (MA) constant calibration model. In the ARMA-based filter model, four cases namely AR(1), ARMA(1, 1), AR(2) and ARMA(2, 1) are applied for the LB prediction. The real-time relative positioning model using the ARMA-based predicting LB is derived and it is theoretically proved that the positioning accuracy is better than the traditional double difference carrier phase (DDCP) model. The drifting LB is defined with a phase temperature changing rate integral function, which is a random walk process if the phase temperature changing rate is white noise, and is validated by the analysis of the AR model coefficient. The auto covariance function shows that the LB is indeed varying in time and estimating it as a constant is not safe, which is also demonstrated by the analysis on LB variation of each visible satellite during a zero and short baseline BDS/GPS experiment. Compared to the DDCP approach, in the zero-baseline experiment, the LB constant calibration (LBCC) and MA approaches improved the positioning accuracy of the vertical component, while slightly degrading the accuracy of the horizontal components. The ARMA(1, 0) model, however, improved the positioning accuracy of all three components, with 40 and 50 % improvement of the vertical component for BDS and GPS, respectively. In the short baseline experiment, compared to the DDCP approach, the LBCC approach yielded bad positioning solutions and degraded the AD accuracy; both MA and ARMA-based filter approaches improved the AD accuracy. Moreover, the ARMA(1, 0) and ARMA(1, 1) models have relatively better performance, improving to 55 % and 48 % the elevation angle in ARMA(1, 1) and MA model for GPS, respectively. Furthermore, the drifting LB variation is found to be continuous and slowly cumulative; the variation magnitudes in the unit of length are almost identical on different frequency carrier phases, so the LB variation does not show obvious correlation between different frequencies. Consequently, the wide-lane LB in the unit of cycle is very stable, while the narrow-lane LB varies largely in time. This reasoning probably also explains the phenomenon that the wide-lane LB originating in the satellites is stable, while the narrow-lane LB varies. The results of ARMA-based filters are better than the MA model, which probably implies that the modeling for drifting LB can further improve the precise point positioning accuracy.

  15. ir-HSP: Improved Recognition of Heat Shock Proteins, Their Families and Sub-types Based On g-Spaced Di-peptide Features and Support Vector Machine

    PubMed Central

    Meher, Prabina K.; Sahu, Tanmaya K.; Gahoi, Shachi; Rao, Atmakuri R.

    2018-01-01

    Heat shock proteins (HSPs) play a pivotal role in cell growth and variability. Since conventional approaches are expensive and voluminous protein sequence information is available in the post-genomic era, development of an automated and accurate computational tool is highly desirable for prediction of HSPs, their families and sub-types. Thus, we propose a computational approach for reliable prediction of all these components in a single framework and with higher accuracy as well. The proposed approach achieved an overall accuracy of ~84% in predicting HSPs, ~97% in predicting six different families of HSPs, and ~94% in predicting four types of DnaJ proteins, with bench mark datasets. The developed approach also achieved higher accuracy as compared to most of the existing approaches. For easy prediction of HSPs by experimental scientists, a user friendly web server ir-HSP is made freely accessible at http://cabgrid.res.in:8080/ir-hsp. The ir-HSP was further evaluated for proteome-wide identification of HSPs by using proteome datasets of eight different species, and ~50% of the predicted HSPs in each species were found to be annotated with InterPro HSP families/domains. Thus, the developed computational method is expected to supplement the currently available approaches for prediction of HSPs, to the extent of their families and sub-types. PMID:29379521

  16. Development of equations for predicting Puerto Rican subtropical dry forest biomass and volume

    Treesearch

    Thomas J. Brandeis; Matthew Delaney; Bernard R. Parresol; Larry Royer

    2006-01-01

    Carbon accounting, forest health monitoring and sustainable management of the subtropical dry forests of Puerto Rico and other Caribbean Islands require an accurate assessment of forest aboveground biomass (AGB) and stem volume. One means of improving assessment accuracy is the development of predictive equations derived from locally collected data. Forest inventory...

  17. Development of equations for predicting Puerto Rican subtropical dry forest biomass and volume.

    Treesearch

    Thomas J. Brandeis; Matthew Delaney; Bernard R. Parresol; Larry Royer

    2006-01-01

    Carbon accounting, forest health monitoring and sustainable management of the subtropical dry forests of Puerto Rico and other Caribbean Islands require an accurate assessment of forest aboveground biomass (AGB) and stem volume. One means of improving assessment accuracy is the development of predictive equations derived from locally collected data. Forest inventory...

  18. Effects of Word Prediction on Writing Fluency for Students with Physical Disabilities

    ERIC Educational Resources Information Center

    Mezei, Peter J.; Heller, Kathryn Wolff

    2012-01-01

    Many students with physical disabilities have difficulty with writing fluency due to motor limitations. One type of assistive technology that has been developed to improve writing speed and accuracy is word prediction software, although there is a paucity of research supporting its use for individuals with physical disabilities. This study used an…

  19. Genomic prediction of continuous and binary fertility traits of females in a composite beef cattle breed

    USDA-ARS?s Scientific Manuscript database

    Reproduction efficiency is a major factor in the profitability of the beef cattle industry. Genomic selection (GS) is a promising tool that may improve the predictive accuracy and genetic gain of fertility traits. There is a wide range of traits used to measure fertility in dairy and beef cattle inc...

  20. Protein subcellular localization prediction using artificial intelligence technology.

    PubMed

    Nair, Rajesh; Rost, Burkhard

    2008-01-01

    Proteins perform many important tasks in living organisms, such as catalysis of biochemical reactions, transport of nutrients, and recognition and transmission of signals. The plethora of aspects of the role of any particular protein is referred to as its "function." One aspect of protein function that has been the target of intensive research by computational biologists is its subcellular localization. Proteins must be localized in the same subcellular compartment to cooperate toward a common physiological function. Aberrant subcellular localization of proteins can result in several diseases, including kidney stones, cancer, and Alzheimer's disease. To date, sequence homology remains the most widely used method for inferring the function of a protein. However, the application of advanced artificial intelligence (AI)-based techniques in recent years has resulted in significant improvements in our ability to predict the subcellular localization of a protein. The prediction accuracy has risen steadily over the years, in large part due to the application of AI-based methods such as hidden Markov models (HMMs), neural networks (NNs), and support vector machines (SVMs), although the availability of larger experimental datasets has also played a role. Automatic methods that mine textual information from the biological literature and molecular biology databases have considerably sped up the process of annotation for proteins for which some information regarding function is available in the literature. State-of-the-art methods based on NNs and HMMs can predict the presence of N-terminal sorting signals extremely accurately. Ab initio methods that predict subcellular localization for any protein sequence using only the native amino acid sequence and features predicted from the native sequence have shown the most remarkable improvements. The prediction accuracy of these methods has increased by over 30% in the past decade. The accuracy of these methods is now on par with high-throughput methods for predicting localization, and they are beginning to play an important role in directing experimental research. In this chapter, we review some of the most important methods for the prediction of subcellular localization.

  1. The drivers of wildfire enlargement do not exhibit scale thresholds in southeastern Australian forests.

    PubMed

    Price, Owen F; Penman, Trent; Bradstock, Ross; Borah, Rittick

    2016-10-01

    Wildfires are complex adaptive systems, and have been hypothesized to exhibit scale-dependent transitions in the drivers of fire spread. Among other things, this makes the prediction of final fire size from conditions at the ignition difficult. We test this hypothesis by conducting a multi-scale statistical modelling of the factors determining whether fires reached 10 ha, then 100 ha then 1000 ha and the final size of fires >1000 ha. At each stage, the predictors were measures of weather, fuels, topography and fire suppression. The objectives were to identify differences among the models indicative of scale transitions, assess the accuracy of the multi-step method for predicting fire size (compared to predicting final size from initial conditions) and to quantify the importance of the predictors. The data were 1116 fires that occurred in the eucalypt forests of New South Wales between 1985 and 2010. The models were similar at the different scales, though there were subtle differences. For example, the presence of roads affected whether fires reached 10 ha but not larger scales. Weather was the most important predictor overall, though fuel load, topography and ease of suppression all showed effects. Overall, there was no evidence that fires have scale-dependent transitions in behaviour. The models had a predictive accuracy of 73%, 66%, 72% and 53% accuracy at 10 ha, 100 ha, 1000 ha and final size scales. When these steps were combined, the overall accuracy for predicting the size of fires was 62%, while the accuracy of the one step model was only 20%. Thus, the multi-scale approach was an improvement on the single scale approach, even though the predictive accuracy was probably insufficient for use as an operational tool. The analysis has also provided further evidence of the important role of weather, compared to fuel, suppression and topography in driving fire behaviour. Copyright © 2016. Published by Elsevier Ltd.

  2. Global and Regional 3D Tomography for Improved Seismic Event Location and Uncertainty in Explosion Monitoring

    NASA Astrophysics Data System (ADS)

    Downey, N.; Begnaud, M. L.; Hipp, J. R.; Ballard, S.; Young, C. S.; Encarnacao, A. V.

    2017-12-01

    The SALSA3D global 3D velocity model of the Earth was developed to improve the accuracy and precision of seismic travel time predictions for a wide suite of regional and teleseismic phases. Recently, the global SALSA3D model was updated to include additional body wave phases including mantle phases, core phases, reflections off the core-mantle boundary and underside reflections off the surface of the Earth. We show that this update improves travel time predictions and leads directly to significant improvements in the accuracy and precision of seismic event locations as compared to locations computed using standard 1D velocity models like ak135, or 2½D models like RSTT. A key feature of our inversions is that path-specific model uncertainty of travel time predictions are calculated using the full 3D model covariance matrix computed during tomography, which results in more realistic uncertainty ellipses that directly reflect tomographic data coverage. Application of this method can also be done at a regional scale: we present a velocity model with uncertainty obtained using data obtained from the University of Utah Seismograph Stations. These results show a reduction in travel-time residuals for re-located events compared with those obtained using previously published models.

  3. Evaluation of anthropogenic influence in probabilistic forecasting of coastal change

    NASA Astrophysics Data System (ADS)

    Hapke, C. J.; Wilson, K.; Adams, P. N.

    2014-12-01

    Prediction of large scale coastal behavior is especially challenging in areas of pervasive human activity. Many coastal zones on the Gulf and Atlantic coasts are moderately to highly modified through the use of soft sediment and hard stabilization techniques. These practices have the potential to alter sediment transport and availability, as well as reshape the beach profile, ultimately transforming the natural evolution of the coastal system. We present the results of a series of probabilistic models, designed to predict the observed geomorphic response to high wave events at Fire Island, New York. The island comprises a variety of land use types, including inhabited communities with modified beaches, where beach nourishment and artificial dune construction (scraping) occur, unmodified zones, and protected national seashore. This variation in land use presents an opportunity for comparison of model accuracy across highly modified and rarely modified stretches of coastline. Eight models with basic and expanded structures were developed, resulting in sixteen models, informed with observational data from Fire Island. The basic model type does not include anthropogenic modification. The expanded model includes records of nourishment and scraping, designed to quantify the improved accuracy when anthropogenic activity is represented. Modification was included as frequency of occurrence divided by the time since the most recent event, to distinguish between recent and historic events. All but one model reported improved predictive accuracy from the basic to expanded form. The addition of nourishment and scraping parameters resulted in a maximum reduction in predictive error of 36%. The seven improved models reported an average 23% reduction in error. These results indicate that it is advantageous to incorporate the human forcing into a coastal hazards probability model framework.

  4. PconsFold: improved contact predictions improve protein models.

    PubMed

    Michel, Mirco; Hayat, Sikander; Skwark, Marcin J; Sander, Chris; Marks, Debora S; Elofsson, Arne

    2014-09-01

    Recently it has been shown that the quality of protein contact prediction from evolutionary information can be improved significantly if direct and indirect information is separated. Given sufficiently large protein families, the contact predictions contain sufficient information to predict the structure of many protein families. However, since the first studies contact prediction methods have improved. Here, we ask how much the final models are improved if improved contact predictions are used. In a small benchmark of 15 proteins, we show that the TM-scores of top-ranked models are improved by on average 33% using PconsFold compared with the original version of EVfold. In a larger benchmark, we find that the quality is improved with 15-30% when using PconsC in comparison with earlier contact prediction methods. Further, using Rosetta instead of CNS does not significantly improve global model accuracy, but the chemistry of models generated with Rosetta is improved. PconsFold is a fully automated pipeline for ab initio protein structure prediction based on evolutionary information. PconsFold is based on PconsC contact prediction and uses the Rosetta folding protocol. Due to its modularity, the contact prediction tool can be easily exchanged. The source code of PconsFold is available on GitHub at https://www.github.com/ElofssonLab/pcons-fold under the MIT license. PconsC is available from http://c.pcons.net/. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.

  5. A new computational strategy for predicting essential genes.

    PubMed

    Cheng, Jian; Wu, Wenwu; Zhang, Yinwen; Li, Xiangchen; Jiang, Xiaoqian; Wei, Gehong; Tao, Shiheng

    2013-12-21

    Determination of the minimum gene set for cellular life is one of the central goals in biology. Genome-wide essential gene identification has progressed rapidly in certain bacterial species; however, it remains difficult to achieve in most eukaryotic species. Several computational models have recently been developed to integrate gene features and used as alternatives to transfer gene essentiality annotations between organisms. We first collected features that were widely used by previous predictive models and assessed the relationships between gene features and gene essentiality using a stepwise regression model. We found two issues that could significantly reduce model accuracy: (i) the effect of multicollinearity among gene features and (ii) the diverse and even contrasting correlations between gene features and gene essentiality existing within and among different species. To address these issues, we developed a novel model called feature-based weighted Naïve Bayes model (FWM), which is based on Naïve Bayes classifiers, logistic regression, and genetic algorithm. The proposed model assesses features and filters out the effects of multicollinearity and diversity. The performance of FWM was compared with other popular models, such as support vector machine, Naïve Bayes model, and logistic regression model, by applying FWM to reciprocally predict essential genes among and within 21 species. Our results showed that FWM significantly improves the accuracy and robustness of essential gene prediction. FWM can remarkably improve the accuracy of essential gene prediction and may be used as an alternative method for other classification work. This method can contribute substantially to the knowledge of the minimum gene sets required for living organisms and the discovery of new drug targets.

  6. Wind Power Curve Modeling in Simple and Complex Terrain

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bulaevskaya, V.; Wharton, S.; Irons, Z.

    2015-02-09

    Our previous work on wind power curve modeling using statistical models focused on a location with a moderately complex terrain in the Altamont Pass region in northern California (CA). The work described here is the follow-up to that work, but at a location with a simple terrain in northern Oklahoma (OK). The goal of the present analysis was to determine the gain in predictive ability afforded by adding information beyond the hub-height wind speed, such as wind speeds at other heights, as well as other atmospheric variables, to the power prediction model at this new location and compare the resultsmore » to those obtained at the CA site in the previous study. While we reach some of the same conclusions at both sites, many results reported for the CA site do not hold at the OK site. In particular, using the entire vertical profile of wind speeds improves the accuracy of wind power prediction relative to using the hub-height wind speed alone at both sites. However, in contrast to the CA site, the rotor equivalent wind speed (REWS) performs almost as well as the entire profile at the OK site. Another difference is that at the CA site, adding wind veer as a predictor significantly improved the power prediction accuracy. The same was true for that site when air density was added to the model separately instead of using the standard air density adjustment. At the OK site, these additional variables result in no significant benefit for the prediction accuracy.« less

  7. [Ways to improve measurement accuracy of blood glucose sensing by mid-infrared spectroscopy].

    PubMed

    Wang, Yan; Li, Ning; Xu, Kexin

    2006-06-01

    Mid-infrared (MIR) spectroscopy is applicable to blood glucose sensing without using any reagent, however, due to a result of inadequate accuracy, till now this method has not been used in clinical detection. The principle and key technologies of blood glucose sensing by MIR spectroscopy are presented in this paper. Along with our experimental results, the paper analyzes ways to enhance measurement accuracy and prediction accuracy by the following four methods: selection of optimized spectral region; application of spectra data processing method; elimination of the interference with other components in the blood, and promotion in system hardware. According to these four improving methods, we designed four experiments, i.e., strict determination of the region where glucose concentration changes most sensitively in MIR, application of genetic algorithm for wavelength selection, normalization of spectra for the purpose of enhancing measuring reproduction, and utilization of CO2 laser as light source. The results show that the measurement accuracy of blood glucose concentration is enhanced almost to a clinical detection level.

  8. Genomic prediction using different estimation methodology, blending and cross-validation techniques for growth traits and visual scores in Hereford and Braford cattle.

    PubMed

    Campos, G S; Reimann, F A; Cardoso, L L; Ferreira, C E R; Junqueira, V S; Schmidt, P I; Braccini Neto, J; Yokoo, M J I; Sollero, B P; Boligon, A A; Cardoso, F F

    2018-05-07

    The objective of the present study was to evaluate the accuracy and bias of direct and blended genomic predictions using different methods and cross-validation techniques for growth traits (weight and weight gains) and visual scores (conformation, precocity, muscling and size) obtained at weaning and at yearling in Hereford and Braford breeds. Phenotypic data contained 126,290 animals belonging to the Delta G Connection genetic improvement program, and a set of 3,545 animals genotyped with the 50K chip and 131 sires with the 777K. After quality control, 41,045 markers remained for all animals. An animal model was used to estimate (co)variances components and to predict breeding values, which were later used to calculate the deregressed estimated breeding values (DEBV). Animals with genotype and phenotype for the traits studied were divided into four or five groups by random and k-means clustering cross-validation strategies. The values of accuracy of the direct genomic values (DGV) were moderate to high magnitude for at weaning and at yearling traits, ranging from 0.19 to 0.45 for the k-means and 0.23 to 0.78 for random clustering among all traits. The greatest gain in relation to the pedigree BLUP (PBLUP) was 9.5% with the BayesB method with both the k-means and the random clustering. Blended genomic value accuracies ranged from 0.19 to 0.56 for k-means and from 0.21 to 0.82 for random clustering. The analyzes using the historical pedigree and phenotypes contributed additional information to calculate the GEBV and in general, the largest gains were for the single-step (ssGBLUP) method in bivariate analyses with a mean increase of 43.00% among all traits measured at weaning and of 46.27% for those evaluated at yearling. The accuracy values for the marker effects estimation methods were lower for k-means clustering, indicating that the training set relationship to the selection candidates is a major factor affecting accuracy of genomic predictions. The gains in accuracy obtained with genomic blending methods, mainly ssGBLUP in bivariate analyses, indicate that genomic predictions should be used as a tool to improve genetic gains in relation to the traditional PBLUP selection.

  9. Improving consensus contact prediction via server correlation reduction.

    PubMed

    Gao, Xin; Bu, Dongbo; Xu, Jinbo; Li, Ming

    2009-05-06

    Protein inter-residue contacts play a crucial role in the determination and prediction of protein structures. Previous studies on contact prediction indicate that although template-based consensus methods outperform sequence-based methods on targets with typical templates, such consensus methods perform poorly on new fold targets. However, we find out that even for new fold targets, the models generated by threading programs can contain many true contacts. The challenge is how to identify them. In this paper, we develop an integer linear programming model for consensus contact prediction. In contrast to the simple majority voting method assuming that all the individual servers are equally important and independent, the newly developed method evaluates their correlation by using maximum likelihood estimation and extracts independent latent servers from them by using principal component analysis. An integer linear programming method is then applied to assign a weight to each latent server to maximize the difference between true contacts and false ones. The proposed method is tested on the CASP7 data set. If the top L/5 predicted contacts are evaluated where L is the protein size, the average accuracy is 73%, which is much higher than that of any previously reported study. Moreover, if only the 15 new fold CASP7 targets are considered, our method achieves an average accuracy of 37%, which is much better than that of the majority voting method, SVM-LOMETS, SVM-SEQ, and SAM-T06. These methods demonstrate an average accuracy of 13.0%, 10.8%, 25.8% and 21.2%, respectively. Reducing server correlation and optimally combining independent latent servers show a significant improvement over the traditional consensus methods. This approach can hopefully provide a powerful tool for protein structure refinement and prediction use.

  10. A sparse autoencoder-based deep neural network for protein solvent accessibility and contact number prediction.

    PubMed

    Deng, Lei; Fan, Chao; Zeng, Zhiwen

    2017-12-28

    Direct prediction of the three-dimensional (3D) structures of proteins from one-dimensional (1D) sequences is a challenging problem. Significant structural characteristics such as solvent accessibility and contact number are essential for deriving restrains in modeling protein folding and protein 3D structure. Thus, accurately predicting these features is a critical step for 3D protein structure building. In this study, we present DeepSacon, a computational method that can effectively predict protein solvent accessibility and contact number by using a deep neural network, which is built based on stacked autoencoder and a dropout method. The results demonstrate that our proposed DeepSacon achieves a significant improvement in the prediction quality compared with the state-of-the-art methods. We obtain 0.70 three-state accuracy for solvent accessibility, 0.33 15-state accuracy and 0.74 Pearson Correlation Coefficient (PCC) for the contact number on the 5729 monomeric soluble globular protein dataset. We also evaluate the performance on the CASP11 benchmark dataset, DeepSacon achieves 0.68 three-state accuracy and 0.69 PCC for solvent accessibility and contact number, respectively. We have shown that DeepSacon can reliably predict solvent accessibility and contact number with stacked sparse autoencoder and a dropout approach.

  11. Threshold models for genome-enabled prediction of ordinal categorical traits in plant breeding.

    PubMed

    Montesinos-López, Osval A; Montesinos-López, Abelardo; Pérez-Rodríguez, Paulino; de Los Campos, Gustavo; Eskridge, Kent; Crossa, José

    2014-12-23

    Categorical scores for disease susceptibility or resistance often are recorded in plant breeding. The aim of this study was to introduce genomic models for analyzing ordinal characters and to assess the predictive ability of genomic predictions for ordered categorical phenotypes using a threshold model counterpart of the Genomic Best Linear Unbiased Predictor (i.e., TGBLUP). The threshold model was used to relate a hypothetical underlying scale to the outward categorical response. We present an empirical application where a total of nine models, five without interaction and four with genomic × environment interaction (G×E) and genomic additive × additive × environment interaction (G×G×E), were used. We assessed the proposed models using data consisting of 278 maize lines genotyped with 46,347 single-nucleotide polymorphisms and evaluated for disease resistance [with ordinal scores from 1 (no disease) to 5 (complete infection)] in three environments (Colombia, Zimbabwe, and Mexico). Models with G×E captured a sizeable proportion of the total variability, which indicates the importance of introducing interaction to improve prediction accuracy. Relative to models based on main effects only, the models that included G×E achieved 9-14% gains in prediction accuracy; adding additive × additive interactions did not increase prediction accuracy consistently across locations. Copyright © 2015 Montesinos-López et al.

  12. Operational improvements of long-term predicted ephemerides of the Tracking and Data Relay Satellites (TDRSs)

    NASA Technical Reports Server (NTRS)

    Kostoff, J. L.; Ward, D. T.; Cuevas, O. O.; Beckman, R. M.

    1995-01-01

    Tracking and Data Relay Satellite (TDRS) orbit determination and prediction are supported by the Flight Dynamics Facility (FDF) of the Goddard Space Flight Center (GSFC) Flight Dynamics Division (FDD). TDRS System (TDRSS)-user satellites require predicted TDRS ephemerides that are up to 10 weeks in length. Previously, long-term ephemerides generated by the FDF included predictions from the White Sands Complex (WSC), which plans and executes TDRS maneuvers. TDRSs typically have monthly stationkeeping maneuvers, and predicted postmaneuver state vectors are received from WSC up to a month in advance. This paper presents the results of an analysis performed in the FDF to investigate more accurate and economical long-term ephemerides for the TDRSs. As a result of this analysis, two new methods for generating long-term TDRS ephemeris predictions have been implemented by the FDF. The Center-of-Box (COB) method models a TDRS as fixed at the center of its stationkeeping box. Using this method, long-term ephemeris updates are made semiannually instead of weekly. The impulse method is used to model more maneuvers. The impulse method yields better short-term accuracy than the COB method, especially for larger stationkeeping boxes. The accuracy of the impulse method depends primarily on the accuracy of maneuver date forecasting.

  13. Axisymmetric inlet minimum weight design method

    NASA Technical Reports Server (NTRS)

    Nadell, Shari-Beth

    1995-01-01

    An analytical method for determining the minimum weight design of an axisymmetric supersonic inlet has been developed. The goal of this method development project was to improve the ability to predict the weight of high-speed inlets in conceptual and preliminary design. The initial model was developed using information that was available from inlet conceptual design tools (e.g., the inlet internal and external geometries and pressure distributions). Stiffened shell construction was assumed. Mass properties were computed by analyzing a parametric cubic curve representation of the inlet geometry. Design loads and stresses were developed at analysis stations along the length of the inlet. The equivalent minimum structural thicknesses for both shell and frame structures required to support the maximum loads produced by various load conditions were then determined. Preliminary results indicated that inlet hammershock pressures produced the critical design load condition for a significant portion of the inlet. By improving the accuracy of inlet weight predictions, the method will improve the fidelity of propulsion and vehicle design studies and increase the accuracy of weight versus cost studies.

  14. A universal deep learning approach for modeling the flow of patients under different severities.

    PubMed

    Jiang, Shancheng; Chin, Kwai-Sang; Tsui, Kwok L

    2018-02-01

    The Accident and Emergency Department (A&ED) is the frontline for providing emergency care in hospitals. Unfortunately, relative A&ED resources have failed to keep up with continuously increasing demand in recent years, which leads to overcrowding in A&ED. Knowing the fluctuation of patient arrival volume in advance is a significant premise to relieve this pressure. Based on this motivation, the objective of this study is to explore an integrated framework with high accuracy for predicting A&ED patient flow under different triage levels, by combining a novel feature selection process with deep neural networks. Administrative data is collected from an actual A&ED and categorized into five groups based on different triage levels. A genetic algorithm (GA)-based feature selection algorithm is improved and implemented as a pre-processing step for this time-series prediction problem, in order to explore key features affecting patient flow. In our improved GA, a fitness-based crossover is proposed to maintain the joint information of multiple features during iterative process, instead of traditional point-based crossover. Deep neural networks (DNN) is employed as the prediction model to utilize their universal adaptability and high flexibility. In the model-training process, the learning algorithm is well-configured based on a parallel stochastic gradient descent algorithm. Two effective regularization strategies are integrated in one DNN framework to avoid overfitting. All introduced hyper-parameters are optimized efficiently by grid-search in one pass. As for feature selection, our improved GA-based feature selection algorithm has outperformed a typical GA and four state-of-the-art feature selection algorithms (mRMR, SAFS, VIFR, and CFR). As for the prediction accuracy of proposed integrated framework, compared with other frequently used statistical models (GLM, seasonal-ARIMA, ARIMAX, and ANN) and modern machine models (SVM-RBF, SVM-linear, RF, and R-LASSO), the proposed integrated "DNN-I-GA" framework achieves higher prediction accuracy on both MAPE and RMSE metrics in pairwise comparisons. The contribution of our study is two-fold. Theoretically, the traditional GA-based feature selection process is improved to have less hyper-parameters and higher efficiency, and the joint information of multiple features is maintained by fitness-based crossover operator. The universal property of DNN is further enhanced by merging different regularization strategies. Practically, features selected by our improved GA can be used to acquire an underlying relationship between patient flows and input features. Predictive values are significant indicators of patients' demand and can be used by A&ED managers to make resource planning and allocation. High accuracy achieved by the present framework in different cases enhances the reliability of downstream decision makings. Copyright © 2017 Elsevier B.V. All rights reserved.

  15. Improved nonlinear prediction method

    NASA Astrophysics Data System (ADS)

    Adenan, Nur Hamiza; Md Noorani, Mohd Salmi

    2014-06-01

    The analysis and prediction of time series data have been addressed by researchers. Many techniques have been developed to be applied in various areas, such as weather forecasting, financial markets and hydrological phenomena involving data that are contaminated by noise. Therefore, various techniques to improve the method have been introduced to analyze and predict time series data. In respect of the importance of analysis and the accuracy of the prediction result, a study was undertaken to test the effectiveness of the improved nonlinear prediction method for data that contain noise. The improved nonlinear prediction method involves the formation of composite serial data based on the successive differences of the time series. Then, the phase space reconstruction was performed on the composite data (one-dimensional) to reconstruct a number of space dimensions. Finally the local linear approximation method was employed to make a prediction based on the phase space. This improved method was tested with data series Logistics that contain 0%, 5%, 10%, 20% and 30% of noise. The results show that by using the improved method, the predictions were found to be in close agreement with the observed ones. The correlation coefficient was close to one when the improved method was applied on data with up to 10% noise. Thus, an improvement to analyze data with noise without involving any noise reduction method was introduced to predict the time series data.

  16. Improving default risk prediction using Bayesian model uncertainty techniques.

    PubMed

    Kazemi, Reza; Mosleh, Ali

    2012-11-01

    Credit risk is the potential exposure of a creditor to an obligor's failure or refusal to repay the debt in principal or interest. The potential of exposure is measured in terms of probability of default. Many models have been developed to estimate credit risk, with rating agencies dating back to the 19th century. They provide their assessment of probability of default and transition probabilities of various firms in their annual reports. Regulatory capital requirements for credit risk outlined by the Basel Committee on Banking Supervision have made it essential for banks and financial institutions to develop sophisticated models in an attempt to measure credit risk with higher accuracy. The Bayesian framework proposed in this article uses the techniques developed in physical sciences and engineering for dealing with model uncertainty and expert accuracy to obtain improved estimates of credit risk and associated uncertainties. The approach uses estimates from one or more rating agencies and incorporates their historical accuracy (past performance data) in estimating future default risk and transition probabilities. Several examples demonstrate that the proposed methodology can assess default probability with accuracy exceeding the estimations of all the individual models. Moreover, the methodology accounts for potentially significant departures from "nominal predictions" due to "upsetting events" such as the 2008 global banking crisis. © 2012 Society for Risk Analysis.

  17. Modified linear predictive coding approach for moving target tracking by Doppler radar

    NASA Astrophysics Data System (ADS)

    Ding, Yipeng; Lin, Xiaoyi; Sun, Ke-Hui; Xu, Xue-Mei; Liu, Xi-Yao

    2016-07-01

    Doppler radar is a cost-effective tool for moving target tracking, which can support a large range of civilian and military applications. A modified linear predictive coding (LPC) approach is proposed to increase the target localization accuracy of the Doppler radar. Based on the time-frequency analysis of the received echo, the proposed approach first real-time estimates the noise statistical parameters and constructs an adaptive filter to intelligently suppress the noise interference. Then, a linear predictive model is applied to extend the available data, which can help improve the resolution of the target localization result. Compared with the traditional LPC method, which empirically decides the extension data length, the proposed approach develops an error array to evaluate the prediction accuracy and thus, adjust the optimum extension data length intelligently. Finally, the prediction error array is superimposed with the predictor output to correct the prediction error. A series of experiments are conducted to illustrate the validity and performance of the proposed techniques.

  18. How long will my mouse live? Machine learning approaches for prediction of mouse life span.

    PubMed

    Swindell, William R; Harper, James M; Miller, Richard A

    2008-09-01

    Prediction of individual life span based on characteristics evaluated at middle-age represents a challenging objective for aging research. In this study, we used machine learning algorithms to construct models that predict life span in a stock of genetically heterogeneous mice. Life-span prediction accuracy of 22 algorithms was evaluated using a cross-validation approach, in which models were trained and tested with distinct subsets of data. Using a combination of body weight and T-cell subset measures evaluated before 2 years of age, we show that the life-span quartile to which an individual mouse belongs can be predicted with an accuracy of 35.3% (+/-0.10%). This result provides a new benchmark for the development of life-span-predictive models, but improvement can be expected through identification of new predictor variables and development of computational approaches. Future work in this direction can provide tools for aging research and will shed light on associations between phenotypic traits and longevity.

  19. An Improved Method of AGM for High Precision Geolocation of SAR Images

    NASA Astrophysics Data System (ADS)

    Zhou, G.; He, C.; Yue, T.; Huang, W.; Huang, Y.; Li, X.; Chen, Y.

    2018-05-01

    In order to take full advantage of SAR images, it is necessary to obtain the high precision location of the image. During the geometric correction process of images, to ensure the accuracy of image geometric correction and extract the effective mapping information from the images, precise image geolocation is important. This paper presents an improved analytical geolocation method (IAGM) that determine the high precision geolocation of each pixel in a digital SAR image. This method is based on analytical geolocation method (AGM) proposed by X. K. Yuan aiming at realizing the solution of RD model. Tests will be conducted using RADARSAT-2 SAR image. Comparing the predicted feature geolocation with the position as determined by high precision orthophoto, results indicate an accuracy of 50m is attainable with this method. Error sources will be analyzed and some recommendations about improving image location accuracy in future spaceborne SAR's will be given.

  20. Alcohol-related hot-spot analysis and prediction : final report.

    DOT National Transportation Integrated Search

    2017-05-01

    This project developed methods to more accurately identify alcohol-related crash hot spots, ultimately allowing for more effective and efficient enforcement and safety campaigns. Advancements in accuracy came from improving the calculation of spatial...

  1. Improving Alcohol Screening for College Students: Screening for Alcohol Misuse amongst College Students with a Simple Modification to the CAGE Questionnaire

    ERIC Educational Resources Information Center

    Taylor, Purcell; El-Sabawi, Taleed; Cangin, Causenge

    2016-01-01

    Objective: To improve the CAGE (Cut down, Annoyed, Guilty, Eye opener) questionnaire's predictive accuracy in screening college students. Participants: The sample consisted of 219 midwestern university students who self-administered a confidential survey. Methods: Exploratory factor analysis, confirmatory factor analysis, receiver operating…

  2. Improving High-Resolution Weather Forecasts Using the Weather Research and Forecasting (WRF) Model with an Updated Kain–Fritsch Scheme

    EPA Science Inventory

    Efforts to improve the prediction accuracy of high-resolution (1–10 km) surface precipitation distribution and variability are of vital importance to local aspects of air pollution, wet deposition, and regional climate. However, precipitation biases and errors can occur at ...

  3. Predicting the biological condition of streams: Use of geospatial indicators of natural and anthropogenic characteristics of watersheds

    USGS Publications Warehouse

    Carlisle, D.M.; Falcone, J.; Meador, M.R.

    2009-01-01

    We developed and evaluated empirical models to predict biological condition of wadeable streams in a large portion of the eastern USA, with the ultimate goal of prediction for unsampled basins. Previous work had classified (i.e., altered vs. unaltered) the biological condition of 920 streams based on a biological assessment of macroinvertebrate assemblages. Predictor variables were limited to widely available geospatial data, which included land cover, topography, climate, soils, societal infrastructure, and potential hydrologic modification. We compared the accuracy of predictions of biological condition class based on models with continuous and binary responses. We also evaluated the relative importance of specific groups and individual predictor variables, as well as the relationships between the most important predictors and biological condition. Prediction accuracy and the relative importance of predictor variables were different for two subregions for which models were created. Predictive accuracy in the highlands region improved by including predictors that represented both natural and human activities. Riparian land cover and road-stream intersections were the most important predictors. In contrast, predictive accuracy in the lowlands region was best for models limited to predictors representing natural factors, including basin topography and soil properties. Partial dependence plots revealed complex and nonlinear relationships between specific predictors and the probability of biological alteration. We demonstrate a potential application of the model by predicting biological condition in 552 unsampled basins across an ecoregion in southeastern Wisconsin (USA). Estimates of the likelihood of biological condition of unsampled streams could be a valuable tool for screening large numbers of basins to focus targeted monitoring of potentially unaltered or altered stream segments. ?? Springer Science+Business Media B.V. 2008.

  4. A time series modeling approach in risk appraisal of violent and sexual recidivism.

    PubMed

    Bani-Yaghoub, Majid; Fedoroff, J Paul; Curry, Susan; Amundsen, David E

    2010-10-01

    For over half a century, various clinical and actuarial methods have been employed to assess the likelihood of violent recidivism. Yet there is a need for new methods that can improve the accuracy of recidivism predictions. This study proposes a new time series modeling approach that generates high levels of predictive accuracy over short and long periods of time. The proposed approach outperformed two widely used actuarial instruments (i.e., the Violence Risk Appraisal Guide and the Sex Offender Risk Appraisal Guide). Furthermore, analysis of temporal risk variations based on specific time series models can add valuable information into risk assessment and management of violent offenders.

  5. iCN718, an Updated and Improved Genome-Scale Metabolic Network Reconstruction of Acinetobacter baumannii AYE.

    PubMed

    Norsigian, Charles J; Kavvas, Erol; Seif, Yara; Palsson, Bernhard O; Monk, Jonathan M

    2018-01-01

    Acinetobacter baumannii has become an urgent clinical threat due to the recent emergence of multi-drug resistant strains. There is thus a significant need to discover new therapeutic targets in this organism. One means for doing so is through the use of high-quality genome-scale reconstructions. Well-curated and accurate genome-scale models (GEMs) of A. baumannii would be useful for improving treatment options. We present an updated and improved genome-scale reconstruction of A. baumannii AYE, named iCN718, that improves and standardizes previous A. baumannii AYE reconstructions. iCN718 has 80% accuracy for predicting gene essentiality data and additionally can predict large-scale phenotypic data with as much as 89% accuracy, a new capability for an A. baumannii reconstruction. We further demonstrate that iCN718 can be used to analyze conserved metabolic functions in the A. baumannii core genome and to build strain-specific GEMs of 74 other A. baumannii strains from genome sequence alone. iCN718 will serve as a resource to integrate and synthesize new experimental data being generated for this urgent threat pathogen.

  6. Decision curve analysis assessing the clinical benefit of NMP22 in the detection of bladder cancer: secondary analysis of a prospective trial.

    PubMed

    Barbieri, Christopher E; Cha, Eugene K; Chromecki, Thomas F; Dunning, Allison; Lotan, Yair; Svatek, Robert S; Scherr, Douglas S; Karakiewicz, Pierre I; Sun, Maxine; Mazumdar, Madhu; Shariat, Shahrokh F

    2012-03-01

    • To employ decision curve analysis to determine the impact of nuclear matrix protein 22 (NMP22) on clinical decision making in the detection of bladder cancer using data from a prospective trial. • The study included 1303 patients at risk for bladder cancer who underwent cystoscopy, urine cytology and measurement of urinary NMP22 levels. • We constructed several prediction models to estimate risk of bladder cancer. The base model was generated using patient characteristics (age, gender, race, smoking and haematuria); cytology and NMP22 were added to the base model to determine effects on predictive accuracy. • Clinical net benefit was calculated by summing the benefits and subtracting the harms and weighting these by the threshold probability at which a patient or clinician would opt for cystoscopy. • In all, 72 patients were found to have bladder cancer (5.5%). In univariate analyses, NMP22 was the strongest predictor of bladder cancer presence (predictive accuracy 71.3%), followed by age (67.5%) and cytology (64.3%). • In multivariable prediction models, NMP22 improved the predictive accuracy of the base model by 8.2% (area under the curve 70.2-78.4%) and of the base model plus cytology by 4.2% (area under the curve 75.9-80.1%). • Decision curve analysis revealed that adding NMP22 to other models increased clinical benefit, particularly at higher threshold probabilities. • NMP22 is a strong, independent predictor of bladder cancer. • Addition of NMP22 improves the accuracy of standard predictors by a statistically and clinically significant margin. • Decision curve analysis suggests that integration of NMP22 into clinical decision making helps avoid unnecessary cystoscopies, with minimal increased risk of missing a cancer. © 2011 THE AUTHORS. BJU INTERNATIONAL © 2011 BJU INTERNATIONAL.

  7. Video image analysis in the Australian meat industry - precision and accuracy of predicting lean meat yield in lamb carcasses.

    PubMed

    Hopkins, D L; Safari, E; Thompson, J M; Smith, C R

    2004-06-01

    A wide selection of lamb types of mixed sex (ewes and wethers) were slaughtered at a commercial abattoir and during this process images of 360 carcasses were obtained online using the VIAScan® system developed by Meat and Livestock Australia. Soft tissue depth at the GR site (thickness of tissue over the 12th rib 110 mm from the midline) was measured by an abattoir employee using the AUS-MEAT sheep probe (PGR). Another measure of this thickness was taken in the chiller using a GR knife (NGR). Each carcass was subsequently broken down to a range of trimmed boneless retail cuts and the lean meat yield determined. The current industry model for predicting meat yield uses hot carcass weight (HCW) and tissue depth at the GR site. A low level of accuracy and precision was found when HCW and PGR were used to predict lean meat yield (R(2)=0.19, r.s.d.=2.80%), which could be improved markedly when PGR was replaced by NGR (R(2)=0.41, r.s.d.=2.39%). If the GR measures were replaced by 8 VIAScan® measures then greater prediction accuracy could be achieved (R(2)=0.52, r.s.d.=2.17%). A similar result was achieved when the model was based on principal components (PCs) computed from the 8 VIAScan® measures (R(2)=0.52, r.s.d.=2.17%). The use of PCs also improved the stability of the model compared to a regression model based on HCW and NGR. The transportability of the models was tested by randomly dividing the data set and comparing coefficients and the level of accuracy and precision. Those models based on PCs were superior to those based on regression. It is demonstrated that with the appropriate modeling the VIAScan® system offers a workable method for predicting lean meat yield automatically.

  8. Thermal cut-off response modelling of universal motors

    NASA Astrophysics Data System (ADS)

    Thangaveloo, Kashveen; Chin, Yung Shin

    2017-04-01

    This paper presents a model to predict the thermal cut-off (TCO) response behaviour in universal motors. The mathematical model includes the calculations of heat loss in the universal motor and the flow characteristics around the TCO component which together are the main parameters for TCO response prediction. In order to accurately predict the TCO component temperature, factors like the TCO component resistance, the effect of ambient, and the flow conditions through the motor are taken into account to improve the prediction accuracy of the model.

  9. Improvement of attention with amphetamine in low- and high-performing rats.

    PubMed

    Turner, Karly M; Burne, Thomas H J

    2016-09-01

    Attentional deficits occur in a range of neuropsychiatric disorders, such as schizophrenia and attention deficit hyperactivity disorder. Psychostimulants are one of the main treatments for attentional deficits, yet there are limited reports of procognitive effects of amphetamine in preclinical studies. Therefore, task development may be needed to improve predictive validity when measuring attention in rodents. This study aimed to use a modified signal detection task (SDT) to determine if and at what doses amphetamine could improve attention in rats. Sprague-Dawley rats were trained on the SDT prior to amphetamine challenge (0.1, 0.25, 0.75 and 1.25 mg/kg). This dose range was predicted to enhance and disrupt cognition with the effect differing between individuals depending on baseline performance. Acute low dose amphetamine (0.1 and 0.25 mg/kg) improved accuracy, while the highest dose (1.25 mg/kg) significantly disrupted performance. The effects differed for low- and high-performing groups across these doses. The effect of amphetamine on accuracy was found to significantly correlate with baseline performance in rats. This study demonstrates that improvement in attentional performance with systemic amphetamine is dependent on baseline accuracy in rats. Indicative of the inverted U-shaped relationship between dopamine and cognition, there was a baseline-dependent shift in performance with increasing doses of amphetamine. The SDT may be a useful tool for investigating individual differences in attention and response to psychostimulants in rodents.

  10. Evaluation of new techniques for the calculation of internal recirculating flows

    NASA Technical Reports Server (NTRS)

    Van Doormaal, J. P.; Turan, A.; Raithby, G. D.

    1987-01-01

    The performance of discrete methods for the prediction of fluid flows can be enhanced by improving the convergence rate of solvers and by increasing the accuracy of the discrete representation of the equations of motion. This paper evaluates the gains in solver performance that are available when various acceleration methods are applied. Various discretizations are also examined and two are recommended because of their accuracy and robustness. Insertion of the improved discretization and solver accelerator into a TEACH code, that has been widely applied to combustor flows, illustrates the substantial gains that can be achieved.

  11. Development and validation of a preoperative prediction model for colorectal cancer T-staging based on MDCT images and clinical information.

    PubMed

    Sa, Sha; Li, Jing; Li, Xiaodong; Li, Yongrui; Liu, Xiaoming; Wang, Defeng; Zhang, Huimao; Fu, Yu

    2017-08-15

    This study aimed to establish and evaluate the efficacy of a prediction model for colorectal cancer T-staging. T-staging was positively correlated with the level of carcinoembryonic antigen (CEA), expression of carbohydrate antigen 19-9 (CA19-9), wall deformity, blurred outer edges, fat infiltration, infiltration into the surrounding tissue, tumor size and wall thickness. Age, location, enhancement rate and enhancement homogeneity were negatively correlated with T-staging. The predictive results of the model were consistent with the pathological gold standard, and the kappa value was 0.805. The total accuracy of staging improved from 51.04% to 86.98% with the proposed model. The clinical, imaging and pathological data of 611 patients with colorectal cancer (419 patients in the training group and 192 patients in the validation group) were collected. A spearman correlation analysis was used to validate the relationship among these factors and pathological T-staging. A prediction model was trained with the random forest algorithm. T staging of the patients in the validation group was predicted by both prediction model and traditional method. The consistency, accuracy, sensitivity, specificity and area under the curve (AUC) were used to compare the efficacy of the two methods. The newly established comprehensive model can improve the predictive efficiency of preoperative colorectal cancer T-staging.

  12. Pedaling rate is an important determinant of human oxygen uptake during exercise on the cycle ergometer

    PubMed Central

    Formenti, Federico; Minetti, Alberto E; Borrani, Fabio

    2015-01-01

    Estimation of human oxygen uptake () during exercise is often used as an alternative when its direct measurement is not feasible. The American College of Sports Medicine (ACSM) suggests estimating human during exercise on a cycle ergometer through an equation that considers individual's body mass and external work rate, but not pedaling rate (PR). We hypothesized that including PR in the ACSM equation would improve its prediction accuracy. Ten healthy male participants’ (age 19–48 years) were recruited and their steady-state was recorded on a cycle ergometer for 16 combinations of external work rates (0, 50, 100, and 150 W) and PR (50, 70, 90, and 110 revolutions per minute). was calculated by means of a new equation, and by the ACSM equation for comparison. Kinematic data were collected by means of an infrared 3-D motion analysis system in order to explore the mechanical determinants of . Including PR in the ACSM equation improved the accuracy for prediction of sub-maximal during exercise (mean bias 1.9 vs. 3.3 mL O2 kg−1 min−1) but it did not affect the accuracy for prediction of maximal (P > 0.05). Confirming the validity of this new equation, the results were replicated for data reported in the literature in 51 participants. We conclude that PR is an important determinant of human during cycling exercise, and it should be considered when predicting oxygen consumption. PMID:26371230

  13. MeSHLabeler: improving the accuracy of large-scale MeSH indexing by integrating diverse evidence.

    PubMed

    Liu, Ke; Peng, Shengwen; Wu, Junqiu; Zhai, Chengxiang; Mamitsuka, Hiroshi; Zhu, Shanfeng

    2015-06-15

    Medical Subject Headings (MeSHs) are used by National Library of Medicine (NLM) to index almost all citations in MEDLINE, which greatly facilitates the applications of biomedical information retrieval and text mining. To reduce the time and financial cost of manual annotation, NLM has developed a software package, Medical Text Indexer (MTI), for assisting MeSH annotation, which uses k-nearest neighbors (KNN), pattern matching and indexing rules. Other types of information, such as prediction by MeSH classifiers (trained separately), can also be used for automatic MeSH annotation. However, existing methods cannot effectively integrate multiple evidence for MeSH annotation. We propose a novel framework, MeSHLabeler, to integrate multiple evidence for accurate MeSH annotation by using 'learning to rank'. Evidence includes numerous predictions from MeSH classifiers, KNN, pattern matching, MTI and the correlation between different MeSH terms, etc. Each MeSH classifier is trained independently, and thus prediction scores from different classifiers are incomparable. To address this issue, we have developed an effective score normalization procedure to improve the prediction accuracy. MeSHLabeler won the first place in Task 2A of 2014 BioASQ challenge, achieving the Micro F-measure of 0.6248 for 9,040 citations provided by the BioASQ challenge. Note that this accuracy is around 9.15% higher than 0.5724, obtained by MTI. The software is available upon request. © The Author 2015. Published by Oxford University Press.

  14. MeSHLabeler: improving the accuracy of large-scale MeSH indexing by integrating diverse evidence

    PubMed Central

    Liu, Ke; Peng, Shengwen; Wu, Junqiu; Zhai, Chengxiang; Mamitsuka, Hiroshi; Zhu, Shanfeng

    2015-01-01

    Motivation: Medical Subject Headings (MeSHs) are used by National Library of Medicine (NLM) to index almost all citations in MEDLINE, which greatly facilitates the applications of biomedical information retrieval and text mining. To reduce the time and financial cost of manual annotation, NLM has developed a software package, Medical Text Indexer (MTI), for assisting MeSH annotation, which uses k-nearest neighbors (KNN), pattern matching and indexing rules. Other types of information, such as prediction by MeSH classifiers (trained separately), can also be used for automatic MeSH annotation. However, existing methods cannot effectively integrate multiple evidence for MeSH annotation. Methods: We propose a novel framework, MeSHLabeler, to integrate multiple evidence for accurate MeSH annotation by using ‘learning to rank’. Evidence includes numerous predictions from MeSH classifiers, KNN, pattern matching, MTI and the correlation between different MeSH terms, etc. Each MeSH classifier is trained independently, and thus prediction scores from different classifiers are incomparable. To address this issue, we have developed an effective score normalization procedure to improve the prediction accuracy. Results: MeSHLabeler won the first place in Task 2A of 2014 BioASQ challenge, achieving the Micro F-measure of 0.6248 for 9,040 citations provided by the BioASQ challenge. Note that this accuracy is around 9.15% higher than 0.5724, obtained by MTI. Availability and implementation: The software is available upon request. Contact: zhusf@fudan.edu.cn PMID:26072501

  15. Community detection in complex networks using link prediction

    NASA Astrophysics Data System (ADS)

    Cheng, Hui-Min; Ning, Yi-Zi; Yin, Zhao; Yan, Chao; Liu, Xin; Zhang, Zhong-Yuan

    2018-01-01

    Community detection and link prediction are both of great significance in network analysis, which provide very valuable insights into topological structures of the network from different perspectives. In this paper, we propose a novel community detection algorithm with inclusion of link prediction, motivated by the question whether link prediction can be devoted to improving the accuracy of community partition. For link prediction, we propose two novel indices to compute the similarity between each pair of nodes, one of which aims to add missing links, and the other tries to remove spurious edges. Extensive experiments are conducted on benchmark data sets, and the results of our proposed algorithm are compared with two classes of baselines. In conclusion, our proposed algorithm is competitive, revealing that link prediction does improve the precision of community detection.

  16. Increased prognostic accuracy of TBI when a brain electrical activity biomarker is added to loss of consciousness (LOC).

    PubMed

    Hack, Dallas; Huff, J Stephen; Curley, Kenneth; Naunheim, Roseanne; Ghosh Dastidar, Samanwoy; Prichep, Leslie S

    2017-07-01

    Extremely high accuracy for predicting CT+ traumatic brain injury (TBI) using a quantitative EEG (QEEG) based multivariate classification algorithm was demonstrated in an independent validation trial, in Emergency Department (ED) patients, using an easy to use handheld device. This study compares the predictive power using that algorithm (which includes LOC and amnesia), to the predictive power of LOC alone or LOC plus traumatic amnesia. ED patients 18-85years presenting within 72h of closed head injury, with GSC 12-15, were study candidates. 680 patients with known absence or presence of LOC were enrolled (145 CT+ and 535 CT- patients). 5-10min of eyes closed EEG was acquired using the Ahead 300 handheld device, from frontal and frontotemporal regions. The same classification algorithm methodology was used for both the EEG based and the LOC based algorithms. Predictive power was evaluated using area under the ROC curve (AUC) and odds ratios. The QEEG based classification algorithm demonstrated significant improvement in predictive power compared with LOC alone, both in improved AUC (83% improvement) and odds ratio (increase from 4.65 to 16.22). Adding RGA and/or PTA to LOC was not improved over LOC alone. Rapid triage of TBI relies on strong initial predictors. Addition of an electrophysiological based marker was shown to outperform report of LOC alone or LOC plus amnesia, in determining risk of an intracranial bleed. In addition, ease of use at point-of-care, non-invasive, and rapid result using such technology suggests significant value added to standard clinical prediction. Copyright © 2017 Elsevier Inc. All rights reserved.

  17. Improving a two-equation eddy-viscosity turbulence model to predict the aerodynamic performance of thick wind turbine airfoils

    NASA Astrophysics Data System (ADS)

    Bangga, Galih; Kusumadewi, Tri; Hutomo, Go; Sabila, Ahmad; Syawitri, Taurista; Setiadi, Herlambang; Faisal, Muhamad; Wiranegara, Raditya; Hendranata, Yongki; Lastomo, Dwi; Putra, Louis; Kristiadi, Stefanus

    2018-03-01

    Numerical simulations for relatively thick airfoils are carried out in the present studies. An attempt to improve the accuracy of the numerical predictions is done by adjusting the turbulent viscosity of the eddy-viscosity Menter Shear-Stress-Transport (SST) model. The modification involves the addition of a damping factor on the wall-bounded flows incorporating the ratio of the turbulent kinetic energy to its specific dissipation rate for separation detection. The results are compared with available experimental data and CFD simulations using the original Menter SST model. The present model improves the lift polar prediction even though the stall angle is still overestimated. The improvement is caused by the better prediction of separated flow under a strong adverse pressure gradient. The results show that the Reynolds stresses are damped near the wall causing variation of the logarithmic velocity profiles.

  18. Femtosecond laser micromachining of compound parabolic concentrator fiber tipped glucose sensors.

    PubMed

    Hassan, Hafeez Ul; Lacraz, Amédée; Kalli, Kyriacos; Bang, Ole

    2017-03-01

    We report on highly accurate femtosecond (fs) laser micromachining of a compound parabolic concentrator (CPC) fiber tip on a polymer optical fiber (POF). The accuracy is reflected in an unprecedented correspondence between the numerically predicted and experimentally found improvement in fluorescence pickup efficiency of a Förster resonance energy transfer-based POF glucose sensor. A Zemax model of the CPC-tipped sensor predicts an optimal improvement of a factor of 3.96 compared to the sensor with a plane-cut fiber tip. The fs laser micromachined CPC tip showed an increase of a factor of 3.5, which is only 11.6% from the predicted value. Earlier state-of-the-art fabrication of the CPC-shaped tip by fiber tapering was of so poor quality that the actual improvement was 43% lower than the predicted improvement of the ideal CPC shape.

  19. Femtosecond laser micromachining of compound parabolic concentrator fiber tipped glucose sensors

    NASA Astrophysics Data System (ADS)

    Hassan, Hafeez Ul; Lacraz, Amédée; Kalli, Kyriacos; Bang, Ole

    2017-03-01

    We report on highly accurate femtosecond (fs) laser micromachining of a compound parabolic concentrator (CPC) fiber tip on a polymer optical fiber (POF). The accuracy is reflected in an unprecedented correspondence between the numerically predicted and experimentally found improvement in fluorescence pickup efficiency of a Förster resonance energy transfer-based POF glucose sensor. A Zemax model of the CPC-tipped sensor predicts an optimal improvement of a factor of 3.96 compared to the sensor with a plane-cut fiber tip. The fs laser micromachined CPC tip showed an increase of a factor of 3.5, which is only 11.6% from the predicted value. Earlier state-of-the-art fabrication of the CPC-shaped tip by fiber tapering was of so poor quality that the actual improvement was 43% lower than the predicted improvement of the ideal CPC shape.

  20. Predicting the accuracy of ligand overlay methods with Random Forest models.

    PubMed

    Nandigam, Ravi K; Evans, David A; Erickson, Jon A; Kim, Sangtae; Sutherland, Jeffrey J

    2008-12-01

    The accuracy of binding mode prediction using standard molecular overlay methods (ROCS, FlexS, Phase, and FieldCompare) is studied. Previous work has shown that simple decision tree modeling can be used to improve accuracy by selection of the best overlay template. This concept is extended to the use of Random Forest (RF) modeling for template and algorithm selection. An extensive data set of 815 ligand-bound X-ray structures representing 5 gene families was used for generating ca. 70,000 overlays using four programs. RF models, trained using standard measures of ligand and protein similarity and Lipinski-related descriptors, are used for automatically selecting the reference ligand and overlay method maximizing the probability of reproducing the overlay deduced from X-ray structures (i.e., using rmsd < or = 2 A as the criteria for success). RF model scores are highly predictive of overlay accuracy, and their use in template and method selection produces correct overlays in 57% of cases for 349 overlay ligands not used for training RF models. The inclusion in the models of protein sequence similarity enables the use of templates bound to related protein structures, yielding useful results even for proteins having no available X-ray structures.

  1. Prediction accuracies for growth and wood attributes of interior spruce in space using genotyping-by-sequencing.

    PubMed

    Gamal El-Dien, Omnia; Ratcliffe, Blaise; Klápště, Jaroslav; Chen, Charles; Porth, Ilga; El-Kassaby, Yousry A

    2015-05-09

    Genomic selection (GS) in forestry can substantially reduce the length of breeding cycle and increase gain per unit time through early selection and greater selection intensity, particularly for traits of low heritability and late expression. Affordable next-generation sequencing technologies made it possible to genotype large numbers of trees at a reasonable cost. Genotyping-by-sequencing was used to genotype 1,126 Interior spruce trees representing 25 open-pollinated families planted over three sites in British Columbia, Canada. Four imputation algorithms were compared (mean value (MI), singular value decomposition (SVD), expectation maximization (EM), and a newly derived, family-based k-nearest neighbor (kNN-Fam)). Trees were phenotyped for several yield and wood attributes. Single- and multi-site GS prediction models were developed using the Ridge Regression Best Linear Unbiased Predictor (RR-BLUP) and the Generalized Ridge Regression (GRR) to test different assumption about trait architecture. Finally, using PCA, multi-trait GS prediction models were developed. The EM and kNN-Fam imputation methods were superior for 30 and 60% missing data, respectively. The RR-BLUP GS prediction model produced better accuracies than the GRR indicating that the genetic architecture for these traits is complex. GS prediction accuracies for multi-site were high and better than those of single-sites while multi-site predictability produced the lowest accuracies reflecting type-b genetic correlations and deemed unreliable. The incorporation of genomic information in quantitative genetics analyses produced more realistic heritability estimates as half-sib pedigree tended to inflate the additive genetic variance and subsequently both heritability and gain estimates. Principle component scores as representatives of multi-trait GS prediction models produced surprising results where negatively correlated traits could be concurrently selected for using PCA2 and PCA3. The application of GS to open-pollinated family testing, the simplest form of tree improvement evaluation methods, was proven to be effective. Prediction accuracies obtained for all traits greatly support the integration of GS in tree breeding. While the within-site GS prediction accuracies were high, the results clearly indicate that single-site GS models ability to predict other sites are unreliable supporting the utilization of multi-site approach. Principle component scores provided an opportunity for the concurrent selection of traits with different phenotypic optima.

  2. Limitations and potentials of current motif discovery algorithms

    PubMed Central

    Hu, Jianjun; Li, Bin; Kihara, Daisuke

    2005-01-01

    Computational methods for de novo identification of gene regulation elements, such as transcription factor binding sites, have proved to be useful for deciphering genetic regulatory networks. However, despite the availability of a large number of algorithms, their strengths and weaknesses are not sufficiently understood. Here, we designed a comprehensive set of performance measures and benchmarked five modern sequence-based motif discovery algorithms using large datasets generated from Escherichia coli RegulonDB. Factors that affect the prediction accuracy, scalability and reliability are characterized. It is revealed that the nucleotide and the binding site level accuracy are very low, while the motif level accuracy is relatively high, which indicates that the algorithms can usually capture at least one correct motif in an input sequence. To exploit diverse predictions from multiple runs of one or more algorithms, a consensus ensemble algorithm has been developed, which achieved 6–45% improvement over the base algorithms by increasing both the sensitivity and specificity. Our study illustrates limitations and potentials of existing sequence-based motif discovery algorithms. Taking advantage of the revealed potentials, several promising directions for further improvements are discussed. Since the sequence-based algorithms are the baseline of most of the modern motif discovery algorithms, this paper suggests substantial improvements would be possible for them. PMID:16284194

  3. Case studies on forecasting for innovative technologies: frequent revisions improve accuracy.

    PubMed

    Lerner, Jeffrey C; Robertson, Diane C; Goldstein, Sara M

    2015-02-01

    Health technology forecasting is designed to provide reliable predictions about costs, utilization, diffusion, and other market realities before the technologies enter routine clinical use. In this article we address three questions central to forecasting's usefulness: Are early forecasts sufficiently accurate to help providers acquire the most promising technology and payers to set effective coverage policies? What variables contribute to inaccurate forecasts? How can forecasters manage the variables to improve accuracy? We analyzed forecasts published between 2007 and 2010 by the ECRI Institute on four technologies: single-room proton beam radiation therapy for various cancers; digital breast tomosynthesis imaging technology for breast cancer screening; transcatheter aortic valve replacement for serious heart valve disease; and minimally invasive robot-assisted surgery for various cancers. We then examined revised ECRI forecasts published in 2013 (digital breast tomosynthesis) and 2014 (the other three topics) to identify inaccuracies in the earlier forecasts and explore why they occurred. We found that five of twenty early predictions were inaccurate when compared with the updated forecasts. The inaccuracies pertained to two technologies that had more time-sensitive variables to consider. The case studies suggest that frequent revision of forecasts could improve accuracy, especially for complex technologies whose eventual use is governed by multiple interactive factors. Project HOPE—The People-to-People Health Foundation, Inc.

  4. A genome-scale metabolic flux model of Escherichia coli K–12 derived from the EcoCyc database

    PubMed Central

    2014-01-01

    Background Constraint-based models of Escherichia coli metabolic flux have played a key role in computational studies of cellular metabolism at the genome scale. We sought to develop a next-generation constraint-based E. coli model that achieved improved phenotypic prediction accuracy while being frequently updated and easy to use. We also sought to compare model predictions with experimental data to highlight open questions in E. coli biology. Results We present EcoCyc–18.0–GEM, a genome-scale model of the E. coli K–12 MG1655 metabolic network. The model is automatically generated from the current state of EcoCyc using the MetaFlux software, enabling the release of multiple model updates per year. EcoCyc–18.0–GEM encompasses 1445 genes, 2286 unique metabolic reactions, and 1453 unique metabolites. We demonstrate a three-part validation of the model that breaks new ground in breadth and accuracy: (i) Comparison of simulated growth in aerobic and anaerobic glucose culture with experimental results from chemostat culture and simulation results from the E. coli modeling literature. (ii) Essentiality prediction for the 1445 genes represented in the model, in which EcoCyc–18.0–GEM achieves an improved accuracy of 95.2% in predicting the growth phenotype of experimental gene knockouts. (iii) Nutrient utilization predictions under 431 different media conditions, for which the model achieves an overall accuracy of 80.7%. The model’s derivation from EcoCyc enables query and visualization via the EcoCyc website, facilitating model reuse and validation by inspection. We present an extensive investigation of disagreements between EcoCyc–18.0–GEM predictions and experimental data to highlight areas of interest to E. coli modelers and experimentalists, including 70 incorrect predictions of gene essentiality on glucose, 80 incorrect predictions of gene essentiality on glycerol, and 83 incorrect predictions of nutrient utilization. Conclusion Significant advantages can be derived from the combination of model organism databases and flux balance modeling represented by MetaFlux. Interpretation of the EcoCyc database as a flux balance model results in a highly accurate metabolic model and provides a rigorous consistency check for information stored in the database. PMID:24974895

  5. Effect of citizen engagement levels in flood forecasting by assimilating crowdsourced observations in hydrological models

    NASA Astrophysics Data System (ADS)

    Mazzoleni, Maurizio; Cortes Arevalo, Juliette; Alfonso, Leonardo; Wehn, Uta; Norbiato, Daniele; Monego, Martina; Ferri, Michele; Solomatine, Dimitri

    2017-04-01

    In the past years, a number of methods have been proposed to reduce uncertainty in flood prediction by means of model updating techniques. Traditional physical observations are usually integrated into hydrological and hydraulic models to improve model performances and consequent flood predictions. Nowadays, low-cost sensors can be used for crowdsourced observations. Different type of social sensors can measure, in a more distributed way, physical variables such as precipitation and water level. However, these crowdsourced observations are not integrated into a real-time fashion into water-system models due to their varying accuracy and random spatial-temporal coverage. We assess the effect in model performance due to the assimilation of crowdsourced observations of water level. Our method consists in (1) implementing a Kalman filter into a cascade of hydrological and hydraulic models. (2) defining observation errors depending on the type of sensor either physical or social. Randomly distributed errors are based on accuracy ranges that slightly improve according to the citizens' expertise level. (3) Using a simplified social model to realistically represent citizen engagement levels based on population density and citizens' motivation scenarios. To test our method, we synthetically derive crowdsourced observations for different citizen engagement levels from a distributed network of physical and social sensors. The observations are assimilated during a particular flood event occurred in the Bacchiglione catchment, Italy. The results of this study demonstrate that sharing crowdsourced water level observations (often motivated by a feeling of belonging to a community of friends) can help in improving flood prediction. On the other hand, a growing participation of individual citizens or weather enthusiasts sharing hydrological observations in cities can help to improve model performance. This study is a first step to assess the effects of crowdsourced observations in flood model predictions. Effective communication and feedback about the quality of observations from water authorities to engaged citizens are further required to minimize their intrinsic low-variable accuracy.

  6. Prediction of β-turns in proteins from multiple alignment using neural network

    PubMed Central

    Kaur, Harpreet; Raghava, Gajendra Pal Singh

    2003-01-01

    A neural network-based method has been developed for the prediction of β-turns in proteins by using multiple sequence alignment. Two feed-forward back-propagation networks with a single hidden layer are used where the first-sequence structure network is trained with the multiple sequence alignment in the form of PSI-BLAST–generated position-specific scoring matrices. The initial predictions from the first network and PSIPRED-predicted secondary structure are used as input to the second structure-structure network to refine the predictions obtained from the first net. A significant improvement in prediction accuracy has been achieved by using evolutionary information contained in the multiple sequence alignment. The final network yields an overall prediction accuracy of 75.5% when tested by sevenfold cross-validation on a set of 426 nonhomologous protein chains. The corresponding Qpred, Qobs, and Matthews correlation coefficient values are 49.8%, 72.3%, and 0.43, respectively, and are the best among all the previously published β-turn prediction methods. The Web server BetaTPred2 (http://www.imtech.res.in/raghava/betatpred2/) has been developed based on this approach. PMID:12592033

  7. Combining Physicochemical and Evolutionary Information for Protein Contact Prediction

    PubMed Central

    Schneider, Michael; Brock, Oliver

    2014-01-01

    We introduce a novel contact prediction method that achieves high prediction accuracy by combining evolutionary and physicochemical information about native contacts. We obtain evolutionary information from multiple-sequence alignments and physicochemical information from predicted ab initio protein structures. These structures represent low-energy states in an energy landscape and thus capture the physicochemical information encoded in the energy function. Such low-energy structures are likely to contain native contacts, even if their overall fold is not native. To differentiate native from non-native contacts in those structures, we develop a graph-based representation of the structural context of contacts. We then use this representation to train an support vector machine classifier to identify most likely native contacts in otherwise non-native structures. The resulting contact predictions are highly accurate. As a result of combining two sources of information—evolutionary and physicochemical—we maintain prediction accuracy even when only few sequence homologs are present. We show that the predicted contacts help to improve ab initio structure prediction. A web service is available at http://compbio.robotics.tu-berlin.de/epc-map/. PMID:25338092

  8. Applying data mining techniques to improve diagnosis in neonatal jaundice.

    PubMed

    Ferreira, Duarte; Oliveira, Abílio; Freitas, Alberto

    2012-12-07

    Hyperbilirubinemia is emerging as an increasingly common problem in newborns due to a decreasing hospital length of stay after birth. Jaundice is the most common disease of the newborn and although being benign in most cases it can lead to severe neurological consequences if poorly evaluated. In different areas of medicine, data mining has contributed to improve the results obtained with other methodologies.Hence, the aim of this study was to improve the diagnosis of neonatal jaundice with the application of data mining techniques. This study followed the different phases of the Cross Industry Standard Process for Data Mining model as its methodology.This observational study was performed at the Obstetrics Department of a central hospital (Centro Hospitalar Tâmega e Sousa--EPE), from February to March of 2011. A total of 227 healthy newborn infants with 35 or more weeks of gestation were enrolled in the study. Over 70 variables were collected and analyzed. Also, transcutaneous bilirubin levels were measured from birth to hospital discharge with maximum time intervals of 8 hours between measurements, using a noninvasive bilirubinometer.Different attribute subsets were used to train and test classification models using algorithms included in Weka data mining software, such as decision trees (J48) and neural networks (multilayer perceptron). The accuracy results were compared with the traditional methods for prediction of hyperbilirubinemia. The application of different classification algorithms to the collected data allowed predicting subsequent hyperbilirubinemia with high accuracy. In particular, at 24 hours of life of newborns, the accuracy for the prediction of hyperbilirubinemia was 89%. The best results were obtained using the following algorithms: naive Bayes, multilayer perceptron and simple logistic. The findings of our study sustain that, new approaches, such as data mining, may support medical decision, contributing to improve diagnosis in neonatal jaundice.

  9. A high order accurate finite element algorithm for high Reynolds number flow prediction

    NASA Technical Reports Server (NTRS)

    Baker, A. J.

    1978-01-01

    A Galerkin-weighted residuals formulation is employed to establish an implicit finite element solution algorithm for generally nonlinear initial-boundary value problems. Solution accuracy, and convergence rate with discretization refinement, are quantized in several error norms, by a systematic study of numerical solutions to several nonlinear parabolic and a hyperbolic partial differential equation characteristic of the equations governing fluid flows. Solutions are generated using selective linear, quadratic and cubic basis functions. Richardson extrapolation is employed to generate a higher-order accurate solution to facilitate isolation of truncation error in all norms. Extension of the mathematical theory underlying accuracy and convergence concepts for linear elliptic equations is predicted for equations characteristic of laminar and turbulent fluid flows at nonmodest Reynolds number. The nondiagonal initial-value matrix structure introduced by the finite element theory is determined intrinsic to improved solution accuracy and convergence. A factored Jacobian iteration algorithm is derived and evaluated to yield a consequential reduction in both computer storage and execution CPU requirements while retaining solution accuracy.

  10. Assessment of traffic noise levels in urban areas using different soft computing techniques.

    PubMed

    Tomić, J; Bogojević, N; Pljakić, M; Šumarac-Pavlović, D

    2016-10-01

    Available traffic noise prediction models are usually based on regression analysis of experimental data, and this paper presents the application of soft computing techniques in traffic noise prediction. Two mathematical models are proposed and their predictions are compared to data collected by traffic noise monitoring in urban areas, as well as to predictions of commonly used traffic noise models. The results show that application of evolutionary algorithms and neural networks may improve process of development, as well as accuracy of traffic noise prediction.

  11. Additional measures do not improve the diagnostic accuracy of the Hospital Admission Risk Profile for detecting downstream quality of life in community-dwelling older people presenting to a hospital emergency department.

    PubMed

    Grimmer, K; Milanese, S; Beaton, K; Atlas, A

    2014-01-01

    The Hospital Admission Risk Profile (HARP) instrument is commonly used to assess risk of functional decline when older people are admitted to hospital. HARP has moderate diagnostic accuracy (65%) for downstream decreased scores in activities of daily living. This paper reports the diagnostic accuracy of HARP for downstream quality of life. It also tests whether adding other measures to HARP improves its diagnostic accuracy. One hundred and forty-eight independent community dwelling individuals aged 65 years or older were recruited in the emergency department of one large Australian hospital with a medical problem for which they were discharged without a hospital ward admission. Data, including age, sex, primary language, highest level of education, postcode, living status, requiring care for daily activities, using a gait aid, receiving formal community supports, instrumental activities of daily living in the last week, hospitalization and falls in the last 12 months, and mental state were collected at recruitment. HARP scores were derived from a formula that summed scores assigned to age, activities of daily living, and mental state categories. Physical and mental component scores of a quality of life measure were captured by telephone interview at 1 and 3 months after recruitment. HARP scores are moderately accurate at predicting downstream decline in physical quality of life, but did not predict downstream decline in mental quality of life. The addition of other variables to HARP did not improve its diagnostic accuracy for either measure of quality of life. HARP is a poor predictor of quality of life.

  12. Breeding Jatropha curcas by genomic selection: A pilot assessment of the accuracy of predictive models.

    PubMed

    Azevedo Peixoto, Leonardo de; Laviola, Bruno Galvêas; Alves, Alexandre Alonso; Rosado, Tatiana Barbosa; Bhering, Leonardo Lopes

    2017-01-01

    Genomic wide selection is a promising approach for improving the selection accuracy in plant breeding, particularly in species with long life cycles, such as Jatropha. Therefore, the objectives of this study were to estimate the genetic parameters for grain yield (GY) and the weight of 100 seeds (W100S) using restricted maximum likelihood (REML); to compare the performance of GWS methods to predict GY and W100S; and to estimate how many markers are needed to train the GWS model to obtain the maximum accuracy. Eight GWS models were compared in terms of predictive ability. The impact that the marker density had on the predictive ability was investigated using a varying number of markers, from 2 to 1,248. Because the genetic variance between evaluated genotypes was significant, it was possible to obtain selection gain. All of the GWS methods tested in this study can be used to predict GY and W100S in Jatropha. A training model fitted using 1,000 and 800 markers is sufficient to capture the maximum genetic variance and, consequently, maximum prediction ability of GY and W100S, respectively. This study demonstrated the applicability of genome-wide prediction to identify useful genetic sources of GY and W100S for Jatropha breeding. Further research is needed to confirm the applicability of the proposed approach to other complex traits.

  13. Ensemble Streamflow Prediction in Korea: Past and Future 5 Years

    NASA Astrophysics Data System (ADS)

    Jeong, D.; Kim, Y.; Lee, J.

    2005-05-01

    The Ensemble Streamflow Prediction (ESP) approach was first introduced in 2000 by the Hydrology Research Group (HRG) at Seoul National University as an alternative probabilistic forecasting technique for improving the 'Water Supply Outlook' That is issued every month by the Ministry of Construction and Transportation in Korea. That study motivated the Korea Water Resources Corporation (KOWACO) to establish their seasonal probabilistic forecasting system for the 5 major river basins using the ESP approach. In cooperation with the HRG, the KOWACO developed monthly optimal multi-reservoir operating systems for the Geum river basin in 2004, which coupled the ESP forecasts with an optimization model using sampling stochastic dynamic programming. The user interfaces for both ESP and SSDP have also been designed for the developed computer systems to become more practical. More projects for developing ESP systems to the other 3 major river basins (i.e. the Nakdong, Han and Seomjin river basins) was also completed by the HRG and KOWACO at the end of December 2004. Therefore, the ESP system has become the most important mid- and long-term streamflow forecast technique in Korea. In addition to the practical aspects, resent research experience on ESP has raised some concerns into ways of improving the accuracy of ESP in Korea. Jeong and Kim (2002) performed an error analysis on its resulting probabilistic forecasts and found that the modeling error is dominant in the dry season, while the meteorological error is dominant in the flood season. To address the first issue, Kim et al. (2004) tested various combinations and/or combining techniques and showed that the ESP probabilistic accuracy could be improved considerably during the dry season when the hydrologic models were combined and/or corrected. In addition, an attempt was also made to improve the ESP accuracy for the flood season using climate forecast information. This ongoing project handles three types of climate forecast information: (1) the Monthly Industrial Meteorology Information Magazine (MIMIM) of the Korea Meteorological Administration (2) the Global Data Assimilation Prediction System (GDAPS), and (3) the US National Centers for Environmental Prediction (NCEP). Each of these forecasts is issued in a unique format: (1) MIMIM is a most-probable-event forecast, (2) GDAPS is a single series of deterministic forecasts, and (3) NCEP is an ensemble of deterministic forecasts. Other minor issues include how long the initial conditions influences the ESP accuracy, and how many ESP scenarios are needed to obtain the best accuracy. This presentation also addresses some future research that is needed for ESP in Korea.

  14. Accuracy of predicted haemoglobin concentration on cardiopulmonary bypass in paediatric cardiac surgery: effect of different formulae for estimating patient blood volume.

    PubMed

    Redlin, Matthias; Boettcher, Wolfgang; Dehmel, Frank; Cho, Mi-Young; Kukucka, Marian; Habazettl, Helmut

    2017-11-01

    When applying a blood-conserving approach in paediatric cardiac surgery with the aim of reducing the transfusion of homologous blood products, the decision to use blood or blood-free priming of the cardiopulmonary bypass (CPB) circuit is often based on the predicted haemoglobin concentration (Hb) as derived from the pre-CPB Hb, the prime volume and the estimated blood volume. We assessed the accuracy of this approach and whether it may be improved by using more sophisticated methods of estimating the blood volume. Data from 522 paediatric cardiac surgery patients treated with CPB with blood-free priming in a 2-year period from May 2013 to May 2015 were collected. Inclusion criteria were body weight <15 kg and available Hb data immediately prior to and after the onset of CPB. The Hb on CPB was predicted according to Fick's principle from the pre-CPB Hb, the prime volume and the patient blood volume. Linear regression analyses and Bland-Altman plots were used to assess the accuracy of the Hb prediction. Different methods to estimate the blood volume were assessed and compared. The initial Hb on CPB correlated well with the predicted Hb (R 2 =0.87, p<0.001). A Bland-Altman plot revealed little bias at 0.07 g/dL and an area of agreement from -1.35 to 1.48 g/dL. More sophisticated methods of estimating blood volume from lean body mass did not improve the Hb prediction, but rather increased bias. Hb prediction is reasonably accurate, with the best result obtained with the simplest method of estimating the blood volume at 80 mL/kg body weight. When deciding for or against blood-free priming, caution is necessary when the predicted Hb lies in a range of ± 2 g/dL around the transfusion trigger.

  15. Analysis of Mining-Induced Subsidence Prediction by Exponent Knothe Model Combined with Insar and Leveling

    NASA Astrophysics Data System (ADS)

    Chen, Lei; Zhang, Liguo; Tang, Yixian; Zhang, Hong

    2018-04-01

    The principle of exponent Knothe model was introduced in detail and the variation process of mining subsidence with time was analysed based on the formulas of subsidence, subsidence velocity and subsidence acceleration in the paper. Five scenes of radar images and six levelling measurements were collected to extract ground deformation characteristics in one coal mining area in this study. Then the unknown parameters of exponent Knothe model were estimated by combined levelling data with deformation information along the line of sight obtained by InSAR technique. By compared the fitting and prediction results obtained by InSAR and levelling with that obtained only by levelling, it was shown that the accuracy of fitting and prediction combined with InSAR and levelling was obviously better than the other that. Therefore, the InSAR measurements can significantly improve the fitting and prediction accuracy of exponent Knothe model.

  16. Multimarker assessment for the prediction of renal function improvement after percutaneous revascularization for renal artery stenosis.

    PubMed

    Staub, Daniel; Partovi, Sasan; Zeller, Thomas; Breidthardt, Tobias; Kaech, Max; Boeddinghaus, Jasper; Puelacher, Christian; Nestelberger, Thomas; Aschwanden, Markus; Mueller, Christian

    2016-06-01

    Identifying patients likely to have improved renal function after percutaneous transluminal renal angioplasty and stenting (PTRA) for renal artery stenosis (RAS) is challenging. The purpose of this study was to use a comprehensive multimarker assessment to identify those patients who would benefit most from correction of RAS. In 127 patients with RAS and decreased renal function and/or hypertension referred for PTRA, quantification of hemodynamic cardiac stress using B-type natriuretic peptide (BNP), renal function using estimated glomerular filtration rate (eGFR), parenchymal renal damage using resistance index (RI), and systemic inflammation using C-reactive protein (CRP) were performed before intervention. Predefined renal function improvement (increase in eGFR ≥10%) at 6 months occurred in 37% of patients. Prognostic accuracy as quantified by the area under the receiver-operating characteristics curve for the ability of BNP, eGFR, RI and CRP to predict renal function improvement were 0.59 (95% CI, 0.48-0.70), 0.71 (95% CI, 0.61-0.81), 0.52 (95% CI, 0.41-0.65), and 0.56 (95% CI, 0.44-0.68), respectively. None of the possible combinations increased the accuracy provided by eGFR (lower eGFR indicated a higher likelihood for eGFR improvement after PTRA, P=ns for all). In the subgroup of 56 patients with pre-interventional eGFR <60 mL/min/1.73 m(2), similar findings were obtained. Quantification of renal function, but not any other pathophysiologic signal, provides at least moderate accuracy in the identification of patients with RAS in whom PTRA will improve renal function.

  17. Forecasting of particulate matter time series using wavelet analysis and wavelet-ARMA/ARIMA model in Taiyuan, China.

    PubMed

    Zhang, Hong; Zhang, Sheng; Wang, Ping; Qin, Yuzhe; Wang, Huifeng

    2017-07-01

    Particulate matter with aerodynamic diameter below 10 μm (PM 10 ) forecasting is difficult because of the uncertainties in describing the emission and meteorological fields. This paper proposed a wavelet-ARMA/ARIMA model to forecast the short-term series of the PM 10 concentrations. It was evaluated by experiments using a 10-year data set of daily PM 10 concentrations from 4 stations located in Taiyuan, China. The results indicated the following: (1) PM 10 concentrations of Taiyuan had a decreasing trend during 2005 to 2012 but increased in 2013. PM 10 concentrations had an obvious seasonal fluctuation related to coal-fired heating in winter and early spring. (2) Spatial differences among the four stations showed that the PM 10 concentrations in industrial and heavily trafficked areas were higher than those in residential and suburb areas. (3) Wavelet analysis revealed that the trend variation and the changes of the PM 10 concentration of Taiyuan were complicated. (4) The proposed wavelet-ARIMA model could be efficiently and successfully applied to the PM 10 forecasting field. Compared with the traditional ARMA/ARIMA methods, this wavelet-ARMA/ARIMA method could effectively reduce the forecasting error, improve the prediction accuracy, and realize multiple-time-scale prediction. Wavelet analysis can filter noisy signals and identify the variation trend and the fluctuation of the PM 10 time-series data. Wavelet decomposition and reconstruction reduce the nonstationarity of the PM 10 time-series data, and thus improve the accuracy of the prediction. This paper proposed a wavelet-ARMA/ARIMA model to forecast the PM 10 time series. Compared with the traditional ARMA/ARIMA method, this wavelet-ARMA/ARIMA method could effectively reduce the forecasting error, improve the prediction accuracy, and realize multiple-time-scale prediction. The proposed model could be efficiently and successfully applied to the PM 10 forecasting field.

  18. Strategies for Near Real Time Estimates of Precipitable Water Vapor from GPS Ground Receivers

    NASA Technical Reports Server (NTRS)

    Y., Bar-Sever; Runge, T.; Kroger, P.

    1995-01-01

    GPS-based estimates of precipitable water vapor (PWV) may be useful in numerical weather models to improve short-term weather predictions. To be effective in numerical weather prediction models, GPS PWV estimates must be produced with sufficient accuracy in near real time. Several estimation strategies for the near real time processing of GPS data are investigated.

  19. Improving accuracies of genomic predictions for drought tolerance in maize by joint modeling of additive and dominance effects in multi-environment trials.

    PubMed

    Dias, Kaio Olímpio Das Graças; Gezan, Salvador Alejandro; Guimarães, Claudia Teixeira; Nazarian, Alireza; da Costa E Silva, Luciano; Parentoni, Sidney Netto; de Oliveira Guimarães, Paulo Evaristo; de Oliveira Anoni, Carina; Pádua, José Maria Villela; de Oliveira Pinto, Marcos; Noda, Roberto Willians; Ribeiro, Carlos Alexandre Gomes; de Magalhães, Jurandir Vieira; Garcia, Antonio Augusto Franco; de Souza, João Cândido; Guimarães, Lauro José Moreira; Pastina, Maria Marta

    2018-07-01

    Breeding for drought tolerance is a challenging task that requires costly, extensive, and precise phenotyping. Genomic selection (GS) can be used to maximize selection efficiency and the genetic gains in maize (Zea mays L.) breeding programs for drought tolerance. Here, we evaluated the accuracy of genomic selection (GS) using additive (A) and additive + dominance (AD) models to predict the performance of untested maize single-cross hybrids for drought tolerance in multi-environment trials. Phenotypic data of five drought tolerance traits were measured in 308 hybrids along eight trials under water-stressed (WS) and well-watered (WW) conditions over two years and two locations in Brazil. Hybrids' genotypes were inferred based on their parents' genotypes (inbred lines) using single-nucleotide polymorphism markers obtained via genotyping-by-sequencing. GS analyses were performed using genomic best linear unbiased prediction by fitting a factor analytic (FA) multiplicative mixed model. Two cross-validation (CV) schemes were tested: CV1 and CV2. The FA framework allowed for investigating the stability of additive and dominance effects across environments, as well as the additive-by-environment and the dominance-by-environment interactions, with interesting applications for parental and hybrid selection. Results showed differences in the predictive accuracy between A and AD models, using both CV1 and CV2, for the five traits in both water conditions. For grain yield (GY) under WS and using CV1, the AD model doubled the predictive accuracy in comparison to the A model. Through CV2, GS models benefit from borrowing information of correlated trials, resulting in an increase of 40% and 9% in the predictive accuracy of GY under WS for A and AD models, respectively. These results highlight the importance of multi-environment trial analyses using GS models that incorporate additive and dominance effects for genomic predictions of GY under drought in maize single-cross hybrids.

  20. Predicting grain yield using canopy hyperspectral reflectance in wheat breeding data.

    PubMed

    Montesinos-López, Osval A; Montesinos-López, Abelardo; Crossa, José; de Los Campos, Gustavo; Alvarado, Gregorio; Suchismita, Mondal; Rutkoski, Jessica; González-Pérez, Lorena; Burgueño, Juan

    2017-01-01

    Modern agriculture uses hyperspectral cameras to obtain hundreds of reflectance data measured at discrete narrow bands to cover the whole visible light spectrum and part of the infrared and ultraviolet light spectra, depending on the camera. This information is used to construct vegetation indices (VI) (e.g., green normalized difference vegetation index or GNDVI, simple ratio or SRa, etc.) which are used for the prediction of primary traits (e.g., biomass). However, these indices only use some bands and are cultivar-specific; therefore they lose considerable information and are not robust for all cultivars. This study proposes models that use all available bands as predictors to increase prediction accuracy; we compared these approaches with eight conventional vegetation indexes (VIs) constructed using only some bands. The data set we used comes from CIMMYT's global wheat program and comprises 1170 genotypes evaluated for grain yield (ton/ha) in five environments (Drought, Irrigated, EarlyHeat, Melgas and Reduced Irrigated); the reflectance data were measured in 250 discrete narrow bands ranging between 392 and 851 nm. The proposed models for the simultaneous analysis of all the bands were ordinal least square (OLS), Bayes B, principal components with Bayes B, functional B-spline, functional Fourier and functional partial least square. The results of these models were compared with the OLS performed using as predictors each of the eight VIs individually and combined. We found that using all bands simultaneously increased prediction accuracy more than using VI alone. The Splines and Fourier models had the best prediction accuracy for each of the nine time-points under study. Combining image data collected at different time-points led to a small increase in prediction accuracy relative to models that use data from a single time-point. Also, using bands with heritabilities larger than 0.5 only in Drought as predictor variables showed improvements in prediction accuracy.

  1. Systematic bias of correlation coefficient may explain negative accuracy of genomic prediction.

    PubMed

    Zhou, Yao; Vales, M Isabel; Wang, Aoxue; Zhang, Zhiwu

    2017-09-01

    Accuracy of genomic prediction is commonly calculated as the Pearson correlation coefficient between the predicted and observed phenotypes in the inference population by using cross-validation analysis. More frequently than expected, significant negative accuracies of genomic prediction have been reported in genomic selection studies. These negative values are surprising, given that the minimum value for prediction accuracy should hover around zero when randomly permuted data sets are analyzed. We reviewed the two common approaches for calculating the Pearson correlation and hypothesized that these negative accuracy values reflect potential bias owing to artifacts caused by the mathematical formulas used to calculate prediction accuracy. The first approach, Instant accuracy, calculates correlations for each fold and reports prediction accuracy as the mean of correlations across fold. The other approach, Hold accuracy, predicts all phenotypes in all fold and calculates correlation between the observed and predicted phenotypes at the end of the cross-validation process. Using simulated and real data, we demonstrated that our hypothesis is true. Both approaches are biased downward under certain conditions. The biases become larger when more fold are employed and when the expected accuracy is low. The bias of Instant accuracy can be corrected using a modified formula. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  2. Joint L2,1 Norm and Fisher Discrimination Constrained Feature Selection for Rational Synthesis of Microporous Aluminophosphates.

    PubMed

    Qi, Miao; Wang, Ting; Yi, Yugen; Gao, Na; Kong, Jun; Wang, Jianzhong

    2017-04-01

    Feature selection has been regarded as an effective tool to help researchers understand the generating process of data. For mining the synthesis mechanism of microporous AlPOs, this paper proposes a novel feature selection method by joint l 2,1 norm and Fisher discrimination constraints (JNFDC). In order to obtain more effective feature subset, the proposed method can be achieved in two steps. The first step is to rank the features according to sparse and discriminative constraints. The second step is to establish predictive model with the ranked features, and select the most significant features in the light of the contribution of improving the predictive accuracy. To the best of our knowledge, JNFDC is the first work which employs the sparse representation theory to explore the synthesis mechanism of six kinds of pore rings. Numerical simulations demonstrate that our proposed method can select significant features affecting the specified structural property and improve the predictive accuracy. Moreover, comparison results show that JNFDC can obtain better predictive performances than some other state-of-the-art feature selection methods. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  3. Intermolecular shielding contributions studied by modeling the 13C chemical-shift tensors of organic single crystals with plane waves

    PubMed Central

    Johnston, Jessica C.; Iuliucci, Robbie J.; Facelli, Julio C.; Fitzgerald, George; Mueller, Karl T.

    2009-01-01

    In order to predict accurately the chemical shift of NMR-active nuclei in solid phase systems, magnetic shielding calculations must be capable of considering the complete lattice structure. Here we assess the accuracy of the density functional theory gauge-including projector augmented wave method, which uses pseudopotentials to approximate the nodal structure of the core electrons, to determine the magnetic properties of crystals by predicting the full chemical-shift tensors of all 13C nuclides in 14 organic single crystals from which experimental tensors have previously been reported. Plane-wave methods use periodic boundary conditions to incorporate the lattice structure, providing a substantial improvement for modeling the chemical shifts in hydrogen-bonded systems. Principal tensor components can now be predicted to an accuracy that approaches the typical experimental uncertainty. Moreover, methods that include the full solid-phase structure enable geometry optimizations to be performed on the input structures prior to calculation of the shielding. Improvement after optimization is noted here even when neutron diffraction data are used for determining the initial structures. After geometry optimization, the isotropic shift can be predicted to within 1 ppm. PMID:19831448

  4. A hierarchical spatial model for well yield in complex aquifers

    NASA Astrophysics Data System (ADS)

    Montgomery, J.; O'sullivan, F.

    2017-12-01

    Efficiently siting and managing groundwater wells requires reliable estimates of the amount of water that can be produced, or the well yield. This can be challenging to predict in highly complex, heterogeneous fractured aquifers due to the uncertainty around local hydraulic properties. Promising statistical approaches have been advanced in recent years. For instance, kriging and multivariate regression analysis have been applied to well test data with limited but encouraging levels of prediction accuracy. Additionally, some analytical solutions to diffusion in homogeneous porous media have been used to infer "effective" properties consistent with observed flow rates or drawdown. However, this is an under-specified inverse problem with substantial and irreducible uncertainty. We describe a flexible machine learning approach capable of combining diverse datasets with constraining physical and geostatistical models for improved well yield prediction accuracy and uncertainty quantification. Our approach can be implemented within a hierarchical Bayesian framework using Markov Chain Monte Carlo, which allows for additional sources of information to be incorporated in priors to further constrain and improve predictions and reduce the model order. We demonstrate the usefulness of this approach using data from over 7,000 wells in a fractured bedrock aquifer.

  5. Genomic estimation of additive and dominance effects and impact of accounting for dominance on accuracy of genomic evaluation in sheep populations.

    PubMed

    Moghaddar, N; van der Werf, J H J

    2017-12-01

    The objectives of this study were to estimate the additive and dominance variance component of several weight and ultrasound scanned body composition traits in purebred and combined cross-bred sheep populations based on single nucleotide polymorphism (SNP) marker genotypes and then to investigate the effect of fitting additive and dominance effects on accuracy of genomic evaluation. Additive and dominance variance components were estimated in a mixed model equation based on "average information restricted maximum likelihood" using additive and dominance (co)variances between animals calculated from 48,599 SNP marker genotypes. Genomic prediction was based on genomic best linear unbiased prediction (GBLUP), and the accuracy of prediction was assessed based on a random 10-fold cross-validation. Across different weight and scanned body composition traits, dominance variance ranged from 0.0% to 7.3% of the phenotypic variance in the purebred population and from 7.1% to 19.2% in the combined cross-bred population. In the combined cross-bred population, the range of dominance variance decreased to 3.1% and 9.9% after accounting for heterosis effects. Accounting for dominance effects significantly improved the likelihood of the fitting model in the combined cross-bred population. This study showed a substantial dominance genetic variance for weight and ultrasound scanned body composition traits particularly in cross-bred population; however, improvement in the accuracy of genomic breeding values was small and statistically not significant. Dominance variance estimates in combined cross-bred population could be overestimated if heterosis is not fitted in the model. © 2017 Blackwell Verlag GmbH.

  6. Evaluating the effect of disturbed ensemble distributions on SCFG based statistical sampling of RNA secondary structures.

    PubMed

    Scheid, Anika; Nebel, Markus E

    2012-07-09

    Over the past years, statistical and Bayesian approaches have become increasingly appreciated to address the long-standing problem of computational RNA structure prediction. Recently, a novel probabilistic method for the prediction of RNA secondary structures from a single sequence has been studied which is based on generating statistically representative and reproducible samples of the entire ensemble of feasible structures for a particular input sequence. This method samples the possible foldings from a distribution implied by a sophisticated (traditional or length-dependent) stochastic context-free grammar (SCFG) that mirrors the standard thermodynamic model applied in modern physics-based prediction algorithms. Specifically, that grammar represents an exact probabilistic counterpart to the energy model underlying the Sfold software, which employs a sampling extension of the partition function (PF) approach to produce statistically representative subsets of the Boltzmann-weighted ensemble. Although both sampling approaches have the same worst-case time and space complexities, it has been indicated that they differ in performance (both with respect to prediction accuracy and quality of generated samples), where neither of these two competing approaches generally outperforms the other. In this work, we will consider the SCFG based approach in order to perform an analysis on how the quality of generated sample sets and the corresponding prediction accuracy changes when different degrees of disturbances are incorporated into the needed sampling probabilities. This is motivated by the fact that if the results prove to be resistant to large errors on the distinct sampling probabilities (compared to the exact ones), then it will be an indication that these probabilities do not need to be computed exactly, but it may be sufficient and more efficient to approximate them. Thus, it might then be possible to decrease the worst-case time requirements of such an SCFG based sampling method without significant accuracy losses. If, on the other hand, the quality of sampled structures can be observed to strongly react to slight disturbances, there is little hope for improving the complexity by heuristic procedures. We hence provide a reliable test for the hypothesis that a heuristic method could be implemented to improve the time scaling of RNA secondary structure prediction in the worst-case - without sacrificing much of the accuracy of the results. Our experiments indicate that absolute errors generally lead to the generation of useless sample sets, whereas relative errors seem to have only small negative impact on both the predictive accuracy and the overall quality of resulting structure samples. Based on these observations, we present some useful ideas for developing a time-reduced sampling method guaranteeing an acceptable predictive accuracy. We also discuss some inherent drawbacks that arise in the context of approximation. The key results of this paper are crucial for the design of an efficient and competitive heuristic prediction method based on the increasingly accepted and attractive statistical sampling approach. This has indeed been indicated by the construction of prototype algorithms.

  7. Evaluating the effect of disturbed ensemble distributions on SCFG based statistical sampling of RNA secondary structures

    PubMed Central

    2012-01-01

    Background Over the past years, statistical and Bayesian approaches have become increasingly appreciated to address the long-standing problem of computational RNA structure prediction. Recently, a novel probabilistic method for the prediction of RNA secondary structures from a single sequence has been studied which is based on generating statistically representative and reproducible samples of the entire ensemble of feasible structures for a particular input sequence. This method samples the possible foldings from a distribution implied by a sophisticated (traditional or length-dependent) stochastic context-free grammar (SCFG) that mirrors the standard thermodynamic model applied in modern physics-based prediction algorithms. Specifically, that grammar represents an exact probabilistic counterpart to the energy model underlying the Sfold software, which employs a sampling extension of the partition function (PF) approach to produce statistically representative subsets of the Boltzmann-weighted ensemble. Although both sampling approaches have the same worst-case time and space complexities, it has been indicated that they differ in performance (both with respect to prediction accuracy and quality of generated samples), where neither of these two competing approaches generally outperforms the other. Results In this work, we will consider the SCFG based approach in order to perform an analysis on how the quality of generated sample sets and the corresponding prediction accuracy changes when different degrees of disturbances are incorporated into the needed sampling probabilities. This is motivated by the fact that if the results prove to be resistant to large errors on the distinct sampling probabilities (compared to the exact ones), then it will be an indication that these probabilities do not need to be computed exactly, but it may be sufficient and more efficient to approximate them. Thus, it might then be possible to decrease the worst-case time requirements of such an SCFG based sampling method without significant accuracy losses. If, on the other hand, the quality of sampled structures can be observed to strongly react to slight disturbances, there is little hope for improving the complexity by heuristic procedures. We hence provide a reliable test for the hypothesis that a heuristic method could be implemented to improve the time scaling of RNA secondary structure prediction in the worst-case – without sacrificing much of the accuracy of the results. Conclusions Our experiments indicate that absolute errors generally lead to the generation of useless sample sets, whereas relative errors seem to have only small negative impact on both the predictive accuracy and the overall quality of resulting structure samples. Based on these observations, we present some useful ideas for developing a time-reduced sampling method guaranteeing an acceptable predictive accuracy. We also discuss some inherent drawbacks that arise in the context of approximation. The key results of this paper are crucial for the design of an efficient and competitive heuristic prediction method based on the increasingly accepted and attractive statistical sampling approach. This has indeed been indicated by the construction of prototype algorithms. PMID:22776037

  8. Design of a breath analysis system for diabetes screening and blood glucose level prediction.

    PubMed

    Yan, Ke; Zhang, David; Wu, Darong; Wei, Hua; Lu, Guangming

    2014-11-01

    It has been reported that concentrations of several biomarkers in diabetics' breath show significant difference from those in healthy people's breath. Concentrations of some biomarkers are also correlated with the blood glucose levels (BGLs) of diabetics. Therefore, it is possible to screen for diabetes and predict BGLs by analyzing one's breath. In this paper, we describe the design of a novel breath analysis system for this purpose. The system uses carefully selected chemical sensors to detect biomarkers in breath. Common interferential factors, including humidity and the ratio of alveolar air in breath, are compensated or handled in the algorithm. Considering the intersubject variance of the components in breath, we build subject-specific prediction models to improve the accuracy of BGL prediction. A total of 295 breath samples from healthy subjects and 279 samples from diabetic subjects were collected to evaluate the performance of the system. The sensitivity and specificity of diabetes screening are 91.51% and 90.77%, respectively. The mean relative absolute error for BGL prediction is 21.7%. Experiments show that the system is effective and that the strategies adopted in the system can improve its accuracy. The system potentially provides a noninvasive and convenient method for diabetes screening and BGL monitoring as an adjunct to the standard criteria.

  9. Innate biology versus lifestyle behaviour in the aetiology of obesity and type 2 diabetes: the GLACIER Study.

    PubMed

    Poveda, Alaitz; Koivula, Robert W; Ahmad, Shafqat; Barroso, Inês; Hallmans, Göran; Johansson, Ingegerd; Renström, Frida; Franks, Paul W

    2016-03-01

    We compared the ability of genetic (established type 2 diabetes, fasting glucose, 2 h glucose and obesity variants) and modifiable lifestyle (diet, physical activity, smoking, alcohol and education) risk factors to predict incident type 2 diabetes and obesity in a population-based prospective cohort of 3,444 Swedish adults studied sequentially at baseline and 10 years later. Multivariable logistic regression analyses were used to assess the predictive ability of genetic and lifestyle risk factors on incident obesity and type 2 diabetes by calculating the AUC. The predictive accuracy of lifestyle risk factors was similar to that yielded by genetic information for incident type 2 diabetes (AUC 75% and 74%, respectively) and obesity (AUC 68% and 73%, respectively) in models adjusted for age, age(2) and sex. The addition of genetic information to the lifestyle model significantly improved the prediction of type 2 diabetes (AUC 80%; p = 0.0003) and obesity (AUC 79%; p < 0.0001) and resulted in a net reclassification improvement of 58% for type 2 diabetes and 64% for obesity. These findings illustrate that lifestyle and genetic information separately provide a similarly high degree of long-range predictive accuracy for obesity and type 2 diabetes.

  10. Sphinx: merging knowledge-based and ab initio approaches to improve protein loop prediction

    PubMed Central

    Marks, Claire; Nowak, Jaroslaw; Klostermann, Stefan; Georges, Guy; Dunbar, James; Shi, Jiye; Kelm, Sebastian

    2017-01-01

    Abstract Motivation: Loops are often vital for protein function, however, their irregular structures make them difficult to model accurately. Current loop modelling algorithms can mostly be divided into two categories: knowledge-based, where databases of fragments are searched to find suitable conformations and ab initio, where conformations are generated computationally. Existing knowledge-based methods only use fragments that are the same length as the target, even though loops of slightly different lengths may adopt similar conformations. Here, we present a novel method, Sphinx, which combines ab initio techniques with the potential extra structural information contained within loops of a different length to improve structure prediction. Results: We show that Sphinx is able to generate high-accuracy predictions and decoy sets enriched with near-native loop conformations, performing better than the ab initio algorithm on which it is based. In addition, it is able to provide predictions for every target, unlike some knowledge-based methods. Sphinx can be used successfully for the difficult problem of antibody H3 prediction, outperforming RosettaAntibody, one of the leading H3-specific ab initio methods, both in accuracy and speed. Availability and Implementation: Sphinx is available at http://opig.stats.ox.ac.uk/webapps/sphinx. Contact: deane@stats.ox.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28453681

  11. Sphinx: merging knowledge-based and ab initio approaches to improve protein loop prediction.

    PubMed

    Marks, Claire; Nowak, Jaroslaw; Klostermann, Stefan; Georges, Guy; Dunbar, James; Shi, Jiye; Kelm, Sebastian; Deane, Charlotte M

    2017-05-01

    Loops are often vital for protein function, however, their irregular structures make them difficult to model accurately. Current loop modelling algorithms can mostly be divided into two categories: knowledge-based, where databases of fragments are searched to find suitable conformations and ab initio, where conformations are generated computationally. Existing knowledge-based methods only use fragments that are the same length as the target, even though loops of slightly different lengths may adopt similar conformations. Here, we present a novel method, Sphinx, which combines ab initio techniques with the potential extra structural information contained within loops of a different length to improve structure prediction. We show that Sphinx is able to generate high-accuracy predictions and decoy sets enriched with near-native loop conformations, performing better than the ab initio algorithm on which it is based. In addition, it is able to provide predictions for every target, unlike some knowledge-based methods. Sphinx can be used successfully for the difficult problem of antibody H3 prediction, outperforming RosettaAntibody, one of the leading H3-specific ab initio methods, both in accuracy and speed. Sphinx is available at http://opig.stats.ox.ac.uk/webapps/sphinx. deane@stats.ox.ac.uk. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.

  12. Optical glucose monitoring using vertical cavity surface emitting lasers (VCSELs)

    NASA Astrophysics Data System (ADS)

    Talebi Fard, Sahba; Hofmann, Werner; Talebi Fard, Pouria; Kwok, Ezra; Amann, Markus-Christian; Chrostowski, Lukas

    2009-08-01

    Diabetes Mellitus is a common chronic disease that has become a public health issue. Continuous glucose monitoring improves patient health by stabilizing the glucose levels. Optical methods are one of the painless and promising methods that can be used for blood glucose predictions. However, having accuracies lower than what is acceptable clinically has been a major concern. Using lasers along with multivariate techniques such as Partial Least Square (PLS) can improve glucose predictions. This research involves investigations for developing a novel optical system for accurate glucose predictions, which leads to the development of a small, low power, implantable optical sensor for diabetes patients.

  13. Protein homology model refinement by large-scale energy optimization.

    PubMed

    Park, Hahnbeom; Ovchinnikov, Sergey; Kim, David E; DiMaio, Frank; Baker, David

    2018-03-20

    Proteins fold to their lowest free-energy structures, and hence the most straightforward way to increase the accuracy of a partially incorrect protein structure model is to search for the lowest-energy nearby structure. This direct approach has met with little success for two reasons: first, energy function inaccuracies can lead to false energy minima, resulting in model degradation rather than improvement; and second, even with an accurate energy function, the search problem is formidable because the energy only drops considerably in the immediate vicinity of the global minimum, and there are a very large number of degrees of freedom. Here we describe a large-scale energy optimization-based refinement method that incorporates advances in both search and energy function accuracy that can substantially improve the accuracy of low-resolution homology models. The method refined low-resolution homology models into correct folds for 50 of 84 diverse protein families and generated improved models in recent blind structure prediction experiments. Analyses of the basis for these improvements reveal contributions from both the improvements in conformational sampling techniques and the energy function.

  14. Combined ECG, Echocardiographic, and Biomarker Criteria for Diagnosing Acute Myocardial Infarction in Out-of-Hospital Cardiac Arrest Patients.

    PubMed

    Lee, Sang-Eun; Uhm, Jae-Sun; Kim, Jong-Youn; Pak, Hui-Nam; Lee, Moon-Hyoung; Joung, Boyoung

    2015-07-01

    Acute coronary lesions commonly trigger out-of-hospital cardiac arrest (OHCA). However, the prevalence of coronary artery disease (CAD) in Asian patients with OHCA and whether electrocardiogram (ECG) and other findings might predict acute myocardial infarction (AMI) have not been fully elucidated. Of 284 consecutive resuscitated OHCA patients seen between January 2006 and July 2013, we enrolled 135 patients who had undergone coronary evaluation. ECGs, echocardiography, and biomarkers were compared between patients with or without CAD. We included 135 consecutive patients aged 54 years (interquartile range 45-65) with sustained return of spontaneous circulation after OHCA between 2006 and 2012. Sixty six (45%) patients had CAD. The initial rhythm was shockable and non-shockable in 110 (81%) and 25 (19%) patients, respectively. ST-segment elevation predicted CAD with 42% sensitivity, 87% specificity, and 65% accuracy. ST elevation and/or regional wall motion abnormality (RWMA) showed 68% sensitivity, 52% specificity, and 70% accuracy in the prediction of CAD. Finally, a combination of ST elevation and/or RWMA and/or troponin T elevation predicted CAD with 94% sensitivity, 17% specificity, and 55% accuracy. In patients with OHCA without obvious non-cardiac causes, selection for coronary angiogram based on the combined criterion could detect 94% of CADs. However, compared with ECG only criteria, the combined criterion failed to improve diagnostic accuracy with a lower specificity.

  15. Calculating an optimal box size for ligand docking and virtual screening against experimental and predicted binding pockets.

    PubMed

    Feinstein, Wei P; Brylinski, Michal

    2015-01-01

    Computational approaches have emerged as an instrumental methodology in modern research. For example, virtual screening by molecular docking is routinely used in computer-aided drug discovery. One of the critical parameters for ligand docking is the size of a search space used to identify low-energy binding poses of drug candidates. Currently available docking packages often come with a default protocol for calculating the box size, however, many of these procedures have not been systematically evaluated. In this study, we investigate how the docking accuracy of AutoDock Vina is affected by the selection of a search space. We propose a new procedure for calculating the optimal docking box size that maximizes the accuracy of binding pose prediction against a non-redundant and representative dataset of 3,659 protein-ligand complexes selected from the Protein Data Bank. Subsequently, we use the Directory of Useful Decoys, Enhanced to demonstrate that the optimized docking box size also yields an improved ranking in virtual screening. Binding pockets in both datasets are derived from the experimental complex structures and, additionally, predicted by eFindSite. A systematic analysis of ligand binding poses generated by AutoDock Vina shows that the highest accuracy is achieved when the dimensions of the search space are 2.9 times larger than the radius of gyration of a docking compound. Subsequent virtual screening benchmarks demonstrate that this optimized docking box size also improves compound ranking. For instance, using predicted ligand binding sites, the average enrichment factor calculated for the top 1 % (10 %) of the screening library is 8.20 (3.28) for the optimized protocol, compared to 7.67 (3.19) for the default procedure. Depending on the evaluation metric, the optimal docking box size gives better ranking in virtual screening for about two-thirds of target proteins. This fully automated procedure can be used to optimize docking protocols in order to improve the ranking accuracy in production virtual screening simulations. Importantly, the optimized search space systematically yields better results than the default method not only for experimental pockets, but also for those predicted from protein structures. A script for calculating the optimal docking box size is freely available at www.brylinski.org/content/docking-box-size. Graphical AbstractWe developed a procedure to optimize the box size in molecular docking calculations. Left panel shows the predicted binding pose of NADP (green sticks) compared to the experimental complex structure of human aldose reductase (blue sticks) using a default protocol. Right panel shows the docking accuracy using an optimized box size.

  16. Effects of Land Use Land Cover (LULC) and Climate on Simulation of Phosphorus loading in the Southeast United States Region

    NASA Astrophysics Data System (ADS)

    Jima, T. G.; Roberts, A.

    2013-12-01

    Quality of coastal and freshwater resources in the Southeastern United States is threatened due to Eutrophication as a result of excessive nutrients, and phosphorus is acknowledged as one of the major limiting nutrients. In areas with much non-point source (NPS) pollution, land use land cover and climate have been found to have significant impact on water quality. Landscape metrics applied in catchment and riparian stream based nutrient export models are known to significantly improve nutrient prediction. The regional SPARROW (Spatially Referenced Regression On Watershed attributes), which predicts Total Phosphorus has been developed by the Southeastern United States regions USGS, as part of the National Water Quality Assessment (NAWQA) program and the model accuracy was found to be 67%. However, landscape composition and configuration metrics which play a significant role in the source, transport and delivery of the nutrient have not been incorporated in the model. Including these matrices in the models parameterization will improve the models accuracy and improve decision making process for mitigating and managing NPS phosphorus in the region. The National Land Cover Data 2001 raster data will be used (since the base line is 2002) for the region (with 8321 watersheds ) with fragstats 4.1 and ArcGIS Desktop 10.1 for the analysis of landscape matrices, buffers and creating map layers. The result will be imported to the Southeast SPARROW model and will be analyzed. Resulting statistical significance and model accuracy will be assessed and predictions for those areas with no water quality monitoring station will be made.

  17. Optical Passive Sensor Calibration for Satellite Remote Sensing and the Legacy of NOAA and NIST Cooperation

    PubMed Central

    Datla, Raju; Weinreb, Michael; Rice, Joseph; Johnson, B. Carol; Shirley, Eric; Cao, Changyong

    2014-01-01

    This paper traces the cooperative efforts of scientists at the National Oceanic and Atmospheric Administration (NOAA) and the National Institute of Standards and Technology (NIST) to improve the calibration of operational satellite sensors for remote sensing of the Earth’s land, atmosphere and oceans. It gives a chronological perspective of the NOAA satellite program and the interactions between the two agencies’ scientists to address pre-launch calibration and issues of sensor performance on orbit. The drive to improve accuracy of measurements has had a new impetus in recent years because of the need for improved weather prediction and climate monitoring. The highlights of this cooperation and strategies to achieve SI-traceability and improve accuracy for optical satellite sensor data are summarized1. PMID:26601030

  18. Optical Passive Sensor Calibration for Satellite Remote Sensing and the Legacy of NOAA and NIST Cooperation.

    PubMed

    Datla, Raju; Weinreb, Michael; Rice, Joseph; Johnson, B Carol; Shirley, Eric; Cao, Changyong

    2014-01-01

    This paper traces the cooperative efforts of scientists at the National Oceanic and Atmospheric Administration (NOAA) and the National Institute of Standards and Technology (NIST) to improve the calibration of operational satellite sensors for remote sensing of the Earth's land, atmosphere and oceans. It gives a chronological perspective of the NOAA satellite program and the interactions between the two agencies' scientists to address pre-launch calibration and issues of sensor performance on orbit. The drive to improve accuracy of measurements has had a new impetus in recent years because of the need for improved weather prediction and climate monitoring. The highlights of this cooperation and strategies to achieve SI-traceability and improve accuracy for optical satellite sensor data are summarized.

  19. Comparative study to assess whether high sensitive C-reactive protein and carotid intima media thickness improve the predictive accuracy of exercise stress testing for coronary artery disease in perimenopausal women with typical angina.

    PubMed

    Sinha, Dhurjati Prasad; Das, Munna; Banerjee, Amal Kumar; Ahmed, Shageer; Majumdar, Sonali

    2008-02-01

    Anginal symptoms are less predictive of abnormal coronary anatomy in women. The diagnostic accuracy of exercise treadmill test for obstructive coronary artery disease is less in young and middle aged women. High sensitive C-reactive protein has shown a strong and consistent relationship to the risk of incident cardiovascular events. Carotid intima media thickness is a non-invasive marker of atherosclerosis burden and also predicts prognosis in patients with coronary artery disease. We investigated whether incorporation of high sensitive C-reactive protein and carotid intima media thickness along with exercise stress results improved the predictive accuracy in perimenopausal non-diabetic women subset. Fifty perimenopausal non-diabetic patients (age 45 +/- 7 years) presenting with typical angina were subjected to treadmill test (Bruce protocol). Also carotid artery images at both sides of neck were acquired by B-mode ultrasound and carotid intima media thickness were measured. High sensitive C-reactive protein was measured. Of 50 patients, 22 had a positive exercise stress result. Coronary angiography done in all 50 patients revealed coronary artery disease in 10 patients with positive exercise stress result and in 4 patients with negative exercise stress result. Treadmill exercise stress test had a sensitivity of 71.4%, specificity of 66.7% and a negative predictive accuracy of 85.7% in this study group. High sensitive C-reactive protein in patients with documented coronary artery disease was not significantly different from those without coronary artery disease (4.8 +/- 0.9 mg/l versus 3.9 +/- 1.7 mg/l, p=NS). Also carotid intima media thickness was not significantly different between either of the groups with coronary artery disease positivity and negativity respectively (left: 1.25 +/- 0.55 versus 1.20 +/- 0.51 mm, p=NS; right:1.18 +/- 0.54 versus 1.15 +/- 0.41 mm, p=NS). High sensitive C-reactive protein and carotid intima media thickness were not helpful in further adding to the predictability of coronary artery disease in perimenopausal patients with typical angina as assessed by treadmill exercise stress test.

  20. Improved Formula for the Stress Intensity Factor of Semi-Elliptical Surface Cracks in Welded Joints under Bending Stress

    PubMed Central

    Peng, Yang; Wu, Chao; Zheng, Yifu; Dong, Jun

    2017-01-01

    Welded joints are prone to fatigue cracking with the existence of welding defects and bending stress. Fracture mechanics is a useful approach in which the fatigue life of the welded joint can be predicted. The key challenge of such predictions using fracture mechanics is how to accurately calculate the stress intensity factor (SIF). An empirical formula for calculating the SIF of welded joints under bending stress was developed by Baik, Yamada and Ishikawa based on the hybrid method. However, when calculating the SIF of a semi-elliptical crack, this study found that the accuracy of the Baik-Yamada formula was poor when comparing the benchmark results, experimental data and numerical results. The reasons for the reduced accuracy of the Baik-Yamada formula were identified and discussed in this paper. Furthermore, a new correction factor was developed and added to the Baik-Yamada formula by using theoretical analysis and numerical regression. Finally, the predictions using the modified Baik-Yamada formula were compared with the benchmark results, experimental data and numerical results. It was found that the accuracy of the modified Baik-Yamada formula was greatly improved. Therefore, it is proposed that this modified formula is used to conveniently and accurately calculate the SIF of semi-elliptical cracks in welded joints under bending stress. PMID:28772527

  1. Forecasting Influenza Outbreaks in Boroughs and Neighborhoods of New York City.

    PubMed

    Yang, Wan; Olson, Donald R; Shaman, Jeffrey

    2016-11-01

    The ideal spatial scale, or granularity, at which infectious disease incidence should be monitored and forecast has been little explored. By identifying the optimal granularity for a given disease and host population, and matching surveillance and prediction efforts to this scale, response to emergent and recurrent outbreaks can be improved. Here we explore how granularity and representation of spatial structure affect influenza forecast accuracy within New York City. We develop network models at the borough and neighborhood levels, and use them in conjunction with surveillance data and a data assimilation method to forecast influenza activity. These forecasts are compared to an alternate system that predicts influenza for each borough or neighborhood in isolation. At the borough scale, influenza epidemics are highly synchronous despite substantial differences in intensity, and inclusion of network connectivity among boroughs generally improves forecast accuracy. At the neighborhood scale, we observe much greater spatial heterogeneity among influenza outbreaks including substantial differences in local outbreak timing and structure; however, inclusion of the network model structure generally degrades forecast accuracy. One notable exception is that local outbreak onset, particularly when signal is modest, is better predicted with the network model. These findings suggest that observation and forecast at sub-municipal scales within New York City provides richer, more discriminant information on influenza incidence, particularly at the neighborhood scale where greater heterogeneity exists, and that the spatial spread of influenza among localities can be forecast.

  2. Improving density functional tight binding predictions of free energy surfaces for peptide condensation reactions in solution

    NASA Astrophysics Data System (ADS)

    Kroonblawd, Matthew; Goldman, Nir

    First principles molecular dynamics using highly accurate density functional theory (DFT) is a common tool for predicting chemistry, but the accessible time and space scales are often orders of magnitude beyond the resolution of experiments. Semi-empirical methods such as density functional tight binding (DFTB) offer up to a thousand-fold reduction in required CPU hours and can approach experimental scales. However, standard DFTB parameter sets lack good transferability and calibration for a particular system is usually necessary. Force matching the pairwise repulsive energy term in DFTB to short DFT trajectories can improve the former's accuracy for chemistry that is fast relative to DFT simulation times (<10 ps), but the effects on slow chemistry and the free energy surface are not well-known. We present a force matching approach to increase the accuracy of DFTB predictions for free energy surfaces. Accelerated sampling techniques are combined with path collective variables to generate the reference DFT data set and validate fitted DFTB potentials without a priori knowledge of transition states. Accuracy of force-matched DFTB free energy surfaces is assessed for slow peptide-forming reactions by direct comparison to DFT results for particular paths. Extensions to model prebiotic chemistry under shock conditions are discussed. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.

  3. Mathematical-Artificial Neural Network Hybrid Model to Predict Roll Force during Hot Rolling of Steel

    NASA Astrophysics Data System (ADS)

    Rath, S.; Sengupta, P. P.; Singh, A. P.; Marik, A. K.; Talukdar, P.

    2013-07-01

    Accurate prediction of roll force during hot strip rolling is essential for model based operation of hot strip mills. Traditionally, mathematical models based on theory of plastic deformation have been used for prediction of roll force. In the last decade, data driven models like artificial neural network have been tried for prediction of roll force. Pure mathematical models have accuracy limitations whereas data driven models have difficulty in convergence when applied to industrial conditions. Hybrid models by integrating the traditional mathematical formulations and data driven methods are being developed in different parts of world. This paper discusses the methodology of development of an innovative hybrid mathematical-artificial neural network model. In mathematical model, the most important factor influencing accuracy is flow stress of steel. Coefficients of standard flow stress equation, calculated by parameter estimation technique, have been used in the model. The hybrid model has been trained and validated with input and output data collected from finishing stands of Hot Strip Mill, Bokaro Steel Plant, India. It has been found that the model accuracy has been improved with use of hybrid model, over the traditional mathematical model.

  4. Personality, Cognitive Style, Motivation, and Aptitude Predict Systematic Trends in Analytic Forecasting Behavior.

    PubMed

    Poore, Joshua C; Forlines, Clifton L; Miller, Sarah M; Regan, John R; Irvine, John M

    2014-12-01

    The decision sciences are increasingly challenged to advance methods for modeling analysts, accounting for both analytic strengths and weaknesses, to improve inferences taken from increasingly large and complex sources of data. We examine whether psychometric measures-personality, cognitive style, motivated cognition-predict analytic performance and whether psychometric measures are competitive with aptitude measures (i.e., SAT scores) as analyst sample selection criteria. A heterogeneous, national sample of 927 participants completed an extensive battery of psychometric measures and aptitude tests and was asked 129 geopolitical forecasting questions over the course of 1 year. Factor analysis reveals four dimensions among psychometric measures; dimensions characterized by differently motivated "top-down" cognitive styles predicted distinctive patterns in aptitude and forecasting behavior. These dimensions were not better predictors of forecasting accuracy than aptitude measures. However, multiple regression and mediation analysis reveals that these dimensions influenced forecasting accuracy primarily through bias in forecasting confidence. We also found that these facets were competitive with aptitude tests as forecast sampling criteria designed to mitigate biases in forecasting confidence while maximizing accuracy. These findings inform the understanding of individual difference dimensions at the intersection of analytic aptitude and demonstrate that they wield predictive power in applied, analytic domains.

  5. Personality, Cognitive Style, Motivation, and Aptitude Predict Systematic Trends in Analytic Forecasting Behavior

    PubMed Central

    Forlines, Clifton L.; Miller, Sarah M.; Regan, John R.; Irvine, John M.

    2014-01-01

    The decision sciences are increasingly challenged to advance methods for modeling analysts, accounting for both analytic strengths and weaknesses, to improve inferences taken from increasingly large and complex sources of data. We examine whether psychometric measures—personality, cognitive style, motivated cognition—predict analytic performance and whether psychometric measures are competitive with aptitude measures (i.e., SAT scores) as analyst sample selection criteria. A heterogeneous, national sample of 927 participants completed an extensive battery of psychometric measures and aptitude tests and was asked 129 geopolitical forecasting questions over the course of 1 year. Factor analysis reveals four dimensions among psychometric measures; dimensions characterized by differently motivated “top-down” cognitive styles predicted distinctive patterns in aptitude and forecasting behavior. These dimensions were not better predictors of forecasting accuracy than aptitude measures. However, multiple regression and mediation analysis reveals that these dimensions influenced forecasting accuracy primarily through bias in forecasting confidence. We also found that these facets were competitive with aptitude tests as forecast sampling criteria designed to mitigate biases in forecasting confidence while maximizing accuracy. These findings inform the understanding of individual difference dimensions at the intersection of analytic aptitude and demonstrate that they wield predictive power in applied, analytic domains. PMID:25983670

  6. Noncontact blood species identification method based on spatially resolved near-infrared transmission spectroscopy

    NASA Astrophysics Data System (ADS)

    Zhang, Linna; Sun, Meixiu; Wang, Zhennan; Li, Hongxiao; Li, Yingxin; Li, Gang; Lin, Ling

    2017-09-01

    The inspection and identification of whole blood are crucially significant for import-export ports and inspection and quarantine departments. In our previous research, we proved Near-Infrared diffuse transmitted spectroscopy method was potential for noninvasively identifying three blood species, including macaque, human and mouse, with samples measured in the cuvettes. However, in open sampling cases, inspectors may be endangered by virulence factors in blood samples. In this paper, we explored the noncontact measurement for classification, with blood samples measured in the vacuum blood vessels. Spatially resolved near-infrared spectroscopy was used to improve the prediction accuracy. Results showed that the prediction accuracy of the model built with nine detection points was more than 90% in identification between all five species, including chicken, goat, macaque, pig and rat, far better than the performance of the model built with single-point spectra. The results fully supported the idea that spatially resolved near-infrared spectroscopy method can improve the prediction ability, and demonstrated the feasibility of this method for noncontact blood species identification in practical applications.

  7. Predictive accuracy of particle filtering in dynamic models supporting outbreak projections.

    PubMed

    Safarishahrbijari, Anahita; Teyhouee, Aydin; Waldner, Cheryl; Liu, Juxin; Osgood, Nathaniel D

    2017-09-26

    While a new generation of computational statistics algorithms and availability of data streams raises the potential for recurrently regrounding dynamic models with incoming observations, the effectiveness of such arrangements can be highly subject to specifics of the configuration (e.g., frequency of sampling and representation of behaviour change), and there has been little attempt to identify effective configurations. Combining dynamic models with particle filtering, we explored a solution focusing on creating quickly formulated models regrounded automatically and recurrently as new data becomes available. Given a latent underlying case count, we assumed that observed incident case counts followed a negative binomial distribution. In accordance with the condensation algorithm, each such observation led to updating of particle weights. We evaluated the effectiveness of various particle filtering configurations against each other and against an approach without particle filtering according to the accuracy of the model in predicting future prevalence, given data to a certain point and a norm-based discrepancy metric. We examined the effectiveness of particle filtering under varying times between observations, negative binomial dispersion parameters, and rates with which the contact rate could evolve. We observed that more frequent observations of empirical data yielded super-linearly improved accuracy in model predictions. We further found that for the data studied here, the most favourable assumptions to make regarding the parameters associated with the negative binomial distribution and changes in contact rate were robust across observation frequency and the observation point in the outbreak. Combining dynamic models with particle filtering can perform well in projecting future evolution of an outbreak. Most importantly, the remarkable improvements in predictive accuracy resulting from more frequent sampling suggest that investments to achieve efficient reporting mechanisms may be more than paid back by improved planning capacity. The robustness of the results on particle filter configuration in this case study suggests that it may be possible to formulate effective standard guidelines and regularized approaches for such techniques in particular epidemiological contexts. Most importantly, the work tentatively suggests potential for health decision makers to secure strong guidance when anticipating outbreak evolution for emerging infectious diseases by combining even very rough models with particle filtering method.

  8. Improving the accuracy of camber predictions for precast pretensioned concrete beams.

    DOT National Transportation Integrated Search

    2015-07-01

    The discrepancies between the designed and measured camber of precast pretensioned concrete beams (PPCBs) observed by the : Iowa DOT have created challenges in the field during bridge construction, causing construction delays and additional costs. Th...

  9. Group-regularized individual prediction: theory and application to pain.

    PubMed

    Lindquist, Martin A; Krishnan, Anjali; López-Solà, Marina; Jepma, Marieke; Woo, Choong-Wan; Koban, Leonie; Roy, Mathieu; Atlas, Lauren Y; Schmidt, Liane; Chang, Luke J; Reynolds Losin, Elizabeth A; Eisenbarth, Hedwig; Ashar, Yoni K; Delk, Elizabeth; Wager, Tor D

    2017-01-15

    Multivariate pattern analysis (MVPA) has become an important tool for identifying brain representations of psychological processes and clinical outcomes using fMRI and related methods. Such methods can be used to predict or 'decode' psychological states in individual subjects. Single-subject MVPA approaches, however, are limited by the amount and quality of individual-subject data. In spite of higher spatial resolution, predictive accuracy from single-subject data often does not exceed what can be accomplished using coarser, group-level maps, because single-subject patterns are trained on limited amounts of often-noisy data. Here, we present a method that combines population-level priors, in the form of biomarker patterns developed on prior samples, with single-subject MVPA maps to improve single-subject prediction. Theoretical results and simulations motivate a weighting based on the relative variances of biomarker-based prediction-based on population-level predictive maps from prior groups-and individual-subject, cross-validated prediction. Empirical results predicting pain using brain activity on a trial-by-trial basis (single-trial prediction) across 6 studies (N=180 participants) confirm the theoretical predictions. Regularization based on a population-level biomarker-in this case, the Neurologic Pain Signature (NPS)-improved single-subject prediction accuracy compared with idiographic maps based on the individuals' data alone. The regularization scheme that we propose, which we term group-regularized individual prediction (GRIP), can be applied broadly to within-person MVPA-based prediction. We also show how GRIP can be used to evaluate data quality and provide benchmarks for the appropriateness of population-level maps like the NPS for a given individual or study. Copyright © 2015 Elsevier Inc. All rights reserved.

  10. A Predictive Risk Model for A(H7N9) Human Infections Based on Spatial-Temporal Autocorrelation and Risk Factors: China, 2013–2014

    PubMed Central

    Dong, Wen; Yang, Kun; Xu, Quan-Li; Yang, Yu-Lian

    2015-01-01

    This study investigated the spatial distribution, spatial autocorrelation, temporal cluster, spatial-temporal autocorrelation and probable risk factors of H7N9 outbreaks in humans from March 2013 to December 2014 in China. The results showed that the epidemic spread with significant spatial-temporal autocorrelation. In order to describe the spatial-temporal autocorrelation of H7N9, an improved model was developed by introducing a spatial-temporal factor in this paper. Logistic regression analyses were utilized to investigate the risk factors associated with their distribution, and nine risk factors were significantly associated with the occurrence of A(H7N9) human infections: the spatial-temporal factor φ (OR = 2546669.382, p < 0.001), migration route (OR = 0.993, p < 0.01), river (OR = 0.861, p < 0.001), lake(OR = 0.992, p < 0.001), road (OR = 0.906, p < 0.001), railway (OR = 0.980, p < 0.001), temperature (OR = 1.170, p < 0.01), precipitation (OR = 0.615, p < 0.001) and relative humidity (OR = 1.337, p < 0.001). The improved model obtained a better prediction performance and a higher fitting accuracy than the traditional model: in the improved model 90.1% (91/101) of the cases during February 2014 occurred in the high risk areas (the predictive risk > 0.70) of the predictive risk map, whereas 44.6% (45/101) of which overlaid on the high risk areas (the predictive risk > 0.70) for the traditional model, and the fitting accuracy of the improved model was 91.6% which was superior to the traditional model (86.1%). The predictive risk map generated based on the improved model revealed that the east and southeast of China were the high risk areas of A(H7N9) human infections in February 2014. These results provided baseline data for the control and prevention of future human infections. PMID:26633446

  11. A testing strategy to predict risk for drug-induced liver injury in humans using high-content screen assays and the 'rule-of-two' model.

    PubMed

    Chen, Minjun; Tung, Chun-Wei; Shi, Qiang; Guo, Lei; Shi, Leming; Fang, Hong; Borlak, Jürgen; Tong, Weida

    2014-07-01

    Drug-induced liver injury (DILI) is a major cause of drug failures in both the preclinical and clinical phase. Consequently, improving prediction of DILI at an early stage of drug discovery will reduce the potential failures in the subsequent drug development program. In this regard, high-content screening (HCS) assays are considered as a promising strategy for the study of DILI; however, the predictive performance of HCS assays is frequently insufficient. In the present study, a new testing strategy was developed to improve DILI prediction by employing in vitro assays that was combined with the RO2 model (i.e., 'rule-of-two' defined by daily dose ≥100 mg/day & logP ≥3). The RO2 model was derived from the observation that high daily doses and lipophilicity of an oral medication were associated with significant DILI risk in humans. In the developed testing strategy, the RO2 model was used for the rational selection of candidates for HCS assays, and only the negatives predicted by the RO2 model were further investigated by HCS. Subsequently, the effects of drug treatment on cell loss, nuclear size, DNA damage/fragmentation, apoptosis, lysosomal mass, mitochondrial membrane potential, and steatosis were studied in cultures of primary rat hepatocytes. Using a set of 70 drugs with clear evidence of clinically relevant DILI, the testing strategy improved the accuracies by 10 % and reduced the number of drugs requiring experimental assessment by approximately 20 %, as compared to the HCS assay alone. Moreover, the testing strategy was further validated by including published data (Cosgrove et al. in Toxicol Appl Pharmacol 237:317-330, 2009) on drug-cytokine-induced hepatotoxicity, which improved the accuracies by 7 %. Taken collectively, the proposed testing strategy can significantly improve the prediction of in vitro assays for detecting DILI liability in an early drug discovery phase.

  12. FlexStem: improving predictions of RNA secondary structures with pseudoknots by reducing the search space.

    PubMed

    Chen, Xiang; He, Si-Min; Bu, Dongbo; Zhang, Fa; Wang, Zhiyong; Chen, Runsheng; Gao, Wen

    2008-09-15

    RNA secondary structures with pseudoknots are often predicted by minimizing free energy, which is proved to be NP-hard. Due to kinetic reasons the real RNA secondary structure often has local instead of global minimum free energy. This implies that we may improve the performance of RNA secondary structure prediction by taking kinetics into account and minimize free energy in a local area. we propose a novel algorithm named FlexStem to predict RNA secondary structures with pseudoknots. Still based on MFE criterion, FlexStem adopts comprehensive energy models that allow complex pseudoknots. Unlike classical thermodynamic methods, our approach aims to simulate the RNA folding process by successive addition of maximal stems, reducing the search space while maintaining or even improving the prediction accuracy. This reduced space is constructed by our maximal stem strategy and stem-adding rule induced from elaborate statistical experiments on real RNA secondary structures. The strategy and the rule also reflect the folding characteristic of RNA from a new angle and help compensate for the deficiency of merely relying on MFE in RNA structure prediction. We validate FlexStem by applying it to tRNAs, 5SrRNAs and a large number of pseudoknotted structures and compare it with the well-known algorithms such as RNAfold, PKNOTS, PknotsRG, HotKnots and ILM according to their overall sensitivities and specificities, as well as positive and negative controls on pseudoknots. The results show that FlexStem significantly increases the prediction accuracy through its local search strategy. Software is available at http://pfind.ict.ac.cn/FlexStem/. Supplementary data are available at Bioinformatics online.

  13. Conflation and aggregation of spatial data improve predictive models for species with limited habitats: a case of the threatened yellow-billed cuckoo in Arizona, USA

    USGS Publications Warehouse

    Villarreal, Miguel L.; van Riper, Charles; Petrakis, Roy E.

    2013-01-01

    Riparian vegetation provides important wildlife habitat in the Southwestern United States, but limited distributions and spatial complexity often leads to inaccurate representation in maps used to guide conservation. We test the use of data conflation and aggregation on multiple vegetation/land-cover maps to improve the accuracy of habitat models for the threatened western yellow-billed cuckoo (Coccyzus americanus occidentalis). We used species observations (n = 479) from a state-wide survey to develop habitat models from 1) three vegetation/land-cover maps produced at different geographic scales ranging from state to national, and 2) new aggregate maps defined by the spatial agreement of cover types, which were defined as high (agreement = all data sets), moderate (agreement ≥ 2), and low (no agreement required). Model accuracies, predicted habitat locations, and total area of predicted habitat varied considerably, illustrating the effects of input data quality on habitat predictions and resulting potential impacts on conservation planning. Habitat models based on aggregated and conflated data were more accurate and had higher model sensitivity than original vegetation/land-cover, but this accuracy came at the cost of reduced geographic extent of predicted habitat. Using the highest performing models, we assessed cuckoo habitat preference and distribution in Arizona and found that major watersheds containing high-probably habitat are fragmented by a wide swath of low-probability habitat. Focus on riparian restoration in these areas could provide more breeding habitat for the threatened cuckoo, offset potential future habitat losses in adjacent watershed, and increase regional connectivity for other threatened vertebrates that also use riparian corridors.

  14. Short-term load forecasting of power system

    NASA Astrophysics Data System (ADS)

    Xu, Xiaobin

    2017-05-01

    In order to ensure the scientific nature of optimization about power system, it is necessary to improve the load forecasting accuracy. Power system load forecasting is based on accurate statistical data and survey data, starting from the history and current situation of electricity consumption, with a scientific method to predict the future development trend of power load and change the law of science. Short-term load forecasting is the basis of power system operation and analysis, which is of great significance to unit combination, economic dispatch and safety check. Therefore, the load forecasting of the power system is explained in detail in this paper. First, we use the data from 2012 to 2014 to establish the partial least squares model to regression analysis the relationship between daily maximum load, daily minimum load, daily average load and each meteorological factor, and select the highest peak by observing the regression coefficient histogram Day maximum temperature, daily minimum temperature and daily average temperature as the meteorological factors to improve the accuracy of load forecasting indicators. Secondly, in the case of uncertain climate impact, we use the time series model to predict the load data for 2015, respectively, the 2009-2014 load data were sorted out, through the previous six years of the data to forecast the data for this time in 2015. The criterion for the accuracy of the prediction is the average of the standard deviations for the prediction results and average load for the previous six years. Finally, considering the climate effect, we use the BP neural network model to predict the data in 2015, and optimize the forecast results on the basis of the time series model.

  15. Investigation of metabolites for estimating blood deposition time.

    PubMed

    Lech, Karolina; Liu, Fan; Davies, Sarah K; Ackermann, Katrin; Ang, Joo Ern; Middleton, Benita; Revell, Victoria L; Raynaud, Florence J; Hoveijn, Igor; Hut, Roelof A; Skene, Debra J; Kayser, Manfred

    2018-01-01

    Trace deposition timing reflects a novel concept in forensic molecular biology involving the use of rhythmic biomarkers for estimating the time within a 24-h day/night cycle a human biological sample was left at the crime scene, which in principle allows verifying a sample donor's alibi. Previously, we introduced two circadian hormones for trace deposition timing and recently demonstrated that messenger RNA (mRNA) biomarkers significantly improve time prediction accuracy. Here, we investigate the suitability of metabolites measured using a targeted metabolomics approach, for trace deposition timing. Analysis of 171 plasma metabolites collected around the clock at 2-h intervals for 36 h from 12 male participants under controlled laboratory conditions identified 56 metabolites showing statistically significant oscillations, with peak times falling into three day/night time categories: morning/noon, afternoon/evening and night/early morning. Time prediction modelling identified 10 independently contributing metabolite biomarkers, which together achieved prediction accuracies expressed as AUC of 0.81, 0.86 and 0.90 for these three time categories respectively. Combining metabolites with previously established hormone and mRNA biomarkers in time prediction modelling resulted in an improved prediction accuracy reaching AUCs of 0.85, 0.89 and 0.96 respectively. The additional impact of metabolite biomarkers, however, was rather minor as the previously established model with melatonin, cortisol and three mRNA biomarkers achieved AUC values of 0.88, 0.88 and 0.95 for the same three time categories respectively. Nevertheless, the selected metabolites could become practically useful in scenarios where RNA marker information is unavailable such as due to RNA degradation. This is the first metabolomics study investigating circulating metabolites for trace deposition timing, and more work is needed to fully establish their usefulness for this forensic purpose.

  16. Predicting the Naturalistic Course of Major Depressive Disorder Using Clinical and Multimodal Neuroimaging Information: A Multivariate Pattern Recognition Study.

    PubMed

    Schmaal, Lianne; Marquand, Andre F; Rhebergen, Didi; van Tol, Marie-José; Ruhé, Henricus G; van der Wee, Nic J A; Veltman, Dick J; Penninx, Brenda W J H

    2015-08-15

    A chronic course of major depressive disorder (MDD) is associated with profound alterations in brain volumes and emotional and cognitive processing. However, no neurobiological markers have been identified that prospectively predict MDD course trajectories. This study evaluated the prognostic value of different neuroimaging modalities, clinical characteristics, and their combination to classify MDD course trajectories. One hundred eighteen MDD patients underwent structural and functional magnetic resonance imaging (MRI) (emotional facial expressions and executive functioning) and were clinically followed-up at 2 years. Three MDD trajectories (chronic n = 23, gradual improving n = 36, and fast remission n = 59) were identified based on Life Chart Interview measuring the presence of symptoms each month. Gaussian process classifiers were employed to evaluate prognostic value of neuroimaging data and clinical characteristics (including baseline severity, duration, and comorbidity). Chronic patients could be discriminated from patients with more favorable trajectories from neural responses to various emotional faces (up to 73% accuracy) but not from structural MRI and functional MRI related to executive functioning. Chronic patients could also be discriminated from remitted patients based on clinical characteristics (accuracy 69%) but not when age differences between the groups were taken into account. Combining different task contrasts or data sources increased prediction accuracies in some but not all cases. Our findings provide evidence that the prediction of naturalistic course of depression over 2 years is improved by considering neuroimaging data especially derived from neural responses to emotional facial expressions. Neural responses to emotional salient faces more accurately predicted outcome than clinical data. Copyright © 2015 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.

  17. Combination of Pre-Treatment DWI-Signal Intensity and S-1 Treatment: A Predictor of Survival in Patients with Locally Advanced Pancreatic Cancer Receiving Stereotactic Body Radiation Therapy and Sequential S-1.

    PubMed

    Zhang, Yu; Zhu, Xiaofei; Liu, Ri; Wang, Xianglian; Sun, Gaofeng; Song, Jiaqi; Lu, Jianping; Zhang, Huojun

    2018-04-01

    To identify whether the combination of pre-treatment radiological and clinical factors can predict the overall survival (OS) in patients with locally advanced pancreatic cancer (LAPC) treated with stereotactic body radiation and sequential S-1 (a prodrug of 5-FU combined with two modulators) therapy with improved accuracy compared with that of established clinical and radiologic risk models. Patients admitted with LAPC underwent diffusion weighted imaging (DWI) scan at 3.0-T (b = 600 s/mm 2 ). The mean signal intensity (SI b = 600) of region-of-interest (ROI) was measured. The Log-rank test was done for tumor location, biliary stent, S-1, and other treatments and the Cox regression analysis was done to identify independent prognostic factors for OS. Prediction error curves (PEC) were used to assess potential errors in prediction of survival. The accuracy of prediction was evaluated by Integrated Brier Score (IBS) and C index. 41 patients were included in this study. The median OS was 11.7 months (2.8-23.23 months). The 1-year OS was 46%. Multivariate analysis showed that pre-treatment SI b = 600 value and administration of S-1 were independent predictors for OS. The performance of pre-treatment SI b = 600 and S-1 treatment in combination was better than that of SI b = 600 or S-1 treatment alone. The combination of pre-treatment SI b = 600 and S-1 treatment could predict the OS in patients with LAPC undergoing SBRT and sequential S-1 therapy with improved accuracy compared with that of established clinical and radiologic risk models. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.

  18. Remote sensing-based predictors improve distribution models of rare, early successional and broadleaf tree species in Utah

    USGS Publications Warehouse

    Zimmermann, N.E.; Edwards, T.C.; Moisen, Gretchen G.; Frescino, T.S.; Blackard, J.A.

    2007-01-01

    1. Compared to bioclimatic variables, remote sensing predictors are rarely used for predictive species modelling. When used, the predictors represent typically habitat classifications or filters rather than gradual spectral, surface or biophysical properties. Consequently, the full potential of remotely sensed predictors for modelling the spatial distribution of species remains unexplored. Here we analysed the partial contributions of remotely sensed and climatic predictor sets to explain and predict the distribution of 19 tree species in Utah. We also tested how these partial contributions were related to characteristics such as successional types or species traits. 2. We developed two spatial predictor sets of remotely sensed and topo-climatic variables to explain the distribution of tree species. We used variation partitioning techniques applied to generalized linear models to explore the combined and partial predictive powers of the two predictor sets. Non-parametric tests were used to explore the relationships between the partial model contributions of both predictor sets and species characteristics. 3. More than 60% of the variation explained by the models represented contributions by one of the two partial predictor sets alone, with topo-climatic variables outperforming the remotely sensed predictors. However, the partial models derived from only remotely sensed predictors still provided high model accuracies, indicating a significant correlation between climate and remote sensing variables. The overall accuracy of the models was high, but small sample sizes had a strong effect on cross-validated accuracies for rare species. 4. Models of early successional and broadleaf species benefited significantly more from adding remotely sensed predictors than did late seral and needleleaf species. The core-satellite species types differed significantly with respect to overall model accuracies. Models of satellite and urban species, both with low prevalence, benefited more from use of remotely sensed predictors than did the more frequent core species. 5. Synthesis and applications. If carefully prepared, remotely sensed variables are useful additional predictors for the spatial distribution of trees. Major improvements resulted for deciduous, early successional, satellite and rare species. The ability to improve model accuracy for species having markedly different life history strategies is a crucial step for assessing effects of global change. ?? 2007 The Authors.

  19. Remote sensing-based predictors improve distribution models of rare, early successional and broadleaf tree species in Utah

    PubMed Central

    ZIMMERMANN, N E; EDWARDS, T C; MOISEN, G G; FRESCINO, T S; BLACKARD, J A

    2007-01-01

    Compared to bioclimatic variables, remote sensing predictors are rarely used for predictive species modelling. When used, the predictors represent typically habitat classifications or filters rather than gradual spectral, surface or biophysical properties. Consequently, the full potential of remotely sensed predictors for modelling the spatial distribution of species remains unexplored. Here we analysed the partial contributions of remotely sensed and climatic predictor sets to explain and predict the distribution of 19 tree species in Utah. We also tested how these partial contributions were related to characteristics such as successional types or species traits. We developed two spatial predictor sets of remotely sensed and topo-climatic variables to explain the distribution of tree species. We used variation partitioning techniques applied to generalized linear models to explore the combined and partial predictive powers of the two predictor sets. Non-parametric tests were used to explore the relationships between the partial model contributions of both predictor sets and species characteristics. More than 60% of the variation explained by the models represented contributions by one of the two partial predictor sets alone, with topo-climatic variables outperforming the remotely sensed predictors. However, the partial models derived from only remotely sensed predictors still provided high model accuracies, indicating a significant correlation between climate and remote sensing variables. The overall accuracy of the models was high, but small sample sizes had a strong effect on cross-validated accuracies for rare species. Models of early successional and broadleaf species benefited significantly more from adding remotely sensed predictors than did late seral and needleleaf species. The core-satellite species types differed significantly with respect to overall model accuracies. Models of satellite and urban species, both with low prevalence, benefited more from use of remotely sensed predictors than did the more frequent core species. Synthesis and applications. If carefully prepared, remotely sensed variables are useful additional predictors for the spatial distribution of trees. Major improvements resulted for deciduous, early successional, satellite and rare species. The ability to improve model accuracy for species having markedly different life history strategies is a crucial step for assessing effects of global change. PMID:18642470

  20. Convective Weather Forecast Accuracy Analysis at Center and Sector Levels

    NASA Technical Reports Server (NTRS)

    Wang, Yao; Sridhar, Banavar

    2010-01-01

    This paper presents a detailed convective forecast accuracy analysis at center and sector levels. The study is aimed to provide more meaningful forecast verification measures to aviation community, as well as to obtain useful information leading to the improvements in the weather translation capacity models. In general, the vast majority of forecast verification efforts over past decades have been on the calculation of traditional standard verification measure scores over forecast and observation data analyses onto grids. These verification measures based on the binary classification have been applied in quality assurance of weather forecast products at the national level for many years. Our research focuses on the forecast at the center and sector levels. We calculate the standard forecast verification measure scores for en-route air traffic centers and sectors first, followed by conducting the forecast validation analysis and related verification measures for weather intensities and locations at centers and sectors levels. An approach to improve the prediction of sector weather coverage by multiple sector forecasts is then developed. The weather severe intensity assessment was carried out by using the correlations between forecast and actual weather observation airspace coverage. The weather forecast accuracy on horizontal location was assessed by examining the forecast errors. The improvement in prediction of weather coverage was determined by the correlation between actual sector weather coverage and prediction. observed and forecasted Convective Weather Avoidance Model (CWAM) data collected from June to September in 2007. CWAM zero-minute forecast data with aircraft avoidance probability of 60% and 80% are used as the actual weather observation. All forecast measurements are based on 30-minute, 60- minute, 90-minute, and 120-minute forecasts with the same avoidance probabilities. The forecast accuracy analysis for times under one-hour showed that the errors in intensity and location for center forecast are relatively low. For example, 1-hour forecast intensity and horizontal location errors for ZDC center were about 0.12 and 0.13. However, the correlation between sector 1-hour forecast and actual weather coverage was weak, for sector ZDC32, about 32% of the total variation of observation weather intensity was unexplained by forecast; the sector horizontal location error was about 0.10. The paper also introduces an approach to estimate the sector three-dimensional actual weather coverage by using multiple sector forecasts, which turned out to produce better predictions. Using Multiple Linear Regression (MLR) model for this approach, the correlations between actual observation and the multiple sector forecast model prediction improved by several percents at 95% confidence level in comparison with single sector forecast.

  1. Prediction of recovery of motor function after stroke.

    PubMed

    Stinear, Cathy

    2010-12-01

    Stroke is a leading cause of disability. The ability to live independently after stroke depends largely on the reduction of motor impairment and the recovery of motor function. Accurate prediction of motor recovery assists rehabilitation planning and supports realistic goal setting by clinicians and patients. Initial impairment is negatively related to degree of recovery, but inter-individual variability makes accurate prediction difficult. Neuroimaging and neurophysiological assessments can be used to measure the extent of stroke damage to the motor system and predict subsequent recovery of function, but these techniques are not yet used routinely. The use of motor impairment scores and neuroimaging has been refined by two recent studies in which these investigations were used at multiple time points early after stroke. Voluntary finger extension and shoulder abduction within 5 days of stroke predicted subsequent recovery of upper-limb function. Diffusion-weighted imaging within 7 days detected the effects of stroke on caudal motor pathways and was predictive of lasting motor impairment. Thus, investigations done soon after stroke had good prognostic value. The potential prognostic value of cortical activation and neural plasticity has been explored for the first time by two recent studies. Functional MRI detected a pattern of cortical activation at the acute stage that was related to subsequent reduction in motor impairment. Transcranial magnetic stimulation enabled measurement of neural plasticity in the primary motor cortex, which was related to subsequent disability. These studies open interesting new lines of enquiry. WHERE NEXT?: The accuracy of prediction might be increased by taking into account the motor system's capacity for functional reorganisation in response to therapy, in addition to the extent of stroke-related damage. Improved prognostic accuracy could also be gained by combining simple tests of motor impairment with neuroimaging, genotyping, and neurophysiological assessment of neural plasticity. The development of algorithms to guide the sequential combinations of these assessments could also further increase accuracy, in addition to improving rehabilitation planning and outcomes. Copyright © 2010 Elsevier Ltd. All rights reserved.

  2. R2C: improving ab initio residue contact map prediction using dynamic fusion strategy and Gaussian noise filter.

    PubMed

    Yang, Jing; Jin, Qi-Yu; Zhang, Biao; Shen, Hong-Bin

    2016-08-15

    Inter-residue contacts in proteins dictate the topology of protein structures. They are crucial for protein folding and structural stability. Accurate prediction of residue contacts especially for long-range contacts is important to the quality of ab inito structure modeling since they can enforce strong restraints to structure assembly. In this paper, we present a new Residue-Residue Contact predictor called R2C that combines machine learning-based and correlated mutation analysis-based methods, together with a two-dimensional Gaussian noise filter to enhance the long-range residue contact prediction. Our results show that the outputs from the machine learning-based method are concentrated with better performance on short-range contacts; while for correlated mutation analysis-based approach, the predictions are widespread with higher accuracy on long-range contacts. An effective query-driven dynamic fusion strategy proposed here takes full advantages of the two different methods, resulting in an impressive overall accuracy improvement. We also show that the contact map directly from the prediction model contains the interesting Gaussian noise, which has not been discovered before. Different from recent studies that tried to further enhance the quality of contact map by removing its transitive noise, we designed a new two-dimensional Gaussian noise filter, which was especially helpful for reinforcing the long-range residue contact prediction. Tested on recent CASP10/11 datasets, the overall top L/5 accuracy of our final R2C predictor is 17.6%/15.5% higher than the pure machine learning-based method and 7.8%/8.3% higher than the correlated mutation analysis-based approach for the long-range residue contact prediction. http://www.csbio.sjtu.edu.cn/bioinf/R2C/Contact:hbshen@sjtu.edu.cn Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  3. Improve the prediction of RNA-binding residues using structural neighbours.

    PubMed

    Li, Quan; Cao, Zanxia; Liu, Haiyan

    2010-03-01

    The interactions between RNA-binding proteins (RBPs) with RNA play key roles in managing some of the cell's basic functions. The identification and prediction of RNA binding sites is important for understanding the RNA-binding mechanism. Computational approaches are being developed to predict RNA-binding residues based on the sequence- or structure-derived features. To achieve higher prediction accuracy, improvements on current prediction methods are necessary. We identified that the structural neighbors of RNA-binding and non-RNA-binding residues have different amino acid compositions. Combining this structure-derived feature with evolutionary (PSSM) and other structural information (secondary structure and solvent accessibility) significantly improves the predictions over existing methods. Using a multiple linear regression approach and 6-fold cross validation, our best model can achieve an overall correct rate of 87.8% and MCC of 0.47, with a specificity of 93.4%, correctly predict 52.4% of the RNA-binding residues for a dataset containing 107 non-homologous RNA-binding proteins. Compared with existing methods, including the amino acid compositions of structure neighbors lead to clearly improvement. A web server was developed for predicting RNA binding residues in a protein sequence (or structure),which is available at http://mcgill.3322.org/RNA/.

  4. ENSO Predictions in an Intermediate Coupled Model Influenced by Removing Initial Condition Errors in Sensitive Areas: A Target Observation Perspective

    NASA Astrophysics Data System (ADS)

    Tao, Ling-Jiang; Gao, Chuan; Zhang, Rong-Hua

    2018-07-01

    Previous studies indicate that ENSO predictions are particularly sensitive to the initial conditions in some key areas (socalled "sensitive areas"). And yet, few studies have quantified improvements in prediction skill in the context of an optimal observing system. In this study, the impact on prediction skill is explored using an intermediate coupled model in which errors in initial conditions formed to make ENSO predictions are removed in certain areas. Based on ideal observing system simulation experiments, the importance of various observational networks on improvement of El Niño prediction skill is examined. The results indicate that the initial states in the central and eastern equatorial Pacific are important to improve El Ni˜no prediction skill effectively. When removing the initial condition errors in the central equatorial Pacific, ENSO prediction errors can be reduced by 25%. Furthermore, combinations of various subregions are considered to demonstrate the efficiency on ENSO prediction skill. Particularly, seasonally varying observational networks are suggested to improve the prediction skill more effectively. For example, in addition to observing in the central equatorial Pacific and its north throughout the year, increasing observations in the eastern equatorial Pacific during April to October is crucially important, which can improve the prediction accuracy by 62%. These results also demonstrate the effectiveness of the conditional nonlinear optimal perturbation approach on detecting sensitive areas for target observations.

  5. An Improved Version of the NASA-Lockheed Multielement Airfoil Analysis Computer Program

    NASA Technical Reports Server (NTRS)

    Brune, G. W.; Manke, J. W.

    1978-01-01

    An improved version of the NASA-Lockheed computer program for the analysis of multielement airfoils is described. The predictions of the program are evaluated by comparison with recent experimental high lift data including lift, pitching moment, profile drag, and detailed distributions of surface pressures and boundary layer parameters. The results of the evaluation show that the contract objectives of improving program reliability and accuracy have been met.

  6. Sequence fingerprints distinguish erroneous from correct predictions of intrinsically disordered protein regions.

    PubMed

    Saravanan, Konda Mani; Dunker, A Keith; Krishnaswamy, Sankaran

    2017-12-27

    More than 60 prediction methods for intrinsically disordered proteins (IDPs) have been developed over the years, many of which are accessible on the World Wide Web. Nearly, all of these predictors give balanced accuracies in the ~65%-~80% range. Since predictors are not perfect, further studies are required to uncover the role of amino acid residues in native IDP as compared to predicted IDP regions. In the present work, we make use of sequences of 100% predicted IDP regions, false positive disorder predictions, and experimentally determined IDP regions to distinguish the characteristics of native versus predicted IDP regions. A higher occurrence of asparagine is observed in sequences of native IDP regions but not in sequences of false positive predictions of IDP regions. The occurrences of certain combinations of amino acids at the pentapeptide level provide a distinguishing feature in the IDPs with respect to globular proteins. The distinguishing features presented in this paper provide insights into the sequence fingerprints of amino acid residues in experimentally determined as compared to predicted IDP regions. These observations and additional work along these lines should enable the development of improvements in the accuracy of disorder prediction algorithm.

  7. Small-time Scale Network Traffic Prediction Based on Complex-valued Neural Network

    NASA Astrophysics Data System (ADS)

    Yang, Bin

    2017-07-01

    Accurate models play an important role in capturing the significant characteristics of the network traffic, analyzing the network dynamic, and improving the forecasting accuracy for system dynamics. In this study, complex-valued neural network (CVNN) model is proposed to further improve the accuracy of small-time scale network traffic forecasting. Artificial bee colony (ABC) algorithm is proposed to optimize the complex-valued and real-valued parameters of CVNN model. Small-scale traffic measurements data namely the TCP traffic data is used to test the performance of CVNN model. Experimental results reveal that CVNN model forecasts the small-time scale network traffic measurement data very accurately

  8. Improved classification accuracy of powdery mildew infection levels of wine grapes by spatial-spectral analysis of hyperspectral images.

    PubMed

    Knauer, Uwe; Matros, Andrea; Petrovic, Tijana; Zanker, Timothy; Scott, Eileen S; Seiffert, Udo

    2017-01-01

    Hyperspectral imaging is an emerging means of assessing plant vitality, stress parameters, nutrition status, and diseases. Extraction of target values from the high-dimensional datasets either relies on pixel-wise processing of the full spectral information, appropriate selection of individual bands, or calculation of spectral indices. Limitations of such approaches are reduced classification accuracy, reduced robustness due to spatial variation of the spectral information across the surface of the objects measured as well as a loss of information intrinsic to band selection and use of spectral indices. In this paper we present an improved spatial-spectral segmentation approach for the analysis of hyperspectral imaging data and its application for the prediction of powdery mildew infection levels (disease severity) of intact Chardonnay grape bunches shortly before veraison. Instead of calculating texture features (spatial features) for the huge number of spectral bands independently, dimensionality reduction by means of Linear Discriminant Analysis (LDA) was applied first to derive a few descriptive image bands. Subsequent classification was based on modified Random Forest classifiers and selective extraction of texture parameters from the integral image representation of the image bands generated. Dimensionality reduction, integral images, and the selective feature extraction led to improved classification accuracies of up to [Formula: see text] for detached berries used as a reference sample (training dataset). Our approach was validated by predicting infection levels for a sample of 30 intact bunches. Classification accuracy improved with the number of decision trees of the Random Forest classifier. These results corresponded with qPCR results. An accuracy of 0.87 was achieved in classification of healthy, infected, and severely diseased bunches. However, discrimination between visually healthy and infected bunches proved to be challenging for a few samples, perhaps due to colonized berries or sparse mycelia hidden within the bunch or airborne conidia on the berries that were detected by qPCR. An advanced approach to hyperspectral image classification based on combined spatial and spectral image features, potentially applicable to many available hyperspectral sensor technologies, has been developed and validated to improve the detection of powdery mildew infection levels of Chardonnay grape bunches. The spatial-spectral approach improved especially the detection of light infection levels compared with pixel-wise spectral data analysis. This approach is expected to improve the speed and accuracy of disease detection once the thresholds for fungal biomass detected by hyperspectral imaging are established; it can also facilitate monitoring in plant phenotyping of grapevine and additional crops.

  9. Evaluation of approaches for estimating the accuracy of genomic prediction in plant breeding

    PubMed Central

    2013-01-01

    Background In genomic prediction, an important measure of accuracy is the correlation between the predicted and the true breeding values. Direct computation of this quantity for real datasets is not possible, because the true breeding value is unknown. Instead, the correlation between the predicted breeding values and the observed phenotypic values, called predictive ability, is often computed. In order to indirectly estimate predictive accuracy, this latter correlation is usually divided by an estimate of the square root of heritability. In this study we use simulation to evaluate estimates of predictive accuracy for seven methods, four (1 to 4) of which use an estimate of heritability to divide predictive ability computed by cross-validation. Between them the seven methods cover balanced and unbalanced datasets as well as correlated and uncorrelated genotypes. We propose one new indirect method (4) and two direct methods (5 and 6) for estimating predictive accuracy and compare their performances and those of four other existing approaches (three indirect (1 to 3) and one direct (7)) with simulated true predictive accuracy as the benchmark and with each other. Results The size of the estimated genetic variance and hence heritability exerted the strongest influence on the variation in the estimated predictive accuracy. Increasing the number of genotypes considerably increases the time required to compute predictive accuracy by all the seven methods, most notably for the five methods that require cross-validation (Methods 1, 2, 3, 4 and 6). A new method that we propose (Method 5) and an existing method (Method 7) used in animal breeding programs were the fastest and gave the least biased, most precise and stable estimates of predictive accuracy. Of the methods that use cross-validation Methods 4 and 6 were often the best. Conclusions The estimated genetic variance and the number of genotypes had the greatest influence on predictive accuracy. Methods 5 and 7 were the fastest and produced the least biased, the most precise, robust and stable estimates of predictive accuracy. These properties argue for routinely using Methods 5 and 7 to assess predictive accuracy in genomic selection studies. PMID:24314298

  10. Evaluation of approaches for estimating the accuracy of genomic prediction in plant breeding.

    PubMed

    Ould Estaghvirou, Sidi Boubacar; Ogutu, Joseph O; Schulz-Streeck, Torben; Knaak, Carsten; Ouzunova, Milena; Gordillo, Andres; Piepho, Hans-Peter

    2013-12-06

    In genomic prediction, an important measure of accuracy is the correlation between the predicted and the true breeding values. Direct computation of this quantity for real datasets is not possible, because the true breeding value is unknown. Instead, the correlation between the predicted breeding values and the observed phenotypic values, called predictive ability, is often computed. In order to indirectly estimate predictive accuracy, this latter correlation is usually divided by an estimate of the square root of heritability. In this study we use simulation to evaluate estimates of predictive accuracy for seven methods, four (1 to 4) of which use an estimate of heritability to divide predictive ability computed by cross-validation. Between them the seven methods cover balanced and unbalanced datasets as well as correlated and uncorrelated genotypes. We propose one new indirect method (4) and two direct methods (5 and 6) for estimating predictive accuracy and compare their performances and those of four other existing approaches (three indirect (1 to 3) and one direct (7)) with simulated true predictive accuracy as the benchmark and with each other. The size of the estimated genetic variance and hence heritability exerted the strongest influence on the variation in the estimated predictive accuracy. Increasing the number of genotypes considerably increases the time required to compute predictive accuracy by all the seven methods, most notably for the five methods that require cross-validation (Methods 1, 2, 3, 4 and 6). A new method that we propose (Method 5) and an existing method (Method 7) used in animal breeding programs were the fastest and gave the least biased, most precise and stable estimates of predictive accuracy. Of the methods that use cross-validation Methods 4 and 6 were often the best. The estimated genetic variance and the number of genotypes had the greatest influence on predictive accuracy. Methods 5 and 7 were the fastest and produced the least biased, the most precise, robust and stable estimates of predictive accuracy. These properties argue for routinely using Methods 5 and 7 to assess predictive accuracy in genomic selection studies.

  11. Development of a near-infrared spectroscopic system for monitoring urine glucose level for the use of long-term home healthcare

    NASA Astrophysics Data System (ADS)

    Tanaka, Shinobu; Hayakawa, Yuuto; Ogawa, Mitsuhiro; Yamakoshi, Ken-ichi

    2010-08-01

    We have been developing a new technique for measuring urine glucose concentration using near infrared spectroscopy (NIRS) in conjunction with the Partial Least Square (PLS) method. In the previous study, we reported some results of preliminary experiments for assessing feasibility of this method using a FT-IR spectrometer. In this study, considering practicability of the system, a flow-through cell with the optical path length of 10 mm was newly introduced. Accuracy of the system was verified by the preliminary experiments using urine samples. From the results obtained, it was clearly demonstrated that the present method had a capability of predicting individual urine glucose level with reasonable accuracy (the minimum value of standard error of prediction: SEP = 22.3 mg/dl) and appeared to be a useful means for long-term home health care. However, mean value of SEP obtained by the urine samples from ten subjects was not satisfactorily low (53.7 mg/dl). For improving the accuracy, (1) mechanical stability of the optical system should be improved, (2) the method for normalizing the spectrum should be reconsidered, and (3) the number of subject should be increased.

  12. The prediction of zenith range refraction from surface measurements of meteorological parameters. [mathematical models of atmospheric refraction used to improve spacecraft tracking space navigation

    NASA Technical Reports Server (NTRS)

    Berman, A. L.

    1976-01-01

    In the last two decades, increasingly sophisticated deep space missions have placed correspondingly stringent requirements on navigational accuracy. As part of the effort to increase navigational accuracy, and hence the quality of radiometric data, much effort has been expended in an attempt to understand and compute the tropospheric effect on range (and hence range rate) data. The general approach adopted has been that of computing a zenith range refraction, and then mapping this refraction to any arbitrary elevation angle via an empirically derived function of elevation. The prediction of zenith range refraction derived from surface measurements of meteorological parameters is presented. Refractivity is separated into wet (water vapor pressure) and dry (atmospheric pressure) components. The integration of dry refractivity is shown to be exact. Attempts to integrate wet refractivity directly prove ineffective; however, several empirical models developed by the author and other researchers at JPL are discussed. The best current wet refraction model is here considered to be a separate day/night model, which is proportional to surface water vapor pressure and inversely proportional to surface temperature. Methods are suggested that might improve the accuracy of the wet range refraction model.

  13. Towards Cooperative Predictive Data Mining in Competitive Environments

    NASA Astrophysics Data System (ADS)

    Lisý, Viliam; Jakob, Michal; Benda, Petr; Urban, Štěpán; Pěchouček, Michal

    We study the problem of predictive data mining in a competitive multi-agent setting, in which each agent is assumed to have some partial knowledge required for correctly classifying a set of unlabelled examples. The agents are self-interested and therefore need to reason about the trade-offs between increasing their classification accuracy by collaborating with other agents and disclosing their private classification knowledge to other agents through such collaboration. We analyze the problem and propose a set of components which can enable cooperation in this otherwise competitive task. These components include measures for quantifying private knowledge disclosure, data-mining models suitable for multi-agent predictive data mining, and a set of strategies by which agents can improve their classification accuracy through collaboration. The overall framework and its individual components are validated on a synthetic experimental domain.

  14. Developing EHR-driven heart failure risk prediction models using CPXR(Log) with the probabilistic loss function.

    PubMed

    Taslimitehrani, Vahid; Dong, Guozhu; Pereira, Naveen L; Panahiazar, Maryam; Pathak, Jyotishman

    2016-04-01

    Computerized survival prediction in healthcare identifying the risk of disease mortality, helps healthcare providers to effectively manage their patients by providing appropriate treatment options. In this study, we propose to apply a classification algorithm, Contrast Pattern Aided Logistic Regression (CPXR(Log)) with the probabilistic loss function, to develop and validate prognostic risk models to predict 1, 2, and 5year survival in heart failure (HF) using data from electronic health records (EHRs) at Mayo Clinic. The CPXR(Log) constructs a pattern aided logistic regression model defined by several patterns and corresponding local logistic regression models. One of the models generated by CPXR(Log) achieved an AUC and accuracy of 0.94 and 0.91, respectively, and significantly outperformed prognostic models reported in prior studies. Data extracted from EHRs allowed incorporation of patient co-morbidities into our models which helped improve the performance of the CPXR(Log) models (15.9% AUC improvement), although did not improve the accuracy of the models built by other classifiers. We also propose a probabilistic loss function to determine the large error and small error instances. The new loss function used in the algorithm outperforms other functions used in the previous studies by 1% improvement in the AUC. This study revealed that using EHR data to build prediction models can be very challenging using existing classification methods due to the high dimensionality and complexity of EHR data. The risk models developed by CPXR(Log) also reveal that HF is a highly heterogeneous disease, i.e., different subgroups of HF patients require different types of considerations with their diagnosis and treatment. Our risk models provided two valuable insights for application of predictive modeling techniques in biomedicine: Logistic risk models often make systematic prediction errors, and it is prudent to use subgroup based prediction models such as those given by CPXR(Log) when investigating heterogeneous diseases. Copyright © 2016 Elsevier Inc. All rights reserved.

  15. Evaluation of CROES Nephrolithometry Nomogram as a Preoperative Predictive System for Percutaneous Nephrolithotomy Outcomes.

    PubMed

    Kumar, Sumit; Sreenivas, Jayaram; Karthikeyan, Vilvapathy Senguttuvan; Mallya, Ashwin; Keshavamurthy, Ramaiah

    2016-10-01

    Scoring systems have been devised to predict outcomes of percutaneous nephrolithotomy (PCNL). CROES nephrolithometry nomogram (CNN) is the latest tool devised to predict stone-free rate (SFR). We aim to compare predictive accuracy of CNN against Guy stone score (GSS) for SFR and postoperative outcomes. Between January 2013 and December 2015, 313 patients undergoing PCNL were analyzed for predictive accuracy of GSS, CNN, and stone burden (SB) for SFR, complications, operation time (OT), and length of hospitalization (LOH). We further stratified patients into risk groups based on CNN and GSS. Mean ± standard deviation (SD) SB was 298.8 ± 235.75 mm 2 . SB, GSS, and CNN (area under curve [AUC]: 0.662, 0.660, 0.673) were found to be predictors of SFR. However, predictability for complications was not as good (AUC: SB 0.583, GSS 0.554, CNN 0.580). Single implicated calix (Adj. OR 3.644; p = 0.027), absence of staghorn calculus (Adj. OR 3.091; p = 0.044), single stone (Adj. OR 3.855; p = 0.002), and single puncture (Adj. OR 2.309; p = 0.048) significantly predicted SFR on multivariate analysis. Charlson comorbidity index (CCI; p = 0.020) and staghorn calculus (p = 0.002) were independent predictors for complications on linear regression. SB and GSS independently predicted OT on multivariate analysis. SB and complications significantly predicted LOH, while GSS and CNN did not predict LOH. CNN offered better risk stratification for residual stones than GSS. CNN and GSS have good preoperative predictive accuracy for SFR. Number of implicated calices may affect SFR, and CCI affects complications. Studies should incorporate these factors in scoring systems and assess if predictability of PCNL outcomes improves.

  16. Improved age determination of blood and teeth samples using a selected set of DNA methylation markers

    PubMed Central

    Kamalandua, Aubeline

    2015-01-01

    Age estimation from DNA methylation markers has seen an exponential growth of interest, not in the least from forensic scientists. The current published assays, however, can still be improved by lowering the number of markers in the assay and by providing more accurate models to predict chronological age. From the published literature we selected 4 age-associated genes (ASPA, PDE4C, ELOVL2, and EDARADD) and determined CpG methylation levels from 206 blood samples of both deceased and living individuals (age range: 0–91 years). This data was subsequently used to compare prediction accuracy with both linear and non-linear regression models. A quadratic regression model in which the methylation levels of ELOVL2 were squared showed the highest accuracy with a Mean Absolute Deviation (MAD) between chronological age and predicted age of 3.75 years and an adjusted R2 of 0.95. No difference in accuracy was observed for samples obtained either from living and deceased individuals or between the 2 genders. In addition, 29 teeth from different individuals (age range: 19–70 years) were analyzed using the same set of markers resulting in a MAD of 4.86 years and an adjusted R2 of 0.74. Cross validation of the results obtained from blood samples demonstrated the robustness and reproducibility of the assay. In conclusion, the set of 4 CpG DNA methylation markers is capable of producing highly accurate age predictions for blood samples from deceased and living individuals PMID:26280308

  17. Automatically identifying health outcome information in MEDLINE records.

    PubMed

    Demner-Fushman, Dina; Few, Barbara; Hauser, Susan E; Thoma, George

    2006-01-01

    Understanding the effect of a given intervention on the patient's health outcome is one of the key elements in providing optimal patient care. This study presents a methodology for automatic identification of outcomes-related information in medical text and evaluates its potential in satisfying clinical information needs related to health care outcomes. An annotation scheme based on an evidence-based medicine model for critical appraisal of evidence was developed and used to annotate 633 MEDLINE citations. Textual, structural, and meta-information features essential to outcome identification were learned from the created collection and used to develop an automatic system. Accuracy of automatic outcome identification was assessed in an intrinsic evaluation and in an extrinsic evaluation, in which ranking of MEDLINE search results obtained using PubMed Clinical Queries relied on identified outcome statements. The accuracy and positive predictive value of outcome identification were calculated. Effectiveness of the outcome-based ranking was measured using mean average precision and precision at rank 10. Automatic outcome identification achieved 88% to 93% accuracy. The positive predictive value of individual sentences identified as outcomes ranged from 30% to 37%. Outcome-based ranking improved retrieval accuracy, tripling mean average precision and achieving 389% improvement in precision at rank 10. Preliminary results in outcome-based document ranking show potential validity of the evidence-based medicine-model approach in timely delivery of information critical to clinical decision support at the point of service.

  18. The accuracy of climate models' simulated season lengths and the effectiveness of grid scale correction factors

    DOE PAGES

    Winterhalter, Wade E.

    2011-09-01

    Global climate change is expected to impact biological populations through a variety of mechanisms including increases in the length of their growing season. Climate models are useful tools for predicting how season length might change in the future. However, the accuracy of these models tends to be rather low at regional geographic scales. Here, I determined the ability of several atmosphere and ocean general circulating models (AOGCMs) to accurately simulate historical season lengths for a temperate ectotherm across the continental United States. I also evaluated the effectiveness of regional-scale correction factors to improve the accuracy of these models. I foundmore » that both the accuracy of simulated season lengths and the effectiveness of the correction factors to improve the model's accuracy varied geographically and across models. These results suggest that regional specific correction factors do not always adequately remove potential discrepancies between simulated and historically observed environmental parameters. As such, an explicit evaluation of the correction factors' effectiveness should be included in future studies of global climate change's impact on biological populations.« less

  19. Validation of Groundwater Models: Meaningful or Meaningless?

    NASA Astrophysics Data System (ADS)

    Konikow, L. F.

    2003-12-01

    Although numerical simulation models are valuable tools for analyzing groundwater systems, their predictive accuracy is limited. People who apply groundwater flow or solute-transport models, as well as those who make decisions based on model results, naturally want assurance that a model is "valid." To many people, model validation implies some authentication of the truth or accuracy of the model. History matching is often presented as the basis for model validation. Although such model calibration is a necessary modeling step, it is simply insufficient for model validation. Because of parameter uncertainty and solution non-uniqueness, declarations of validation (or verification) of a model are not meaningful. Post-audits represent a useful means to assess the predictive accuracy of a site-specific model, but they require the existence of long-term monitoring data. Model testing may yield invalidation, but that is an opportunity to learn and to improve the conceptual and numerical models. Examples of post-audits and of the application of a solute-transport model to a radioactive waste disposal site illustrate deficiencies in model calibration, prediction, and validation.

  20. [Research on partial least squares for determination of impurities in the presence of high concentration of matrix by ICP-AES].

    PubMed

    Wang, Yan-peng; Gong, Qi; Yu, Sheng-rong; Liu, You-yan

    2012-04-01

    A method for detecting trace impurities in high concentration matrix by ICP-AES based on partial least squares (PLS) was established. The research showed that PLS could effectively correct the interference caused by high level of matrix concentration error and could withstand higher concentrations of matrix than multicomponent spectral fitting (MSF). When the mass ratios of matrix to impurities were from 1 000 : 1 to 20 000 : 1, the recoveries of standard addition were between 95% and 105% by PLS. For the system in which interference effect has nonlinear correlation with the matrix concentrations, the prediction accuracy of normal PLS method was poor, but it can be improved greatly by using LIN-PPLS, which was based on matrix transformation of sample concentration. The contents of Co, Pb and Ga in stream sediment (GBW07312) were detected by MSF, PLS and LIN-PPLS respectively. The results showed that the prediction accuracy of LIN-PPLS was better than PLS, and the prediction accuracy of PLS was better than MSF.

  1. Limited diagnostic accuracy of magnetic resonance imaging and clinical tests for detecting partial-thickness tears of the rotator cuff.

    PubMed

    Brockmeyer, Matthias; Schmitt, Cornelia; Haupert, Alexander; Kohn, Dieter; Lorbach, Olaf

    2017-12-01

    The reliable diagnosis of partial-thickness tears of the rotator cuff is still elusive in clinical practise. Therefore, the purpose of the study was to determine the diagnostic accuracy of MR imaging and clinical tests for detecting partial-thickness tears of the rotator cuff as well as the combination of these parameters. 334 consecutive shoulder arthroscopies for rotator cuff pathologies performed during the time period between 2010 and 2012 were analyzed retrospectively for the findings of common clinical signs for rotator cuff lesions and preoperative MR imaging. These were compared with the intraoperative arthroscopic findings as "gold standard". The reports of the MR imaging were evaluated with regard to the integrity of the rotator cuff. The Ellman Classification was used to define partial-thickness tears of the rotator cuff in accordance with the arthroscopic findings. Descriptive statistics, sensitivity, specificity, positive and negative predictive value were calculated. MR imaging showed 80 partial-thickness and 70 full-thickness tears of the rotator cuff. The arthroscopic examination confirmed 64 partial-thickness tears of which 52 needed debridement or refixation of the rotator cuff. Sensitivity for MR imaging to identify partial-thickness tears was 51.6%, specificity 77.2%, positive predictive value 41.3% and negative predictive value 83.7%. For the Jobe-test, sensitivity was 64.1%, specificity 43.2%, positive predictive value 25.9% and negative predictive value 79.5%. Sensitivity for the Impingement-sign was 76.7%, specificity 46.6%, positive predictive value 30.8% and negative predictive value 86.5%. For the combination of MR imaging, Jobe-test and Impingement-sign sensitivity was 46.9%, specificity 85.4%, positive predictive value 50% and negative predictive value 83.8%. The diagnostic accuracy of MR imaging and clinical tests (Jobe-test and Impingement-sign) alone is limited for detecting partial-thickness tears of the rotator cuff. Additionally, the combination of MR imaging and clinical tests does not improve diagnostic accuracy. Level II, Diagnostic study.

  2. Long-Term Trajectories of the Development of Speech Sound Production in Pediatric Cochlear Implant Recipients

    PubMed Central

    Tomblin, J. Bruce; Peng, Shu-Chen; Spencer, Linda J.; Lu, Nelson

    2011-01-01

    Purpose This study characterized the development of speech sound production in prelingually deaf children with a minimum of 8 years of cochlear implant (CI) experience. Method Twenty-seven pediatric CI recipients' spontaneous speech samples from annual evaluation sessions were phonemically transcribed. Accuracy for these speech samples was evaluated in piecewise regression models. Results As a group, pediatric CI recipients showed steady improvement in speech sound production following implantation, but the improvement rate declined after 6 years of device experience. Piecewise regression models indicated that the slope estimating the participants' improvement rate was statistically greater than 0 during the first 6 years postimplantation, but not after 6 years. The group of pediatric CI recipients' accuracy of speech sound production after 4 years of device experience reasonably predicts their speech sound production after 5–10 years of device experience. Conclusions The development of speech sound production in prelingually deaf children stabilizes after 6 years of device experience, and typically approaches a plateau by 8 years of device use. Early growth in speech before 4 years of device experience did not predict later rates of growth or levels of achievement. However, good predictions could be made after 4 years of device use. PMID:18695018

  3. Computing exact bundle compliance control charts via probability generating functions.

    PubMed

    Chen, Binchao; Matis, Timothy; Benneyan, James

    2016-06-01

    Compliance to evidenced-base practices, individually and in 'bundles', remains an important focus of healthcare quality improvement for many clinical conditions. The exact probability distribution of composite bundle compliance measures used to develop corresponding control charts and other statistical tests is based on a fairly large convolution whose direct calculation can be computationally prohibitive. Various series expansions and other approximation approaches have been proposed, each with computational and accuracy tradeoffs, especially in the tails. This same probability distribution also arises in other important healthcare applications, such as for risk-adjusted outcomes and bed demand prediction, with the same computational difficulties. As an alternative, we use probability generating functions to rapidly obtain exact results and illustrate the improved accuracy and detection over other methods. Numerical testing across a wide range of applications demonstrates the computational efficiency and accuracy of this approach.

  4. Effects of modeled tropical sea surface temperature variability on coral reef bleaching predictions

    NASA Astrophysics Data System (ADS)

    van Hooidonk, R.; Huber, M.

    2012-03-01

    Future widespread coral bleaching and subsequent mortality has been projected using sea surface temperature (SST) data derived from global, coupled ocean-atmosphere general circulation models (GCMs). While these models possess fidelity in reproducing many aspects of climate, they vary in their ability to correctly capture such parameters as the tropical ocean seasonal cycle and El Niño Southern Oscillation (ENSO) variability. Such weaknesses most likely reduce the accuracy of predicting coral bleaching, but little attention has been paid to the important issue of understanding potential errors and biases, the interaction of these biases with trends, and their propagation in predictions. To analyze the relative importance of various types of model errors and biases in predicting coral bleaching, various intra- and inter-annual frequency bands of observed SSTs were replaced with those frequencies from 24 GCMs 20th century simulations included in the Intergovernmental Panel on Climate Change (IPCC) 4th assessment report. Subsequent thermal stress was calculated and predictions of bleaching were made. These predictions were compared with observations of coral bleaching in the period 1982-2007 to calculate accuracy using an objective measure of forecast quality, the Peirce skill score (PSS). Major findings are that: (1) predictions are most sensitive to the seasonal cycle and inter-annual variability in the ENSO 24-60 months frequency band and (2) because models tend to understate the seasonal cycle at reef locations, they systematically underestimate future bleaching. The methodology we describe can be used to improve the accuracy of bleaching predictions by characterizing the errors and uncertainties involved in the predictions.

  5. Genomic prediction of the polled and horned phenotypes in Merino sheep.

    PubMed

    Duijvesteijn, Naomi; Bolormaa, Sunduimijid; Daetwyler, Hans D; van der Werf, Julius H J

    2018-05-22

    In horned sheep breeds, breeding for polledness has been of interest for decades. The objective of this study was to improve prediction of the horned and polled phenotypes using horn scores classified as polled, scurs, knobs or horns. Derived phenotypes polled/non-polled (P/NP) and horned/non-horned (H/NH) were used to test four different strategies for prediction in 4001 purebred Merino sheep. These strategies include the use of single 'single nucleotide polymorphism' (SNP) genotypes, multiple-SNP haplotypes, genome-wide and chromosome-wide genomic best linear unbiased prediction and information from imputed sequence variants from the region including the RXFP2 gene. Low-density genotypes of these animals were imputed to the Illumina Ovine high-density (600k) chip and the 1.78-kb insertion polymorphism in RXFP2 was included in the imputation process to whole-genome sequence. We evaluated the mode of inheritance and validated models by a fivefold cross-validation and across- and between-family prediction. The most significant SNPs for prediction of P/NP and H/NH were OAR10_29546872.1 and OAR10_29458450, respectively, located on chromosome 10 close to the 1.78-kb insertion at 29.5 Mb. The mode of inheritance included an additive effect and a sex-dependent effect for dominance for P/NP and a sex-dependent additive and dominance effect for H/NH. Models with the highest prediction accuracies for H/NH used either single SNPs or 3-SNP haplotypes and included a polygenic effect estimated based on traditional pedigree relationships. Prediction accuracies for H/NH were 0.323 for females and 0.725 for males. For predicting P/NP, the best models were the same as for H/NH but included a genomic relationship matrix with accuracies of 0.713 for females and 0.620 for males. Our results show that prediction accuracy is high using a single SNP, but does not reach 1 since the causative mutation is not genotyped. Incomplete penetrance or allelic heterogeneity, which can influence expression of the phenotype, may explain why prediction accuracy did not approach 1 with any of the genetic models tested here. Nevertheless, a breeding program to eradicate horns from Merino sheep can be effective by selecting genotypes GG of SNP OAR10_29458450 or TT of SNP OAR10_29546872.1 since all sheep with these genotypes will be non-horned.

  6. Lessons in molecular recognition. 2. Assessing and improving cross-docking accuracy.

    PubMed

    Sutherland, Jeffrey J; Nandigam, Ravi K; Erickson, Jon A; Vieth, Michal

    2007-01-01

    Docking methods are used to predict the manner in which a ligand binds to a protein receptor. Many studies have assessed the success rate of programs in self-docking tests, whereby a ligand is docked into the protein structure from which it was extracted. Cross-docking, or using a protein structure from a complex containing a different ligand, provides a more realistic assessment of a docking program's ability to reproduce X-ray results. In this work, cross-docking was performed with CDocker, Fred, and Rocs using multiple X-ray structures for eight proteins (two kinases, one nuclear hormone receptor, one serine protease, two metalloproteases, and two phosphodiesterases). While average cross-docking accuracy is not encouraging, it is shown that using the protein structure from the complex that contains the bound ligand most similar to the docked ligand increases docking accuracy for all methods ("similarity selection"). Identifying the most successful protein conformer ("best selection") and similarity selection substantially reduce the difference between self-docking and average cross-docking accuracy. We identify universal predictors of docking accuracy (i.e., showing consistent behavior across most protein-method combinations), and show that models for predicting docking accuracy built using these parameters can be used to select the most appropriate docking method.

  7. Developing an in silico minimum inhibitory concentration panel test for Klebsiella pneumoniae

    DOE PAGES

    Nguyen, Marcus; Brettin, Thomas; Long, S. Wesley; ...

    2018-01-11

    Here, antimicrobial resistant infections are a serious public health threat worldwide. Whole genome sequencing approaches to rapidly identify pathogens and predict antibiotic resistance phenotypes are becoming more feasible and may offer a way to reduce clinical test turnaround times compared to conventional culture-based methods, and in turn, improve patient outcomes. In this study, we use whole genome sequence data from 1668 clinical isolates of Klebsiella pneumoniae to develop a XGBoost-based machine learning model that accurately predicts minimum inhibitory concentrations (MICs) for 20 antibiotics. The overall accuracy of the model, within ± 1 two-fold dilution factor, is 92%. Individual accuracies aremore » >= 90% for 15/20 antibiotics. We show that the MICs predicted by the model correlate with known antimicrobial resistance genes. Importantly, the genome-wide approach described in this study offers a way to predict MICs for isolates without knowledge of the underlying gene content. This study shows that machine learning can be used to build a complete in silico MIC prediction panel for K. pneumoniae and provides a framework for building MIC prediction models for other pathogenic bacteria.« less

  8. Developing an in silico minimum inhibitory concentration panel test for Klebsiella pneumoniae

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nguyen, Marcus; Brettin, Thomas; Long, S. Wesley

    Here, antimicrobial resistant infections are a serious public health threat worldwide. Whole genome sequencing approaches to rapidly identify pathogens and predict antibiotic resistance phenotypes are becoming more feasible and may offer a way to reduce clinical test turnaround times compared to conventional culture-based methods, and in turn, improve patient outcomes. In this study, we use whole genome sequence data from 1668 clinical isolates of Klebsiella pneumoniae to develop a XGBoost-based machine learning model that accurately predicts minimum inhibitory concentrations (MICs) for 20 antibiotics. The overall accuracy of the model, within ± 1 two-fold dilution factor, is 92%. Individual accuracies aremore » >= 90% for 15/20 antibiotics. We show that the MICs predicted by the model correlate with known antimicrobial resistance genes. Importantly, the genome-wide approach described in this study offers a way to predict MICs for isolates without knowledge of the underlying gene content. This study shows that machine learning can be used to build a complete in silico MIC prediction panel for K. pneumoniae and provides a framework for building MIC prediction models for other pathogenic bacteria.« less

  9. An automated decision-tree approach to predicting protein interaction hot spots.

    PubMed

    Darnell, Steven J; Page, David; Mitchell, Julie C

    2007-09-01

    Protein-protein interactions can be altered by mutating one or more "hot spots," the subset of residues that account for most of the interface's binding free energy. The identification of hot spots requires a significant experimental effort, highlighting the practical value of hot spot predictions. We present two knowledge-based models that improve the ability to predict hot spots: K-FADE uses shape specificity features calculated by the Fast Atomic Density Evaluation (FADE) program, and K-CON uses biochemical contact features. The combined K-FADE/CON (KFC) model displays better overall predictive accuracy than computational alanine scanning (Robetta-Ala). In addition, because these methods predict different subsets of known hot spots, a large and significant increase in accuracy is achieved by combining KFC and Robetta-Ala. The KFC analysis is applied to the calmodulin (CaM)/smooth muscle myosin light chain kinase (smMLCK) interface, and to the bone morphogenetic protein-2 (BMP-2)/BMP receptor-type I (BMPR-IA) interface. The results indicate a strong correlation between KFC hot spot predictions and mutations that significantly reduce the binding affinity of the interface. 2007 Wiley-Liss, Inc.

  10. Short-term memory predictions across the lifespan: monitoring span before and after conducting a task.

    PubMed

    Bertrand, Julie Marilyne; Moulin, Chris John Anthony; Souchay, Céline

    2017-05-01

    Our objective was to explore metamemory in short-term memory across the lifespan. Five age groups participated in this study: 3 groups of children (4-13 years old), and younger and older adults. We used a three-phase task: prediction-span-postdiction. For prediction and postdiction phases, participants reported with a Yes/No response if they could recall in order a series of images. For the span task, they had to actually recall such series. From 4 years old, children have some ability to monitor their short-term memory and are able to adjust their prediction after experiencing the task. However, accuracy still improves significantly until adolescence. Although the older adults had a lower span, they were as accurate as young adults in their evaluation, suggesting that metamemory is unimpaired for short-term memory tasks in older adults. •We investigate metamemory for short-term memory tasks across the lifespan. •We find younger children cannot accurately predict their span length. •Older adults are accurate in predicting their span length. •People's metamemory accuracy was related to their short-term memory span.

  11. Accuracy of genomic breeding values for meat tenderness in Polled Nellore cattle.

    PubMed

    Magnabosco, C U; Lopes, F B; Fragoso, R C; Eifert, E C; Valente, B D; Rosa, G J M; Sainz, R D

    2016-07-01

    Zebu () cattle, mostly of the Nellore breed, comprise more than 80% of the beef cattle in Brazil, given their tolerance of the tropical climate and high resistance to ectoparasites. Despite their advantages for production in tropical environments, zebu cattle tend to produce tougher meat than Bos taurus breeds. Traditional genetic selection to improve meat tenderness is constrained by the difficulty and cost of phenotypic evaluation for meat quality. Therefore, genomic selection may be the best strategy to improve meat quality traits. This study was performed to compare the accuracies of different Bayesian regression models in predicting molecular breeding values for meat tenderness in Polled Nellore cattle. The data set was composed of Warner-Bratzler shear force (WBSF) of longissimus muscle from 205, 141, and 81 animals slaughtered in 2005, 2010, and 2012, respectively, which were selected and mated so as to create extreme segregation for WBSF. The animals were genotyped with either the Illumina BovineHD (HD; 777,000 from 90 samples) chip or the GeneSeek Genomic Profiler (GGP Indicus HD; 77,000 from 337 samples). The quality controls of SNP were Hard-Weinberg Proportion -value ≥ 0.1%, minor allele frequency > 1%, and call rate > 90%. The FImpute program was used for imputation from the GGP Indicus HD chip to the HD chip. The effect of each SNP was estimated using ridge regression, least absolute shrinkage and selection operator (LASSO), Bayes A, Bayes B, and Bayes Cπ methods. Different numbers of SNP were used, with 1, 2, 3, 4, 5, 7, 10, 20, 40, 60, 80, or 100% of the markers preselected based on their significance test (-value from genomewide association studies [GWAS]) or randomly sampled. The prediction accuracy was assessed by the correlation between genomic breeding value and the observed WBSF phenotype, using a leave-one-out cross-validation methodology. The prediction accuracies using all markers were all very similar for all models, ranging from 0.22 (Bayes Cπ) to 0.25 (Bayes B). When preselecting SNP based on GWAS results, the highest correlation (0.27) between WBSF and the genomic breeding value was achieved using the Bayesian LASSO model with 15,030 (3%) markers. Although this study used relatively few animals, the design of the segregating population ensured wide genetic variability for meat tenderness, which was important to achieve acceptable accuracy of genomic prediction. Although all models showed similar levels of prediction accuracy, some small advantages were observed with the Bayes B approach when higher numbers of markers were preselected based on their -values resulting from a GWAS analysis.

  12. Assessment of a virtual functional prototyping process for the rapid manufacture of passive-dynamic ankle-foot orthoses.

    PubMed

    Schrank, Elisa S; Hitch, Lester; Wallace, Kevin; Moore, Richard; Stanhope, Steven J

    2013-10-01

    Passive-dynamic ankle-foot orthosis (PD-AFO) bending stiffness is a key functional characteristic for achieving enhanced gait function. However, current orthosis customization methods inhibit objective premanufacture tuning of the PD-AFO bending stiffness, making optimization of orthosis function challenging. We have developed a novel virtual functional prototyping (VFP) process, which harnesses the strengths of computer aided design (CAD) model parameterization and finite element analysis, to quantitatively tune and predict the functional characteristics of a PD-AFO, which is rapidly manufactured via fused deposition modeling (FDM). The purpose of this study was to assess the VFP process for PD-AFO bending stiffness. A PD-AFO CAD model was customized for a healthy subject and tuned to four bending stiffness values via VFP. Two sets of each tuned model were fabricated via FDM using medical-grade polycarbonate (PC-ISO). Dimensional accuracy of the fabricated orthoses was excellent (average 0.51 ± 0.39 mm). Manufacturing precision ranged from 0.0 to 0.74 Nm/deg (average 0.30 ± 0.36 Nm/deg). Bending stiffness prediction accuracy was within 1 Nm/deg using the manufacturer provided PC-ISO elastic modulus (average 0.48 ± 0.35 Nm/deg). Using an experimentally derived PC-ISO elastic modulus improved the optimized bending stiffness prediction accuracy (average 0.29 ± 0.57 Nm/deg). Robustness of the derived modulus was tested by carrying out the VFP process for a disparate subject, tuning the PD-AFO model to five bending stiffness values. For this disparate subject, bending stiffness prediction accuracy was strong (average 0.20 ± 0.14 Nm/deg). Overall, the VFP process had excellent dimensional accuracy, good manufacturing precision, and strong prediction accuracy with the derived modulus. Implementing VFP as part of our PD-AFO customization and manufacturing framework, which also includes fit customization, provides a novel and powerful method to predictably tune and precisely manufacture orthoses with objectively customized fit and functional characteristics.

  13. Genomic prediction in a nuclear population of layers using single-step models.

    PubMed

    Yan, Yiyuan; Wu, Guiqin; Liu, Aiqiao; Sun, Congjiao; Han, Wenpeng; Li, Guangqi; Yang, Ning

    2018-02-01

    Single-step genomic prediction method has been proposed to improve the accuracy of genomic prediction by incorporating information of both genotyped and ungenotyped animals. The objective of this study is to compare the prediction performance of single-step model with a 2-step models and the pedigree-based models in a nuclear population of layers. A total of 1,344 chickens across 4 generations were genotyped by a 600 K SNP chip. Four traits were analyzed, i.e., body weight at 28 wk (BW28), egg weight at 28 wk (EW28), laying rate at 38 wk (LR38), and Haugh unit at 36 wk (HU36). In predicting offsprings, individuals from generation 1 to 3 were used as training data and females from generation 4 were used as validation set. The accuracies of predicted breeding values by pedigree BLUP (PBLUP), genomic BLUP (GBLUP), SSGBLUP and single-step blending (SSBlending) were compared for both genotyped and ungenotyped individuals. For genotyped females, GBLUP performed no better than PBLUP because of the small size of training data, while the 2 single-step models predicted more accurately than the PBLUP model. The average predictive ability of SSGBLUP and SSBlending were 16.0% and 10.8% higher than the PBLUP model across traits, respectively. Furthermore, the predictive abilities for ungenotyped individuals were also enhanced. The average improvements of prediction abilities were 5.9% and 1.5% for SSGBLUP and SSBlending model, respectively. It was concluded that single-step models, especially the SSGBLUP model, can yield more accurate prediction of genetic merits and are preferable for practical implementation of genomic selection in layers. © 2017 Poultry Science Association Inc.

  14. Improving Density Functional Tight Binding Predictions of Free Energy Surfaces for Slow Chemical Reactions in Solution

    NASA Astrophysics Data System (ADS)

    Kroonblawd, Matthew; Goldman, Nir

    2017-06-01

    First principles molecular dynamics using highly accurate density functional theory (DFT) is a common tool for predicting chemistry, but the accessible time and space scales are often orders of magnitude beyond the resolution of experiments. Semi-empirical methods such as density functional tight binding (DFTB) offer up to a thousand-fold reduction in required CPU hours and can approach experimental scales. However, standard DFTB parameter sets lack good transferability and calibration for a particular system is usually necessary. Force matching the pairwise repulsive energy term in DFTB to short DFT trajectories can improve the former's accuracy for reactions that are fast relative to DFT simulation times (<10 ps), but the effects on slow reactions and the free energy surface are not well-known. We present a force matching approach to improve the chemical accuracy of DFTB. Accelerated sampling techniques are combined with path collective variables to generate the reference DFT data set and validate fitted DFTB potentials. Accuracy of force-matched DFTB free energy surfaces is assessed for slow peptide-forming reactions by direct comparison to DFT for particular paths. Extensions to model prebiotic chemistry under shock conditions are discussed. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.

  15. Diagnostic accuracy and effectiveness of automated electronic sepsis alert systems: A systematic review.

    PubMed

    Makam, Anil N; Nguyen, Oanh K; Auerbach, Andrew D

    2015-06-01

    Although timely treatment of sepsis improves outcomes, delays in administering evidence-based therapies are common. To determine whether automated real-time electronic sepsis alerts can: (1) accurately identify sepsis and (2) improve process measures and outcomes. We systematically searched MEDLINE, Embase, The Cochrane Library, and Cumulative Index to Nursing and Allied Health Literature from database inception through June 27, 2014. Included studies that empirically evaluated 1 or both of the prespecified objectives. Two independent reviewers extracted data and assessed the risk of bias. Diagnostic accuracy of sepsis identification was measured by sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and likelihood ratio (LR). Effectiveness was assessed by changes in sepsis care process measures and outcomes. Of 1293 citations, 8 studies met inclusion criteria, 5 for the identification of sepsis (n = 35,423) and 5 for the effectiveness of sepsis alerts (n = 6894). Though definition of sepsis alert thresholds varied, most included systemic inflammatory response syndrome criteria ± evidence of shock. Diagnostic accuracy varied greatly, with PPV ranging from 20.5% to 53.8%, NPV 76.5% to 99.7%, LR+ 1.2 to 145.8, and LR- 0.06 to 0.86. There was modest evidence for improvement in process measures (ie, antibiotic escalation), but only among patients in non-critical care settings; there were no corresponding improvements in mortality or length of stay. Minimal data were reported on potential harms due to false positive alerts. Automated sepsis alerts derived from electronic health data may improve care processes but tend to have poor PPV and do not improve mortality or length of stay. © 2015 Society of Hospital Medicine.

  16. Multimarker assessment for the prediction of renal function improvement after percutaneous revascularization for renal artery stenosis

    PubMed Central

    Partovi, Sasan; Zeller, Thomas; Breidthardt, Tobias; Kaech, Max; Boeddinghaus, Jasper; Puelacher, Christian; Nestelberger, Thomas; Aschwanden, Markus; Mueller, Christian

    2016-01-01

    Background Identifying patients likely to have improved renal function after percutaneous transluminal renal angioplasty and stenting (PTRA) for renal artery stenosis (RAS) is challenging. The purpose of this study was to use a comprehensive multimarker assessment to identify those patients who would benefit most from correction of RAS. Methods In 127 patients with RAS and decreased renal function and/or hypertension referred for PTRA, quantification of hemodynamic cardiac stress using B-type natriuretic peptide (BNP), renal function using estimated glomerular filtration rate (eGFR), parenchymal renal damage using resistance index (RI), and systemic inflammation using C-reactive protein (CRP) were performed before intervention. Results Predefined renal function improvement (increase in eGFR ≥10%) at 6 months occurred in 37% of patients. Prognostic accuracy as quantified by the area under the receiver-operating characteristics curve for the ability of BNP, eGFR, RI and CRP to predict renal function improvement were 0.59 (95% CI, 0.48–0.70), 0.71 (95% CI, 0.61–0.81), 0.52 (95% CI, 0.41–0.65), and 0.56 (95% CI, 0.44–0.68), respectively. None of the possible combinations increased the accuracy provided by eGFR (lower eGFR indicated a higher likelihood for eGFR improvement after PTRA, P=ns for all). In the subgroup of 56 patients with pre-interventional eGFR <60 mL/min/1.73 m2, similar findings were obtained. Conclusions Quantification of renal function, but not any other pathophysiologic signal, provides at least moderate accuracy in the identification of patients with RAS in whom PTRA will improve renal function. PMID:27280085

  17. Diagnostic Accuracy and Effectiveness of Automated Electronic Sepsis Alert Systems: A Systematic Review

    PubMed Central

    Makam, Anil N.; Nguyen, Oanh K.; Auerbach, Andrew D.

    2015-01-01

    Background Although timely treatment of sepsis improves outcomes, delays in administering evidence-based therapies are common. Purpose To determine whether automated real-time electronic sepsis alerts can: 1) accurately identify sepsis, and 2) improve process measures and outcomes. Data Sources We systematically searched MEDLINE, Embase, The Cochrane Library, and CINAHL from database inception through June 27, 2014. Study Selection Included studies that empirically evaluated one or both of the prespecified objectives. Data Extraction Two independent reviewers extracted data and assessed the risk of bias. Diagnostic accuracy of sepsis identification was measured by sensitivity, specificity, positive (PPV) and negative predictive values (NPV) and likelihood ratios (LR). Effectiveness was assessed by changes in sepsis care process measures and outcomes. Data Synthesis Of 1,293 citations, 8 studies met inclusion criteria, 5 for the identification of sepsis (n=35,423) and 5 for the effectiveness of sepsis alerts (n=6,894). Though definition of sepsis alert thresholds varied, most included systemic inflammatory response syndrome criteria ± evidence of shock. Diagnostic accuracy varied greatly, with PPV ranging from 20.5-53.8%, NPV 76.5-99.7%; LR+ 1.2-145.8; and LR- 0.06-0.86. There was modest evidence for improvement in process measures (i.e., antibiotic escalation), but only among patients in non-critical care settings; there were no corresponding improvements in mortality or length of stay. Minimal data were reported on potential harms due to false positive alerts. Conclusions Automated sepsis alerts derived from electronic health data may improve care processes but tend to have poor positive predictive value and do not improve mortality or length of stay. PMID:25758641

  18. SELF-RATED EXPECTATIONS OF SUICIDAL BEHAVIOR PREDICT FUTURE SUICIDE ATTEMPTS AMONG ADOLESCENT AND YOUNG ADULT PSYCHIATRIC EMERGENCY PATIENTS.

    PubMed

    Czyz, Ewa K; Horwitz, Adam G; King, Cheryl A

    2016-06-01

    This study's purpose was to examine the predictive validity and clinical utility of a brief measure assessing youths' own expectations of their future risk of suicidal behavior, administered in a psychiatric emergency (PE) department; and determine if youths' ratings improve upon a clinician-administered assessment of suicidal ideation severity. The outcome was suicide attempts up to 18 months later. In this medical record review study, 340 consecutively presenting youths (ages 13-24) seeking PE services over a 7-month period were included. Subsequent PE visits and suicide attempts were retrospectively tracked for up to 18 months. The 3-item scale assessing patients' perception of their own suicidal behavior risk and the clinician-administered ideation severity scale were used routinely at the study site. Cox regression results showed that youths' expectations of suicidal behavior were independently associated with increased risk of suicide attempts, even after adjusting for key covariates. Results were not moderated by sex, suicide attempt history, or age. Receiver-operating characteristic (ROC) analyses indicated that self-assessed expectations of risk improved the predictive accuracy of the clinician-administered suicidal ideation measure. Youths' ratings indicative of lower confidence in maintaining safety uniquely predicted follow-up attempts and provided incremental validity over and above the clinician-administered assessment and improved its accuracy, suggesting their potential for augmenting suicide risk formulation. Assessing youths' own perceptions of suicide risk appears to be clinically useful, feasible to implement in PE settings, and, if replicated, promising for improving identification of youth at risk for suicidal behavior. © 2016 Wiley Periodicals, Inc.

  19. Prediction of apoptosis protein locations with genetic algorithms and support vector machines through a new mode of pseudo amino acid composition.

    PubMed

    Kandaswamy, Krishna Kumar; Pugalenthi, Ganesan; Möller, Steffen; Hartmann, Enno; Kalies, Kai-Uwe; Suganthan, P N; Martinetz, Thomas

    2010-12-01

    Apoptosis is an essential process for controlling tissue homeostasis by regulating a physiological balance between cell proliferation and cell death. The subcellular locations of proteins performing the cell death are determined by mostly independent cellular mechanisms. The regular bioinformatics tools to predict the subcellular locations of such apoptotic proteins do often fail. This work proposes a model for the sorting of proteins that are involved in apoptosis, allowing us to both the prediction of their subcellular locations as well as the molecular properties that contributed to it. We report a novel hybrid Genetic Algorithm (GA)/Support Vector Machine (SVM) approach to predict apoptotic protein sequences using 119 sequence derived properties like frequency of amino acid groups, secondary structure, and physicochemical properties. GA is used for selecting a near-optimal subset of informative features that is most relevant for the classification. Jackknife cross-validation is applied to test the predictive capability of the proposed method on 317 apoptosis proteins. Our method achieved 85.80% accuracy using all 119 features and 89.91% accuracy for 25 features selected by GA. Our models were examined by a test dataset of 98 apoptosis proteins and obtained an overall accuracy of 90.34%. The results show that the proposed approach is promising; it is able to select small subsets of features and still improves the classification accuracy. Our model can contribute to the understanding of programmed cell death and drug discovery. The software and dataset are available at http://www.inb.uni-luebeck.de/tools-demos/apoptosis/GASVM.

  20. Classification of drug molecules considering their IC50 values using mixed-integer linear programming based hyper-boxes method.

    PubMed

    Armutlu, Pelin; Ozdemir, Muhittin E; Uney-Yuksektepe, Fadime; Kavakli, I Halil; Turkay, Metin

    2008-10-03

    A priori analysis of the activity of drugs on the target protein by computational approaches can be useful in narrowing down drug candidates for further experimental tests. Currently, there are a large number of computational methods that predict the activity of drugs on proteins. In this study, we approach the activity prediction problem as a classification problem and, we aim to improve the classification accuracy by introducing an algorithm that combines partial least squares regression with mixed-integer programming based hyper-boxes classification method, where drug molecules are classified as low active or high active regarding their binding activity (IC50 values) on target proteins. We also aim to determine the most significant molecular descriptors for the drug molecules. We first apply our approach by analyzing the activities of widely known inhibitor datasets including Acetylcholinesterase (ACHE), Benzodiazepine Receptor (BZR), Dihydrofolate Reductase (DHFR), Cyclooxygenase-2 (COX-2) with known IC50 values. The results at this stage proved that our approach consistently gives better classification accuracies compared to 63 other reported classification methods such as SVM, Naïve Bayes, where we were able to predict the experimentally determined IC50 values with a worst case accuracy of 96%. To further test applicability of this approach we first created dataset for Cytochrome P450 C17 inhibitors and then predicted their activities with 100% accuracy. Our results indicate that this approach can be utilized to predict the inhibitory effects of inhibitors based on their molecular descriptors. This approach will not only enhance drug discovery process, but also save time and resources committed.

  1. A prediction model of compressor with variable-geometry diffuser based on elliptic equation and partial least squares

    PubMed Central

    Yang, Chuanlei; Wang, Yinyan; Wang, Hechun

    2018-01-01

    To achieve a much more extensive intake air flow range of the diesel engine, a variable-geometry compressor (VGC) is introduced into a turbocharged diesel engine. However, due to the variable diffuser vane angle (DVA), the prediction for the performance of the VGC becomes more difficult than for a normal compressor. In the present study, a prediction model comprising an elliptical equation and a PLS (partial least-squares) model was proposed to predict the performance of the VGC. The speed lines of the pressure ratio map and the efficiency map were fitted with the elliptical equation, and the coefficients of the elliptical equation were introduced into the PLS model to build the polynomial relationship between the coefficients and the relative speed, the DVA. Further, the maximal order of the polynomial was investigated in detail to reduce the number of sub-coefficients and achieve acceptable fit accuracy simultaneously. The prediction model was validated with sample data and in order to present the superiority of compressor performance prediction, the prediction results of this model were compared with those of the look-up table and back-propagation neural networks (BPNNs). The validation and comparison results show that the prediction accuracy of the new developed model is acceptable, and this model is much more suitable than the look-up table and the BPNN methods under the same condition in VGC performance prediction. Moreover, the new developed prediction model provides a novel and effective prediction solution for the VGC and can be used to improve the accuracy of the thermodynamic model for turbocharged diesel engines in the future. PMID:29410849

  2. A prediction model of compressor with variable-geometry diffuser based on elliptic equation and partial least squares.

    PubMed

    Li, Xu; Yang, Chuanlei; Wang, Yinyan; Wang, Hechun

    2018-01-01

    To achieve a much more extensive intake air flow range of the diesel engine, a variable-geometry compressor (VGC) is introduced into a turbocharged diesel engine. However, due to the variable diffuser vane angle (DVA), the prediction for the performance of the VGC becomes more difficult than for a normal compressor. In the present study, a prediction model comprising an elliptical equation and a PLS (partial least-squares) model was proposed to predict the performance of the VGC. The speed lines of the pressure ratio map and the efficiency map were fitted with the elliptical equation, and the coefficients of the elliptical equation were introduced into the PLS model to build the polynomial relationship between the coefficients and the relative speed, the DVA. Further, the maximal order of the polynomial was investigated in detail to reduce the number of sub-coefficients and achieve acceptable fit accuracy simultaneously. The prediction model was validated with sample data and in order to present the superiority of compressor performance prediction, the prediction results of this model were compared with those of the look-up table and back-propagation neural networks (BPNNs). The validation and comparison results show that the prediction accuracy of the new developed model is acceptable, and this model is much more suitable than the look-up table and the BPNN methods under the same condition in VGC performance prediction. Moreover, the new developed prediction model provides a novel and effective prediction solution for the VGC and can be used to improve the accuracy of the thermodynamic model for turbocharged diesel engines in the future.

  3. Combining classifiers to predict gene function in Arabidopsis thaliana using large-scale gene expression measurements.

    PubMed

    Lan, Hui; Carson, Rachel; Provart, Nicholas J; Bonner, Anthony J

    2007-09-21

    Arabidopsis thaliana is the model species of current plant genomic research with a genome size of 125 Mb and approximately 28,000 genes. The function of half of these genes is currently unknown. The purpose of this study is to infer gene function in Arabidopsis using machine-learning algorithms applied to large-scale gene expression data sets, with the goal of identifying genes that are potentially involved in plant response to abiotic stress. Using in house and publicly available data, we assembled a large set of gene expression measurements for A. thaliana. Using those genes of known function, we first evaluated and compared the ability of basic machine-learning algorithms to predict which genes respond to stress. Predictive accuracy was measured using ROC50 and precision curves derived through cross validation. To improve accuracy, we developed a method for combining these classifiers using a weighted-voting scheme. The combined classifier was then trained on genes of known function and applied to genes of unknown function, identifying genes that potentially respond to stress. Visual evidence corroborating the predictions was obtained using electronic Northern analysis. Three of the predicted genes were chosen for biological validation. Gene knockout experiments confirmed that all three are involved in a variety of stress responses. The biological analysis of one of these genes (At1g16850) is presented here, where it is shown to be necessary for the normal response to temperature and NaCl. Supervised learning methods applied to large-scale gene expression measurements can be used to predict gene function. However, the ability of basic learning methods to predict stress response varies widely and depends heavily on how much dimensionality reduction is used. Our method of combining classifiers can improve the accuracy of such predictions - in this case, predictions of genes involved in stress response in plants - and it effectively chooses the appropriate amount of dimensionality reduction automatically. The method provides a useful means of identifying genes in A. thaliana that potentially respond to stress, and we expect it would be useful in other organisms and for other gene functions.

  4. Word associations contribute to machine learning in automatic scoring of degree of emotional tones in dream reports.

    PubMed

    Amini, Reza; Sabourin, Catherine; De Koninck, Joseph

    2011-12-01

    Scientific study of dreams requires the most objective methods to reliably analyze dream content. In this context, artificial intelligence should prove useful for an automatic and non subjective scoring technique. Past research has utilized word search and emotional affiliation methods, to model and automatically match human judges' scoring of dream report's negative emotional tone. The current study added word associations to improve the model's accuracy. Word associations were established using words' frequency of co-occurrence with their defining words as found in a dictionary and an encyclopedia. It was hypothesized that this addition would facilitate the machine learning model and improve its predictability beyond those of previous models. With a sample of 458 dreams, this model demonstrated an improvement in accuracy from 59% to 63% (kappa=.485) on the negative emotional tone scale, and for the first time reached an accuracy of 77% (kappa=.520) on the positive scale. Copyright © 2011 Elsevier Inc. All rights reserved.

  5. Multicategory reclassification statistics for assessing improvements in diagnostic accuracy

    PubMed Central

    Li, Jialiang; Jiang, Binyan; Fine, Jason P.

    2013-01-01

    In this paper, we extend the definitions of the net reclassification improvement (NRI) and the integrated discrimination improvement (IDI) in the context of multicategory classification. Both measures were proposed in Pencina and others (2008. Evaluating the added predictive ability of a new marker: from area under the receiver operating characteristic (ROC) curve to reclassification and beyond. Statistics in Medicine 27, 157–172) as numeric characterizations of accuracy improvement for binary diagnostic tests and were shown to have certain advantage over analyses based on ROC curves or other regression approaches. Estimation and inference procedures for the multiclass NRI and IDI are provided in this paper along with necessary asymptotic distributional results. Simulations are conducted to study the finite-sample properties of the proposed estimators. Two medical examples are considered to illustrate our methodology. PMID:23197381

  6. An improved thermoregulatory model for cooling garment applications with transient metabolic rates

    NASA Astrophysics Data System (ADS)

    Westin, Johan K.

    Current state-of-the-art thermoregulatory models do not predict body temperatures with the accuracies that are required for the development of automatic cooling control in liquid cooling garment (LCG) systems. Automatic cooling control would be beneficial in a variety of space, aviation, military, and industrial environments for optimizing cooling efficiency, for making LCGs as portable and practical as possible, for alleviating the individual from manual cooling control, and for improving thermal comfort and cognitive performance. In this study, we adopt the Fiala thermoregulatory model, which has previously demonstrated state-of-the-art predictive abilities in air environments, for use in LCG environments. We validate the numerical formulation with analytical solutions to the bioheat equation, and find our model to be accurate and stable with a variety of different grid configurations. We then compare the thermoregulatory model's tissue temperature predictions with experimental data where individuals, equipped with an LCG, exercise according to a 700 W rectangular type activity schedule. The root mean square (RMS) deviation between the model response and the mean experimental group response is 0.16°C for the rectal temperature and 0.70°C for the mean skin temperature, which is within state-of-the-art variations. However, with a mean absolute body heat storage error 3¯ BHS of 9.7 W˙h, the model fails to satisfy the +/-6.5 W˙h accuracy that is required for the automatic LCG cooling control development. In order to improve model predictions, we modify the blood flow dynamics of the thermoregulatory model. Instead of using step responses to changing requirements, we introduce exponential responses to the muscle blood flow and the vasoconstriction command. We find that such modifications have an insignificant effect on temperature predictions. However, a new vasoconstriction dependency, i.e. the rate of change of hypothalamus temperature weighted by the hypothalamus error signal (DeltaThy˙ dThy/dt), proves to be an important signal that governs the thermoregulatory response during conditions of simultaneously increasing core and decreasing skin temperatures, which is a common scenario in LCG environments. With the new ?DeltaThy˙dThy /dt dependency in the vasoconstriction command, the 3¯ BHS for the exercise period is reduced by 59% (from 12.9 W˙h to 5.2 W˙h). Even though the new 3¯ BHS of 5.8 W˙h for the total activity schedule is within the target accuracy of +/-6.5 W˙h, 3¯ BHS fails to stay within the target accuracy during the entire activity schedule. With additional improvements to the central blood pool formulation, the LCG boundary condition, and the agreement between model set-points and actual experimental initial conditions, it seems possible to achieve the strict accuracy that is needed for automatic cooling control development.

  7. Prognostic scores in oesophageal or gastric variceal bleeding.

    PubMed

    Ohmann, C; Stöltzing, H; Wins, L; Busch, E; Thon, K

    1990-05-01

    Numerous scoring systems have been developed for the prediction of outcome of variceal bleeding; however, only a few have been evaluated adequately. The object of this study was to improve the classical Child-Pugh score (CPS) and to test other scores from the literature. Patients (n = 82) with endoscopically confirmed variceal bleeding and long-term sclerotherapy were included in the study. Linear logistic regression (LR) was applied to different sets of prognostic variables with regard to 30-day mortality. In addition, scores from the literature were evaluated on the data set. Performance was measured by the accuracy and receiver-operating characteristic curves. The application of LR to all five CPS variables (accuracy, 80%) was superior to the classical CPS (70%). LR with selection from the CPS variables or from other sets of variables resulted in no improvement. Compared with CPS only three scores from the literature, mainly based on subsets of the CPS variables, showed an improved accuracy. It is concluded that CPS is still a good scoring system; however, it can be improved by statistical analysis using the same variables.

  8. Including non-additive genetic effects in Bayesian methods for the prediction of genetic values based on genome-wide markers

    PubMed Central

    2011-01-01

    Background Molecular marker information is a common source to draw inferences about the relationship between genetic and phenotypic variation. Genetic effects are often modelled as additively acting marker allele effects. The true mode of biological action can, of course, be different from this plain assumption. One possibility to better understand the genetic architecture of complex traits is to include intra-locus (dominance) and inter-locus (epistasis) interaction of alleles as well as the additive genetic effects when fitting a model to a trait. Several Bayesian MCMC approaches exist for the genome-wide estimation of genetic effects with high accuracy of genetic value prediction. Including pairwise interaction for thousands of loci would probably go beyond the scope of such a sampling algorithm because then millions of effects are to be estimated simultaneously leading to months of computation time. Alternative solving strategies are required when epistasis is studied. Methods We extended a fast Bayesian method (fBayesB), which was previously proposed for a purely additive model, to include non-additive effects. The fBayesB approach was used to estimate genetic effects on the basis of simulated datasets. Different scenarios were simulated to study the loss of accuracy of prediction, if epistatic effects were not simulated but modelled and vice versa. Results If 23 QTL were simulated to cause additive and dominance effects, both fBayesB and a conventional MCMC sampler BayesB yielded similar results in terms of accuracy of genetic value prediction and bias of variance component estimation based on a model including additive and dominance effects. Applying fBayesB to data with epistasis, accuracy could be improved by 5% when all pairwise interactions were modelled as well. The accuracy decreased more than 20% if genetic variation was spread over 230 QTL. In this scenario, accuracy based on modelling only additive and dominance effects was generally superior to that of the complex model including epistatic effects. Conclusions This simulation study showed that the fBayesB approach is convenient for genetic value prediction. Jointly estimating additive and non-additive effects (especially dominance) has reasonable impact on the accuracy of prediction and the proportion of genetic variation assigned to the additive genetic source. PMID:21867519

  9. Prediction of hot regions in protein-protein interaction by combining density-based incremental clustering with feature-based classification.

    PubMed

    Hu, Jing; Zhang, Xiaolong; Liu, Xiaoming; Tang, Jinshan

    2015-06-01

    Discovering hot regions in protein-protein interaction is important for drug and protein design, while experimental identification of hot regions is a time-consuming and labor-intensive effort; thus, the development of predictive models can be very helpful. In hot region prediction research, some models are based on structure information, and others are based on a protein interaction network. However, the prediction accuracy of these methods can still be improved. In this paper, a new method is proposed for hot region prediction, which combines density-based incremental clustering with feature-based classification. The method uses density-based incremental clustering to obtain rough hot regions, and uses feature-based classification to remove the non-hot spot residues from the rough hot regions. Experimental results show that the proposed method significantly improves the prediction performance of hot regions. Copyright © 2015 Elsevier Ltd. All rights reserved.

  10. Improved intact soil-core carbon determination applying regression shrinkage and variable selection techniques to complete spectrum laser-induced breakdown spectroscopy (LIBS).

    PubMed

    Bricklemyer, Ross S; Brown, David J; Turk, Philip J; Clegg, Sam M

    2013-10-01

    Laser-induced breakdown spectroscopy (LIBS) provides a potential method for rapid, in situ soil C measurement. In previous research on the application of LIBS to intact soil cores, we hypothesized that ultraviolet (UV) spectrum LIBS (200-300 nm) might not provide sufficient elemental information to reliably discriminate between soil organic C (SOC) and inorganic C (IC). In this study, using a custom complete spectrum (245-925 nm) core-scanning LIBS instrument, we analyzed 60 intact soil cores from six wheat fields. Predictive multi-response partial least squares (PLS2) models using full and reduced spectrum LIBS were compared for directly determining soil total C (TC), IC, and SOC. Two regression shrinkage and variable selection approaches, the least absolute shrinkage and selection operator (LASSO) and sparse multivariate regression with covariance estimation (MRCE), were tested for soil C predictions and the identification of wavelengths important for soil C prediction. Using complete spectrum LIBS for PLS2 modeling reduced the calibration standard error of prediction (SEP) 15 and 19% for TC and IC, respectively, compared to UV spectrum LIBS. The LASSO and MRCE approaches provided significantly improved calibration accuracy and reduced SEP 32-55% over UV spectrum PLS2 models. We conclude that (1) complete spectrum LIBS is superior to UV spectrum LIBS for predicting soil C for intact soil cores without pretreatment; (2) LASSO and MRCE approaches provide improved calibration prediction accuracy over PLS2 but require additional testing with increased soil and target analyte diversity; and (3) measurement errors associated with analyzing intact cores (e.g., sample density and surface roughness) require further study and quantification.

  11. MS2PIP prediction server: compute and visualize MS2 peak intensity predictions for CID and HCD fragmentation.

    PubMed

    Degroeve, Sven; Maddelein, Davy; Martens, Lennart

    2015-07-01

    We present an MS(2) peak intensity prediction server that computes MS(2) charge 2+ and 3+ spectra from peptide sequences for the most common fragment ions. The server integrates the Unimod public domain post-translational modification database for modified peptides. The prediction model is an improvement of the previously published MS(2)PIP model for Orbitrap-LTQ CID spectra. Predicted MS(2) spectra can be downloaded as a spectrum file and can be visualized in the browser for comparisons with observations. In addition, we added prediction models for HCD fragmentation (Q-Exactive Orbitrap) and show that these models compute accurate intensity predictions on par with CID performance. We also show that training prediction models for CID and HCD separately improves the accuracy for each fragmentation method. The MS(2)PIP prediction server is accessible from http://iomics.ugent.be/ms2pip. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  12. Building Energy Simulation Test for Existing Homes (BESTEST-EX) (Presentation)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Judkoff, R.; Neymark, J.; Polly, B.

    2011-12-01

    This presentation discusses the goals of NREL Analysis Accuracy R&D; BESTEST-EX goals; what BESTEST-EX is; how it works; 'Building Physics' cases; 'Building Physics' reference results; 'utility bill calibration' cases; limitations and potential future work. Goals of NREL Analysis Accuracy R&D are: (1) Provide industry with the tools and technical information needed to improve the accuracy and consistency of analysis methods; (2) Reduce the risks associated with purchasing, financing, and selling energy efficiency upgrades; and (3) Enhance software and input collection methods considering impacts on accuracy, cost, and time of energy assessments. BESTEST-EX Goals are: (1) Test software predictions of retrofitmore » energy savings in existing homes; (2) Ensure building physics calculations and utility bill calibration procedures perform up to a minimum standard; and (3) Quantify impact of uncertainties in input audit data and occupant behavior. BESTEST-EX is a repeatable procedure that tests how well audit software predictions compare to the current state of the art in building energy simulation. There is no direct truth standard. However, reference software have been subjected to validation testing, including comparisons with empirical data.« less

  13. Improved Modeling of Open Waveguide Aperture Radiators for use in Conformal Antenna Arrays

    NASA Astrophysics Data System (ADS)

    Nelson, Gregory James

    Open waveguide apertures have been used as radiating elements in conformal arrays. Individual radiating element model patterns are used in constructing overall array models. The existing models for these aperture radiating elements may not accurately predict the array pattern for TEM waves which are not on boresight for each radiating element. In particular, surrounding structures can affect the far field patterns of these apertures, which ultimately affects the overall array pattern. New models of open waveguide apertures are developed here with the goal of accounting for the surrounding structure effects on the aperture far field patterns such that the new models make accurate pattern predictions. These aperture patterns (both E plane and H plane) are measured in an anechoic chamber and the manner in which they deviate from existing model patterns are studied. Using these measurements as a basis, existing models for both E and H planes are updated with new factors and terms which allow the prediction of far field open waveguide aperture patterns with improved accuracy. These new and improved individual radiator models are then used to predict overall conformal array patterns. Arrays of open waveguide apertures are constructed and measured in a similar fashion to the individual aperture measurements. These measured array patterns are compared with the newly modeled array patterns to verify the improved accuracy of the new models as compared with the performance of existing models in making array far field pattern predictions. The array pattern lobe characteristics are then studied for predicting fully circularly conformal arrays of varying radii. The lobe metrics that are tracked are angular location and magnitude as the radii of the conformal arrays are varied. A constructed, measured array that is close to conforming to a circular surface is compared with a fully circularly conformal modeled array pattern prediction, with the predicted lobe angular locations and magnitudes tracked, plotted and tabulated. The close match between the patterns of the measured array and the modeled circularly conformal array verifies the validity of the modeled circularly conformal array pattern predictions.

  14. Real-time and simultaneous control of artificial limbs based on pattern recognition algorithms.

    PubMed

    Ortiz-Catalan, Max; Håkansson, Bo; Brånemark, Rickard

    2014-07-01

    The prediction of simultaneous limb motions is a highly desirable feature for the control of artificial limbs. In this work, we investigate different classification strategies for individual and simultaneous movements based on pattern recognition of myoelectric signals. Our results suggest that any classifier can be potentially employed in the prediction of simultaneous movements if arranged in a distributed topology. On the other hand, classifiers inherently capable of simultaneous predictions, such as the multi-layer perceptron (MLP), were found to be more cost effective, as they can be successfully employed in their simplest form. In the prediction of individual movements, the one-vs-one (OVO) topology was found to improve classification accuracy across different classifiers and it was therefore used to benchmark the benefits of simultaneous control. As opposed to previous work reporting only offline accuracy, the classification performance and the resulting controllability are evaluated in real time using the motion test and target achievement control (TAC) test, respectively. We propose a simultaneous classification strategy based on MLP that outperformed a top classifier for individual movements (LDA-OVO), thus improving the state-of-the-art classification approach. Furthermore, all the presented classification strategies and data collected in this study are freely available in BioPatRec, an open source platform for the development of advanced prosthetic control strategies.

  15. Predicting treatment outcome of drug-susceptible tuberculosis patients using machine-learning models.

    PubMed

    Hussain, Owais A; Junejo, Khurum N

    2018-02-20

    Tuberculosis (TB) is a deadly contagious disease and a serious global health problem. It is curable but due to its lengthy treatment process, a patient is likely to leave the treatment incomplete, leading to a more lethal, drug resistant form of disease. The World Health Organization (WHO) propagates Directly Observed Therapy Short-course (DOTS) as an effective way to stop the spread of TB in communities with a high burden. But DOTS also adds a significant burden on the financial feasibility of the program. We aim to facilitate TB programs by predicting the outcome of the treatment of a particular patient at the start of treatment so that their health workers can be utilized in a targeted and cost-effective way. The problem was modeled as a classification problem, and the outcome of treatment was predicted using state-of-art implementations of 3 machine learning algorithms. 4213 patients were evaluated, out of which 64.37% completed their treatment. Results were evaluated using 4 performance measures; accuracy, precision, sensitivity, and specificity. The models offer an improvement of more than 12% accuracy over the baseline prediction. Empirical results also revealed some insights to improve TB programs. Overall, our proposed methodology will may help teams running TB programs manage their human resources more effectively, thus saving more lives.

  16. New support vector machine-based method for microRNA target prediction.

    PubMed

    Li, L; Gao, Q; Mao, X; Cao, Y

    2014-06-09

    MicroRNA (miRNA) plays important roles in cell differentiation, proliferation, growth, mobility, and apoptosis. An accurate list of precise target genes is necessary in order to fully understand the importance of miRNAs in animal development and disease. Several computational methods have been proposed for miRNA target-gene identification. However, these methods still have limitations with respect to their sensitivity and accuracy. Thus, we developed a new miRNA target-prediction method based on the support vector machine (SVM) model. The model supplies information of two binding sites (primary and secondary) for a radial basis function kernel as a similarity measure for SVM features. The information is categorized based on structural, thermodynamic, and sequence conservation. Using high-confidence datasets selected from public miRNA target databases, we obtained a human miRNA target SVM classifier model with high performance and provided an efficient tool for human miRNA target gene identification. Experiments have shown that our method is a reliable tool for miRNA target-gene prediction, and a successful application of an SVM classifier. Compared with other methods, the method proposed here improves the sensitivity and accuracy of miRNA prediction. Its performance can be further improved by providing more training examples.

  17. EMUDRA: Ensemble of Multiple Drug Repositioning Approaches to Improve Prediction Accuracy.

    PubMed

    Zhou, Xianxiao; Wang, Minghui; Katsyv, Igor; Irie, Hanna; Zhang, Bin

    2018-04-24

    Availability of large-scale genomic, epigenetic and proteomic data in complex diseases makes it possible to objectively and comprehensively identify therapeutic targets that can lead to new therapies. The Connectivity Map has been widely used to explore novel indications of existing drugs. However, the prediction accuracy of the existing methods, such as Kolmogorov-Smirnov statistic remains low. Here we present a novel high-performance drug repositioning approach that improves over the state-of-the-art methods. We first designed an expression weighted cosine method (EWCos) to minimize the influence of the uninformative expression changes and then developed an ensemble approach termed EMUDRA (Ensemble of Multiple Drug Repositioning Approaches) to integrate EWCos and three existing state-of-the-art methods. EMUDRA significantly outperformed individual drug repositioning methods when applied to simulated and independent evaluation datasets. We predicted using EMUDRA and experimentally validated an antibiotic rifabutin as an inhibitor of cell growth in triple negative breast cancer. EMUDRA can identify drugs that more effectively target disease gene signatures and will thus be a useful tool for identifying novel therapies for complex diseases and predicting new indications for existing drugs. The EMUDRA R package is available at doi:10.7303/syn11510888. bin.zhang@mssm.edu or zhangb@hotmail.com. Supplementary data are available at Bioinformatics online.

  18. A hybrid approach to survival model building using integration of clinical and molecular information in censored data.

    PubMed

    Choi, Ickwon; Kattan, Michael W; Wells, Brian J; Yu, Changhong

    2012-01-01

    In medical society, the prognostic models, which use clinicopathologic features and predict prognosis after a certain treatment, have been externally validated and used in practice. In recent years, most research has focused on high dimensional genomic data and small sample sizes. Since clinically similar but molecularly heterogeneous tumors may produce different clinical outcomes, the combination of clinical and genomic information, which may be complementary, is crucial to improve the quality of prognostic predictions. However, there is a lack of an integrating scheme for clinic-genomic models due to the P ≥ N problem, in particular, for a parsimonious model. We propose a methodology to build a reduced yet accurate integrative model using a hybrid approach based on the Cox regression model, which uses several dimension reduction techniques, L₂ penalized maximum likelihood estimation (PMLE), and resampling methods to tackle the problem. The predictive accuracy of the modeling approach is assessed by several metrics via an independent and thorough scheme to compare competing methods. In breast cancer data studies on a metastasis and death event, we show that the proposed methodology can improve prediction accuracy and build a final model with a hybrid signature that is parsimonious when integrating both types of variables.

  19. Incorporation of satellite remote sensing pan-sharpened imagery into digital soil prediction and mapping models to characterize soil property variability in small agricultural fields

    NASA Astrophysics Data System (ADS)

    Xu, Yiming; Smith, Scot E.; Grunwald, Sabine; Abd-Elrahman, Amr; Wani, Suhas P.

    2017-01-01

    Soil prediction models based on spectral indices from some multispectral images are too coarse to characterize spatial pattern of soil properties in small and heterogeneous agricultural lands. Image pan-sharpening has seldom been utilized in Digital Soil Mapping research before. This research aimed to analyze the effects of pan-sharpened (PAN) remote sensing spectral indices on soil prediction models in smallholder farm settings. This research fused the panchromatic band and multispectral (MS) bands of WorldView-2, GeoEye-1, and Landsat 8 images in a village in Southern India by Brovey, Gram-Schmidt and Intensity-Hue-Saturation methods. Random Forest was utilized to develop soil total nitrogen (TN) and soil exchangeable potassium (Kex) prediction models by incorporating multiple spectral indices from the PAN and MS images. Overall, our results showed that PAN remote sensing spectral indices have similar spectral characteristics with soil TN and Kex as MS remote sensing spectral indices. There is no soil prediction model incorporating the specific type of pan-sharpened spectral indices always had the strongest prediction capability of soil TN and Kex. The incorporation of pan-sharpened remote sensing spectral data not only increased the spatial resolution of the soil prediction maps, but also enhanced the prediction accuracy of soil prediction models. Small farms with limited footprint, fragmented ownership and diverse crop cycle should benefit greatly from the pan-sharpened high spatial resolution imagery for soil property mapping. Our results show that multiple high and medium resolution images can be used to map soil properties suggesting the possibility of an improvement in the maps' update frequency. Additionally, the results should benefit the large agricultural community through the reduction of routine soil sampling cost and improved prediction accuracy.

  20. Genomic Prediction of Testcross Performance in Canola (Brassica napus)

    PubMed Central

    Jan, Habib U.; Abbadi, Amine; Lücke, Sophie; Nichols, Richard A.; Snowdon, Rod J.

    2016-01-01

    Genomic selection (GS) is a modern breeding approach where genome-wide single-nucleotide polymorphism (SNP) marker profiles are simultaneously used to estimate performance of untested genotypes. In this study, the potential of genomic selection methods to predict testcross performance for hybrid canola breeding was applied for various agronomic traits based on genome-wide marker profiles. A total of 475 genetically diverse spring-type canola pollinator lines were genotyped at 24,403 single-copy, genome-wide SNP loci. In parallel, the 950 F1 testcross combinations between the pollinators and two representative testers were evaluated for a number of important agronomic traits including seedling emergence, days to flowering, lodging, oil yield and seed yield along with essential seed quality characters including seed oil content and seed glucosinolate content. A ridge-regression best linear unbiased prediction (RR-BLUP) model was applied in combination with 500 cross-validations for each trait to predict testcross performance, both across the whole population as well as within individual subpopulations or clusters, based solely on SNP profiles. Subpopulations were determined using multidimensional scaling and K-means clustering. Genomic prediction accuracy across the whole population was highest for seed oil content (0.81) followed by oil yield (0.75) and lowest for seedling emergence (0.29). For seed yieId, seed glucosinolate, lodging resistance and days to onset of flowering (DTF), prediction accuracies were 0.45, 0.61, 0.39 and 0.56, respectively. Prediction accuracies could be increased for some traits by treating subpopulations separately; a strategy which only led to moderate improvements for some traits with low heritability, like seedling emergence. No useful or consistent increase in accuracy was obtained by inclusion of a population substructure covariate in the model. Testcross performance prediction using genome-wide SNP markers shows considerable potential for pre-selection of promising hybrid combinations prior to resource-intensive field testing over multiple locations and years. PMID:26824924

Top