Chae, Jin Seok; Park, Jin; So, Wi-Young
2017-07-28
The purpose of this study was to suggest a ranking prediction model using the competition record of the Ladies Professional Golf Association (LPGA) players. The top 100 players on the tour money list from the 2013-2016 US Open were analyzed in this model. Stepwise regression analysis was conducted to examine the effect of performance and independent variables (i.e., driving accuracy, green in regulation, putts per round, driving distance, percentage of sand saves, par-3 average, par-4 average, par-5 average, birdies average, and eagle average) on dependent variables (i.e., scoring average, official money, top-10 finishes, winning percentage, and 60-strokes average). The following prediction model was suggested:Y (Scoring average) = 55.871 - 0.947 (Birdies average) + 4.576 (Par-4 average) - 0.028 (Green in regulation) - 0.012 (Percentage of sand saves) + 2.088 (Par-3 average) - 0.026 (Driving accuracy) - 0.017 (Driving distance) + 0.085 (Putts per round)Y (Official money) = 6628736.723 + 528557.907 (Birdies average) - 1831800.821 (Par-4 average) + 11681.739 (Green in regulation) + 6476.344 (Percentage of sand saves) - 688115.074 (Par-3 average) + 7375.971 (Driving accuracy)Y (Top-10 finish%) = 204.462 + 12.562 (Birdies average) - 47.745 (Par-4 average) + 1.633 (Green in regulation) - 5.151 (Putts per round) + 0.132 (Percentage of sand saves)Y (Winning percentage) = 49.949 + 3.191 (Birdies average) - 15.023 (Par-4 average) + 0.043 (Percentage of sand saves)Y (60-strokes average) = 217.649 + 13.978 (Birdies average) - 44.855 (Par-4 average) - 22.433 (Par-3 average) + 0.16 (Green in regulation)Scoring of the above five prediction models and the prediction of golf ranking in the 2016 Women's Golf Olympic competition in Rio revealed a significant correlation between the predicted and real ranking (r = 0.689, p < 0.001) and between the predicted and the real average score (r = 0.653, p < 0.001). Our ranking prediction model using LPGA data may help coaches and players to identify which players are likely to participate in Olympic and World competitions, based on their performance.
Wang, Xueyi; Davidson, Nicholas J.
2011-01-01
Ensemble methods have been widely used to improve prediction accuracy over individual classifiers. In this paper, we achieve a few results about the prediction accuracies of ensemble methods for binary classification that are missed or misinterpreted in previous literature. First we show the upper and lower bounds of the prediction accuracies (i.e. the best and worst possible prediction accuracies) of ensemble methods. Next we show that an ensemble method can achieve > 0.5 prediction accuracy, while individual classifiers have < 0.5 prediction accuracies. Furthermore, for individual classifiers with different prediction accuracies, the average of the individual accuracies determines the upper and lower bounds. We perform two experiments to verify the results and show that it is hard to achieve the upper and lower bounds accuracies by random individual classifiers and better algorithms need to be developed. PMID:21853162
Using simple artificial intelligence methods for predicting amyloidogenesis in antibodies
2010-01-01
Background All polypeptide backbones have the potential to form amyloid fibrils, which are associated with a number of degenerative disorders. However, the likelihood that amyloidosis would actually occur under physiological conditions depends largely on the amino acid composition of a protein. We explore using a naive Bayesian classifier and a weighted decision tree for predicting the amyloidogenicity of immunoglobulin sequences. Results The average accuracy based on leave-one-out (LOO) cross validation of a Bayesian classifier generated from 143 amyloidogenic sequences is 60.84%. This is consistent with the average accuracy of 61.15% for a holdout test set comprised of 103 AM and 28 non-amyloidogenic sequences. The LOO cross validation accuracy increases to 81.08% when the training set is augmented by the holdout test set. In comparison, the average classification accuracy for the holdout test set obtained using a decision tree is 78.64%. Non-amyloidogenic sequences are predicted with average LOO cross validation accuracies between 74.05% and 77.24% using the Bayesian classifier, depending on the training set size. The accuracy for the holdout test set was 89%. For the decision tree, the non-amyloidogenic prediction accuracy is 75.00%. Conclusions This exploratory study indicates that both classification methods may be promising in providing straightforward predictions on the amyloidogenicity of a sequence. Nevertheless, the number of available sequences that satisfy the premises of this study are limited, and are consequently smaller than the ideal training set size. Increasing the size of the training set clearly increases the accuracy, and the expansion of the training set to include not only more derivatives, but more alignments, would make the method more sound. The accuracy of the classifiers may also be improved when additional factors, such as structural and physico-chemical data, are considered. The development of this type of classifier has significant applications in evaluating engineered antibodies, and may be adapted for evaluating engineered proteins in general. PMID:20144194
Using simple artificial intelligence methods for predicting amyloidogenesis in antibodies.
David, Maria Pamela C; Concepcion, Gisela P; Padlan, Eduardo A
2010-02-08
All polypeptide backbones have the potential to form amyloid fibrils, which are associated with a number of degenerative disorders. However, the likelihood that amyloidosis would actually occur under physiological conditions depends largely on the amino acid composition of a protein. We explore using a naive Bayesian classifier and a weighted decision tree for predicting the amyloidogenicity of immunoglobulin sequences. The average accuracy based on leave-one-out (LOO) cross validation of a Bayesian classifier generated from 143 amyloidogenic sequences is 60.84%. This is consistent with the average accuracy of 61.15% for a holdout test set comprised of 103 AM and 28 non-amyloidogenic sequences. The LOO cross validation accuracy increases to 81.08% when the training set is augmented by the holdout test set. In comparison, the average classification accuracy for the holdout test set obtained using a decision tree is 78.64%. Non-amyloidogenic sequences are predicted with average LOO cross validation accuracies between 74.05% and 77.24% using the Bayesian classifier, depending on the training set size. The accuracy for the holdout test set was 89%. For the decision tree, the non-amyloidogenic prediction accuracy is 75.00%. This exploratory study indicates that both classification methods may be promising in providing straightforward predictions on the amyloidogenicity of a sequence. Nevertheless, the number of available sequences that satisfy the premises of this study are limited, and are consequently smaller than the ideal training set size. Increasing the size of the training set clearly increases the accuracy, and the expansion of the training set to include not only more derivatives, but more alignments, would make the method more sound. The accuracy of the classifiers may also be improved when additional factors, such as structural and physico-chemical data, are considered. The development of this type of classifier has significant applications in evaluating engineered antibodies, and may be adapted for evaluating engineered proteins in general.
Protein contact prediction using patterns of correlation.
Hamilton, Nicholas; Burrage, Kevin; Ragan, Mark A; Huber, Thomas
2004-09-01
We describe a new method for using neural networks to predict residue contact pairs in a protein. The main inputs to the neural network are a set of 25 measures of correlated mutation between all pairs of residues in two "windows" of size 5 centered on the residues of interest. While the individual pair-wise correlations are a relatively weak predictor of contact, by training the network on windows of correlation the accuracy of prediction is significantly improved. The neural network is trained on a set of 100 proteins and then tested on a disjoint set of 1033 proteins of known structure. An average predictive accuracy of 21.7% is obtained taking the best L/2 predictions for each protein, where L is the sequence length. Taking the best L/10 predictions gives an average accuracy of 30.7%. The predictor is also tested on a set of 59 proteins from the CASP5 experiment. The accuracy is found to be relatively consistent across different sequence lengths, but to vary widely according to the secondary structure. Predictive accuracy is also found to improve by using multiple sequence alignments containing many sequences to calculate the correlations. Copyright 2004 Wiley-Liss, Inc.
Accuracies of univariate and multivariate genomic prediction models in African cassava.
Okeke, Uche Godfrey; Akdemir, Deniz; Rabbi, Ismail; Kulakow, Peter; Jannink, Jean-Luc
2017-12-04
Genomic selection (GS) promises to accelerate genetic gain in plant breeding programs especially for crop species such as cassava that have long breeding cycles. Practically, to implement GS in cassava breeding, it is necessary to evaluate different GS models and to develop suitable models for an optimized breeding pipeline. In this paper, we compared (1) prediction accuracies from a single-trait (uT) and a multi-trait (MT) mixed model for a single-environment genetic evaluation (Scenario 1), and (2) accuracies from a compound symmetric multi-environment model (uE) parameterized as a univariate multi-kernel model to a multivariate (ME) multi-environment mixed model that accounts for genotype-by-environment interaction for multi-environment genetic evaluation (Scenario 2). For these analyses, we used 16 years of public cassava breeding data for six target cassava traits and a fivefold cross-validation scheme with 10-repeat cycles to assess model prediction accuracies. In Scenario 1, the MT models had higher prediction accuracies than the uT models for all traits and locations analyzed, which amounted to on average a 40% improved prediction accuracy. For Scenario 2, we observed that the ME model had on average (across all locations and traits) a 12% improved prediction accuracy compared to the uE model. We recommend the use of multivariate mixed models (MT and ME) for cassava genetic evaluation. These models may be useful for other plant species.
Evaluating the accuracy of SHAPE-directed RNA secondary structure predictions
Sükösd, Zsuzsanna; Swenson, M. Shel; Kjems, Jørgen; Heitsch, Christine E.
2013-01-01
Recent advances in RNA structure determination include using data from high-throughput probing experiments to improve thermodynamic prediction accuracy. We evaluate the extent and nature of improvements in data-directed predictions for a diverse set of 16S/18S ribosomal sequences using a stochastic model of experimental SHAPE data. The average accuracy for 1000 data-directed predictions always improves over the original minimum free energy (MFE) structure. However, the amount of improvement varies with the sequence, exhibiting a correlation with MFE accuracy. Further analysis of this correlation shows that accurate MFE base pairs are typically preserved in a data-directed prediction, whereas inaccurate ones are not. Thus, the positive predictive value of common base pairs is consistently higher than the directed prediction accuracy. Finally, we confirm sequence dependencies in the directability of thermodynamic predictions and investigate the potential for greater accuracy improvements in the worst performing test sequence. PMID:23325843
Schrank, Elisa S; Hitch, Lester; Wallace, Kevin; Moore, Richard; Stanhope, Steven J
2013-10-01
Passive-dynamic ankle-foot orthosis (PD-AFO) bending stiffness is a key functional characteristic for achieving enhanced gait function. However, current orthosis customization methods inhibit objective premanufacture tuning of the PD-AFO bending stiffness, making optimization of orthosis function challenging. We have developed a novel virtual functional prototyping (VFP) process, which harnesses the strengths of computer aided design (CAD) model parameterization and finite element analysis, to quantitatively tune and predict the functional characteristics of a PD-AFO, which is rapidly manufactured via fused deposition modeling (FDM). The purpose of this study was to assess the VFP process for PD-AFO bending stiffness. A PD-AFO CAD model was customized for a healthy subject and tuned to four bending stiffness values via VFP. Two sets of each tuned model were fabricated via FDM using medical-grade polycarbonate (PC-ISO). Dimensional accuracy of the fabricated orthoses was excellent (average 0.51 ± 0.39 mm). Manufacturing precision ranged from 0.0 to 0.74 Nm/deg (average 0.30 ± 0.36 Nm/deg). Bending stiffness prediction accuracy was within 1 Nm/deg using the manufacturer provided PC-ISO elastic modulus (average 0.48 ± 0.35 Nm/deg). Using an experimentally derived PC-ISO elastic modulus improved the optimized bending stiffness prediction accuracy (average 0.29 ± 0.57 Nm/deg). Robustness of the derived modulus was tested by carrying out the VFP process for a disparate subject, tuning the PD-AFO model to five bending stiffness values. For this disparate subject, bending stiffness prediction accuracy was strong (average 0.20 ± 0.14 Nm/deg). Overall, the VFP process had excellent dimensional accuracy, good manufacturing precision, and strong prediction accuracy with the derived modulus. Implementing VFP as part of our PD-AFO customization and manufacturing framework, which also includes fit customization, provides a novel and powerful method to predictably tune and precisely manufacture orthoses with objectively customized fit and functional characteristics.
Prediction of Spirometric Forced Expiratory Volume (FEV1) Data Using Support Vector Regression
NASA Astrophysics Data System (ADS)
Kavitha, A.; Sujatha, C. M.; Ramakrishnan, S.
2010-01-01
In this work, prediction of forced expiratory volume in 1 second (FEV1) in pulmonary function test is carried out using the spirometer and support vector regression analysis. Pulmonary function data are measured with flow volume spirometer from volunteers (N=175) using a standard data acquisition protocol. The acquired data are then used to predict FEV1. Support vector machines with polynomial kernel function with four different orders were employed to predict the values of FEV1. The performance is evaluated by computing the average prediction accuracy for normal and abnormal cases. Results show that support vector machines are capable of predicting FEV1 in both normal and abnormal cases and the average prediction accuracy for normal subjects was higher than that of abnormal subjects. Accuracy in prediction was found to be high for a regularization constant of C=10. Since FEV1 is the most significant parameter in the analysis of spirometric data, it appears that this method of assessment is useful in diagnosing the pulmonary abnormalities with incomplete data and data with poor recording.
Bolormaa, S; Pryce, J E; Kemper, K; Savin, K; Hayes, B J; Barendse, W; Zhang, Y; Reich, C M; Mason, B A; Bunch, R J; Harrison, B E; Reverter, A; Herd, R M; Tier, B; Graser, H-U; Goddard, M E
2013-07-01
The aim of this study was to assess the accuracy of genomic predictions for 19 traits including feed efficiency, growth, and carcass and meat quality traits in beef cattle. The 10,181 cattle in our study had real or imputed genotypes for 729,068 SNP although not all cattle were measured for all traits. Animals included Bos taurus, Brahman, composite, and crossbred animals. Genomic EBV (GEBV) were calculated using 2 methods of genomic prediction [BayesR and genomic BLUP (GBLUP)] either using a common training dataset for all breeds or using a training dataset comprising only animals of the same breed. Accuracies of GEBV were assessed using 5-fold cross-validation. The accuracy of genomic prediction varied by trait and by method. Traits with a large number of recorded and genotyped animals and with high heritability gave the greatest accuracy of GEBV. Using GBLUP, the average accuracy was 0.27 across traits and breeds, but the accuracies between breeds and between traits varied widely. When the training population was restricted to animals from the same breed as the validation population, GBLUP accuracies declined by an average of 0.04. The greatest decline in accuracy was found for the 4 composite breeds. The BayesR accuracies were greater by an average of 0.03 than GBLUP accuracies, particularly for traits with known genes of moderate to large effect mutations segregating. The accuracies of 0.43 to 0.48 for IGF-I traits were among the greatest in the study. Although accuracies are low compared with those observed in dairy cattle, genomic selection would still be beneficial for traits that are hard to improve by conventional selection, such as tenderness and residual feed intake. BayesR identified many of the same quantitative trait loci as a genomewide association study but appeared to map them more precisely. All traits appear to be highly polygenic with thousands of SNP independently associated with each trait.
NASA Astrophysics Data System (ADS)
Rahaman, Md. Mashiur; Islam, Hafizul; Islam, Md. Tariqul; Khondoker, Md. Reaz Hasan
2017-12-01
Maneuverability and resistance prediction with suitable accuracy is essential for optimum ship design and propulsion power prediction. This paper aims at providing some of the maneuverability characteristics of a Japanese bulk carrier model, JBC in calm water using a computational fluid dynamics solver named SHIP Motion and OpenFOAM. The solvers are based on the Reynolds average Navier-Stokes method (RaNS) and solves structured grid using the Finite Volume Method (FVM). This paper comprises the numerical results of calm water test for the JBC model with available experimental results. The calm water test results include the total drag co-efficient, average sinkage, and trim data. Visualization data for pressure distribution on the hull surface and free water surface have also been included. The paper concludes that the presented solvers predict the resistance and maneuverability characteristics of the bulk carrier with reasonable accuracy utilizing minimum computational resources.
Zhang, Chu; Liu, Fei; He, Yong
2018-02-01
Hyperspectral imaging was used to identify and to visualize the coffee bean varieties. Spectral preprocessing of pixel-wise spectra was conducted by different methods, including moving average smoothing (MA), wavelet transform (WT) and empirical mode decomposition (EMD). Meanwhile, spatial preprocessing of the gray-scale image at each wavelength was conducted by median filter (MF). Support vector machine (SVM) models using full sample average spectra and pixel-wise spectra, and the selected optimal wavelengths by second derivative spectra all achieved classification accuracy over 80%. Primarily, the SVM models using pixel-wise spectra were used to predict the sample average spectra, and these models obtained over 80% of the classification accuracy. Secondly, the SVM models using sample average spectra were used to predict pixel-wise spectra, but achieved with lower than 50% of classification accuracy. The results indicated that WT and EMD were suitable for pixel-wise spectra preprocessing. The use of pixel-wise spectra could extend the calibration set, and resulted in the good prediction results for pixel-wise spectra and sample average spectra. The overall results indicated the effectiveness of using spectral preprocessing and the adoption of pixel-wise spectra. The results provided an alternative way of data processing for applications of hyperspectral imaging in food industry.
NASA Astrophysics Data System (ADS)
Magnuson, Brian
A proof-of-concept software-in-the-loop study is performed to assess the accuracy of predicted net and charge-gaining energy consumption for potential effective use in optimizing powertrain management of hybrid vehicles. With promising results of improving fuel efficiency of a thermostatic control strategy for a series, plug-ing, hybrid-electric vehicle by 8.24%, the route and speed prediction machine learning algorithms are redesigned and implemented for real- world testing in a stand-alone C++ code-base to ingest map data, learn and predict driver habits, and store driver data for fast startup and shutdown of the controller or computer used to execute the compiled algorithm. Speed prediction is performed using a multi-layer, multi-input, multi- output neural network using feed-forward prediction and gradient descent through back- propagation training. Route prediction utilizes a Hidden Markov Model with a recurrent forward algorithm for prediction and multi-dimensional hash maps to store state and state distribution constraining associations between atomic road segments and end destinations. Predicted energy is calculated using the predicted time-series speed and elevation profile over the predicted route and the road-load equation. Testing of the code-base is performed over a known road network spanning 24x35 blocks on the south hill of Spokane, Washington. A large set of training routes are traversed once to add randomness to the route prediction algorithm, and a subset of the training routes, testing routes, are traversed to assess the accuracy of the net and charge-gaining predicted energy consumption. Each test route is traveled a random number of times with varying speed conditions from traffic and pedestrians to add randomness to speed prediction. Prediction data is stored and analyzed in a post process Matlab script. The aggregated results and analysis of all traversals of all test routes reflect the performance of the Driver Prediction algorithm. The error of average energy gained through charge-gaining events is 31.3% and the error of average net energy consumed is 27.3%. The average delta and average standard deviation of the delta of predicted energy gained through charge-gaining events is 0.639 and 0.601 Wh respectively for individual time-series calculations. Similarly, the average delta and average standard deviation of the delta of the predicted net energy consumed is 0.567 and 0.580 Wh respectively for individual time-series calculations. The average delta and standard deviation of the delta of the predicted speed is 1.60 and 1.15 respectively also for the individual time-series measurements. The percentage of accuracy of route prediction is 91%. Overall, test routes are traversed 151 times for a total test distance of 276.4 km.
ERIC Educational Resources Information Center
Morris, Darrell; Pennell, Ashley M.; Perney, Jan; Trathen, Woodrow
2018-01-01
This study compared reading rate to reading fluency (as measured by a rating scale). After listening to first graders read short passages, we assigned an overall fluency rating (low, average, or high) to each reading. We then used predictive discriminant analyses to determine which of five measures--accuracy, rate (objective); accuracy, phrasing,…
Improved accuracy of intraocular lens power calculation with the Zeiss IOLMaster.
Olsen, Thomas
2007-02-01
This study aimed to demonstrate how the level of accuracy in intraocular lens (IOL) power calculation can be improved with optical biometry using partial optical coherence interferometry (PCI) (Zeiss IOLMaster) and current anterior chamber depth (ACD) prediction algorithms. Intraocular lens power in 461 consecutive cataract operations was calculated using both PCI and ultrasound and the accuracy of the results of each technique were compared. To illustrate the importance of ACD prediction per se, predictions were calculated using both a recently published 5-variable method and the Haigis 2-variable method and the results compared. All calculations were optimized in retrospect to account for systematic errors, including IOL constants and other off-set errors. The average absolute IOL prediction error (observed minus expected refraction) was 0.65 dioptres with ultrasound and 0.43 D with PCI using the 5-variable ACD prediction method (p < 0.00001). The number of predictions within +/- 0.5 D, +/- 1.0 D and +/- 2.0 D of the expected outcome was 62.5%, 92.4% and 99.9% with PCI, compared with 45.5%, 77.3% and 98.4% with ultrasound, respectively (p < 0.00001). The 2-variable ACD method resulted in an average error in PCI predictions of 0.46 D, which was significantly higher than the error in the 5-variable method (p < 0.001). The accuracy of IOL power calculation can be significantly improved using calibrated axial length readings obtained with PCI and modern IOL power calculation formulas incorporating the latest generation ACD prediction algorithms.
On the accuracy of ERS-1 orbit predictions
NASA Technical Reports Server (NTRS)
Koenig, Rolf; Li, H.; Massmann, Franz-Heinrich; Raimondo, J. C.; Rajasenan, C.; Reigber, C.
1993-01-01
Since the launch of ERS-1, the D-PAF (German Processing and Archiving Facility) provides regularly orbit predictions for the worldwide SLR (Satellite Laser Ranging) tracking network. The weekly distributed orbital elements are so called tuned IRV's and tuned SAO-elements. The tuning procedure, designed to improve the accuracy of the recovery of the orbit at the stations, is discussed based on numerical results. This shows that tuning of elements is essential for ERS-1 with the currently applied tracking procedures. The orbital elements are updated by daily distributed time bias functions. The generation of the time bias function is explained. Problems and numerical results are presented. The time bias function increases the prediction accuracy considerably. Finally, the quality assessment of ERS-1 orbit predictions is described. The accuracy is compiled for about 250 days since launch. The average accuracy lies in the range of 50-100 ms and has considerably improved.
NASA Astrophysics Data System (ADS)
Qian, Xiaoshan
2018-01-01
The traditional model of evaporation process parameters have continuity and cumulative characteristics of the prediction error larger issues, based on the basis of the process proposed an adaptive particle swarm neural network forecasting method parameters established on the autoregressive moving average (ARMA) error correction procedure compensated prediction model to predict the results of the neural network to improve prediction accuracy. Taking a alumina plant evaporation process to analyze production data validation, and compared with the traditional model, the new model prediction accuracy greatly improved, can be used to predict the dynamic process of evaporation of sodium aluminate solution components.
Indirect Validation of Probe Speed Data on Arterial Corridors
DOE Office of Scientific and Technical Information (OSTI.GOV)
Eshragh, Sepideh; Young, Stanley E.; Sharifi, Elham
This study aimed to estimate the accuracy of probe speed data on arterial corridors on the basis of roadway geometric attributes and functional classification. It was assumed that functional class (medium and low) along with other road characteristics (such as weighted average of the annual average daily traffic, average signal density, average access point density, and average speed) were available as correlation factors to estimate the accuracy of probe traffic data. This study tested these factors as predictors of the fidelity of probe traffic data by using the results of an extensive validation exercise. This study showed strong correlations betweenmore » these geometric attributes and the accuracy of probe data when they were assessed by using average absolute speed error. Linear models were regressed to existing data to estimate appropriate models for medium- and low-type arterial corridors. The proposed models for medium- and low-type arterials were validated further on the basis of the results of a slowdown analysis. These models can be used to predict the accuracy of probe data indirectly in medium and low types of arterial corridors.« less
Analysis of energy-based algorithms for RNA secondary structure prediction
2012-01-01
Background RNA molecules play critical roles in the cells of organisms, including roles in gene regulation, catalysis, and synthesis of proteins. Since RNA function depends in large part on its folded structures, much effort has been invested in developing accurate methods for prediction of RNA secondary structure from the base sequence. Minimum free energy (MFE) predictions are widely used, based on nearest neighbor thermodynamic parameters of Mathews, Turner et al. or those of Andronescu et al. Some recently proposed alternatives that leverage partition function calculations find the structure with maximum expected accuracy (MEA) or pseudo-expected accuracy (pseudo-MEA) methods. Advances in prediction methods are typically benchmarked using sensitivity, positive predictive value and their harmonic mean, namely F-measure, on datasets of known reference structures. Since such benchmarks document progress in improving accuracy of computational prediction methods, it is important to understand how measures of accuracy vary as a function of the reference datasets and whether advances in algorithms or thermodynamic parameters yield statistically significant improvements. Our work advances such understanding for the MFE and (pseudo-)MEA-based methods, with respect to the latest datasets and energy parameters. Results We present three main findings. First, using the bootstrap percentile method, we show that the average F-measure accuracy of the MFE and (pseudo-)MEA-based algorithms, as measured on our largest datasets with over 2000 RNAs from diverse families, is a reliable estimate (within a 2% range with high confidence) of the accuracy of a population of RNA molecules represented by this set. However, average accuracy on smaller classes of RNAs such as a class of 89 Group I introns used previously in benchmarking algorithm accuracy is not reliable enough to draw meaningful conclusions about the relative merits of the MFE and MEA-based algorithms. Second, on our large datasets, the algorithm with best overall accuracy is a pseudo MEA-based algorithm of Hamada et al. that uses a generalized centroid estimator of base pairs. However, between MFE and other MEA-based methods, there is no clear winner in the sense that the relative accuracy of the MFE versus MEA-based algorithms changes depending on the underlying energy parameters. Third, of the four parameter sets we considered, the best accuracy for the MFE-, MEA-based, and pseudo-MEA-based methods is 0.686, 0.680, and 0.711, respectively (on a scale from 0 to 1 with 1 meaning perfect structure predictions) and is obtained with a thermodynamic parameter set obtained by Andronescu et al. called BL* (named after the Boltzmann likelihood method by which the parameters were derived). Conclusions Large datasets should be used to obtain reliable measures of the accuracy of RNA structure prediction algorithms, and average accuracies on specific classes (such as Group I introns and Transfer RNAs) should be interpreted with caution, considering the relatively small size of currently available datasets for such classes. The accuracy of the MEA-based methods is significantly higher when using the BL* parameter set of Andronescu et al. than when using the parameters of Mathews and Turner, and there is no significant difference between the accuracy of MEA-based methods and MFE when using the BL* parameters. The pseudo-MEA-based method of Hamada et al. with the BL* parameter set significantly outperforms all other MFE and MEA-based algorithms on our large data sets. PMID:22296803
Analysis of energy-based algorithms for RNA secondary structure prediction.
Hajiaghayi, Monir; Condon, Anne; Hoos, Holger H
2012-02-01
RNA molecules play critical roles in the cells of organisms, including roles in gene regulation, catalysis, and synthesis of proteins. Since RNA function depends in large part on its folded structures, much effort has been invested in developing accurate methods for prediction of RNA secondary structure from the base sequence. Minimum free energy (MFE) predictions are widely used, based on nearest neighbor thermodynamic parameters of Mathews, Turner et al. or those of Andronescu et al. Some recently proposed alternatives that leverage partition function calculations find the structure with maximum expected accuracy (MEA) or pseudo-expected accuracy (pseudo-MEA) methods. Advances in prediction methods are typically benchmarked using sensitivity, positive predictive value and their harmonic mean, namely F-measure, on datasets of known reference structures. Since such benchmarks document progress in improving accuracy of computational prediction methods, it is important to understand how measures of accuracy vary as a function of the reference datasets and whether advances in algorithms or thermodynamic parameters yield statistically significant improvements. Our work advances such understanding for the MFE and (pseudo-)MEA-based methods, with respect to the latest datasets and energy parameters. We present three main findings. First, using the bootstrap percentile method, we show that the average F-measure accuracy of the MFE and (pseudo-)MEA-based algorithms, as measured on our largest datasets with over 2000 RNAs from diverse families, is a reliable estimate (within a 2% range with high confidence) of the accuracy of a population of RNA molecules represented by this set. However, average accuracy on smaller classes of RNAs such as a class of 89 Group I introns used previously in benchmarking algorithm accuracy is not reliable enough to draw meaningful conclusions about the relative merits of the MFE and MEA-based algorithms. Second, on our large datasets, the algorithm with best overall accuracy is a pseudo MEA-based algorithm of Hamada et al. that uses a generalized centroid estimator of base pairs. However, between MFE and other MEA-based methods, there is no clear winner in the sense that the relative accuracy of the MFE versus MEA-based algorithms changes depending on the underlying energy parameters. Third, of the four parameter sets we considered, the best accuracy for the MFE-, MEA-based, and pseudo-MEA-based methods is 0.686, 0.680, and 0.711, respectively (on a scale from 0 to 1 with 1 meaning perfect structure predictions) and is obtained with a thermodynamic parameter set obtained by Andronescu et al. called BL* (named after the Boltzmann likelihood method by which the parameters were derived). Large datasets should be used to obtain reliable measures of the accuracy of RNA structure prediction algorithms, and average accuracies on specific classes (such as Group I introns and Transfer RNAs) should be interpreted with caution, considering the relatively small size of currently available datasets for such classes. The accuracy of the MEA-based methods is significantly higher when using the BL* parameter set of Andronescu et al. than when using the parameters of Mathews and Turner, and there is no significant difference between the accuracy of MEA-based methods and MFE when using the BL* parameters. The pseudo-MEA-based method of Hamada et al. with the BL* parameter set significantly outperforms all other MFE and MEA-based algorithms on our large data sets.
Putz, A M; Tiezzi, F; Maltecca, C; Gray, K A; Knauer, M T
2018-02-01
The objective of this study was to compare and determine the optimal validation method when comparing accuracy from single-step GBLUP (ssGBLUP) to traditional pedigree-based BLUP. Field data included six litter size traits. Simulated data included ten replicates designed to mimic the field data in order to determine the method that was closest to the true accuracy. Data were split into training and validation sets. The methods used were as follows: (i) theoretical accuracy derived from the prediction error variance (PEV) of the direct inverse (iLHS), (ii) approximated accuracies from the accf90(GS) program in the BLUPF90 family of programs (Approx), (iii) correlation between predictions and the single-step GEBVs from the full data set (GEBV Full ), (iv) correlation between predictions and the corrected phenotypes of females from the full data set (Y c ), (v) correlation from method iv divided by the square root of the heritability (Y ch ) and (vi) correlation between sire predictions and the average of their daughters' corrected phenotypes (Y cs ). Accuracies from iLHS increased from 0.27 to 0.37 (37%) in the Large White. Approximation accuracies were very consistent and close in absolute value (0.41 to 0.43). Both iLHS and Approx were much less variable than the corrected phenotype methods (ranging from 0.04 to 0.27). On average, simulated data showed an increase in accuracy from 0.34 to 0.44 (29%) using ssGBLUP. Both iLHS and Y ch approximated the increase well, 0.30 to 0.46 and 0.36 to 0.45, respectively. GEBV Full performed poorly in both data sets and is not recommended. Results suggest that for within-breed selection, theoretical accuracy using PEV was consistent and accurate. When direct inversion is infeasible to get the PEV, correlating predictions to the corrected phenotypes divided by the square root of heritability is adequate given a large enough validation data set. © 2017 Blackwell Verlag GmbH.
Short-term load forecasting of power system
NASA Astrophysics Data System (ADS)
Xu, Xiaobin
2017-05-01
In order to ensure the scientific nature of optimization about power system, it is necessary to improve the load forecasting accuracy. Power system load forecasting is based on accurate statistical data and survey data, starting from the history and current situation of electricity consumption, with a scientific method to predict the future development trend of power load and change the law of science. Short-term load forecasting is the basis of power system operation and analysis, which is of great significance to unit combination, economic dispatch and safety check. Therefore, the load forecasting of the power system is explained in detail in this paper. First, we use the data from 2012 to 2014 to establish the partial least squares model to regression analysis the relationship between daily maximum load, daily minimum load, daily average load and each meteorological factor, and select the highest peak by observing the regression coefficient histogram Day maximum temperature, daily minimum temperature and daily average temperature as the meteorological factors to improve the accuracy of load forecasting indicators. Secondly, in the case of uncertain climate impact, we use the time series model to predict the load data for 2015, respectively, the 2009-2014 load data were sorted out, through the previous six years of the data to forecast the data for this time in 2015. The criterion for the accuracy of the prediction is the average of the standard deviations for the prediction results and average load for the previous six years. Finally, considering the climate effect, we use the BP neural network model to predict the data in 2015, and optimize the forecast results on the basis of the time series model.
Ansari, Mozafar; Othman, Faridah; Abunama, Taher; El-Shafie, Ahmed
2018-04-01
The function of a sewage treatment plant is to treat the sewage to acceptable standards before being discharged into the receiving waters. To design and operate such plants, it is necessary to measure and predict the influent flow rate. In this research, the influent flow rate of a sewage treatment plant (STP) was modelled and predicted by autoregressive integrated moving average (ARIMA), nonlinear autoregressive network (NAR) and support vector machine (SVM) regression time series algorithms. To evaluate the models' accuracy, the root mean square error (RMSE) and coefficient of determination (R 2 ) were calculated as initial assessment measures, while relative error (RE), peak flow criterion (PFC) and low flow criterion (LFC) were calculated as final evaluation measures to demonstrate the detailed accuracy of the selected models. An integrated model was developed based on the individual models' prediction ability for low, average and peak flow. An initial assessment of the results showed that the ARIMA model was the least accurate and the NAR model was the most accurate. The RE results also prove that the SVM model's frequency of errors above 10% or below - 10% was greater than the NAR model's. The influent was also forecasted up to 44 weeks ahead by both models. The graphical results indicate that the NAR model made better predictions than the SVM model. The final evaluation of NAR and SVM demonstrated that SVM made better predictions at peak flow and NAR fit well for low and average inflow ranges. The integrated model developed includes the NAR model for low and average influent and the SVM model for peak inflow.
Using Demographic Subgroup and Dummy Variable Equations to Predict College Freshman Grade Average.
ERIC Educational Resources Information Center
Sawyer, Richard
1986-01-01
This study was designed to determine whether adjustments for the differential prediction observed among sex, racial/ethnic, or age subgroups in one freshman class at a college could be used to improve prediction accuracy for these subgroups in future freshman classes. (Author/LMO)
Measurement and interpretation of skin prick test results.
van der Valk, J P M; Gerth van Wijk, R; Hoorn, E; Groenendijk, L; Groenendijk, I M; de Jong, N W
2015-01-01
There are several methods to read skin prick test results in type-I allergy testing. A commonly used method is to characterize the wheal size by its 'average diameter'. A more accurate method is to scan the area of the wheal to calculate the actual size. In both methods, skin prick test (SPT) results can be corrected for histamine-sensitivity of the skin by dividing the results of the allergic reaction by the histamine control. The objectives of this study are to compare different techniques of quantifying SPT results, to determine a cut-off value for a positive SPT for histamine equivalent prick -index (HEP) area, and to study the accuracy of predicting cashew nut reactions in double-blind placebo-controlled food challenge (DBPCFC) tests with the different SPT methods. Data of 172 children with cashew nut sensitisation were used for the analysis. All patients underwent a DBPCFC with cashew nut. Per patient, the average diameter and scanned area of the wheal size were recorded. In addition, the same data for the histamine-induced wheal were collected for each patient. The accuracy in predicting the outcome of the DBPCFC using four different SPT readings (i.e. average diameter, area, HEP-index diameter, HEP-index area) were compared in a Receiver-Operating Characteristic (ROC) plot. Characterizing the wheal size by the average diameter method is inaccurate compared to scanning method. A wheal average diameter of 3 mm is generally considered as a positive SPT cut-off value and an equivalent HEP-index area cut-off value of 0.4 was calculated. The four SPT methods yielded a comparable area under the curve (AUC) of 0.84, 0.85, 0.83 and 0.83, respectively. The four methods showed comparable accuracy in predicting cashew nut reactions in a DBPCFC. The 'scanned area method' is theoretically more accurate in determining the wheal area than the 'average diameter method' and is recommended in academic research. A HEP-index area of 0.4 is determined as cut-off value for a positive SPT. However, in clinical practice, the 'average diameter method' is also useful, because this method provides similar accuracy in predicting cashew nut allergic reactions in the DBPCFC. Trial number NTR3572.
Koo, Choongwan; Hong, Taehoon; Lee, Minhyun; Park, Hyo Seon
2013-05-07
The photovoltaic (PV) system is considered an unlimited source of clean energy, whose amount of electricity generation changes according to the monthly average daily solar radiation (MADSR). It is revealed that the MADSR distribution in South Korea has very diverse patterns due to the country's climatic and geographical characteristics. This study aimed to develop a MADSR estimation model for the location without the measured MADSR data, using an advanced case based reasoning (CBR) model, which is a hybrid methodology combining CBR with artificial neural network, multiregression analysis, and genetic algorithm. The average prediction accuracy of the advanced CBR model was very high at 95.69%, and the standard deviation of the prediction accuracy was 3.67%, showing a significant improvement in prediction accuracy and consistency. A case study was conducted to verify the proposed model. The proposed model could be useful for owner or construction manager in charge of determining whether or not to introduce the PV system and where to install it. Also, it would benefit contractors in a competitive bidding process to accurately estimate the electricity generation of the PV system in advance and to conduct an economic and environmental feasibility study from the life cycle perspective.
Hoover, Stephen; Jackson, Eric V.; Paul, David; Locke, Robert
2016-01-01
Summary Background Accurate prediction of future patient census in hospital units is essential for patient safety, health outcomes, and resource planning. Forecasting census in the Neonatal Intensive Care Unit (NICU) is particularly challenging due to limited ability to control the census and clinical trajectories. The fixed average census approach, using average census from previous year, is a forecasting alternative used in clinical practice, but has limitations due to census variations. Objective Our objectives are to: (i) analyze the daily NICU census at a single health care facility and develop census forecasting models, (ii) explore models with and without patient data characteristics obtained at the time of admission, and (iii) evaluate accuracy of the models compared with the fixed average census approach. Methods We used five years of retrospective daily NICU census data for model development (January 2008 – December 2012, N=1827 observations) and one year of data for validation (January – December 2013, N=365 observations). Best-fitting models of ARIMA and linear regression were applied to various 7-day prediction periods and compared using error statistics. Results The census showed a slightly increasing linear trend. Best fitting models included a non-seasonal model, ARIMA(1,0,0), seasonal ARIMA models, ARIMA(1,0,0)x(1,1,2)7 and ARIMA(2,1,4)x(1,1,2)14, as well as a seasonal linear regression model. Proposed forecasting models resulted on average in 36.49% improvement in forecasting accuracy compared with the fixed average census approach. Conclusions Time series models provide higher prediction accuracy under different census conditions compared with the fixed average census approach. Presented methodology is easily applicable in clinical practice, can be generalized to other care settings, support short- and long-term census forecasting, and inform staff resource planning. PMID:27437040
Capan, Muge; Hoover, Stephen; Jackson, Eric V; Paul, David; Locke, Robert
2016-01-01
Accurate prediction of future patient census in hospital units is essential for patient safety, health outcomes, and resource planning. Forecasting census in the Neonatal Intensive Care Unit (NICU) is particularly challenging due to limited ability to control the census and clinical trajectories. The fixed average census approach, using average census from previous year, is a forecasting alternative used in clinical practice, but has limitations due to census variations. Our objectives are to: (i) analyze the daily NICU census at a single health care facility and develop census forecasting models, (ii) explore models with and without patient data characteristics obtained at the time of admission, and (iii) evaluate accuracy of the models compared with the fixed average census approach. We used five years of retrospective daily NICU census data for model development (January 2008 - December 2012, N=1827 observations) and one year of data for validation (January - December 2013, N=365 observations). Best-fitting models of ARIMA and linear regression were applied to various 7-day prediction periods and compared using error statistics. The census showed a slightly increasing linear trend. Best fitting models included a non-seasonal model, ARIMA(1,0,0), seasonal ARIMA models, ARIMA(1,0,0)x(1,1,2)7 and ARIMA(2,1,4)x(1,1,2)14, as well as a seasonal linear regression model. Proposed forecasting models resulted on average in 36.49% improvement in forecasting accuracy compared with the fixed average census approach. Time series models provide higher prediction accuracy under different census conditions compared with the fixed average census approach. Presented methodology is easily applicable in clinical practice, can be generalized to other care settings, support short- and long-term census forecasting, and inform staff resource planning.
Rutter, Carolyn M; Knudsen, Amy B; Marsh, Tracey L; Doria-Rose, V Paul; Johnson, Eric; Pabiniak, Chester; Kuntz, Karen M; van Ballegooijen, Marjolein; Zauber, Ann G; Lansdorp-Vogelaar, Iris
2016-07-01
Microsimulation models synthesize evidence about disease processes and interventions, providing a method for predicting long-term benefits and harms of prevention, screening, and treatment strategies. Because models often require assumptions about unobservable processes, assessing a model's predictive accuracy is important. We validated 3 colorectal cancer (CRC) microsimulation models against outcomes from the United Kingdom Flexible Sigmoidoscopy Screening (UKFSS) Trial, a randomized controlled trial that examined the effectiveness of one-time flexible sigmoidoscopy screening to reduce CRC mortality. The models incorporate different assumptions about the time from adenoma initiation to development of preclinical and symptomatic CRC. Analyses compare model predictions to study estimates across a range of outcomes to provide insight into the accuracy of model assumptions. All 3 models accurately predicted the relative reduction in CRC mortality 10 years after screening (predicted hazard ratios, with 95% percentile intervals: 0.56 [0.44, 0.71], 0.63 [0.51, 0.75], 0.68 [0.53, 0.83]; estimated with 95% confidence interval: 0.56 [0.45, 0.69]). Two models with longer average preclinical duration accurately predicted the relative reduction in 10-year CRC incidence. Two models with longer mean sojourn time accurately predicted the number of screen-detected cancers. All 3 models predicted too many proximal adenomas among patients referred to colonoscopy. Model accuracy can only be established through external validation. Analyses such as these are therefore essential for any decision model. Results supported the assumptions that the average time from adenoma initiation to development of preclinical cancer is long (up to 25 years), and mean sojourn time is close to 4 years, suggesting the window for early detection and intervention by screening is relatively long. Variation in dwell time remains uncertain and could have important clinical and policy implications. © The Author(s) 2016.
Performance of genomic prediction within and across generations in maritime pine.
Bartholomé, Jérôme; Van Heerwaarden, Joost; Isik, Fikret; Boury, Christophe; Vidal, Marjorie; Plomion, Christophe; Bouffier, Laurent
2016-08-11
Genomic selection (GS) is a promising approach for decreasing breeding cycle length in forest trees. Assessment of progeny performance and of the prediction accuracy of GS models over generations is therefore a key issue. A reference population of maritime pine (Pinus pinaster) with an estimated effective inbreeding population size (status number) of 25 was first selected with simulated data. This reference population (n = 818) covered three generations (G0, G1 and G2) and was genotyped with 4436 single-nucleotide polymorphism (SNP) markers. We evaluated the effects on prediction accuracy of both the relatedness between the calibration and validation sets and validation on the basis of progeny performance. Pedigree-based (best linear unbiased prediction, ABLUP) and marker-based (genomic BLUP and Bayesian LASSO) models were used to predict breeding values for three different traits: circumference, height and stem straightness. On average, the ABLUP model outperformed genomic prediction models, with a maximum difference in prediction accuracies of 0.12, depending on the trait and the validation method. A mean difference in prediction accuracy of 0.17 was found between validation methods differing in terms of relatedness. Including the progenitors in the calibration set reduced this difference in prediction accuracy to 0.03. When only genotypes from the G0 and G1 generations were used in the calibration set and genotypes from G2 were used in the validation set (progeny validation), prediction accuracies ranged from 0.70 to 0.85. This study suggests that the training of prediction models on parental populations can predict the genetic merit of the progeny with high accuracy: an encouraging result for the implementation of GS in the maritime pine breeding program.
Using Time-Series Regression to Predict Academic Library Circulations.
ERIC Educational Resources Information Center
Brooks, Terrence A.
1984-01-01
Four methods were used to forecast monthly circulation totals in 15 midwestern academic libraries: dummy time-series regression, lagged time-series regression, simple average (straight-line forecasting), monthly average (naive forecasting). In tests of forecasting accuracy, dummy regression method and monthly mean method exhibited smallest average…
Improving consensus contact prediction via server correlation reduction.
Gao, Xin; Bu, Dongbo; Xu, Jinbo; Li, Ming
2009-05-06
Protein inter-residue contacts play a crucial role in the determination and prediction of protein structures. Previous studies on contact prediction indicate that although template-based consensus methods outperform sequence-based methods on targets with typical templates, such consensus methods perform poorly on new fold targets. However, we find out that even for new fold targets, the models generated by threading programs can contain many true contacts. The challenge is how to identify them. In this paper, we develop an integer linear programming model for consensus contact prediction. In contrast to the simple majority voting method assuming that all the individual servers are equally important and independent, the newly developed method evaluates their correlation by using maximum likelihood estimation and extracts independent latent servers from them by using principal component analysis. An integer linear programming method is then applied to assign a weight to each latent server to maximize the difference between true contacts and false ones. The proposed method is tested on the CASP7 data set. If the top L/5 predicted contacts are evaluated where L is the protein size, the average accuracy is 73%, which is much higher than that of any previously reported study. Moreover, if only the 15 new fold CASP7 targets are considered, our method achieves an average accuracy of 37%, which is much better than that of the majority voting method, SVM-LOMETS, SVM-SEQ, and SAM-T06. These methods demonstrate an average accuracy of 13.0%, 10.8%, 25.8% and 21.2%, respectively. Reducing server correlation and optimally combining independent latent servers show a significant improvement over the traditional consensus methods. This approach can hopefully provide a powerful tool for protein structure refinement and prediction use.
Cardiovascular risk prediction tools for populations in Asia.
Barzi, F; Patel, A; Gu, D; Sritara, P; Lam, T H; Rodgers, A; Woodward, M
2007-02-01
Cardiovascular risk equations are traditionally derived from the Framingham Study. The accuracy of this approach in Asian populations, where resources for risk factor measurement may be limited, is unclear. To compare "low-information" equations (derived using only age, systolic blood pressure, total cholesterol and smoking status) derived from the Framingham Study with those derived from the Asian cohorts, on the accuracy of cardiovascular risk prediction. Separate equations to predict the 8-year risk of a cardiovascular event were derived from Asian and Framingham cohorts. The performance of these equations, and a subsequently "recalibrated" Framingham equation, were evaluated among participants from independent Chinese cohorts. Six cohort studies from Japan, Korea and Singapore (Asian cohorts); six cohort studies from China; the Framingham Study from the US. 172,077 participants from the Asian cohorts; 25,682 participants from Chinese cohorts and 6053 participants from the Framingham Study. In the Chinese cohorts, 542 cardiovascular events occurred during 8 years of follow-up. Both the Asian cohorts and the Framingham equations discriminated cardiovascular risk well in the Chinese cohorts; the area under the receiver-operator characteristic curve was at least 0.75 for men and women. However, the Framingham risk equation systematically overestimated risk in the Chinese cohorts by an average of 276% among men and 102% among women. The corresponding average overestimation using the Asian cohorts equation was 11% and 10%, respectively. Recalibrating the Framingham risk equation using cardiovascular disease incidence from the non-Chinese Asian cohorts led to an overestimation of risk by an average of 4% in women and underestimation of risk by an average of 2% in men. A low-information Framingham cardiovascular risk prediction tool, which, when recalibrated with contemporary data, is likely to estimate future cardiovascular risk with similar accuracy in Asian populations as tools developed from data on local cohorts.
Waide, Emily H; Tuggle, Christopher K; Serão, Nick V L; Schroyen, Martine; Hess, Andrew; Rowland, Raymond R R; Lunney, Joan K; Plastow, Graham; Dekkers, Jack C M
2018-02-01
Genomic prediction of the pig's response to the porcine reproductive and respiratory syndrome (PRRS) virus (PRRSV) would be a useful tool in the swine industry. This study investigated the accuracy of genomic prediction based on porcine SNP60 Beadchip data using training and validation datasets from populations with different genetic backgrounds that were challenged with different PRRSV isolates. Genomic prediction accuracy averaged 0.34 for viral load (VL) and 0.23 for weight gain (WG) following experimental PRRSV challenge, which demonstrates that genomic selection could be used to improve response to PRRSV infection. Training on WG data during infection with a less virulent PRRSV, KS06, resulted in poor accuracy of prediction for WG during infection with a more virulent PRRSV, NVSL. Inclusion of single nucleotide polymorphisms (SNPs) that are in linkage disequilibrium with a major quantitative trait locus (QTL) on chromosome 4 was vital for accurate prediction of VL. Overall, SNPs that were significantly associated with either trait in single SNP genome-wide association analysis were unable to predict the phenotypes with an accuracy as high as that obtained by using all genotyped SNPs across the genome. Inclusion of data from close relatives into the training population increased whole genome prediction accuracy by 33% for VL and by 37% for WG but did not affect the accuracy of prediction when using only SNPs in the major QTL region. Results show that genomic prediction of response to PRRSV infection is moderately accurate and, when using all SNPs on the porcine SNP60 Beadchip, is not very sensitive to differences in virulence of the PRRSV in training and validation populations. Including close relatives in the training population increased prediction accuracy when using the whole genome or SNPs other than those near a major QTL.
Lessons in molecular recognition. 2. Assessing and improving cross-docking accuracy.
Sutherland, Jeffrey J; Nandigam, Ravi K; Erickson, Jon A; Vieth, Michal
2007-01-01
Docking methods are used to predict the manner in which a ligand binds to a protein receptor. Many studies have assessed the success rate of programs in self-docking tests, whereby a ligand is docked into the protein structure from which it was extracted. Cross-docking, or using a protein structure from a complex containing a different ligand, provides a more realistic assessment of a docking program's ability to reproduce X-ray results. In this work, cross-docking was performed with CDocker, Fred, and Rocs using multiple X-ray structures for eight proteins (two kinases, one nuclear hormone receptor, one serine protease, two metalloproteases, and two phosphodiesterases). While average cross-docking accuracy is not encouraging, it is shown that using the protein structure from the complex that contains the bound ligand most similar to the docked ligand increases docking accuracy for all methods ("similarity selection"). Identifying the most successful protein conformer ("best selection") and similarity selection substantially reduce the difference between self-docking and average cross-docking accuracy. We identify universal predictors of docking accuracy (i.e., showing consistent behavior across most protein-method combinations), and show that models for predicting docking accuracy built using these parameters can be used to select the most appropriate docking method.
Efficient pairwise RNA structure prediction using probabilistic alignment constraints in Dynalign
2007-01-01
Background Joint alignment and secondary structure prediction of two RNA sequences can significantly improve the accuracy of the structural predictions. Methods addressing this problem, however, are forced to employ constraints that reduce computation by restricting the alignments and/or structures (i.e. folds) that are permissible. In this paper, a new methodology is presented for the purpose of establishing alignment constraints based on nucleotide alignment and insertion posterior probabilities. Using a hidden Markov model, posterior probabilities of alignment and insertion are computed for all possible pairings of nucleotide positions from the two sequences. These alignment and insertion posterior probabilities are additively combined to obtain probabilities of co-incidence for nucleotide position pairs. A suitable alignment constraint is obtained by thresholding the co-incidence probabilities. The constraint is integrated with Dynalign, a free energy minimization algorithm for joint alignment and secondary structure prediction. The resulting method is benchmarked against the previous version of Dynalign and against other programs for pairwise RNA structure prediction. Results The proposed technique eliminates manual parameter selection in Dynalign and provides significant computational time savings in comparison to prior constraints in Dynalign while simultaneously providing a small improvement in the structural prediction accuracy. Savings are also realized in memory. In experiments over a 5S RNA dataset with average sequence length of approximately 120 nucleotides, the method reduces computation by a factor of 2. The method performs favorably in comparison to other programs for pairwise RNA structure prediction: yielding better accuracy, on average, and requiring significantly lesser computational resources. Conclusion Probabilistic analysis can be utilized in order to automate the determination of alignment constraints for pairwise RNA structure prediction methods in a principled fashion. These constraints can reduce the computational and memory requirements of these methods while maintaining or improving their accuracy of structural prediction. This extends the practical reach of these methods to longer length sequences. The revised Dynalign code is freely available for download. PMID:17445273
Genomic selection across multiple breeding cycles in applied bread wheat breeding.
Michel, Sebastian; Ametz, Christian; Gungor, Huseyin; Epure, Doru; Grausgruber, Heinrich; Löschenberger, Franziska; Buerstmayr, Hermann
2016-06-01
We evaluated genomic selection across five breeding cycles of bread wheat breeding. Bias of within-cycle cross-validation and methods for improving the prediction accuracy were assessed. The prospect of genomic selection has been frequently shown by cross-validation studies using the same genetic material across multiple environments, but studies investigating genomic selection across multiple breeding cycles in applied bread wheat breeding are lacking. We estimated the prediction accuracy of grain yield, protein content and protein yield of 659 inbred lines across five independent breeding cycles and assessed the bias of within-cycle cross-validation. We investigated the influence of outliers on the prediction accuracy and predicted protein yield by its components traits. A high average heritability was estimated for protein content, followed by grain yield and protein yield. The bias of the prediction accuracy using populations from individual cycles using fivefold cross-validation was accordingly substantial for protein yield (17-712 %) and less pronounced for protein content (8-86 %). Cross-validation using the cycles as folds aimed to avoid this bias and reached a maximum prediction accuracy of [Formula: see text] = 0.51 for protein content, [Formula: see text] = 0.38 for grain yield and [Formula: see text] = 0.16 for protein yield. Dropping outlier cycles increased the prediction accuracy of grain yield to [Formula: see text] = 0.41 as estimated by cross-validation, while dropping outlier environments did not have a significant effect on the prediction accuracy. Independent validation suggests, on the other hand, that careful consideration is necessary before an outlier correction is undertaken, which removes lines from the training population. Predicting protein yield by multiplying genomic estimated breeding values of grain yield and protein content raised the prediction accuracy to [Formula: see text] = 0.19 for this derived trait.
Kusev, Petko; van Schaik, Paul; Tsaneva-Atanasova, Krasimira; Juliusson, Asgeir; Chater, Nick
2018-01-01
When attempting to predict future events, people commonly rely on historical data. One psychological characteristic of judgmental forecasting of time series, established by research, is that when people make forecasts from series, they tend to underestimate future values for upward trends and overestimate them for downward ones, so-called trend-damping (modeled by anchoring on, and insufficient adjustment from, the average of recent time series values). Events in a time series can be experienced sequentially (dynamic mode), or they can also be retrospectively viewed simultaneously (static mode), not experienced individually in real time. In one experiment, we studied the influence of presentation mode (dynamic and static) on two sorts of judgment: (a) predictions of the next event (forecast) and (b) estimation of the average value of all the events in the presented series (average estimation). Participants' responses in dynamic mode were anchored on more recent events than in static mode for all types of judgment but with different consequences; hence, dynamic presentation improved prediction accuracy, but not estimation. These results are not anticipated by existing theoretical accounts; we develop and present an agent-based model-the adaptive anchoring model (ADAM)-to account for the difference between processing sequences of dynamically and statically presented stimuli (visually presented data). ADAM captures how variation in presentation mode produces variation in responses (and the accuracy of these responses) in both forecasting and judgment tasks. ADAM's model predictions for the forecasting and judgment tasks fit better with the response data than a linear-regression time series model. Moreover, ADAM outperformed autoregressive-integrated-moving-average (ARIMA) and exponential-smoothing models, while neither of these models accounts for people's responses on the average estimation task. Copyright © 2017 The Authors. Cognitive Science published by Wiley Periodicals, Inc. on behalf of Cognitive Science Society.
Genomic prediction of reproduction traits for Merino sheep.
Bolormaa, S; Brown, D J; Swan, A A; van der Werf, J H J; Hayes, B J; Daetwyler, H D
2017-06-01
Economically important reproduction traits in sheep, such as number of lambs weaned and litter size, are expressed only in females and later in life after most selection decisions are made, which makes them ideal candidates for genomic selection. Accurate genomic predictions would lead to greater genetic gain for these traits by enabling accurate selection of young rams with high genetic merit. The aim of this study was to design and evaluate the accuracy of a genomic prediction method for female reproduction in sheep using daughter trait deviations (DTD) for sires and ewe phenotypes (when individual ewes were genotyped) for three reproduction traits: number of lambs born (NLB), litter size (LSIZE) and number of lambs weaned. Genomic best linear unbiased prediction (GBLUP), BayesR and pedigree BLUP analyses of the three reproduction traits measured on 5340 sheep (4503 ewes and 837 sires) with real and imputed genotypes for 510 174 SNPs were performed. The prediction of breeding values using both sire and ewe trait records was validated in Merino sheep. Prediction accuracy was evaluated by across sire family and random cross-validations. Accuracies of genomic estimated breeding values (GEBVs) were assessed as the mean Pearson correlation adjusted by the accuracy of the input phenotypes. The addition of sire DTD into the prediction analysis resulted in higher accuracies compared with using only ewe records in genomic predictions or pedigree BLUP. Using GBLUP, the average accuracy based on the combined records (ewes and sire DTD) was 0.43 across traits, but the accuracies varied by trait and type of cross-validations. The accuracies of GEBVs from random cross-validations (range 0.17-0.61) were higher than were those from sire family cross-validations (range 0.00-0.51). The GEBV accuracies of 0.41-0.54 for NLB and LSIZE based on the combined records were amongst the highest in the study. Although BayesR was not significantly different from GBLUP in prediction accuracy, it identified several candidate genes which are known to be associated with NLB and LSIZE. The approach provides a way to make use of all data available in genomic prediction for traits that have limited recording. © 2017 Stichting International Foundation for Animal Genetics.
Prospects for Genomic Selection in Cassava Breeding.
Wolfe, Marnin D; Del Carpio, Dunia Pino; Alabi, Olumide; Ezenwaka, Lydia C; Ikeogu, Ugochukwu N; Kayondo, Ismail S; Lozano, Roberto; Okeke, Uche G; Ozimati, Alfred A; Williams, Esuma; Egesi, Chiedozie; Kawuki, Robert S; Kulakow, Peter; Rabbi, Ismail Y; Jannink, Jean-Luc
2017-11-01
Cassava ( Crantz) is a clonally propagated staple food crop in the tropics. Genomic selection (GS) has been implemented at three breeding institutions in Africa to reduce cycle times. Initial studies provided promising estimates of predictive abilities. Here, we expand on previous analyses by assessing the accuracy of seven prediction models for seven traits in three prediction scenarios: cross-validation within populations, cross-population prediction and cross-generation prediction. We also evaluated the impact of increasing the training population (TP) size by phenotyping progenies selected either at random or with a genetic algorithm. Cross-validation results were mostly consistent across programs, with nonadditive models predicting of 10% better on average. Cross-population accuracy was generally low (mean = 0.18) but prediction of cassava mosaic disease increased up to 57% in one Nigerian population when data from another related population were combined. Accuracy across generations was poorer than within-generation accuracy, as expected, but accuracy for dry matter content and mosaic disease severity should be sufficient for rapid-cycling GS. Selection of a prediction model made some difference across generations, but increasing TP size was more important. With a genetic algorithm, selection of one-third of progeny could achieve an accuracy equivalent to phenotyping all progeny. We are in the early stages of GS for this crop but the results are promising for some traits. General guidelines that are emerging are that TPs need to continue to grow but phenotyping can be done on a cleverly selected subset of individuals, reducing the overall phenotyping burden. Copyright © 2017 Crop Science Society of America.
Motor ability and inhibitory processes in children with ADHD: a neuroelectric study.
Hung, Chiao-Ling; Chang, Yu-Kai; Chan, Yuan-Shuo; Shih, Chia-Hao; Huang, Chung-Ju; Hung, Tsung-Min
2013-06-01
The purpose of the current study was to examine the relationship between motor ability and response inhibition using behavioral and electrophysiological indices in children with ADHD. A total of 32 participants were recruited and underwent a motor ability assessment by administering the Basic Motor Ability Test-Revised (BMAT) as well as the Go/No-Go task and event-related potential (ERP) measurements at the same time. The results indicated that the BMAT scores were positively associated with the behavioral and ERP measures. Specifically, the BMAT average score was associated with a faster reaction time and higher accuracy, whereas higher BMAT subset scores predicted a shorter P3 latency in the Go condition. Although the association between the BMAT average score and the No-Go accuracy was limited, higher BMAT average and subset scores predicted a shorter N2 and P3 latency and a larger P3 amplitude in the No-Go condition. These findings suggest that motor abilities may play roles that benefit the cognitive performance of ADHD children.
Accuracy of taxonomy prediction for 16S rRNA and fungal ITS sequences
2018-01-01
Prediction of taxonomy for marker gene sequences such as 16S ribosomal RNA (rRNA) is a fundamental task in microbiology. Most experimentally observed sequences are diverged from reference sequences of authoritatively named organisms, creating a challenge for prediction methods. I assessed the accuracy of several algorithms using cross-validation by identity, a new benchmark strategy which explicitly models the variation in distances between query sequences and the closest entry in a reference database. When the accuracy of genus predictions was averaged over a representative range of identities with the reference database (100%, 99%, 97%, 95% and 90%), all tested methods had ≤50% accuracy on the currently-popular V4 region of 16S rRNA. Accuracy was found to fall rapidly with identity; for example, better methods were found to have V4 genus prediction accuracy of ∼100% at 100% identity but ∼50% at 97% identity. The relationship between identity and taxonomy was quantified as the probability that a rank is the lowest shared by a pair of sequences with a given pair-wise identity. With the V4 region, 95% identity was found to be a twilight zone where taxonomy is highly ambiguous because the probabilities that the lowest shared rank between pairs of sequences is genus, family, order or class are approximately equal. PMID:29682424
Connecting clinical and actuarial prediction with rule-based methods.
Fokkema, Marjolein; Smits, Niels; Kelderman, Henk; Penninx, Brenda W J H
2015-06-01
Meta-analyses comparing the accuracy of clinical versus actuarial prediction have shown actuarial methods to outperform clinical methods, on average. However, actuarial methods are still not widely used in clinical practice, and there has been a call for the development of actuarial prediction methods for clinical practice. We argue that rule-based methods may be more useful than the linear main effect models usually employed in prediction studies, from a data and decision analytic as well as a practical perspective. In addition, decision rules derived with rule-based methods can be represented as fast and frugal trees, which, unlike main effects models, can be used in a sequential fashion, reducing the number of cues that have to be evaluated before making a prediction. We illustrate the usability of rule-based methods by applying RuleFit, an algorithm for deriving decision rules for classification and regression problems, to a dataset on prediction of the course of depressive and anxiety disorders from Penninx et al. (2011). The RuleFit algorithm provided a model consisting of 2 simple decision rules, requiring evaluation of only 2 to 4 cues. Predictive accuracy of the 2-rule model was very similar to that of a logistic regression model incorporating 20 predictor variables, originally applied to the dataset. In addition, the 2-rule model required, on average, evaluation of only 3 cues. Therefore, the RuleFit algorithm appears to be a promising method for creating decision tools that are less time consuming and easier to apply in psychological practice, and with accuracy comparable to traditional actuarial methods. (c) 2015 APA, all rights reserved).
Age-related DNA methylation changes for forensic age-prediction.
Yi, Shao Hua; Jia, Yun Shu; Mei, Kun; Yang, Rong Zhi; Huang, Dai Xin
2015-03-01
There is no available method of age-prediction for biological samples. The accumulating evidences indicate that DNA methylation patterns change with age. Aging resembles a developmentally regulated process that is tightly controlled by specific epigenetic modifications and age-associated methylation changes exist in human genome. In this study, three age-related methylation fragments were isolated and identified in blood of 40 donors. Age-related methylation changes with each fragment was validated and replicated in a general population sample of 65 donors over a wide age range (11-72 years). Methylation of these fragments is linearly correlated with age over a range of six decades (r = 0.80-0.88). Using average methylation of CpG sites of three fragments, a regression model that explained 95 % of the variance in age was built and is able to predict an individual's age with great accuracy (R (2 )= 0.93). The predicted value is highly correlated with the observed age in the sample (r = 0.96) and has great accuracy of average 4 years difference between predicted age and true age. This study implicates that DNA methylation can be an available biological marker of age-prediction. Further measurement of relevant markers in the genome could be a tool in routine screening to predict age of forensic biological samples.
Koskas, M; Chereau, E; Ballester, M; Dubernard, G; Lécuru, F; Heitz, D; Mathevet, P; Marret, H; Querleu, D; Golfier, F; Leblanc, E; Luton, D; Rouzier, R; Daraï, E
2013-01-01
Background: We developed a nomogram based on five clinical and pathological characteristics to predict lymph-node (LN) metastasis with a high concordance probability in endometrial cancer. Sentinel LN (SLN) biopsy has been suggested as a compromise between systematic lymphadenectomy and no dissection in patients with low-risk endometrial cancer. Methods: Patients with stage I–II endometrial cancer had pelvic SLN and systematic pelvic-node dissection. All LNs were histopathologically examined, and the SLNs were examined by immunohistochemistry. We compared the accuracy of the nomogram at predicting LN detected with conventional histopathology (macrometastasis) and ultrastaging procedure using SLN (micrometastasis). Results: Thirty-eight of the 187 patients (20%) had pelvic LN metastases, 20 had macrometastases and 18 had micrometastases. For the prediction of macrometastases, the nomogram showed good discrimination, with an area under the receiver operating characteristic curve (AUC) of 0.76, and was well calibrated (average error =2.1%). For the prediction of micro- and macrometastases, the nomogram showed poorer discrimination, with an AUC of 0.67, and was less well calibrated (average error =10.9%). Conclusion: Our nomogram is accurate at predicting LN macrometastases but less accurate at predicting micrometastases. Our results suggest that micrometastases are an ‘intermediate state' between disease-free LN and macrometastasis. PMID:23481184
Kiel, Elizabeth J; Buss, Kristin A
2011-10-01
Early social withdrawal and protective parenting predict a host of negative outcomes, warranting examination of their development. Mothers' accurate anticipation of their toddlers' fearfulness may facilitate transactional relations between toddler fearful temperament and protective parenting, leading to these outcomes. Currently, we followed 93 toddlers (42 female; on average 24.76 months) and their mothers (9% underrepresented racial/ethnic backgrounds) over 3 years. We gathered laboratory observation of fearful temperament, maternal protective behavior, and maternal accuracy during toddlerhood and a multi-method assessment of children's social withdrawal and mothers' self-reported protective behavior at kindergarten entry. When mothers displayed higher accuracy, toddler fearful temperament significantly related to concurrent maternal protective behavior and indirectly predicted kindergarten social withdrawal and maternal protective behavior. These results highlight the important role of maternal accuracy in linking fearful temperament and protective parenting, which predict further social withdrawal and protection, and point to toddlerhood for efforts of prevention of anxiety-spectrum outcomes.
Kiel, Elizabeth J.; Buss, Kristin A.
2011-01-01
Early social withdrawal and protective parenting predict a host of negative outcomes, warranting examination of their development. Mothers’ accurate anticipation of their toddlers’ fearfulness may facilitate transactional relations between toddler fearful temperament and protective parenting, leading to these outcomes. Currently, we followed 93 toddlers (42 female; on average 24.76 months) and their mothers (9% underrepresented racial/ethnic backgrounds) over 3 years. We gathered laboratory observation of fearful temperament, maternal protective behavior, and maternal accuracy during toddlerhood and a multi-method assessment of children’s social withdrawal and mothers’ self-reported protective behavior at kindergarten entry. When mothers displayed higher accuracy, toddler fearful temperament significantly related to concurrent maternal protective behavior and indirectly predicted kindergarten social withdrawal and maternal protective behavior. These results highlight the important role of maternal accuracy in linking fearful temperament and protective parenting, which predict further social withdrawal and protection, and point to toddlerhood for efforts of prevention of anxiety-spectrum outcomes. PMID:21537895
A Flexible Spatio-Temporal Model for Air Pollution with Spatial and Spatio-Temporal Covariates.
Lindström, Johan; Szpiro, Adam A; Sampson, Paul D; Oron, Assaf P; Richards, Mark; Larson, Tim V; Sheppard, Lianne
2014-09-01
The development of models that provide accurate spatio-temporal predictions of ambient air pollution at small spatial scales is of great importance for the assessment of potential health effects of air pollution. Here we present a spatio-temporal framework that predicts ambient air pollution by combining data from several different monitoring networks and deterministic air pollution model(s) with geographic information system (GIS) covariates. The model presented in this paper has been implemented in an R package, SpatioTemporal, available on CRAN. The model is used by the EPA funded Multi-Ethnic Study of Atherosclerosis and Air Pollution (MESA Air) to produce estimates of ambient air pollution; MESA Air uses the estimates to investigate the relationship between chronic exposure to air pollution and cardiovascular disease. In this paper we use the model to predict long-term average concentrations of NO x in the Los Angeles area during a ten year period. Predictions are based on measurements from the EPA Air Quality System, MESA Air specific monitoring, and output from a source dispersion model for traffic related air pollution (Caline3QHCR). Accuracy in predicting long-term average concentrations is evaluated using an elaborate cross-validation setup that accounts for a sparse spatio-temporal sampling pattern in the data, and adjusts for temporal effects. The predictive ability of the model is good with cross-validated R 2 of approximately 0.7 at subject sites. Replacing four geographic covariate indicators of traffic density with the Caline3QHCR dispersion model output resulted in very similar prediction accuracy from a more parsimonious and more interpretable model. Adding traffic-related geographic covariates to the model that included Caline3QHCR did not further improve the prediction accuracy.
2010-01-01
Background The binding of peptide fragments of extracellular peptides to class II MHC is a crucial event in the adaptive immune response. Each MHC allotype generally binds a distinct subset of peptides and the enormous number of possible peptide epitopes prevents their complete experimental characterization. Computational methods can utilize the limited experimental data to predict the binding affinities of peptides to class II MHC. Results We have developed the Regularized Thermodynamic Average, or RTA, method for predicting the affinities of peptides binding to class II MHC. RTA accounts for all possible peptide binding conformations using a thermodynamic average and includes a parameter constraint for regularization to improve accuracy on novel data. RTA was shown to achieve higher accuracy, as measured by AUC, than SMM-align on the same data for all 17 MHC allotypes examined. RTA also gave the highest accuracy on all but three allotypes when compared with results from 9 different prediction methods applied to the same data. In addition, the method correctly predicted the peptide binding register of 17 out of 18 peptide-MHC complexes. Finally, we found that suboptimal peptide binding registers, which are often ignored in other prediction methods, made significant contributions of at least 50% of the total binding energy for approximately 20% of the peptides. Conclusions The RTA method accurately predicts peptide binding affinities to class II MHC and accounts for multiple peptide binding registers while reducing overfitting through regularization. The method has potential applications in vaccine design and in understanding autoimmune disorders. A web server implementing the RTA prediction method is available at http://bordnerlab.org/RTA/. PMID:20089173
Application of the aeroacoustic analogy to a shrouded, subsonic, radial fan
NASA Astrophysics Data System (ADS)
Buccieri, Bryan M.; Richards, Christopher M.
2016-12-01
A study was conducted to investigate the predictive capability of computational aeroacoustics with respect to a shrouded, subsonic, radial fan. A three dimensional unsteady fluid dynamics simulation was conducted to produce aerodynamic data used as the acoustic source for an aeroacoustics simulation. Two acoustic models were developed: one modeling the forces on the rotating fan blades as a set of rotating dipoles located at the center of mass of each fan blade and one modeling the forces on the stationary fan shroud as a field of distributed stationary dipoles. Predicted acoustic response was compared to experimental data measured at two operating speeds using three different outlet restrictions. The blade source model predicted overall far field sound power levels within 5 dB averaged over the six different operating conditions while the shroud model predicted overall far field sound power levels within 7 dB averaged over the same conditions. Doubling the density of the computational fluids mesh and using a scale adaptive simulation turbulence model increased broadband noise accuracy. However, computation time doubled and the accuracy of the overall sound power level prediction improved by only 1 dB.
Analysis of deep learning methods for blind protein contact prediction in CASP12.
Wang, Sheng; Sun, Siqi; Xu, Jinbo
2018-03-01
Here we present the results of protein contact prediction achieved in CASP12 by our RaptorX-Contact server, which is an early implementation of our deep learning method for contact prediction. On a set of 38 free-modeling target domains with a median family size of around 58 effective sequences, our server obtained an average top L/5 long- and medium-range contact accuracy of 47% and 44%, respectively (L = length). A complete implementation has an average accuracy of 59% and 57%, respectively. Our deep learning method formulates contact prediction as a pixel-level image labeling problem and simultaneously predicts all residue pairs of a protein using a combination of two deep residual neural networks, taking as input the residue conservation information, predicted secondary structure and solvent accessibility, contact potential, and coevolution information. Our approach differs from existing methods mainly in (1) formulating contact prediction as a pixel-level image labeling problem instead of an image-level classification problem; (2) simultaneously predicting all contacts of an individual protein to make effective use of contact occurrence patterns; and (3) integrating both one-dimensional and two-dimensional deep convolutional neural networks to effectively learn complex sequence-structure relationship including high-order residue correlation. This paper discusses the RaptorX-Contact pipeline, both contact prediction and contact-based folding results, and finally the strength and weakness of our method. © 2017 Wiley Periodicals, Inc.
Hydrometeorological model for streamflow prediction
Tangborn, Wendell V.
1979-01-01
The hydrometeorological model described in this manual was developed to predict seasonal streamflow from water in storage in a basin using streamflow and precipitation data. The model, as described, applies specifically to the Skokomish, Nisqually, and Cowlitz Rivers, in Washington State, and more generally to streams in other regions that derive seasonal runoff from melting snow. Thus the techniques demonstrated for these three drainage basins can be used as a guide for applying this method to other streams. Input to the computer program consists of daily averages of gaged runoff of these streams, and daily values of precipitation collected at Longmire, Kid Valley, and Cushman Dam. Predictions are based on estimates of the absolute storage of water, predominately as snow: storage is approximately equal to basin precipitation less observed runoff. A pre-forecast test season is used to revise the storage estimate and improve the prediction accuracy. To obtain maximum prediction accuracy for operational applications with this model , a systematic evaluation of several hydrologic and meteorologic variables is first necessary. Six input options to the computer program that control prediction accuracy are developed and demonstrated. Predictions of streamflow can be made at any time and for any length of season, although accuracy is usually poor for early-season predictions (before December 1) or for short seasons (less than 15 days). The coefficient of prediction (CP), the chief measure of accuracy used in this manual, approaches zero during the late autumn and early winter seasons and reaches a maximum of about 0.85 during the spring snowmelt season. (Kosco-USGS)
Wiese, Steffen; Teutenberg, Thorsten; Schmidt, Torsten C
2012-01-27
In the present work it is shown that the linear elution strength (LES) model which was adapted from temperature-programming gas chromatography (GC) can also be employed for systematic method development in high-temperature liquid chromatography (HT-HPLC). The ability to predict isothermal retention times based on temperature-gradient as well as isothermal input data was investigated. For a small temperature interval of ΔT=40°C, both approaches result in very similar predictions. Average relative errors of predicted retention times of 2.7% and 1.9% were observed for simulations based on isothermal and temperature-gradient measurements, respectively. Concurrently, it was investigated whether the accuracy of retention time predictions of segmented temperature gradients can be further improved by temperature dependent calculation of the parameter S(T) of the LES relationship. It was found that the accuracy of retention time predictions of multi-step temperature gradients can be improved to around 1.5%, if S(T) was also calculated temperature dependent. The adjusted experimental design making use of four temperature-gradient measurements was applied for systematic method development of selected food additives by high-temperature liquid chromatography. Method development was performed within a temperature interval from 40°C to 180°C using water as mobile phase. Two separation methods were established where selected food additives were baseline separated. In addition, a good agreement between simulation and experiment was observed, because an average relative error of predicted retention times of complex segmented temperature gradients less than 5% was observed. Finally, a schedule of recommendations to assist the practitioner during systematic method development in high-temperature liquid chromatography was established. Copyright © 2011 Elsevier B.V. All rights reserved.
Clinical time series prediction: Toward a hierarchical dynamical system framework.
Liu, Zitao; Hauskrecht, Milos
2015-09-01
Developing machine learning and data mining algorithms for building temporal models of clinical time series is important for understanding of the patient condition, the dynamics of a disease, effect of various patient management interventions and clinical decision making. In this work, we propose and develop a novel hierarchical framework for modeling clinical time series data of varied length and with irregularly sampled observations. Our hierarchical dynamical system framework for modeling clinical time series combines advantages of the two temporal modeling approaches: the linear dynamical system and the Gaussian process. We model the irregularly sampled clinical time series by using multiple Gaussian process sequences in the lower level of our hierarchical framework and capture the transitions between Gaussian processes by utilizing the linear dynamical system. The experiments are conducted on the complete blood count (CBC) panel data of 1000 post-surgical cardiac patients during their hospitalization. Our framework is evaluated and compared to multiple baseline approaches in terms of the mean absolute prediction error and the absolute percentage error. We tested our framework by first learning the time series model from data for the patients in the training set, and then using it to predict future time series values for the patients in the test set. We show that our model outperforms multiple existing models in terms of its predictive accuracy. Our method achieved a 3.13% average prediction accuracy improvement on ten CBC lab time series when it was compared against the best performing baseline. A 5.25% average accuracy improvement was observed when only short-term predictions were considered. A new hierarchical dynamical system framework that lets us model irregularly sampled time series data is a promising new direction for modeling clinical time series and for improving their predictive performance. Copyright © 2014 Elsevier B.V. All rights reserved.
Rutkoski, Jessica; Poland, Jesse; Mondal, Suchismita; Autrique, Enrique; Pérez, Lorena González; Crossa, José; Reynolds, Matthew; Singh, Ravi
2016-01-01
Genomic selection can be applied prior to phenotyping, enabling shorter breeding cycles and greater rates of genetic gain relative to phenotypic selection. Traits measured using high-throughput phenotyping based on proximal or remote sensing could be useful for improving pedigree and genomic prediction model accuracies for traits not yet possible to phenotype directly. We tested if using aerial measurements of canopy temperature, and green and red normalized difference vegetation index as secondary traits in pedigree and genomic best linear unbiased prediction models could increase accuracy for grain yield in wheat, Triticum aestivum L., using 557 lines in five environments. Secondary traits on training and test sets, and grain yield on the training set were modeled as multivariate, and compared to univariate models with grain yield on the training set only. Cross validation accuracies were estimated within and across-environment, with and without replication, and with and without correcting for days to heading. We observed that, within environment, with unreplicated secondary trait data, and without correcting for days to heading, secondary traits increased accuracies for grain yield by 56% in pedigree, and 70% in genomic prediction models, on average. Secondary traits increased accuracy slightly more when replicated, and considerably less when models corrected for days to heading. In across-environment prediction, trends were similar but less consistent. These results show that secondary traits measured in high-throughput could be used in pedigree and genomic prediction to improve accuracy. This approach could improve selection in wheat during early stages if validated in early-generation breeding plots. PMID:27402362
Failure prediction using machine learning and time series in optical network.
Wang, Zhilong; Zhang, Min; Wang, Danshi; Song, Chuang; Liu, Min; Li, Jin; Lou, Liqi; Liu, Zhuo
2017-08-07
In this paper, we propose a performance monitoring and failure prediction method in optical networks based on machine learning. The primary algorithms of this method are the support vector machine (SVM) and double exponential smoothing (DES). With a focus on risk-aware models in optical networks, the proposed protection plan primarily investigates how to predict the risk of an equipment failure. To the best of our knowledge, this important problem has not yet been fully considered. Experimental results showed that the average prediction accuracy of our method was 95% when predicting the optical equipment failure state. This finding means that our method can forecast an equipment failure risk with high accuracy. Therefore, our proposed DES-SVM method can effectively improve traditional risk-aware models to protect services from possible failures and enhance the optical network stability.
NASA Technical Reports Server (NTRS)
Armoundas, A. A.; Rosenbaum, D. S.; Ruskin, J. N.; Garan, H.; Cohen, R. J.
1998-01-01
OBJECTIVE: To investigate the accuracy of signal averaged electrocardiography (SAECG) and measurement of microvolt level T wave alternans as predictors of susceptibility to ventricular arrhythmias. DESIGN: Analysis of new data from a previously published prospective investigation. SETTING: Electrophysiology laboratory of a major referral hospital. PATIENTS AND INTERVENTIONS: 43 patients, not on class I or class III antiarrhythmic drug treatment, undergoing invasive electrophysiological testing had SAECG and T wave alternans measurements. The SAECG was considered positive in the presence of one (SAECG-I) or two (SAECG-II) of three standard criteria. T wave alternans was considered positive if the alternans ratio exceeded 3.0. MAIN OUTCOME MEASURES: Inducibility of sustained ventricular tachycardia or fibrillation during electrophysiological testing, and 20 month arrhythmia-free survival. RESULTS: The accuracy of T wave alternans in predicting the outcome of electrophysiological testing was 84% (p < 0.0001). Neither SAECG-I (accuracy 60%; p < 0.29) nor SAECG-II (accuracy 71%; p < 0.10) was a statistically significant predictor of electrophysiological testing. SAECG, T wave alternans, electrophysiological testing, and follow up data were available in 36 patients while not on class I or III antiarrhythmic agents. The accuracy of T wave alternans in predicting the outcome of arrhythmia-free survival was 86% (p < 0.030). Neither SAECG-I (accuracy 65%; p < 0.21) nor SAECG-II (accuracy 71%; p < 0.48) was a statistically significant predictor of arrhythmia-free survival. CONCLUSIONS: T wave alternans was a highly significant predictor of the outcome of electrophysiological testing and arrhythmia-free survival, while SAECG was not a statistically significant predictor. Although these results need to be confirmed in prospective clinical studies, they suggest that T wave alternans may serve as a non-invasive probe for screening high risk populations for malignant ventricular arrhythmias.
NASA Astrophysics Data System (ADS)
Ko, P.; Kurosawa, S.
2014-03-01
The understanding and accurate prediction of the flow behaviour related to cavitation and pressure fluctuation in a Kaplan turbine are important to the design work enhancing the turbine performance including the elongation of the operation life span and the improvement of turbine efficiency. In this paper, high accuracy turbine and cavitation performance prediction method based on entire flow passage for a Kaplan turbine is presented and evaluated. Two-phase flow field is predicted by solving Reynolds-Averaged Navier-Stokes equations expressed by volume of fluid method tracking the free surface and combined with Reynolds Stress model. The growth and collapse of cavitation bubbles are modelled by the modified Rayleigh-Plesset equation. The prediction accuracy is evaluated by comparing with the model test results of Ns 400 Kaplan model turbine. As a result that the experimentally measured data including turbine efficiency, cavitation performance, and pressure fluctuation are accurately predicted. Furthermore, the cavitation occurrence on the runner blade surface and the influence to the hydraulic loss of the flow passage are discussed. Evaluated prediction method for the turbine flow and performance is introduced to facilitate the future design and research works on Kaplan type turbine.
Erbe, Malena; Gredler, Birgit; Seefried, Franz Reinhold; Bapst, Beat; Simianer, Henner
2013-01-01
Prediction of genomic breeding values is of major practical relevance in dairy cattle breeding. Deterministic equations have been suggested to predict the accuracy of genomic breeding values in a given design which are based on training set size, reliability of phenotypes, and the number of independent chromosome segments ([Formula: see text]). The aim of our study was to find a general deterministic equation for the average accuracy of genomic breeding values that also accounts for marker density and can be fitted empirically. Two data sets of 5'698 Holstein Friesian bulls genotyped with 50 K SNPs and 1'332 Brown Swiss bulls genotyped with 50 K SNPs and imputed to ∼600 K SNPs were available. Different k-fold (k = 2-10, 15, 20) cross-validation scenarios (50 replicates, random assignment) were performed using a genomic BLUP approach. A maximum likelihood approach was used to estimate the parameters of different prediction equations. The highest likelihood was obtained when using a modified form of the deterministic equation of Daetwyler et al. (2010), augmented by a weighting factor (w) based on the assumption that the maximum achievable accuracy is [Formula: see text]. The proportion of genetic variance captured by the complete SNP sets ([Formula: see text]) was 0.76 to 0.82 for Holstein Friesian and 0.72 to 0.75 for Brown Swiss. When modifying the number of SNPs, w was found to be proportional to the log of the marker density up to a limit which is population and trait specific and was found to be reached with ∼20'000 SNPs in the Brown Swiss population studied.
NNvPDB: Neural Network based Protein Secondary Structure Prediction with PDB Validation.
Sakthivel, Seethalakshmi; S K M, Habeeb
2015-01-01
The predicted secondary structural states are not cross validated by any of the existing servers. Hence, information on the level of accuracy for every sequence is not reported by the existing servers. This was overcome by NNvPDB, which not only reported greater Q3 but also validates every prediction with the homologous PDB entries. NNvPDB is based on the concept of Neural Network, with a new and different approach of training the network every time with five PDB structures that are similar to query sequence. The average accuracy for helix is 76%, beta sheet is 71% and overall (helix, sheet and coil) is 66%. http://bit.srmuniv.ac.in/cgi-bin/bit/cfpdb/nnsecstruct.pl.
Li, Jin; Tran, Maggie; Siwabessy, Justy
2016-01-01
Spatially continuous predictions of seabed hardness are important baseline environmental information for sustainable management of Australia’s marine jurisdiction. Seabed hardness is often inferred from multibeam backscatter data with unknown accuracy and can be inferred from underwater video footage at limited locations. In this study, we classified the seabed into four classes based on two new seabed hardness classification schemes (i.e., hard90 and hard70). We developed optimal predictive models to predict seabed hardness using random forest (RF) based on the point data of hardness classes and spatially continuous multibeam data. Five feature selection (FS) methods that are variable importance (VI), averaged variable importance (AVI), knowledge informed AVI (KIAVI), Boruta and regularized RF (RRF) were tested based on predictive accuracy. Effects of highly correlated, important and unimportant predictors on the accuracy of RF predictive models were examined. Finally, spatial predictions generated using the most accurate models were visually examined and analysed. This study confirmed that: 1) hard90 and hard70 are effective seabed hardness classification schemes; 2) seabed hardness of four classes can be predicted with a high degree of accuracy; 3) the typical approach used to pre-select predictive variables by excluding highly correlated variables needs to be re-examined; 4) the identification of the important and unimportant predictors provides useful guidelines for further improving predictive models; 5) FS methods select the most accurate predictive model(s) instead of the most parsimonious ones, and AVI and Boruta are recommended for future studies; and 6) RF is an effective modelling method with high predictive accuracy for multi-level categorical data and can be applied to ‘small p and large n’ problems in environmental sciences. Additionally, automated computational programs for AVI need to be developed to increase its computational efficiency and caution should be taken when applying filter FS methods in selecting predictive models. PMID:26890307
Li, Jin; Tran, Maggie; Siwabessy, Justy
2016-01-01
Spatially continuous predictions of seabed hardness are important baseline environmental information for sustainable management of Australia's marine jurisdiction. Seabed hardness is often inferred from multibeam backscatter data with unknown accuracy and can be inferred from underwater video footage at limited locations. In this study, we classified the seabed into four classes based on two new seabed hardness classification schemes (i.e., hard90 and hard70). We developed optimal predictive models to predict seabed hardness using random forest (RF) based on the point data of hardness classes and spatially continuous multibeam data. Five feature selection (FS) methods that are variable importance (VI), averaged variable importance (AVI), knowledge informed AVI (KIAVI), Boruta and regularized RF (RRF) were tested based on predictive accuracy. Effects of highly correlated, important and unimportant predictors on the accuracy of RF predictive models were examined. Finally, spatial predictions generated using the most accurate models were visually examined and analysed. This study confirmed that: 1) hard90 and hard70 are effective seabed hardness classification schemes; 2) seabed hardness of four classes can be predicted with a high degree of accuracy; 3) the typical approach used to pre-select predictive variables by excluding highly correlated variables needs to be re-examined; 4) the identification of the important and unimportant predictors provides useful guidelines for further improving predictive models; 5) FS methods select the most accurate predictive model(s) instead of the most parsimonious ones, and AVI and Boruta are recommended for future studies; and 6) RF is an effective modelling method with high predictive accuracy for multi-level categorical data and can be applied to 'small p and large n' problems in environmental sciences. Additionally, automated computational programs for AVI need to be developed to increase its computational efficiency and caution should be taken when applying filter FS methods in selecting predictive models.
Kiryu, Hisanori; Kin, Taishin; Asai, Kiyoshi
2007-02-15
Recent transcriptomic studies have revealed the existence of a considerable number of non-protein-coding RNA transcripts in higher eukaryotic cells. To investigate the functional roles of these transcripts, it is of great interest to find conserved secondary structures from multiple alignments on a genomic scale. Since multiple alignments are often created using alignment programs that neglect the special conservation patterns of RNA secondary structures for computational efficiency, alignment failures can cause potential risks of overlooking conserved stem structures. We investigated the dependence of the accuracy of secondary structure prediction on the quality of alignments. We compared three algorithms that maximize the expected accuracy of secondary structures as well as other frequently used algorithms. We found that one of our algorithms, called McCaskill-MEA, was more robust against alignment failures than others. The McCaskill-MEA method first computes the base pairing probability matrices for all the sequences in the alignment and then obtains the base pairing probability matrix of the alignment by averaging over these matrices. The consensus secondary structure is predicted from this matrix such that the expected accuracy of the prediction is maximized. We show that the McCaskill-MEA method performs better than other methods, particularly when the alignment quality is low and when the alignment consists of many sequences. Our model has a parameter that controls the sensitivity and specificity of predictions. We discussed the uses of that parameter for multi-step screening procedures to search for conserved secondary structures and for assigning confidence values to the predicted base pairs. The C++ source code that implements the McCaskill-MEA algorithm and the test dataset used in this paper are available at http://www.ncrna.org/papers/McCaskillMEA/. Supplementary data are available at Bioinformatics online.
Lee, J; Kachman, S D; Spangler, M L
2017-08-01
Genomic selection (GS) has become an integral part of genetic evaluation methodology and has been applied to all major livestock species, including beef and dairy cattle, pigs, and chickens. Significant contributions in increased accuracy of selection decisions have been clearly illustrated in dairy cattle after practical application of GS. In the majority of U.S. beef cattle breeds, similar efforts have also been made to increase the accuracy of genetic merit estimates through the inclusion of genomic information into routine genetic evaluations using a variety of methods. However, prediction accuracies can vary relative to panel density, the number of folds used for folds cross-validation, and the choice of dependent variables (e.g., EBV, deregressed EBV, adjusted phenotypes). The aim of this study was to evaluate the accuracy of genomic predictors for Red Angus beef cattle with different strategies used in training and evaluation. The reference population consisted of 9,776 Red Angus animals whose genotypes were imputed to 2 medium-density panels consisting of over 50,000 (50K) and approximately 80,000 (80K) SNP. Using the imputed panels, we determined the influence of marker density, exclusion (deregressed EPD adjusting for parental information [DEPD-PA]) or inclusion (deregressed EPD without adjusting for parental information [DEPD]) of parental information in the deregressed EPD used as the dependent variable, and the number of clusters used to partition training animals (3, 5, or 10). A BayesC model with π set to 0.99 was used to predict molecular breeding values (MBV) for 13 traits for which EPD existed. The prediction accuracies were measured as genetic correlations between MBV and weighted deregressed EPD. The average accuracies across all traits were 0.540 and 0.552 when using the 50K and 80K SNP panels, respectively, and 0.538, 0.541, and 0.561 when using 3, 5, and 10 folds, respectively, for cross-validation. Using DEPD-PA as the response variable resulted in higher accuracies of MBV than those obtained by DEPD for growth and carcass traits. When DEPD were used as the response variable, accuracies were greater for threshold traits and those that are sex limited, likely due to the fact that these traits suffer from a lack of information content and excluding animals in training with only parental information substantially decreases the training population size. It is recommended that the contribution of parental average to deregressed EPD should be removed in the construction of genomic prediction equations. The difference in terms of prediction accuracies between the 2 SNP panels or the number of folds compared herein was negligible.
Adaptive Spontaneous Transitions between Two Mechanisms of Numerical Averaging.
Brezis, Noam; Bronfman, Zohar Z; Usher, Marius
2015-06-04
We investigated the mechanism with which humans estimate numerical averages. Participants were presented with 4, 8 or 16 (two-digit) numbers, serially and rapidly (2 numerals/second) and were instructed to convey the sequence average. As predicted by a dual, but not a single-component account, we found a non-monotonic influence of set-size on accuracy. Moreover, we observed a marked decrease in RT as set-size increases and RT-accuracy tradeoff in the 4-, but not in the 16-number condition. These results indicate that in accordance with the normative directive, participants spontaneously employ analytic/sequential thinking in the 4-number condition and intuitive/holistic thinking in the 16-number condition. When the presentation rate is extreme (10 items/sec) we find that, while performance still remains high, the estimations are now based on intuitive processing. The results are accounted for by a computational model postulating population-coding underlying intuitive-averaging and working-memory-mediated symbolic procedures underlying analytical-averaging, with flexible allocation between the two.
RBind: computational network method to predict RNA binding sites.
Wang, Kaili; Jian, Yiren; Wang, Huiwen; Zeng, Chen; Zhao, Yunjie
2018-04-26
Non-coding RNA molecules play essential roles by interacting with other molecules to perform various biological functions. However, it is difficult to determine RNA structures due to their flexibility. At present, the number of experimentally solved RNA-ligand and RNA-protein structures is still insufficient. Therefore, binding sites prediction of non-coding RNA is required to understand their functions. Current RNA binding site prediction algorithms produce many false positive nucleotides that are distance away from the binding sites. Here, we present a network approach, RBind, to predict the RNA binding sites. We benchmarked RBind in RNA-ligand and RNA-protein datasets. The average accuracy of 0.82 in RNA-ligand and 0.63 in RNA-protein testing showed that this network strategy has a reliable accuracy for binding sites prediction. The codes and datasets are available at https://zhaolab.com.cn/RBind. yjzhaowh@mail.ccnu.edu.cn. Supplementary data are available at Bioinformatics online.
Weng, Ziqing; Wolc, Anna; Shen, Xia; Fernando, Rohan L; Dekkers, Jack C M; Arango, Jesus; Settar, Petek; Fulton, Janet E; O'Sullivan, Neil P; Garrick, Dorian J
2016-03-19
Genomic estimated breeding values (GEBV) based on single nucleotide polymorphism (SNP) genotypes are widely used in animal improvement programs. It is typically assumed that the larger the number of animals is in the training set, the higher is the prediction accuracy of GEBV. The aim of this study was to quantify genomic prediction accuracy depending on the number of ancestral generations included in the training set, and to determine the optimal number of training generations for different traits in an elite layer breeding line. Phenotypic records for 16 traits on 17,793 birds were used. All parents and some selection candidates from nine non-overlapping generations were genotyped for 23,098 segregating SNPs. An animal model with pedigree relationships (PBLUP) and the BayesB genomic prediction model were applied to predict EBV or GEBV at each validation generation (progeny of the most recent training generation) based on varying numbers of immediately preceding ancestral generations. Prediction accuracy of EBV or GEBV was assessed as the correlation between EBV and phenotypes adjusted for fixed effects, divided by the square root of trait heritability. The optimal number of training generations that resulted in the greatest prediction accuracy of GEBV was determined for each trait. The relationship between optimal number of training generations and heritability was investigated. On average, accuracies were higher with the BayesB model than with PBLUP. Prediction accuracies of GEBV increased as the number of closely-related ancestral generations included in the training set increased, but reached an asymptote or slightly decreased when distant ancestral generations were used in the training set. The optimal number of training generations was 4 or more for high heritability traits but less than that for low heritability traits. For less heritable traits, limiting the training datasets to individuals closely related to the validation population resulted in the best predictions. The effect of adding distant ancestral generations in the training set on prediction accuracy differed between traits and the optimal number of necessary training generations is associated with the heritability of traits.
Isma’eel, Hussain A.; Sakr, George E.; Almedawar, Mohamad M.; Fathallah, Jihan; Garabedian, Torkom; Eddine, Savo Bou Zein
2015-01-01
Background High dietary salt intake is directly linked to hypertension and cardiovascular diseases (CVDs). Predicting behaviors regarding salt intake habits is vital to guide interventions and increase their effectiveness. We aim to compare the accuracy of an artificial neural network (ANN) based tool that predicts behavior from key knowledge questions along with clinical data in a high cardiovascular risk cohort relative to the least square models (LSM) method. Methods We collected knowledge, attitude and behavior data on 115 patients. A behavior score was calculated to classify patients’ behavior towards reducing salt intake. Accuracy comparison between ANN and regression analysis was calculated using the bootstrap technique with 200 iterations. Results Starting from a 69-item questionnaire, a reduced model was developed and included eight knowledge items found to result in the highest accuracy of 62% CI (58-67%). The best prediction accuracy in the full and reduced models was attained by ANN at 66% and 62%, respectively, compared to full and reduced LSM at 40% and 34%, respectively. The average relative increase in accuracy over all in the full and reduced models is 82% and 102%, respectively. Conclusions Using ANN modeling, we can predict salt reduction behaviors with 66% accuracy. The statistical model has been implemented in an online calculator and can be used in clinics to estimate the patient’s behavior. This will help implementation in future research to further prove clinical utility of this tool to guide therapeutic salt reduction interventions in high cardiovascular risk individuals. PMID:26090333
Protein structure refinement using a quantum mechanics-based chemical shielding predictor.
Bratholm, Lars A; Jensen, Jan H
2017-03-01
The accurate prediction of protein chemical shifts using a quantum mechanics (QM)-based method has been the subject of intense research for more than 20 years but so far empirical methods for chemical shift prediction have proven more accurate. In this paper we show that a QM-based predictor of a protein backbone and CB chemical shifts (ProCS15, PeerJ , 2016, 3, e1344) is of comparable accuracy to empirical chemical shift predictors after chemical shift-based structural refinement that removes small structural errors. We present a method by which quantum chemistry based predictions of isotropic chemical shielding values (ProCS15) can be used to refine protein structures using Markov Chain Monte Carlo (MCMC) simulations, relating the chemical shielding values to the experimental chemical shifts probabilistically. Two kinds of MCMC structural refinement simulations were performed using force field geometry optimized X-ray structures as starting points: simulated annealing of the starting structure and constant temperature MCMC simulation followed by simulated annealing of a representative ensemble structure. Annealing of the CHARMM structure changes the CA-RMSD by an average of 0.4 Å but lowers the chemical shift RMSD by 1.0 and 0.7 ppm for CA and N. Conformational averaging has a relatively small effect (0.1-0.2 ppm) on the overall agreement with carbon chemical shifts but lowers the error for nitrogen chemical shifts by 0.4 ppm. If an amino acid specific offset is included the ProCS15 predicted chemical shifts have RMSD values relative to experiments that are comparable to popular empirical chemical shift predictors. The annealed representative ensemble structures differ in CA-RMSD relative to the initial structures by an average of 2.0 Å, with >2.0 Å difference for six proteins. In four of the cases, the largest structural differences arise in structurally flexible regions of the protein as determined by NMR, and in the remaining two cases, the large structural change may be due to force field deficiencies. The overall accuracy of the empirical methods are slightly improved by annealing the CHARMM structure with ProCS15, which may suggest that the minor structural changes introduced by ProCS15-based annealing improves the accuracy of the protein structures. Having established that QM-based chemical shift prediction can deliver the same accuracy as empirical shift predictors we hope this can help increase the accuracy of related approaches such as QM/MM or linear scaling approaches or interpreting protein structural dynamics from QM-derived chemical shift.
Rezaei-Darzi, Ehsan; Farzadfar, Farshad; Hashemi-Meshkini, Amir; Navidi, Iman; Mahmoudi, Mahmoud; Varmaghani, Mehdi; Mehdipour, Parinaz; Soudi Alamdari, Mahsa; Tayefi, Batool; Naderimagham, Shohreh; Soleymani, Fatemeh; Mesdaghinia, Alireza; Delavari, Alireza; Mohammad, Kazem
2014-12-01
This study aimed to evaluate and compare the prediction accuracy of two data mining techniques, including decision tree and neural network models in labeling diagnosis to gastrointestinal prescriptions in Iran. This study was conducted in three phases: data preparation, training phase, and testing phase. A sample from a database consisting of 23 million pharmacy insurance claim records, from 2004 to 2011 was used, in which a total of 330 prescriptions were assessed and used to train and test the models simultaneously. In the training phase, the selected prescriptions were assessed by both a physician and a pharmacist separately and assigned a diagnosis. To test the performance of each model, a k-fold stratified cross validation was conducted in addition to measuring their sensitivity and specificity. Generally, two methods had very similar accuracies. Considering the weighted average of true positive rate (sensitivity) and true negative rate (specificity), the decision tree had slightly higher accuracy in its ability for correct classification (83.3% and 96% versus 80.3% and 95.1%, respectively). However, when the weighted average of ROC area (AUC between each class and all other classes) was measured, the ANN displayed higher accuracies in predicting the diagnosis (93.8% compared with 90.6%). According to the result of this study, artificial neural network and decision tree model represent similar accuracy in labeling diagnosis to GI prescription.
[Forest lighting fire forecasting for Daxing'anling Mountains based on MAXENT model].
Sun, Yu; Shi, Ming-Chang; Peng, Huan; Zhu, Pei-Lin; Liu, Si-Lin; Wu, Shi-Lei; He, Cheng; Chen, Feng
2014-04-01
Daxing'anling Mountains is one of the areas with the highest occurrence of forest lighting fire in Heilongjiang Province, and developing a lightning fire forecast model to accurately predict the forest fires in this area is of importance. Based on the data of forest lightning fires and environment variables, the MAXENT model was used to predict the lightning fire in Daxing' anling region. Firstly, we studied the collinear diagnostic of each environment variable, evaluated the importance of the environmental variables using training gain and the Jackknife method, and then evaluated the prediction accuracy of the MAXENT model using the max Kappa value and the AUC value. The results showed that the variance inflation factor (VIF) values of lightning energy and neutralized charge were 5.012 and 6.230, respectively. They were collinear with the other variables, so the model could not be used for training. Daily rainfall, the number of cloud-to-ground lightning, and current intensity of cloud-to-ground lightning were the three most important factors affecting the lightning fires in the forest, while the daily average wind speed and the slope was of less importance. With the increase of the proportion of test data, the max Kappa and AUC values were increased. The max Kappa values were above 0.75 and the average value was 0.772, while all of the AUC values were above 0.5 and the average value was 0. 859. With a moderate level of prediction accuracy being achieved, the MAXENT model could be used to predict forest lightning fire in Daxing'anling Mountains.
Automated detection of brain atrophy patterns based on MRI for the prediction of Alzheimer's disease
Plant, Claudia; Teipel, Stefan J.; Oswald, Annahita; Böhm, Christian; Meindl, Thomas; Mourao-Miranda, Janaina; Bokde, Arun W.; Hampel, Harald; Ewers, Michael
2010-01-01
Subjects with mild cognitive impairment (MCI) have an increased risk to develop Alzheimer's disease (AD). Voxel-based MRI studies have demonstrated that widely distributed cortical and subcortical brain areas show atrophic changes in MCI, preceding the onset of AD-type dementia. Here we developed a novel data mining framework in combination with three different classifiers including support vector machine (SVM), Bayes statistics, and voting feature intervals (VFI) to derive a quantitative index of pattern matching for the prediction of the conversion from MCI to AD. MRI was collected in 32 AD patients, 24 MCI subjects and 18 healthy controls (HC). Nine out of 24 MCI subjects converted to AD after an average follow-up interval of 2.5 years. Using feature selection algorithms, brain regions showing the highest accuracy for the discrimination between AD and HC were identified, reaching a classification accuracy of up to 92%. The extracted AD clusters were used as a search region to extract those brain areas that are predictive of conversion to AD within MCI subjects. The most predictive brain areas included the anterior cingulate gyrus and orbitofrontal cortex. The best prediction accuracy, which was cross-validated via train-and-test, was 75% for the prediction of the conversion from MCI to AD. The present results suggest that novel multivariate methods of pattern matching reach a clinically relevant accuracy for the a priori prediction of the progression from MCI to AD. PMID:19961938
All-atom 3D structure prediction of transmembrane β-barrel proteins from sequences.
Hayat, Sikander; Sander, Chris; Marks, Debora S; Elofsson, Arne
2015-04-28
Transmembrane β-barrels (TMBs) carry out major functions in substrate transport and protein biogenesis but experimental determination of their 3D structure is challenging. Encouraged by successful de novo 3D structure prediction of globular and α-helical membrane proteins from sequence alignments alone, we developed an approach to predict the 3D structure of TMBs. The approach combines the maximum-entropy evolutionary coupling method for predicting residue contacts (EVfold) with a machine-learning approach (boctopus2) for predicting β-strands in the barrel. In a blinded test for 19 TMB proteins of known structure that have a sufficient number of diverse homologous sequences available, this combined method (EVfold_bb) predicts hydrogen-bonded residue pairs between adjacent β-strands at an accuracy of ∼70%. This accuracy is sufficient for the generation of all-atom 3D models. In the transmembrane barrel region, the average 3D structure accuracy [template-modeling (TM) score] of top-ranked models is 0.54 (ranging from 0.36 to 0.85), with a higher (44%) number of residue pairs in correct strand-strand registration than in earlier methods (18%). Although the nonbarrel regions are predicted less accurately overall, the evolutionary couplings identify some highly constrained loop residues and, for FecA protein, the barrel including the structure of a plug domain can be accurately modeled (TM score = 0.68). Lower prediction accuracy tends to be associated with insufficient sequence information and we therefore expect increasing numbers of β-barrel families to become accessible to accurate 3D structure prediction as the number of available sequences increases.
Yang, Jing; He, Bao-Ji; Jang, Richard; Zhang, Yang; Shen, Hong-Bin
2015-01-01
Abstract Motivation: Cysteine-rich proteins cover many important families in nature but there are currently no methods specifically designed for modeling the structure of these proteins. The accuracy of disulfide connectivity pattern prediction, particularly for the proteins of higher-order connections, e.g. >3 bonds, is too low to effectively assist structure assembly simulations. Results: We propose a new hierarchical order reduction protocol called Cyscon for disulfide-bonding prediction. The most confident disulfide bonds are first identified and bonding prediction is then focused on the remaining cysteine residues based on SVR training. Compared with purely machine learning-based approaches, Cyscon improved the average accuracy of connectivity pattern prediction by 21.9%. For proteins with more than 5 disulfide bonds, Cyscon improved the accuracy by 585% on the benchmark set of PDBCYS. When applied to 158 non-redundant cysteine-rich proteins, Cyscon predictions helped increase (or decrease) the TM-score (or RMSD) of the ab initio QUARK modeling by 12.1% (or 14.4%). This result demonstrates a new avenue to improve the ab initio structure modeling for cysteine-rich proteins. Availability and implementation: http://www.csbio.sjtu.edu.cn/bioinf/Cyscon/ Contact: zhng@umich.edu or hbshen@sjtu.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26254435
Automatically identifying health outcome information in MEDLINE records.
Demner-Fushman, Dina; Few, Barbara; Hauser, Susan E; Thoma, George
2006-01-01
Understanding the effect of a given intervention on the patient's health outcome is one of the key elements in providing optimal patient care. This study presents a methodology for automatic identification of outcomes-related information in medical text and evaluates its potential in satisfying clinical information needs related to health care outcomes. An annotation scheme based on an evidence-based medicine model for critical appraisal of evidence was developed and used to annotate 633 MEDLINE citations. Textual, structural, and meta-information features essential to outcome identification were learned from the created collection and used to develop an automatic system. Accuracy of automatic outcome identification was assessed in an intrinsic evaluation and in an extrinsic evaluation, in which ranking of MEDLINE search results obtained using PubMed Clinical Queries relied on identified outcome statements. The accuracy and positive predictive value of outcome identification were calculated. Effectiveness of the outcome-based ranking was measured using mean average precision and precision at rank 10. Automatic outcome identification achieved 88% to 93% accuracy. The positive predictive value of individual sentences identified as outcomes ranged from 30% to 37%. Outcome-based ranking improved retrieval accuracy, tripling mean average precision and achieving 389% improvement in precision at rank 10. Preliminary results in outcome-based document ranking show potential validity of the evidence-based medicine-model approach in timely delivery of information critical to clinical decision support at the point of service.
Predicting online ratings based on the opinion spreading process
NASA Astrophysics Data System (ADS)
He, Xing-Sheng; Zhou, Ming-Yang; Zhuo, Zhao; Fu, Zhong-Qian; Liu, Jian-Guo
2015-10-01
Predicting users' online ratings is always a challenge issue and has drawn lots of attention. In this paper, we present a rating prediction method by combining the user opinion spreading process with the collaborative filtering algorithm, where user similarity is defined by measuring the amount of opinion a user transfers to another based on the primitive user-item rating matrix. The proposed method could produce a more precise rating prediction for each unrated user-item pair. In addition, we introduce a tunable parameter λ to regulate the preferential diffusion relevant to the degree of both opinion sender and receiver. The numerical results for Movielens and Netflix data sets show that this algorithm has a better accuracy than the standard user-based collaborative filtering algorithm using Cosine and Pearson correlation without increasing computational complexity. By tuning λ, our method could further boost the prediction accuracy when using Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) as measurements. In the optimal cases, on Movielens and Netflix data sets, the corresponding algorithmic accuracy (MAE and RMSE) are improved 11.26% and 8.84%, 13.49% and 10.52% compared to the item average method, respectively.
Screening for Reading Problems: The Utility of SEARCH.
ERIC Educational Resources Information Center
Morrison, Delmont; And Others
1988-01-01
The accuracy of SEARCH for identifying children at risk for developing learning disabilities was evaluated with 1,107 kindergarten children. Children identified as at risk were of average intelligence. SEARCH scores were significantly correlated with sequential and simultaneous information processing skills. SEARCH predicted adequacy of…
Huikang Wang; Luzheng Bi; Teng Teng
2017-07-01
This paper proposes a novel method of electroencephalography (EEG)-based driver emergency braking intention detection system for brain-controlled driving considering one electrode falling-off. First, whether one electrode falls off is discriminated based on EEG potentials. Then, the missing signals are estimated by using the signals collected from other channels based on multivariate linear regression. Finally, a linear decoder is applied to classify driver intentions. Experimental results show that the falling-off discrimination accuracy is 99.63% on average and the correlation coefficient and root mean squared error (RMSE) between the estimated and experimental data are 0.90 and 11.43 μV, respectively, on average. Given one electrode falls off, the system accuracy of the proposed intention prediction method is significantly higher than that of the original method (95.12% VS 79.11%) and is close to that (95.95%) of the original system under normal situations (i. e., no electrode falling-off).
Neumann, Marcus A.
2017-01-01
Motional averaging has been proven to be significant in predicting the chemical shifts in ab initio solid-state NMR calculations, and the applicability of motional averaging with molecular dynamics has been shown to depend on the accuracy of the molecular mechanical force field. The performance of a fully automatically generated tailor-made force field (TMFF) for the dynamic aspects of NMR crystallography is evaluated and compared with existing benchmarks, including static dispersion-corrected density functional theory calculations and the COMPASS force field. The crystal structure of free base cocaine is used as an example. The results reveal that, even though the TMFF outperforms the COMPASS force field for representing the energies and conformations of predicted structures, it does not give significant improvement in the accuracy of NMR calculations. Further studies should direct more attention to anisotropic chemical shifts and development of the method of solid-state NMR calculations. PMID:28250956
Predicting Individuals' Learning Success from Patterns of Pre-Learning MRI Activity
Vo, Loan T. K.; Walther, Dirk B.; Kramer, Arthur F.; Erickson, Kirk I.; Boot, Walter R.; Voss, Michelle W.; Prakash, Ruchika S.; Lee, Hyunkyu; Fabiani, Monica; Gratton, Gabriele; Simons, Daniel J.; Sutton, Bradley P.; Wang, Michelle Y.
2011-01-01
Performance in most complex cognitive and psychomotor tasks improves with training, yet the extent of improvement varies among individuals. Is it possible to forecast the benefit that a person might reap from training? Several behavioral measures have been used to predict individual differences in task improvement, but their predictive power is limited. Here we show that individual differences in patterns of time-averaged T2*-weighted MRI images in the dorsal striatum recorded at the initial stage of training predict subsequent learning success in a complex video game with high accuracy. These predictions explained more than half of the variance in learning success among individuals, suggesting that individual differences in neuroanatomy or persistent physiology predict whether and to what extent people will benefit from training in a complex task. Surprisingly, predictions from white matter were highly accurate, while voxels in the gray matter of the dorsal striatum did not contain any information about future training success. Prediction accuracy was higher in the anterior than the posterior half of the dorsal striatum. The link between trainability and the time-averaged T2*-weighted signal in the dorsal striatum reaffirms the role of this part of the basal ganglia in learning and executive functions, such as task-switching and task coordination processes. The ability to predict who will benefit from training by using neuroimaging data collected in the early training phase may have far-reaching implications for the assessment of candidates for specific training programs as well as the study of populations that show deficiencies in learning new skills. PMID:21264257
NASA Astrophysics Data System (ADS)
Kim, M. S.; Onda, Y.; Kim, J. K.
2015-01-01
SHALSTAB model applied to shallow landslides induced by rainfall to evaluate soil properties related with the effect of soil depth for a granite area in Jinbu region, Republic of Korea. Soil depth measured by a knocking pole test and two soil parameters from direct shear test (a and b) as well as one soil parameters from a triaxial compression test (c) were collected to determine the input parameters for the model. Experimental soil data were used for the first simulation (Case I) and, soil data represented the effect of measured soil depth and average soil depth from soil data of Case I were used in the second (Case II) and third simulations (Case III), respectively. All simulations were analysed using receiver operating characteristic (ROC) analysis to determine the accuracy of prediction. ROC analysis results for first simulation showed the low ROC values under 0.75 may be due to the internal friction angle and particularly the cohesion value. Soil parameters calculated from a stochastic hydro-geomorphological model were applied to the SHALSTAB model. The accuracy of Case II and Case III using ROC analysis showed higher accuracy values rather than first simulation. Our results clearly demonstrate that the accuracy of shallow landslide prediction can be improved when soil parameters represented the effect of soil thickness.
The application of a geometric optical canopy reflectance model to semiarid shrub vegetation
NASA Technical Reports Server (NTRS)
Franklin, Janet; Turner, Debra L.
1992-01-01
Estimates are obtained of the average plant size and density of shrub vegetation on the basis of SPOT High Resolution Visible Multispectral imagery from Chihuahuan desert areas, using the Li and Strahler (1985) model. The aggregated predictions for a number of stands within a class were accurate to within one or two standard errors of the observed average value. Accuracy was highest for those classes of vegetation where the nonrandom scrub pattern was characterized for the class on the basis of the average coefficient of the determination of density.
Bridge Structure Deformation Prediction Based on GNSS Data Using Kalman-ARIMA-GARCH Model
Li, Xiaoqing; Wang, Yu
2018-01-01
Bridges are an essential part of the ground transportation system. Health monitoring is fundamentally important for the safety and service life of bridges. A large amount of structural information is obtained from various sensors using sensing technology, and the data processing has become a challenging issue. To improve the prediction accuracy of bridge structure deformation based on data mining and to accurately evaluate the time-varying characteristics of bridge structure performance evolution, this paper proposes a new method for bridge structure deformation prediction, which integrates the Kalman filter, autoregressive integrated moving average model (ARIMA), and generalized autoregressive conditional heteroskedasticity (GARCH). Firstly, the raw deformation data is directly pre-processed using the Kalman filter to reduce the noise. After that, the linear recursive ARIMA model is established to analyze and predict the structure deformation. Finally, the nonlinear recursive GARCH model is introduced to further improve the accuracy of the prediction. Simulation results based on measured sensor data from the Global Navigation Satellite System (GNSS) deformation monitoring system demonstrated that: (1) the Kalman filter is capable of denoising the bridge deformation monitoring data; (2) the prediction accuracy of the proposed Kalman-ARIMA-GARCH model is satisfactory, where the mean absolute error increases only from 3.402 mm to 5.847 mm with the increment of the prediction step; and (3) in comparision to the Kalman-ARIMA model, the Kalman-ARIMA-GARCH model results in superior prediction accuracy as it includes partial nonlinear characteristics (heteroscedasticity); the mean absolute error of five-step prediction using the proposed model is improved by 10.12%. This paper provides a new way for structural behavior prediction based on data processing, which can lay a foundation for the early warning of bridge health monitoring system based on sensor data using sensing technology. PMID:29351254
Bridge Structure Deformation Prediction Based on GNSS Data Using Kalman-ARIMA-GARCH Model.
Xin, Jingzhou; Zhou, Jianting; Yang, Simon X; Li, Xiaoqing; Wang, Yu
2018-01-19
Bridges are an essential part of the ground transportation system. Health monitoring is fundamentally important for the safety and service life of bridges. A large amount of structural information is obtained from various sensors using sensing technology, and the data processing has become a challenging issue. To improve the prediction accuracy of bridge structure deformation based on data mining and to accurately evaluate the time-varying characteristics of bridge structure performance evolution, this paper proposes a new method for bridge structure deformation prediction, which integrates the Kalman filter, autoregressive integrated moving average model (ARIMA), and generalized autoregressive conditional heteroskedasticity (GARCH). Firstly, the raw deformation data is directly pre-processed using the Kalman filter to reduce the noise. After that, the linear recursive ARIMA model is established to analyze and predict the structure deformation. Finally, the nonlinear recursive GARCH model is introduced to further improve the accuracy of the prediction. Simulation results based on measured sensor data from the Global Navigation Satellite System (GNSS) deformation monitoring system demonstrated that: (1) the Kalman filter is capable of denoising the bridge deformation monitoring data; (2) the prediction accuracy of the proposed Kalman-ARIMA-GARCH model is satisfactory, where the mean absolute error increases only from 3.402 mm to 5.847 mm with the increment of the prediction step; and (3) in comparision to the Kalman-ARIMA model, the Kalman-ARIMA-GARCH model results in superior prediction accuracy as it includes partial nonlinear characteristics (heteroscedasticity); the mean absolute error of five-step prediction using the proposed model is improved by 10.12%. This paper provides a new way for structural behavior prediction based on data processing, which can lay a foundation for the early warning of bridge health monitoring system based on sensor data using sensing technology.
Leegon, Jeffrey; Aronsky, Dominik
2006-01-01
The healthcare environment is constantly changing. Probabilistic clinical decision support systems need to recognize and incorporate the changing patterns and adjust the decision model to maintain high levels of accuracy. Using data from >75,000 ED patients during a 19-month study period we examined the impact of various static and dynamic training strategies on a decision support system designed to predict hospital admission status for ED patients. Training durations ranged from 1 to 12 weeks. During the study period major institutional changes occurred that affected the system's performance level. The average area under the receiver operating characteristic curve was higher and more stable when longer training periods were used. The system showed higher accuracy when retrained an updated with more recent data as compared to static training period. To adjust for temporal trends the accuracy of decision support systems can benefit from longer training periods and retraining with more recent data.
NASA Technical Reports Server (NTRS)
Franklin, J.; Duncan, J.
1992-01-01
The rate at which a light field decays in water is characterized by the diffuse attenuation coefficient k. The Li-Strahler discrete-object canopy reflectance model was tested in two sites, a shrub grass savanna and a degraded shrub savanna on bare soil, in the proposed HAPEX (Hydrologic Atmospheric Pilot Experiment) II/Sahel study area in Niger, West Africa. Average site reflectance was predicted for each site from the reflectances and cover proportions of four components: shrub canopy, background (soil or grass and soil), shaded canopy, and shaded background. Component reflectances were sampled in the SPOT wavebands using a hand-held radiometer. Predicted reflectance was compared to average site reflectance measured using the same radiometer mounted on a backpack with measurements recorded every 5 m along two 1-km transects, also in the SPOT (Systeme Probatoire d'Observation de la Terre) bands. Measurements and predictions were made for each of the three days during the summer growing season, approximately two weeks apart. Red, near infrared reflectance, and the NDVI (normalized difference vegetation index) were all predicted with a high degree of accuracy for the shrub/grass site and with reasonable accuracy for the degraded shrub site.
Saatchi, Mahdi; McClure, Mathew C; McKay, Stephanie D; Rolf, Megan M; Kim, JaeWoo; Decker, Jared E; Taxis, Tasia M; Chapple, Richard H; Ramey, Holly R; Northcutt, Sally L; Bauck, Stewart; Woodward, Brent; Dekkers, Jack C M; Fernando, Rohan L; Schnabel, Robert D; Garrick, Dorian J; Taylor, Jeremy F
2011-11-28
Genomic selection is a recently developed technology that is beginning to revolutionize animal breeding. The objective of this study was to estimate marker effects to derive prediction equations for direct genomic values for 16 routinely recorded traits of American Angus beef cattle and quantify corresponding accuracies of prediction. Deregressed estimated breeding values were used as observations in a weighted analysis to derive direct genomic values for 3570 sires genotyped using the Illumina BovineSNP50 BeadChip. These bulls were clustered into five groups using K-means clustering on pedigree estimates of additive genetic relationships between animals, with the aim of increasing within-group and decreasing between-group relationships. All five combinations of four groups were used for model training, with cross-validation performed in the group not used in training. Bivariate animal models were used for each trait to estimate the genetic correlation between deregressed estimated breeding values and direct genomic values. Accuracies of direct genomic values ranged from 0.22 to 0.69 for the studied traits, with an average of 0.44. Predictions were more accurate when animals within the validation group were more closely related to animals in the training set. When training and validation sets were formed by random allocation, the accuracies of direct genomic values ranged from 0.38 to 0.85, with an average of 0.65, reflecting the greater relationship between animals in training and validation. The accuracies of direct genomic values obtained from training on older animals and validating in younger animals were intermediate to the accuracies obtained from K-means clustering and random clustering for most traits. The genetic correlation between deregressed estimated breeding values and direct genomic values ranged from 0.15 to 0.80 for the traits studied. These results suggest that genomic estimates of genetic merit can be produced in beef cattle at a young age but the recurrent inclusion of genotyped sires in retraining analyses will be necessary to routinely produce for the industry the direct genomic values with the highest accuracy.
2011-01-01
Background Genomic selection is a recently developed technology that is beginning to revolutionize animal breeding. The objective of this study was to estimate marker effects to derive prediction equations for direct genomic values for 16 routinely recorded traits of American Angus beef cattle and quantify corresponding accuracies of prediction. Methods Deregressed estimated breeding values were used as observations in a weighted analysis to derive direct genomic values for 3570 sires genotyped using the Illumina BovineSNP50 BeadChip. These bulls were clustered into five groups using K-means clustering on pedigree estimates of additive genetic relationships between animals, with the aim of increasing within-group and decreasing between-group relationships. All five combinations of four groups were used for model training, with cross-validation performed in the group not used in training. Bivariate animal models were used for each trait to estimate the genetic correlation between deregressed estimated breeding values and direct genomic values. Results Accuracies of direct genomic values ranged from 0.22 to 0.69 for the studied traits, with an average of 0.44. Predictions were more accurate when animals within the validation group were more closely related to animals in the training set. When training and validation sets were formed by random allocation, the accuracies of direct genomic values ranged from 0.38 to 0.85, with an average of 0.65, reflecting the greater relationship between animals in training and validation. The accuracies of direct genomic values obtained from training on older animals and validating in younger animals were intermediate to the accuracies obtained from K-means clustering and random clustering for most traits. The genetic correlation between deregressed estimated breeding values and direct genomic values ranged from 0.15 to 0.80 for the traits studied. Conclusions These results suggest that genomic estimates of genetic merit can be produced in beef cattle at a young age but the recurrent inclusion of genotyped sires in retraining analyses will be necessary to routinely produce for the industry the direct genomic values with the highest accuracy. PMID:22122853
Clinical time series prediction: towards a hierarchical dynamical system framework
Liu, Zitao; Hauskrecht, Milos
2014-01-01
Objective Developing machine learning and data mining algorithms for building temporal models of clinical time series is important for understanding of the patient condition, the dynamics of a disease, effect of various patient management interventions and clinical decision making. In this work, we propose and develop a novel hierarchical framework for modeling clinical time series data of varied length and with irregularly sampled observations. Materials and methods Our hierarchical dynamical system framework for modeling clinical time series combines advantages of the two temporal modeling approaches: the linear dynamical system and the Gaussian process. We model the irregularly sampled clinical time series by using multiple Gaussian process sequences in the lower level of our hierarchical framework and capture the transitions between Gaussian processes by utilizing the linear dynamical system. The experiments are conducted on the complete blood count (CBC) panel data of 1000 post-surgical cardiac patients during their hospitalization. Our framework is evaluated and compared to multiple baseline approaches in terms of the mean absolute prediction error and the absolute percentage error. Results We tested our framework by first learning the time series model from data for the patient in the training set, and then applying the model in order to predict future time series values on the patients in the test set. We show that our model outperforms multiple existing models in terms of its predictive accuracy. Our method achieved a 3.13% average prediction accuracy improvement on ten CBC lab time series when it was compared against the best performing baseline. A 5.25% average accuracy improvement was observed when only short-term predictions were considered. Conclusion A new hierarchical dynamical system framework that lets us model irregularly sampled time series data is a promising new direction for modeling clinical time series and for improving their predictive performance. PMID:25534671
A fast and robust iterative algorithm for prediction of RNA pseudoknotted secondary structures
2014-01-01
Background Improving accuracy and efficiency of computational methods that predict pseudoknotted RNA secondary structures is an ongoing challenge. Existing methods based on free energy minimization tend to be very slow and are limited in the types of pseudoknots that they can predict. Incorporating known structural information can improve prediction accuracy; however, there are not many methods for prediction of pseudoknotted structures that can incorporate structural information as input. There is even less understanding of the relative robustness of these methods with respect to partial information. Results We present a new method, Iterative HFold, for pseudoknotted RNA secondary structure prediction. Iterative HFold takes as input a pseudoknot-free structure, and produces a possibly pseudoknotted structure whose energy is at least as low as that of any (density-2) pseudoknotted structure containing the input structure. Iterative HFold leverages strengths of earlier methods, namely the fast running time of HFold, a method that is based on the hierarchical folding hypothesis, and the energy parameters of HotKnots V2.0. Our experimental evaluation on a large data set shows that Iterative HFold is robust with respect to partial information, with average accuracy on pseudoknotted structures steadily increasing from roughly 54% to 79% as the user provides up to 40% of the input structure. Iterative HFold is much faster than HotKnots V2.0, while having comparable accuracy. Iterative HFold also has significantly better accuracy than IPknot on our HK-PK and IP-pk168 data sets. Conclusions Iterative HFold is a robust method for prediction of pseudoknotted RNA secondary structures, whose accuracy with more than 5% information about true pseudoknot-free structures is better than that of IPknot, and with about 35% information about true pseudoknot-free structures compares well with that of HotKnots V2.0 while being significantly faster. Iterative HFold and all data used in this work are freely available at http://www.cs.ubc.ca/~hjabbari/software.php. PMID:24884954
Genomic prediction in a nuclear population of layers using single-step models.
Yan, Yiyuan; Wu, Guiqin; Liu, Aiqiao; Sun, Congjiao; Han, Wenpeng; Li, Guangqi; Yang, Ning
2018-02-01
Single-step genomic prediction method has been proposed to improve the accuracy of genomic prediction by incorporating information of both genotyped and ungenotyped animals. The objective of this study is to compare the prediction performance of single-step model with a 2-step models and the pedigree-based models in a nuclear population of layers. A total of 1,344 chickens across 4 generations were genotyped by a 600 K SNP chip. Four traits were analyzed, i.e., body weight at 28 wk (BW28), egg weight at 28 wk (EW28), laying rate at 38 wk (LR38), and Haugh unit at 36 wk (HU36). In predicting offsprings, individuals from generation 1 to 3 were used as training data and females from generation 4 were used as validation set. The accuracies of predicted breeding values by pedigree BLUP (PBLUP), genomic BLUP (GBLUP), SSGBLUP and single-step blending (SSBlending) were compared for both genotyped and ungenotyped individuals. For genotyped females, GBLUP performed no better than PBLUP because of the small size of training data, while the 2 single-step models predicted more accurately than the PBLUP model. The average predictive ability of SSGBLUP and SSBlending were 16.0% and 10.8% higher than the PBLUP model across traits, respectively. Furthermore, the predictive abilities for ungenotyped individuals were also enhanced. The average improvements of prediction abilities were 5.9% and 1.5% for SSGBLUP and SSBlending model, respectively. It was concluded that single-step models, especially the SSGBLUP model, can yield more accurate prediction of genetic merits and are preferable for practical implementation of genomic selection in layers. © 2017 Poultry Science Association Inc.
Powerful Voter Selection for Making Multistep Delegate Ballot Fair
NASA Astrophysics Data System (ADS)
Yamakawa, Hiroshi
For decision by majority, each voter often exercises his right by delegating to trustable other voters. Multi-step delegates rule allows indirect delegating through more than one voter, and this helps each voter finding his delegate voters. In this paper, we propose powerful voter selection method depending on the multi-step delegate rule. This method sequentially selects voters who is most delegated indirectly. Multi-agent simulation demonstrate that we can achieve highly fair poll results from small number of vote by using proposed method. Here, fairness is prediction accuracy to sum of all voters preferences for choices. In simulation, each voter selects choices arranged on one dimensional preference axis for voting. Acquaintance relationships among voters were generated as a random network, and each voter delegates some of his acquaintances who has similar preferences. We obtained simulation results from various acquaintance networks, and then averaged these results. Firstly, if each voter has enough acquaintances in average, proposed method can help predicting sum of all voters' preferences of choices from small number of vote. Secondly, if the number of each voter's acquaintances increases corresponding to an increase in the number of voters, prediction accuracy (fairness) from small number of vote can be kept in appropriate level.
Ellis, Katherine; Godbole, Suneeta; Marshall, Simon; Lanckriet, Gert; Staudenmayer, John; Kerr, Jacqueline
2014-01-01
Active travel is an important area in physical activity research, but objective measurement of active travel is still difficult. Automated methods to measure travel behaviors will improve research in this area. In this paper, we present a supervised machine learning method for transportation mode prediction from global positioning system (GPS) and accelerometer data. We collected a dataset of about 150 h of GPS and accelerometer data from two research assistants following a protocol of prescribed trips consisting of five activities: bicycling, riding in a vehicle, walking, sitting, and standing. We extracted 49 features from 1-min windows of this data. We compared the performance of several machine learning algorithms and chose a random forest algorithm to classify the transportation mode. We used a moving average output filter to smooth the output predictions over time. The random forest algorithm achieved 89.8% cross-validated accuracy on this dataset. Adding the moving average filter to smooth output predictions increased the cross-validated accuracy to 91.9%. Machine learning methods are a viable approach for automating measurement of active travel, particularly for measuring travel activities that traditional accelerometer data processing methods misclassify, such as bicycling and vehicle travel.
Serrancolí, Gil; Kinney, Allison L.; Fregly, Benjamin J.; Font-Llagunes, Josep M.
2016-01-01
Though walking impairments are prevalent in society, clinical treatments are often ineffective at restoring lost function. For this reason, researchers have begun to explore the use of patient-specific computational walking models to develop more effective treatments. However, the accuracy with which models can predict internal body forces in muscles and across joints depends on how well relevant model parameter values can be calibrated for the patient. This study investigated how knowledge of internal knee contact forces affects calibration of neuromusculoskeletal model parameter values and subsequent prediction of internal knee contact and leg muscle forces during walking. Model calibration was performed using a novel two-level optimization procedure applied to six normal walking trials from the Fourth Grand Challenge Competition to Predict In Vivo Knee Loads. The outer-level optimization adjusted time-invariant model parameter values to minimize passive muscle forces, reserve actuator moments, and model parameter value changes with (Approach A) and without (Approach B) tracking of experimental knee contact forces. Using the current guess for model parameter values but no knee contact force information, the inner-level optimization predicted time-varying muscle activations that were close to experimental muscle synergy patterns and consistent with the experimental inverse dynamic loads (both approaches). For all the six gait trials, Approach A predicted knee contact forces with high accuracy for both compartments (average correlation coefficient r = 0.99 and root mean square error (RMSE) = 52.6 N medial; average r = 0.95 and RMSE = 56.6 N lateral). In contrast, Approach B overpredicted contact force magnitude for both compartments (average RMSE = 323 N medial and 348 N lateral) and poorly matched contact force shape for the lateral compartment (average r = 0.90 medial and −0.10 lateral). Approach B had statistically higher lateral muscle forces and lateral optimal muscle fiber lengths but lower medial, central, and lateral normalized muscle fiber lengths compared to Approach A. These findings suggest that poorly calibrated model parameter values may be a major factor limiting the ability of neuromusculoskeletal models to predict knee contact and leg muscle forces accurately for walking. PMID:27210105
Matsuba, Shinji; Tabuchi, Hitoshi; Ohsugi, Hideharu; Enno, Hiroki; Ishitobi, Naofumi; Masumoto, Hiroki; Kiuchi, Yoshiaki
2018-05-09
To predict exudative age-related macular degeneration (AMD), we combined a deep convolutional neural network (DCNN), a machine-learning algorithm, with Optos, an ultra-wide-field fundus imaging system. First, to evaluate the diagnostic accuracy of DCNN, 364 photographic images (AMD: 137) were amplified and the area under the curve (AUC), sensitivity and specificity were examined. Furthermore, in order to compare the diagnostic abilities between DCNN and six ophthalmologists, we prepared yield 84 sheets comprising 50% of normal and wet-AMD data each, and calculated the correct answer rate, specificity, sensitivity, and response times. DCNN exhibited 100% sensitivity and 97.31% specificity for wet-AMD images, with an average AUC of 99.76%. Moreover, comparing the diagnostic abilities of DCNN versus six ophthalmologists, the average accuracy of the DCNN was 100%. On the other hand, the accuracy of ophthalmologists, determined only by Optos images without a fundus examination, was 81.9%. A combination of DCNN with Optos images is not better than a medical examination; however, it can identify exudative AMD with a high level of accuracy. Our system is considered useful for screening and telemedicine.
Bayesian averaging over Decision Tree models for trauma severity scoring.
Schetinin, V; Jakaite, L; Krzanowski, W
2018-01-01
Health care practitioners analyse possible risks of misleading decisions and need to estimate and quantify uncertainty in predictions. We have examined the "gold" standard of screening a patient's conditions for predicting survival probability, based on logistic regression modelling, which is used in trauma care for clinical purposes and quality audit. This methodology is based on theoretical assumptions about data and uncertainties. Models induced within such an approach have exposed a number of problems, providing unexplained fluctuation of predicted survival and low accuracy of estimating uncertainty intervals within which predictions are made. Bayesian method, which in theory is capable of providing accurate predictions and uncertainty estimates, has been adopted in our study using Decision Tree models. Our approach has been tested on a large set of patients registered in the US National Trauma Data Bank and has outperformed the standard method in terms of prediction accuracy, thereby providing practitioners with accurate estimates of the predictive posterior densities of interest that are required for making risk-aware decisions. Copyright © 2017 Elsevier B.V. All rights reserved.
Internal Medicine Residents Do Not Accurately Assess Their Medical Knowledge
ERIC Educational Resources Information Center
Jones, Roger; Panda, Mukta; Desbiens, Norman
2008-01-01
Background: Medical knowledge is essential for appropriate patient care; however, the accuracy of internal medicine (IM) residents' assessment of their medical knowledge is unknown. Methods: IM residents predicted their overall percentile performance 1 week (on average) before and after taking the in-training exam (ITE), an objective and well…
2017-01-01
The accurate prediction of protein chemical shifts using a quantum mechanics (QM)-based method has been the subject of intense research for more than 20 years but so far empirical methods for chemical shift prediction have proven more accurate. In this paper we show that a QM-based predictor of a protein backbone and CB chemical shifts (ProCS15, PeerJ, 2016, 3, e1344) is of comparable accuracy to empirical chemical shift predictors after chemical shift-based structural refinement that removes small structural errors. We present a method by which quantum chemistry based predictions of isotropic chemical shielding values (ProCS15) can be used to refine protein structures using Markov Chain Monte Carlo (MCMC) simulations, relating the chemical shielding values to the experimental chemical shifts probabilistically. Two kinds of MCMC structural refinement simulations were performed using force field geometry optimized X-ray structures as starting points: simulated annealing of the starting structure and constant temperature MCMC simulation followed by simulated annealing of a representative ensemble structure. Annealing of the CHARMM structure changes the CA-RMSD by an average of 0.4 Å but lowers the chemical shift RMSD by 1.0 and 0.7 ppm for CA and N. Conformational averaging has a relatively small effect (0.1–0.2 ppm) on the overall agreement with carbon chemical shifts but lowers the error for nitrogen chemical shifts by 0.4 ppm. If an amino acid specific offset is included the ProCS15 predicted chemical shifts have RMSD values relative to experiments that are comparable to popular empirical chemical shift predictors. The annealed representative ensemble structures differ in CA-RMSD relative to the initial structures by an average of 2.0 Å, with >2.0 Å difference for six proteins. In four of the cases, the largest structural differences arise in structurally flexible regions of the protein as determined by NMR, and in the remaining two cases, the large structural change may be due to force field deficiencies. The overall accuracy of the empirical methods are slightly improved by annealing the CHARMM structure with ProCS15, which may suggest that the minor structural changes introduced by ProCS15-based annealing improves the accuracy of the protein structures. Having established that QM-based chemical shift prediction can deliver the same accuracy as empirical shift predictors we hope this can help increase the accuracy of related approaches such as QM/MM or linear scaling approaches or interpreting protein structural dynamics from QM-derived chemical shift. PMID:28451325
Modelling personality, plasticity and predictability in shelter dogs
2017-01-01
Behavioural assessments of shelter dogs (Canis lupus familiaris) typically comprise standardized test batteries conducted at one time point, but test batteries have shown inconsistent predictive validity. Longitudinal behavioural assessments offer an alternative. We modelled longitudinal observational data on shelter dog behaviour using the framework of behavioural reaction norms, partitioning variance into personality (i.e. inter-individual differences in behaviour), plasticity (i.e. inter-individual differences in average behaviour) and predictability (i.e. individual differences in residual intra-individual variation). We analysed data on interactions of 3263 dogs (n = 19 281) with unfamiliar people during their first month after arrival at the shelter. Accounting for personality, plasticity (linear and quadratic trends) and predictability improved the predictive accuracy of the analyses compared to models quantifying personality and/or plasticity only. While dogs were, on average, highly sociable with unfamiliar people and sociability increased over days since arrival, group averages were unrepresentative of all dogs and predictions made at the individual level entailed considerable uncertainty. Effects of demographic variables (e.g. age) on personality, plasticity and predictability were observed. Behavioural repeatability was higher one week after arrival compared to arrival day. Our results highlight the value of longitudinal assessments on shelter dogs and identify measures that could improve the predictive validity of behavioural assessments in shelters. PMID:28989764
Bayesian model aggregation for ensemble-based estimates of protein pKa values
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gosink, Luke J.; Hogan, Emilie A.; Pulsipher, Trenton C.
2014-03-01
This paper investigates an ensemble-based technique called Bayesian Model Averaging (BMA) to improve the performance of protein amino acid pmore » $$K_a$$ predictions. Structure-based p$$K_a$$ calculations play an important role in the mechanistic interpretation of protein structure and are also used to determine a wide range of protein properties. A diverse set of methods currently exist for p$$K_a$$ prediction, ranging from empirical statistical models to {\\it ab initio} quantum mechanical approaches. However, each of these methods are based on a set of assumptions that have inherent bias and sensitivities that can effect a model's accuracy and generalizability for p$$K_a$$ prediction in complicated biomolecular systems. We use BMA to combine eleven diverse prediction methods that each estimate pKa values of amino acids in staphylococcal nuclease. These methods are based on work conducted for the pKa Cooperative and the pKa measurements are based on experimental work conducted by the Garc{\\'i}a-Moreno lab. Our study demonstrates that the aggregated estimate obtained from BMA outperforms all individual prediction methods in our cross-validation study with improvements from 40-70\\% over other method classes. This work illustrates a new possible mechanism for improving the accuracy of p$$K_a$$ prediction and lays the foundation for future work on aggregate models that balance computational cost with prediction accuracy.« less
Chan, Johanna L; Lin, Li; Feiler, Michael; Wolf, Andrew I; Cardona, Diana M; Gellad, Ziad F
2012-11-07
To evaluate accuracy of in vivo diagnosis of adenomatous vs non-adenomatous polyps using i-SCAN digital chromoendoscopy compared with high-definition white light. This is a single-center comparative effectiveness pilot study. Polyps (n = 103) from 75 average-risk adult outpatients undergoing screening or surveillance colonoscopy between December 1, 2010 and April 1, 2011 were evaluated by two participating endoscopists in an academic outpatient endoscopy center. Polyps were evaluated both with high-definition white light and with i-SCAN to make an in vivo prediction of adenomatous vs non-adenomatous pathology. We determined diagnostic characteristics of i-SCAN and high-definition white light, including sensitivity, specificity, and accuracy, with regards to identifying adenomatous vs non-adenomatous polyps. Histopathologic diagnosis was the gold standard comparison. One hundred and three small polyps, detected from forty-three patients, were included in the analysis. The average size of the polyps evaluated in the analysis was 3.7 mm (SD 1.3 mm, range 2 mm to 8 mm). Formal histopathology revealed that 54/103 (52.4%) were adenomas, 26/103 (25.2%) were hyperplastic, and 23/103 (22.3%) were other diagnoses include "lymphoid aggregates", "non-specific colitis," and "no pathologic diagnosis." Overall, the combined accuracy of endoscopists for predicting adenomas was identical between i-SCAN (71.8%, 95%CI: 62.1%-80.3%) and high-definition white light (71.8%, 95%CI: 62.1%-80.3%). However, the accuracy of each endoscopist differed substantially, where endoscopist A demonstrated 63.0% overall accuracy (95%CI: 50.9%-74.0%) as compared with endoscopist B demonstrating 93.3% overall accuracy (95%CI: 77.9%-99.2%), irrespective of imaging modality. Neither endoscopist demonstrated a significant learning effect with i-SCAN during the study. Though endoscopist A increased accuracy using i-SCAN from 59% (95%CI: 42.1%-74.4%) in the first half to 67.6% (95%CI: 49.5%-82.6%) in the second half, and endoscopist B decreased accuracy using i-SCAN from 100% (95%CI: 80.5%-100.0%) in the first half to 84.6% (95%CI: 54.6%-98.1%) in the second half, neither of these differences were statistically significant. i-SCAN and high-definition white light had similar efficacy predicting polyp histology. Endoscopist training likely plays a critical role in diagnostic test characteristics and deserves further study.
Preciat Gonzalez, German A.; El Assal, Lemmer R. P.; Noronha, Alberto; ...
2017-06-14
The mechanism of each chemical reaction in a metabolic network can be represented as a set of atom mappings, each of which relates an atom in a substrate metabolite to an atom of the same element in a product metabolite. Genome-scale metabolic network reconstructions typically represent biochemistry at the level of reaction stoichiometry. However, a more detailed representation at the underlying level of atom mappings opens the possibility for a broader range of biological, biomedical and biotechnological applications than with stoichiometry alone. Complete manual acquisition of atom mapping data for a genome-scale metabolic network is a laborious process. However, manymore » algorithms exist to predict atom mappings. How do their predictions compare to each other and to manually curated atom mappings? For more than four thousand metabolic reactions in the latest human metabolic reconstruction, Recon 3D, we compared the atom mappings predicted by six atom mapping algorithms. We also compared these predictions to those obtained by manual curation of atom mappings for over five hundred reactions distributed among all top level Enzyme Commission number classes. Five of the evaluated algorithms had similarly high prediction accuracy of over 91% when compared to manually curated atom mapped reactions. On average, the accuracy of the prediction was highest for reactions catalysed by oxidoreductases and lowest for reactions catalysed by ligases. In addition to prediction accuracy, the algorithms were evaluated on their accessibility, their advanced features, such as the ability to identify equivalent atoms, and their ability to map hydrogen atoms. In addition to prediction accuracy, we found that software accessibility and advanced features were fundamental to the selection of an atom mapping algorithm in practice.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Preciat Gonzalez, German A.; El Assal, Lemmer R. P.; Noronha, Alberto
The mechanism of each chemical reaction in a metabolic network can be represented as a set of atom mappings, each of which relates an atom in a substrate metabolite to an atom of the same element in a product metabolite. Genome-scale metabolic network reconstructions typically represent biochemistry at the level of reaction stoichiometry. However, a more detailed representation at the underlying level of atom mappings opens the possibility for a broader range of biological, biomedical and biotechnological applications than with stoichiometry alone. Complete manual acquisition of atom mapping data for a genome-scale metabolic network is a laborious process. However, manymore » algorithms exist to predict atom mappings. How do their predictions compare to each other and to manually curated atom mappings? For more than four thousand metabolic reactions in the latest human metabolic reconstruction, Recon 3D, we compared the atom mappings predicted by six atom mapping algorithms. We also compared these predictions to those obtained by manual curation of atom mappings for over five hundred reactions distributed among all top level Enzyme Commission number classes. Five of the evaluated algorithms had similarly high prediction accuracy of over 91% when compared to manually curated atom mapped reactions. On average, the accuracy of the prediction was highest for reactions catalysed by oxidoreductases and lowest for reactions catalysed by ligases. In addition to prediction accuracy, the algorithms were evaluated on their accessibility, their advanced features, such as the ability to identify equivalent atoms, and their ability to map hydrogen atoms. In addition to prediction accuracy, we found that software accessibility and advanced features were fundamental to the selection of an atom mapping algorithm in practice.« less
Preciat Gonzalez, German A; El Assal, Lemmer R P; Noronha, Alberto; Thiele, Ines; Haraldsdóttir, Hulda S; Fleming, Ronan M T
2017-06-14
The mechanism of each chemical reaction in a metabolic network can be represented as a set of atom mappings, each of which relates an atom in a substrate metabolite to an atom of the same element in a product metabolite. Genome-scale metabolic network reconstructions typically represent biochemistry at the level of reaction stoichiometry. However, a more detailed representation at the underlying level of atom mappings opens the possibility for a broader range of biological, biomedical and biotechnological applications than with stoichiometry alone. Complete manual acquisition of atom mapping data for a genome-scale metabolic network is a laborious process. However, many algorithms exist to predict atom mappings. How do their predictions compare to each other and to manually curated atom mappings? For more than four thousand metabolic reactions in the latest human metabolic reconstruction, Recon 3D, we compared the atom mappings predicted by six atom mapping algorithms. We also compared these predictions to those obtained by manual curation of atom mappings for over five hundred reactions distributed among all top level Enzyme Commission number classes. Five of the evaluated algorithms had similarly high prediction accuracy of over 91% when compared to manually curated atom mapped reactions. On average, the accuracy of the prediction was highest for reactions catalysed by oxidoreductases and lowest for reactions catalysed by ligases. In addition to prediction accuracy, the algorithms were evaluated on their accessibility, their advanced features, such as the ability to identify equivalent atoms, and their ability to map hydrogen atoms. In addition to prediction accuracy, we found that software accessibility and advanced features were fundamental to the selection of an atom mapping algorithm in practice.
Song, H; Li, L; Ma, P; Zhang, S; Su, G; Lund, M S; Zhang, Q; Ding, X
2018-06-01
This study investigated the efficiency of genomic prediction with adding the markers identified by genome-wide association study (GWAS) using a data set of imputed high-density (HD) markers from 54K markers in Chinese Holsteins. Among 3,056 Chinese Holsteins with imputed HD data, 2,401 individuals born before October 1, 2009, were used for GWAS and a reference population for genomic prediction, and the 220 younger cows were used as a validation population. In total, 1,403, 1,536, and 1,383 significant single nucleotide polymorphisms (SNP; false discovery rate at 0.05) associated with conformation final score, mammary system, and feet and legs were identified, respectively. About 2 to 3% genetic variance of 3 traits was explained by these significant SNP. Only a very small proportion of significant SNP identified by GWAS was included in the 54K marker panel. Three new marker sets (54K+) were herein produced by adding significant SNP obtained by linear mixed model for each trait into the 54K marker panel. Genomic breeding values were predicted using a Bayesian variable selection (BVS) model. The accuracies of genomic breeding value by BVS based on the 54K+ data were 2.0 to 5.2% higher than those based on the 54K data. The imputed HD markers yielded 1.4% higher accuracy on average (BVS) than the 54K data. Both the 54K+ and HD data generated lower bias of genomic prediction, and the 54K+ data yielded the lowest bias in all situations. Our results show that the imputed HD data were not very useful for improving the accuracy of genomic prediction and that adding the significant markers derived from the imputed HD marker panel could improve the accuracy of genomic prediction and decrease the bias of genomic prediction. Copyright © 2018 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Aboagye-Sarfo, Patrick; Mai, Qun; Sanfilippo, Frank M; Preen, David B; Stewart, Louise M; Fatovich, Daniel M
2015-10-01
To develop multivariate vector-ARMA (VARMA) forecast models for predicting emergency department (ED) demand in Western Australia (WA) and compare them to the benchmark univariate autoregressive moving average (ARMA) and Winters' models. Seven-year monthly WA state-wide public hospital ED presentation data from 2006/07 to 2012/13 were modelled. Graphical and VARMA modelling methods were used for descriptive analysis and model fitting. The VARMA models were compared to the benchmark univariate ARMA and Winters' models to determine their accuracy to predict ED demand. The best models were evaluated by using error correction methods for accuracy. Descriptive analysis of all the dependent variables showed an increasing pattern of ED use with seasonal trends over time. The VARMA models provided a more precise and accurate forecast with smaller confidence intervals and better measures of accuracy in predicting ED demand in WA than the ARMA and Winters' method. VARMA models are a reliable forecasting method to predict ED demand for strategic planning and resource allocation. While the ARMA models are a closely competing alternative, they under-estimated future ED demand. Copyright © 2015 Elsevier Inc. All rights reserved.
2014-01-01
Background Although the X chromosome is the second largest bovine chromosome, markers on the X chromosome are not used for genomic prediction in some countries and populations. In this study, we presented a method for computing genomic relationships using X chromosome markers, investigated the accuracy of imputation from a low density (7K) to the 54K SNP (single nucleotide polymorphism) panel, and compared the accuracy of genomic prediction with and without using X chromosome markers. Methods The impact of considering X chromosome markers on prediction accuracy was assessed using data from Nordic Holstein bulls and different sets of SNPs: (a) the 54K SNPs for reference and test animals, (b) SNPs imputed from the 7K to the 54K SNP panel for test animals, (c) SNPs imputed from the 7K to the 54K panel for half of the reference animals, and (d) the 7K SNP panel for all animals. Beagle and Findhap were used for imputation. GBLUP (genomic best linear unbiased prediction) models with or without X chromosome markers and with or without a residual polygenic effect were used to predict genomic breeding values for 15 traits. Results Averaged over the two imputation datasets, correlation coefficients between imputed and true genotypes for autosomal markers, pseudo-autosomal markers, and X-specific markers were 0.971, 0.831 and 0.935 when using Findhap, and 0.983, 0.856 and 0.937 when using Beagle. Estimated reliabilities of genomic predictions based on the imputed datasets using Findhap or Beagle were very close to those using the real 54K data. Genomic prediction using all markers gave slightly higher reliabilities than predictions without X chromosome markers. Based on our data which included only bulls, using a G matrix that accounted for sex-linked relationships did not improve prediction, compared with a G matrix that did not account for sex-linked relationships. A model that included a polygenic effect did not recover the loss of prediction accuracy from exclusion of X chromosome markers. Conclusions The results from this study suggest that markers on the X chromosome contribute to accuracy of genomic predictions and should be used for routine genomic evaluation. PMID:25080199
A compound reconstructed prediction model for nonstationary climate processes
NASA Astrophysics Data System (ADS)
Wang, Geli; Yang, Peicai
2005-07-01
Based on the idea of climate hierarchy and the theory of state space reconstruction, a local approximation prediction model with the compound structure is built for predicting some nonstationary climate process. By means of this model and the data sets consisting of north Indian Ocean sea-surface temperature, Asian zonal circulation index and monthly mean precipitation anomaly from 37 observation stations in the Inner Mongolia area of China (IMC), a regional prediction experiment for the winter precipitation of IMC is also carried out. When using the same sign ratio R between the prediction field and the actual field to measure the prediction accuracy, an averaged R of 63% given by 10 predictions samples is reached.
Malacarne, D; Pesenti, R; Paolucci, M; Parodi, S
1993-01-01
For a database of 826 chemicals tested for carcinogenicity, we fragmented the structural formula of the chemicals into all possible contiguous-atom fragments with size between two and eight (nonhydrogen) atoms. The fragmentation was obtained using a new software program based on graph theory. We used 80% of the chemicals as a training set and 20% as a test set. The two sets were obtained by random sorting. From the training sets, an average (8 computer runs with independently sorted chemicals) of 315 different fragments were significantly (p < 0.125) associated with carcinogenicity or lack thereof. Even using this relatively low level of statistical significance, 23% of the molecules of the test sets lacked significant fragments. For 77% of the molecules of the test sets, we used the presence of significant fragments to predict carcinogenicity. The average level of accuracy of the predictions in the test sets was 67.5%. Chemicals containing only positive fragments were predicted with an accuracy of 78.7%. The level of accuracy was around 60% for chemicals characterized by contradictory fragments or only negative fragments. In a parallel manner, we performed eight paired runs in which carcinogenicity was attributed randomly to the molecules of the training sets. The fragments generated by these pseudo-training sets were devoid of any predictivity in the corresponding test sets. Using an independent software program, we confirmed (for the complex biological endpoint of carcinogenicity) the validity of a structure-activity relationship approach of the type proposed by Klopman and Rosenkranz with their CASE program. Images Figure 1. Figure 2. Figure 3. Figure 4. Figure 5. Figure 6. PMID:8275991
Predicting Football Matches Results using Bayesian Networks for English Premier League (EPL)
NASA Astrophysics Data System (ADS)
Razali, Nazim; Mustapha, Aida; Yatim, Faiz Ahmad; Aziz, Ruhaya Ab
2017-08-01
The issues of modeling asscoiation football prediction model has become increasingly popular in the last few years and many different approaches of prediction models have been proposed with the point of evaluating the attributes that lead a football team to lose, draw or win the match. There are three types of approaches has been considered for predicting football matches results which include statistical approaches, machine learning approaches and Bayesian approaches. Lately, many studies regarding football prediction models has been produced using Bayesian approaches. This paper proposes a Bayesian Networks (BNs) to predict the results of football matches in term of home win (H), away win (A) and draw (D). The English Premier League (EPL) for three seasons of 2010-2011, 2011-2012 and 2012-2013 has been selected and reviewed. K-fold cross validation has been used for testing the accuracy of prediction model. The required information about the football data is sourced from a legitimate site at http://www.football-data.co.uk. BNs achieved predictive accuracy of 75.09% in average across three seasons. It is hoped that the results could be used as the benchmark output for future research in predicting football matches results.
In vivo dose measurement using TLDs and MOSFET dosimeters for cardiac radiosurgery.
Gardner, Edward A; Sumanaweera, Thilaka S; Blanck, Oliver; Iwamura, Alyson K; Steel, James P; Dieterich, Sonja; Maguire, Patrick
2012-05-10
In vivo measurements were made of the dose delivered to animal models in an effort to develop a method for treating cardiac arrhythmia using radiation. This treatment would replace RF energy (currently used to create cardiac scar) with ionizing radiation. In the current study, the pulmonary vein ostia of animal models were irradiated with 6 MV X-rays in order to produce a scar that would block aberrant signals characteristic of atrial fibrillation. The CyberKnife radiosurgery system was used to deliver planned treatments of 20-35 Gy in a single fraction to four animals. The Synchrony system was used to track respiratory motion of the heart, while the contractile motion of the heart was untracked. The dose was measured on the epicardial surface near the right pulmonary vein and on the esophagus using surgically implanted TLD dosimeters, or in the coronary sinus using a MOSFET dosimeter placed using a catheter. The doses measured on the epicardium with TLDs averaged 5% less than predicted for those locations, while doses measured in the coronary sinus with the MOSFET sensor nearest the target averaged 6% less than the predicted dose. The measurements on the esophagus averaged 25% less than predicted. These results provide an indication of the accuracy with which the treatment planning methods accounted for the motion of the target, with its respiratory and cardiac components. This is the first report on the accuracy of CyberKnife dose delivery to cardiac targets.
NASA Astrophysics Data System (ADS)
Krasnoshchekov, Sergey V.; Schutski, Roman S.; Craig, Norman C.; Sibaev, Marat; Crittenden, Deborah L.
2018-02-01
Three dihalogenated methane derivatives (CH2F2, CH2FCl, and CH2Cl2) were used as model systems to compare and assess the accuracy of two different approaches for predicting observed fundamental frequencies: canonical operator Van Vleck vibrational perturbation theory (CVPT) and vibrational configuration interaction (VCI). For convenience and consistency, both methods employ the Watson Hamiltonian in rectilinear normal coordinates, expanding the potential energy surface (PES) as a Taylor series about equilibrium and constructing the wavefunction from a harmonic oscillator product basis. At the highest levels of theory considered here, fourth-order CVPT and VCI in a harmonic oscillator basis with up to 10 quanta of vibrational excitation in conjunction with a 4-mode representation sextic force field (SFF-4MR) computed at MP2/cc-pVTZ with replacement CCSD(T)/aug-cc-pVQZ harmonic force constants, the agreement between computed fundamentals is closer to 0.3 cm-1 on average, with a maximum difference of 1.7 cm-1. The major remaining accuracy-limiting factors are the accuracy of the underlying electronic structure model, followed by the incompleteness of the PES expansion. Nonetheless, computed and experimental fundamentals agree to within 5 cm-1, with an average difference of 2 cm-1, confirming the utility and accuracy of both theoretical models. One exception to this rule is the formally IR-inactive but weakly allowed through Coriolis-coupling H-C-H out-of-plane twisting mode of dichloromethane, whose spectrum we therefore revisit and reassign. We also investigate convergence with respect to order of CVPT, VCI excitation level, and order of PES expansion, concluding that premature truncation substantially decreases accuracy, although VCI(6)/SFF-4MR results are still of acceptable accuracy, and some error cancellation is observed with CVPT2 using a quartic force field.
CD-Based Indices for Link Prediction in Complex Network.
Wang, Tao; Wang, Hongjue; Wang, Xiaoxia
2016-01-01
Lots of similarity-based algorithms have been designed to deal with the problem of link prediction in the past decade. In order to improve prediction accuracy, a novel cosine similarity index CD based on distance between nodes and cosine value between vectors is proposed in this paper. Firstly, node coordinate matrix can be obtained by node distances which are different from distance matrix and row vectors of the matrix are regarded as coordinates of nodes. Then, cosine value between node coordinates is used as their similarity index. A local community density index LD is also proposed. Then, a series of CD-based indices include CD-LD-k, CD*LD-k, CD-k and CDI are presented and applied in ten real networks. Experimental results demonstrate the effectiveness of CD-based indices. The effects of network clustering coefficient and assortative coefficient on prediction accuracy of indices are analyzed. CD-LD-k and CD*LD-k can improve prediction accuracy without considering the assortative coefficient of network is negative or positive. According to analysis of relative precision of each method on each network, CD-LD-k and CD*LD-k indices have excellent average performance and robustness. CD and CD-k indices perform better on positive assortative networks than on negative assortative networks. For negative assortative networks, we improve and refine CD index, referred as CDI index, combining the advantages of CD index and evolutionary mechanism of the network model BA. Experimental results reveal that CDI index can increase prediction accuracy of CD on negative assortative networks.
CD-Based Indices for Link Prediction in Complex Network
Wang, Tao; Wang, Hongjue; Wang, Xiaoxia
2016-01-01
Lots of similarity-based algorithms have been designed to deal with the problem of link prediction in the past decade. In order to improve prediction accuracy, a novel cosine similarity index CD based on distance between nodes and cosine value between vectors is proposed in this paper. Firstly, node coordinate matrix can be obtained by node distances which are different from distance matrix and row vectors of the matrix are regarded as coordinates of nodes. Then, cosine value between node coordinates is used as their similarity index. A local community density index LD is also proposed. Then, a series of CD-based indices include CD-LD-k, CD*LD-k, CD-k and CDI are presented and applied in ten real networks. Experimental results demonstrate the effectiveness of CD-based indices. The effects of network clustering coefficient and assortative coefficient on prediction accuracy of indices are analyzed. CD-LD-k and CD*LD-k can improve prediction accuracy without considering the assortative coefficient of network is negative or positive. According to analysis of relative precision of each method on each network, CD-LD-k and CD*LD-k indices have excellent average performance and robustness. CD and CD-k indices perform better on positive assortative networks than on negative assortative networks. For negative assortative networks, we improve and refine CD index, referred as CDI index, combining the advantages of CD index and evolutionary mechanism of the network model BA. Experimental results reveal that CDI index can increase prediction accuracy of CD on negative assortative networks. PMID:26752405
Xiaopeng, Q I; Liang, Wei; Barker, Laurie; Lekiachvili, Akaki; Xingyou, Zhang
Temperature changes are known to have significant impacts on human health. Accurate estimates of population-weighted average monthly air temperature for US counties are needed to evaluate temperature's association with health behaviours and disease, which are sampled or reported at the county level and measured on a monthly-or 30-day-basis. Most reported temperature estimates were calculated using ArcGIS, relatively few used SAS. We compared the performance of geostatistical models to estimate population-weighted average temperature in each month for counties in 48 states using ArcGIS v9.3 and SAS v 9.2 on a CITGO platform. Monthly average temperature for Jan-Dec 2007 and elevation from 5435 weather stations were used to estimate the temperature at county population centroids. County estimates were produced with elevation as a covariate. Performance of models was assessed by comparing adjusted R 2 , mean squared error, root mean squared error, and processing time. Prediction accuracy for split validation was above 90% for 11 months in ArcGIS and all 12 months in SAS. Cokriging in SAS achieved higher prediction accuracy and lower estimation bias as compared to cokriging in ArcGIS. County-level estimates produced by both packages were positively correlated (adjusted R 2 range=0.95 to 0.99); accuracy and precision improved with elevation as a covariate. Both methods from ArcGIS and SAS are reliable for U.S. county-level temperature estimates; However, ArcGIS's merits in spatial data pre-processing and processing time may be important considerations for software selection, especially for multi-year or multi-state projects.
Tan, Cheng; Wu, Zhenfang; Ren, Jiangli; Huang, Zhuolin; Liu, Dewu; He, Xiaoyan; Prakapenka, Dzianis; Zhang, Ran; Li, Ning; Da, Yang; Hu, Xiaoxiang
2017-03-29
The number of teats in pigs is related to a sow's ability to rear piglets to weaning age. Several studies have identified genes and genomic regions that affect teat number in swine but few common results were reported. The objective of this study was to identify genetic factors that affect teat number in pigs, evaluate the accuracy of genomic prediction, and evaluate the contribution of significant genes and genomic regions to genomic broad-sense heritability and prediction accuracy using 41,108 autosomal single nucleotide polymorphisms (SNPs) from genotyping-by-sequencing on 2936 Duroc boars. Narrow-sense heritability and dominance heritability of teat number estimated by genomic restricted maximum likelihood were 0.365 ± 0.030 and 0.035 ± 0.019, respectively. The accuracy of genomic predictions, calculated as the average correlation between the genomic best linear unbiased prediction and phenotype in a tenfold validation study, was 0.437 ± 0.064 for the model with additive and dominance effects and 0.435 ± 0.064 for the model with additive effects only. Genome-wide association studies (GWAS) using three methods of analysis identified 85 significant SNP effects for teat number on chromosomes 1, 6, 7, 10, 11, 12 and 14. The region between 102.9 and 106.0 Mb on chromosome 7, which was reported in several studies, had the most significant SNP effects in or near the PTGR2, FAM161B, LIN52, VRTN, FCF1, AREL1 and LRRC74A genes. This region accounted for 10.0% of the genomic additive heritability and 8.0% of the accuracy of prediction. The second most significant chromosome region not reported by previous GWAS was the region between 77.7 and 79.7 Mb on chromosome 11, where SNPs in the FGF14 gene had the most significant effect and accounted for 5.1% of the genomic additive heritability and 5.2% of the accuracy of prediction. The 85 significant SNPs accounted for 28.5 to 28.8% of the genomic additive heritability and 35.8 to 36.8% of the accuracy of prediction. The three methods used for the GWAS identified 85 significant SNPs with additive effects on teat number, including SNPs in a previously reported chromosomal region and SNPs in novel chromosomal regions. Most significant SNPs with larger estimated effects also had larger contributions to the total genomic heritability and accuracy of prediction than other SNPs.
A state-based probabilistic model for tumor respiratory motion prediction
NASA Astrophysics Data System (ADS)
Kalet, Alan; Sandison, George; Wu, Huanmei; Schmitz, Ruth
2010-12-01
This work proposes a new probabilistic mathematical model for predicting tumor motion and position based on a finite state representation using the natural breathing states of exhale, inhale and end of exhale. Tumor motion was broken down into linear breathing states and sequences of states. Breathing state sequences and the observables representing those sequences were analyzed using a hidden Markov model (HMM) to predict the future sequences and new observables. Velocities and other parameters were clustered using a k-means clustering algorithm to associate each state with a set of observables such that a prediction of state also enables a prediction of tumor velocity. A time average model with predictions based on average past state lengths was also computed. State sequences which are known a priori to fit the data were fed into the HMM algorithm to set a theoretical limit of the predictive power of the model. The effectiveness of the presented probabilistic model has been evaluated for gated radiation therapy based on previously tracked tumor motion in four lung cancer patients. Positional prediction accuracy is compared with actual position in terms of the overall RMS errors. Various system delays, ranging from 33 to 1000 ms, were tested. Previous studies have shown duty cycles for latencies of 33 and 200 ms at around 90% and 80%, respectively, for linear, no prediction, Kalman filter and ANN methods as averaged over multiple patients. At 1000 ms, the previously reported duty cycles range from approximately 62% (ANN) down to 34% (no prediction). Average duty cycle for the HMM method was found to be 100% and 91 ± 3% for 33 and 200 ms latency and around 40% for 1000 ms latency in three out of four breathing motion traces. RMS errors were found to be lower than linear and no prediction methods at latencies of 1000 ms. The results show that for system latencies longer than 400 ms, the time average HMM prediction outperforms linear, no prediction, and the more general HMM-type predictive models. RMS errors for the time average model approach the theoretical limit of the HMM, and predicted state sequences are well correlated with sequences known to fit the data.
Comparison of Newer IOL Power Calculation Methods for Eyes With Previous Radial Keratotomy
Ma, Jack X.; Tang, Maolong; Wang, Li; Weikert, Mitchell P.; Huang, David; Koch, Douglas D.
2016-01-01
Purpose To evaluate the accuracy of the optical coherence tomography–based (OCT formula) and Barrett True K (True K) intraocular lens (IOL) calculation formulas in eyes with previous radial keratotomy (RK). Methods In 95 eyes of 65 patients, using the actual refraction following cataract surgery as target refraction, the predicted IOL power for each method was calculated. The IOL prediction error (PE) was obtained by subtracting the predicted IOL power from the implanted IOL power. The arithmetic IOL PE and median refractive PE were calculated and compared. Results All formulas except the True K produced hyperopic IOL PEs at 1 month, which decreased at ≥4 months (all P < 0.05). For the double-K Holladay 1, OCT formula, True K, and average of these three formulas (Average), the median absolute refractive PEs were, respectively, 0.78 diopters (D), 0.74 D, 0.60 D, and 0.59 D at 1 month; 0.69 D, 0.77 D, 0.77 D, and 0.61 D at 2 to 3 months; and 0.34 D, 0.65 D, 0.69 D, and 0.46 D at ≥4 months. The Average produced significantly smaller refractive PE than did the double-K Holladay 1 at 1 month (P < 0.05). There were no significant differences in refractive PEs among formulas at 4 months. Conclusions The OCT formula and True K were comparable to the double-K Holladay 1 method on the ASCRS (American Society of Cataract and Refractive Surgery) calculator. The Average IOL power on the ASCRS calculator may be considered when selecting the IOL power. Further improvements in the accuracy of IOL power calculation in RK eyes are desirable. PMID:27409468
Kim, Scott Y H
2014-04-01
The Patient Preference Predictor (PPP) proposal places a high priority on the accuracy of predicting patients' preferences and finds the performance of surrogates inadequate. However, the quest to develop a highly accurate, individualized statistical model has significant obstacles. First, it will be impossible to validate the PPP beyond the limit imposed by 60%-80% reliability of people's preferences for future medical decisions--a figure no better than the known average accuracy of surrogates. Second, evidence supports the view that a sizable minority of persons may not even have preferences to predict. Third, many, perhaps most, people express their autonomy just as much by entrusting their loved ones to exercise their judgment than by desiring to specifically control future decisions. Surrogate decision making faces none of these issues and, in fact, it may be more efficient, accurate, and authoritative than is commonly assumed.
Energy prediction equations are inadequate for obese Hispanic youth.
Klein, Catherine J; Villavicencio, Stephan A; Schweitzer, Amy; Bethepu, Joel S; Hoffman, Heather J; Mirza, Nazrat M
2011-08-01
Assessing energy requirements is a fundamental activity in clinical dietetics practice. A study was designed to determine whether published linear regression equations were accurate for predicting resting energy expenditure (REE) in fasted Hispanic children with obesity (aged 7 to 15 years). REE was measured using indirect calorimetry; body composition was estimated with whole-body air displacement plethysmography. REE was predicted using four equations: Institute of Medicine for healthy-weight children (IOM-HW), IOM for overweight and obese children (IOM-OS), Harris-Benedict, and Schofield. Accuracy of the prediction was calculated as the absolute value of the difference between the measured and predicted REE divided by the measured REE, expressed as a percentage. Predicted values within 85% to 115% of measured were defined as accurate. Participants (n=58; 53% boys) were mean age 11.8±2.1 years, had 43.5%±5.1% body fat, and had a body mass index of 31.5±5.8 (98.6±1.1 body mass index percentile). Measured REE was 2,339±680 kcal/day; predicted REE was 1,815±401 kcal/day (IOM-HW), 1,794±311 kcal/day (IOM-OS), 1,151±300 kcal/day (Harris-Benedict), and, 1,771±316 kcal/day (Schofield). Measured REE adjusted for body weight averaged 32.0±8.4 kcal/kg/day (95% confidence interval 29.8 to 34.2). Published equations predicted REE within 15% accuracy for only 36% to 40% of 58 participants, except for Harris-Benedict, which did not achieve accuracy for any participant. The most frequently accurate values were obtained using IOM-HW, which predicted REE within 15% accuracy for 55% (17/31) of boys. Published equations did not accurately predict REE for youth in the study sample. Further studies are warranted to formulate accurate energy prediction equations for this population. Copyright © 2011 American Dietetic Association. Published by Elsevier Inc. All rights reserved.
Jin, H; Wu, S; Vidyanti, I; Di Capua, P; Wu, B
2015-01-01
This article is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare". Depression is a common and often undiagnosed condition for patients with diabetes. It is also a condition that significantly impacts healthcare outcomes, use, and cost as well as elevating suicide risk. Therefore, a model to predict depression among diabetes patients is a promising and valuable tool for providers to proactively assess depressive symptoms and identify those with depression. This study seeks to develop a generalized multilevel regression model, using a longitudinal data set from a recent large-scale clinical trial, to predict depression severity and presence of major depression among patients with diabetes. Severity of depression was measured by the Patient Health Questionnaire PHQ-9 score. Predictors were selected from 29 candidate factors to develop a 2-level Poisson regression model that can make population-average predictions for all patients and subject-specific predictions for individual patients with historical records. Newly obtained patient records can be incorporated with historical records to update the prediction model. Root-mean-square errors (RMSE) were used to evaluate predictive accuracy of PHQ-9 scores. The study also evaluated the classification ability of using the predicted PHQ-9 scores to classify patients as having major depression. Two time-invariant and 10 time-varying predictors were selected for the model. Incorporating historical records and using them to update the model may improve both predictive accuracy of PHQ-9 scores and classification ability of the predicted scores. Subject-specific predictions (for individual patients with historical records) achieved RMSE about 4 and areas under the receiver operating characteristic (ROC) curve about 0.9 and are better than population-average predictions. The study developed a generalized multilevel regression model to predict depression and demonstrated that using generalized multilevel regression based on longitudinal patient records can achieve high predictive ability.
GPS Satellite Orbit Prediction at User End for Real-Time PPP System.
Yang, Hongzhou; Gao, Yang
2017-08-30
This paper proposed the high-precision satellite orbit prediction process at the user end for the real-time precise point positioning (PPP) system. Firstly, the structure of a new real-time PPP system will be briefly introduced in the paper. Then, the generation of satellite initial parameters (IP) at the sever end will be discussed, which includes the satellite position, velocity, and the solar radiation pressure (SRP) parameters for each satellite. After that, the method for orbit prediction at the user end, with dynamic models including the Earth's gravitational force, lunar gravitational force, solar gravitational force, and the SRP, are presented. For numerical integration, both the single-step Runge-Kutta and multi-step Adams-Bashforth-Moulton integrator methods are implemented. Then, the comparison between the predicted orbit and the international global navigation satellite system (GNSS) service (IGS) final products are carried out. The results show that the prediction accuracy can be maintained for several hours, and the average prediction error of the 31 satellites are 0.031, 0.032, and 0.033 m for the radial, along-track and cross-track directions over 12 h, respectively. Finally, the PPP in both static and kinematic modes are carried out to verify the accuracy of the predicted satellite orbit. The average root mean square error (RMSE) for the static PPP of the 32 globally distributed IGS stations are 0.012, 0.015, and 0.021 m for the north, east, and vertical directions, respectively; while the RMSE of the kinematic PPP with the predicted orbit are 0.031, 0.069, and 0.167 m in the north, east and vertical directions, respectively.
Sampling design optimisation for rainfall prediction using a non-stationary geostatistical model
NASA Astrophysics Data System (ADS)
Wadoux, Alexandre M. J.-C.; Brus, Dick J.; Rico-Ramirez, Miguel A.; Heuvelink, Gerard B. M.
2017-09-01
The accuracy of spatial predictions of rainfall by merging rain-gauge and radar data is partly determined by the sampling design of the rain-gauge network. Optimising the locations of the rain-gauges may increase the accuracy of the predictions. Existing spatial sampling design optimisation methods are based on minimisation of the spatially averaged prediction error variance under the assumption of intrinsic stationarity. Over the past years, substantial progress has been made to deal with non-stationary spatial processes in kriging. Various well-documented geostatistical models relax the assumption of stationarity in the mean, while recent studies show the importance of considering non-stationarity in the variance for environmental processes occurring in complex landscapes. We optimised the sampling locations of rain-gauges using an extension of the Kriging with External Drift (KED) model for prediction of rainfall fields. The model incorporates both non-stationarity in the mean and in the variance, which are modelled as functions of external covariates such as radar imagery, distance to radar station and radar beam blockage. Spatial predictions are made repeatedly over time, each time recalibrating the model. The space-time averaged KED variance was minimised by Spatial Simulated Annealing (SSA). The methodology was tested using a case study predicting daily rainfall in the north of England for a one-year period. Results show that (i) the proposed non-stationary variance model outperforms the stationary variance model, and (ii) a small but significant decrease of the rainfall prediction error variance is obtained with the optimised rain-gauge network. In particular, it pays off to place rain-gauges at locations where the radar imagery is inaccurate, while keeping the distribution over the study area sufficiently uniform.
GPS Satellite Orbit Prediction at User End for Real-Time PPP System
Yang, Hongzhou; Gao, Yang
2017-01-01
This paper proposed the high-precision satellite orbit prediction process at the user end for the real-time precise point positioning (PPP) system. Firstly, the structure of a new real-time PPP system will be briefly introduced in the paper. Then, the generation of satellite initial parameters (IP) at the sever end will be discussed, which includes the satellite position, velocity, and the solar radiation pressure (SRP) parameters for each satellite. After that, the method for orbit prediction at the user end, with dynamic models including the Earth’s gravitational force, lunar gravitational force, solar gravitational force, and the SRP, are presented. For numerical integration, both the single-step Runge–Kutta and multi-step Adams–Bashforth–Moulton integrator methods are implemented. Then, the comparison between the predicted orbit and the international global navigation satellite system (GNSS) service (IGS) final products are carried out. The results show that the prediction accuracy can be maintained for several hours, and the average prediction error of the 31 satellites are 0.031, 0.032, and 0.033 m for the radial, along-track and cross-track directions over 12 h, respectively. Finally, the PPP in both static and kinematic modes are carried out to verify the accuracy of the predicted satellite orbit. The average root mean square error (RMSE) for the static PPP of the 32 globally distributed IGS stations are 0.012, 0.015, and 0.021 m for the north, east, and vertical directions, respectively; while the RMSE of the kinematic PPP with the predicted orbit are 0.031, 0.069, and 0.167 m in the north, east and vertical directions, respectively. PMID:28867771
A Comparative Study of Data Mining Techniques on Football Match Prediction
NASA Astrophysics Data System (ADS)
Rosli, Che Mohamad Firdaus Che Mohd; Zainuri Saringat, Mohd; Razali, Nazim; Mustapha, Aida
2018-05-01
Data prediction have become a trend in today’s business or organization. This paper is set to predict match outcomes for association football from the perspective of football club managers and coaches. This paper explored different data mining techniques used for predicting the match outcomes where the target class is win, draw and lose. The main objective of this research is to find the most accurate data mining technique that fits the nature of football data. The techniques tested are Decision Trees, Neural Networks, Bayesian Network, and k-Nearest Neighbors. The results from the comparative experiments showed that Decision Trees produced the highest average prediction accuracy in the domain of football match prediction by 99.56%.
Samudrala, Ram
2015-01-01
We have examined the effect of eight different protein classes (channels, GPCRs, kinases, ligases, nuclear receptors, proteases, phosphatases, transporters) on the benchmarking performance of the CANDO drug discovery and repurposing platform (http://protinfo.org/cando). The first version of the CANDO platform utilizes a matrix of predicted interactions between 48278 proteins and 3733 human ingestible compounds (including FDA approved drugs and supplements) that map to 2030 indications/diseases using a hierarchical chem and bio-informatic fragment based docking with dynamics protocol (> one billion predicted interactions considered). The platform uses similarity of compound-proteome interaction signatures as indicative of similar functional behavior and benchmarking accuracy is calculated across 1439 indications/diseases with more than one approved drug. The CANDO platform yields a significant correlation (0.99, p-value < 0.0001) between the number of proteins considered and benchmarking accuracy obtained indicating the importance of multitargeting for drug discovery. Average benchmarking accuracies range from 6.2 % to 7.6 % for the eight classes when the top 10 ranked compounds are considered, in contrast to a range of 5.5 % to 11.7 % obtained for the comparison/control sets consisting of 10, 100, 1000, and 10000 single best performing proteins. These results are generally two orders of magnitude better than the average accuracy of 0.2% obtained when randomly generated (fully scrambled) matrices are used. Different indications perform well when different classes are used but the best accuracies (up to 11.7% for the top 10 ranked compounds) are achieved when a combination of classes are used containing the broadest distribution of protein folds. Our results illustrate the utility of the CANDO approach and the consideration of different protein classes for devising indication specific protocols for drug repurposing as well as drug discovery. PMID:25694071
Nuutinen, Mikko; Leskelä, Riikka-Leena; Suojalehto, Ella; Tirronen, Anniina; Komssi, Vesa
2017-04-13
In previous years a substantial number of studies have identified statistically important predictors of nursing home admission (NHA). However, as far as we know, the analyses have been done at the population-level. No prior research has analysed the prediction accuracy of a NHA model for individuals. This study is an analysis of 3056 longer-term home care customers in the city of Tampere, Finland. Data were collected from the records of social and health service usage and RAI-HC (Resident Assessment Instrument - Home Care) assessment system during January 2011 and September 2015. The aim was to find out the most efficient variable subsets to predict NHA for individuals and validate the accuracy. The variable subsets of predicting NHA were searched by sequential forward selection (SFS) method, a variable ranking metric and the classifiers of logistic regression (LR), support vector machine (SVM) and Gaussian naive Bayes (GNB). The validation of the results was guaranteed using randomly balanced data sets and cross-validation. The primary performance metrics for the classifiers were the prediction accuracy and AUC (average area under the curve). The LR and GNB classifiers achieved 78% accuracy for predicting NHA. The most important variables were RAI MAPLE (Method for Assigning Priority Levels), functional impairment (RAI IADL, Activities of Daily Living), cognitive impairment (RAI CPS, Cognitive Performance Scale), memory disorders (diagnoses G30-G32 and F00-F03) and the use of community-based health-service and prior hospital use (emergency visits and periods of care). The accuracy of the classifier for individuals was high enough to convince the officials of the city of Tampere to integrate the predictive model based on the findings of this study as a part of home care information system. Further work need to be done to evaluate variables that are modifiable and responsive to interventions.
Young, Mariel; Johannesdottir, Fjola; Poole, Ken; Shaw, Colin; Stock, J T
2018-02-01
Femoral head diameter is commonly used to estimate body mass from the skeleton. The three most frequently employed methods, designed by Ruff, Grine, and McHenry, were developed using different populations to address different research questions. They were not specifically designed for application to female remains, and their accuracy for this purpose has rarely been assessed or compared in living populations. This study analyzes the accuracy of these methods using a sample of modern British women through the use of pelvic CT scans (n = 97) and corresponding information about the individuals' known height and weight. Results showed that all methods provided reasonably accurate body mass estimates (average percent prediction errors under 20%) for the normal weight and overweight subsamples, but were inaccurate for the obese and underweight subsamples (average percent prediction errors over 20%). When women of all body mass categories were combined, the methods provided reasonable estimates (average percent prediction errors between 16 and 18%). The results demonstrate that different methods provide more accurate results within specific body mass index (BMI) ranges. The McHenry Equation provided the most accurate estimation for women of small body size, while the original Ruff Equation is most likely to be accurate if the individual was obese or severely obese. The refined Ruff Equation was the most accurate predictor of body mass on average for the entire sample, indicating that it should be utilized when there is no knowledge of the individual's body size or if the individual is assumed to be of a normal body size. The study also revealed a correlation between pubis length and body mass, and an equation for body mass estimation using pubis length was accurate in a dummy sample, suggesting that pubis length can also be used to acquire reliable body mass estimates. This has implications for how we interpret body mass in fossil hominins and has particular relevance to the interpretation of the long pubic ramus that is characteristic of Neandertals. Copyright © 2017 Elsevier Ltd. All rights reserved.
Predicting Atomic Decay Rates Using an Informational-Entropic Approach
NASA Astrophysics Data System (ADS)
Gleiser, Marcelo; Jiang, Nan
2018-06-01
We show that a newly proposed Shannon-like entropic measure of shape complexity applicable to spatially-localized or periodic mathematical functions known as configurational entropy (CE) can be used as a predictor of spontaneous decay rates for one-electron atoms. The CE is constructed from the Fourier transform of the atomic probability density. For the hydrogen atom with degenerate states labeled with the principal quantum number n, we obtain a scaling law relating the n-averaged decay rates to the respective CE. The scaling law allows us to predict the n-averaged decay rate without relying on the traditional computation of dipole matrix elements. We tested the predictive power of our approach up to n = 20, obtaining an accuracy better than 3.7% within our numerical precision, as compared to spontaneous decay tables listed in the literature.
Predicting Atomic Decay Rates Using an Informational-Entropic Approach
NASA Astrophysics Data System (ADS)
Gleiser, Marcelo; Jiang, Nan
2018-02-01
We show that a newly proposed Shannon-like entropic measure of shape complexity applicable to spatially-localized or periodic mathematical functions known as configurational entropy (CE) can be used as a predictor of spontaneous decay rates for one-electron atoms. The CE is constructed from the Fourier transform of the atomic probability density. For the hydrogen atom with degenerate states labeled with the principal quantum number n, we obtain a scaling law relating the n-averaged decay rates to the respective CE. The scaling law allows us to predict the n-averaged decay rate without relying on the traditional computation of dipole matrix elements. We tested the predictive power of our approach up to n = 20, obtaining an accuracy better than 3.7% within our numerical precision, as compared to spontaneous decay tables listed in the literature.
Nazarian, Dalar; Ganesh, P.; Sholl, David S.
2015-09-30
We compiled a test set of chemically and topologically diverse Metal–Organic Frameworks (MOFs) with high accuracy experimentally derived crystallographic structure data. The test set was used to benchmark the performance of Density Functional Theory (DFT) functionals (M06L, PBE, PW91, PBE-D2, PBE-D3, and vdW-DF2) for predicting lattice parameters, unit cell volume, bonded parameters and pore descriptors. On average PBE-D2, PBE-D3, and vdW-DF2 predict more accurate structures, but all functionals predicted pore diameters within 0.5 Å of the experimental diameter for every MOF in the test set. The test set was also used to assess the variance in performance of DFT functionalsmore » for elastic properties and atomic partial charges. The DFT predicted elastic properties such as minimum shear modulus and Young's modulus can differ by an average of 3 and 9 GPa for rigid MOFs such as those in the test set. Moreover, we calculated the partial charges by vdW-DF2 deviate the most from other functionals while there is no significant difference between the partial charges calculated by M06L, PBE, PW91, PBE-D2 and PBE-D3 for the MOFs in the test set. We find that while there are differences in the magnitude of the properties predicted by the various functionals, these discrepancies are small compared to the accuracy necessary for most practical applications.« less
Ellis, Katherine; Godbole, Suneeta; Marshall, Simon; Lanckriet, Gert; Staudenmayer, John; Kerr, Jacqueline
2014-01-01
Background: Active travel is an important area in physical activity research, but objective measurement of active travel is still difficult. Automated methods to measure travel behaviors will improve research in this area. In this paper, we present a supervised machine learning method for transportation mode prediction from global positioning system (GPS) and accelerometer data. Methods: We collected a dataset of about 150 h of GPS and accelerometer data from two research assistants following a protocol of prescribed trips consisting of five activities: bicycling, riding in a vehicle, walking, sitting, and standing. We extracted 49 features from 1-min windows of this data. We compared the performance of several machine learning algorithms and chose a random forest algorithm to classify the transportation mode. We used a moving average output filter to smooth the output predictions over time. Results: The random forest algorithm achieved 89.8% cross-validated accuracy on this dataset. Adding the moving average filter to smooth output predictions increased the cross-validated accuracy to 91.9%. Conclusion: Machine learning methods are a viable approach for automating measurement of active travel, particularly for measuring travel activities that traditional accelerometer data processing methods misclassify, such as bicycling and vehicle travel. PMID:24795875
Genomic Prediction of Genotype × Environment Interaction Kernel Regression Models.
Cuevas, Jaime; Crossa, José; Soberanis, Víctor; Pérez-Elizalde, Sergio; Pérez-Rodríguez, Paulino; Campos, Gustavo de Los; Montesinos-López, O A; Burgueño, Juan
2016-11-01
In genomic selection (GS), genotype × environment interaction (G × E) can be modeled by a marker × environment interaction (M × E). The G × E may be modeled through a linear kernel or a nonlinear (Gaussian) kernel. In this study, we propose using two nonlinear Gaussian kernels: the reproducing kernel Hilbert space with kernel averaging (RKHS KA) and the Gaussian kernel with the bandwidth estimated through an empirical Bayesian method (RKHS EB). We performed single-environment analyses and extended to account for G × E interaction (GBLUP-G × E, RKHS KA-G × E and RKHS EB-G × E) in wheat ( L.) and maize ( L.) data sets. For single-environment analyses of wheat and maize data sets, RKHS EB and RKHS KA had higher prediction accuracy than GBLUP for all environments. For the wheat data, the RKHS KA-G × E and RKHS EB-G × E models did show up to 60 to 68% superiority over the corresponding single environment for pairs of environments with positive correlations. For the wheat data set, the models with Gaussian kernels had accuracies up to 17% higher than that of GBLUP-G × E. For the maize data set, the prediction accuracy of RKHS EB-G × E and RKHS KA-G × E was, on average, 5 to 6% higher than that of GBLUP-G × E. The superiority of the Gaussian kernel models over the linear kernel is due to more flexible kernels that accounts for small, more complex marker main effects and marker-specific interaction effects. Copyright © 2016 Crop Science Society of America.
Janoff, Daniel M; Davol, Patrick; Hazzard, James; Lemmers, Michael J; Paduch, Darius A; Barry, John M
2004-01-01
Computerized tomography (CT) with 3-dimensional (3-D) reconstruction has gained acceptance as an imaging study to evaluate living renal donors. We report our experience with this technique in 199 consecutive patients to validate its predictions of arterial anatomy and kidney volumes. Between January 1997 and March 2002, 199 living donor nephrectomies were performed at our institution using an open technique. During the operation arterial anatomy was recorded as well as kidney weight in 98 patients and displacement volume in 27. Each donor had been evaluated preoperatively by CT angiography with 3-D reconstruction. Arterial anatomy described by a staff radiologist was compared with intraoperative findings. CT estimated volumes were reported. Linear correlation graphs were generated to assess the reliability of CT volume predictions. The accuracy of CT angiography for predicting arterial anatomy was 90.5%. However, as the number of renal arteries increased, predictive accuracy decreased. The ability of CT to predict multiple arteries remained high with a positive predictive value of 95.2%. Calculated CT volume and kidney weight significantly correlated (0.654). However, the coefficient of variation index (how much average CT volume differed from measured intraoperative volume) was 17.8%. CT angiography with 3-D reconstruction accurately predicts arterial vasculature in more than 90% of patients and it can be used to compare renal volumes. However, accuracy decreases with multiple renal arteries and volume comparisons may be inaccurate when the difference in kidney volumes is within 17.8%.
Ostrowski, Michalł; Wilkowska, Ewa; Baczek, Tomasz
2010-12-01
In vivo-in vitro correlation (IVIVC) is an effective tool to predict absorption behavior of active substances from pharmaceutical dosage forms. The model for immediate release dosage form containing amoxicillin was used in the presented study to check if the calculation method of absorption profiles can influence final results achieved. The comparison showed that an averaging of individual absorption profiles performed by Wagner-Nelson (WN) conversion method can lead to lose the discrimination properties of the model. The approach considering individual plasma concentration versus time profiles enabled to average absorption profiles prior WN conversion. In turn, that enabled to find differences between dispersible tablets and capsules. It was concluded that in the case of immediate release dosage form, the decision to use averaging method should be based on an individual situation; however, it seems that the influence of such a procedure on the discrimination properties of the model is then more significant. © 2010 Wiley-Liss, Inc. and the American Pharmacists Association
A Systems Modeling Approach to Forecast Corn Economic Optimum Nitrogen Rate.
Puntel, Laila A; Sawyer, John E; Barker, Daniel W; Thorburn, Peter J; Castellano, Michael J; Moore, Kenneth J; VanLoocke, Andrew; Heaton, Emily A; Archontoulis, Sotirios V
2018-01-01
Historically crop models have been used to evaluate crop yield responses to nitrogen (N) rates after harvest when it is too late for the farmers to make in-season adjustments. We hypothesize that the use of a crop model as an in-season forecast tool will improve current N decision-making. To explore this, we used the Agricultural Production Systems sIMulator (APSIM) calibrated with long-term experimental data for central Iowa, USA (16-years in continuous corn and 15-years in soybean-corn rotation) combined with actual weather data up to a specific crop stage and historical weather data thereafter. The objectives were to: (1) evaluate the accuracy and uncertainty of corn yield and economic optimum N rate (EONR) predictions at four forecast times (planting time, 6th and 12th leaf, and silking phenological stages); (2) determine whether the use of analogous historical weather years based on precipitation and temperature patterns as opposed to using a 35-year dataset could improve the accuracy of the forecast; and (3) quantify the value added by the crop model in predicting annual EONR and yields using the site-mean EONR and the yield at the EONR to benchmark predicted values. Results indicated that the mean corn yield predictions at planting time ( R 2 = 0.77) using 35-years of historical weather was close to the observed and predicted yield at maturity ( R 2 = 0.81). Across all forecasting times, the EONR predictions were more accurate in corn-corn than soybean-corn rotation (relative root mean square error, RRMSE, of 25 vs. 45%, respectively). At planting time, the APSIM model predicted the direction of optimum N rates (above, below or at average site-mean EONR) in 62% of the cases examined ( n = 31) with an average error range of ±38 kg N ha -1 (22% of the average N rate). Across all forecast times, prediction error of EONR was about three times higher than yield predictions. The use of the 35-year weather record was better than using selected historical weather years to forecast (RRMSE was on average 3% lower). Overall, the proposed approach of using the crop model as a forecasting tool could improve year-to-year predictability of corn yields and optimum N rates. Further improvements in modeling and set-up protocols are needed toward more accurate forecast, especially for extreme weather years with the most significant economic and environmental cost.
A Systems Modeling Approach to Forecast Corn Economic Optimum Nitrogen Rate
Puntel, Laila A.; Sawyer, John E.; Barker, Daniel W.; Thorburn, Peter J.; Castellano, Michael J.; Moore, Kenneth J.; VanLoocke, Andrew; Heaton, Emily A.; Archontoulis, Sotirios V.
2018-01-01
Historically crop models have been used to evaluate crop yield responses to nitrogen (N) rates after harvest when it is too late for the farmers to make in-season adjustments. We hypothesize that the use of a crop model as an in-season forecast tool will improve current N decision-making. To explore this, we used the Agricultural Production Systems sIMulator (APSIM) calibrated with long-term experimental data for central Iowa, USA (16-years in continuous corn and 15-years in soybean-corn rotation) combined with actual weather data up to a specific crop stage and historical weather data thereafter. The objectives were to: (1) evaluate the accuracy and uncertainty of corn yield and economic optimum N rate (EONR) predictions at four forecast times (planting time, 6th and 12th leaf, and silking phenological stages); (2) determine whether the use of analogous historical weather years based on precipitation and temperature patterns as opposed to using a 35-year dataset could improve the accuracy of the forecast; and (3) quantify the value added by the crop model in predicting annual EONR and yields using the site-mean EONR and the yield at the EONR to benchmark predicted values. Results indicated that the mean corn yield predictions at planting time (R2 = 0.77) using 35-years of historical weather was close to the observed and predicted yield at maturity (R2 = 0.81). Across all forecasting times, the EONR predictions were more accurate in corn-corn than soybean-corn rotation (relative root mean square error, RRMSE, of 25 vs. 45%, respectively). At planting time, the APSIM model predicted the direction of optimum N rates (above, below or at average site-mean EONR) in 62% of the cases examined (n = 31) with an average error range of ±38 kg N ha−1 (22% of the average N rate). Across all forecast times, prediction error of EONR was about three times higher than yield predictions. The use of the 35-year weather record was better than using selected historical weather years to forecast (RRMSE was on average 3% lower). Overall, the proposed approach of using the crop model as a forecasting tool could improve year-to-year predictability of corn yields and optimum N rates. Further improvements in modeling and set-up protocols are needed toward more accurate forecast, especially for extreme weather years with the most significant economic and environmental cost. PMID:29706974
From sequence to enzyme mechanism using multi-label machine learning.
De Ferrari, Luna; Mitchell, John B O
2014-05-19
In this work we predict enzyme function at the level of chemical mechanism, providing a finer granularity of annotation than traditional Enzyme Commission (EC) classes. Hence we can predict not only whether a putative enzyme in a newly sequenced organism has the potential to perform a certain reaction, but how the reaction is performed, using which cofactors and with susceptibility to which drugs or inhibitors, details with important consequences for drug and enzyme design. Work that predicts enzyme catalytic activity based on 3D protein structure features limits the prediction of mechanism to proteins already having either a solved structure or a close relative suitable for homology modelling. In this study, we evaluate whether sequence identity, InterPro or Catalytic Site Atlas sequence signatures provide enough information for bulk prediction of enzyme mechanism. By splitting MACiE (Mechanism, Annotation and Classification in Enzymes database) mechanism labels to a finer granularity, which includes the role of the protein chain in the overall enzyme complex, the method can predict at 96% accuracy (and 96% micro-averaged precision, 99.9% macro-averaged recall) the MACiE mechanism definitions of 248 proteins available in the MACiE, EzCatDb (Database of Enzyme Catalytic Mechanisms) and SFLD (Structure Function Linkage Database) databases using an off-the-shelf K-Nearest Neighbours multi-label algorithm. We find that InterPro signatures are critical for accurate prediction of enzyme mechanism. We also find that incorporating Catalytic Site Atlas attributes does not seem to provide additional accuracy. The software code (ml2db), data and results are available online at http://sourceforge.net/projects/ml2db/ and as supplementary files.
Day, J D; Weaver, L D; Franti, C E
1995-01-01
The objective of this prospective cohort study was to determine the sensitivity, specificity, accuracy, and predictive value of twin pregnancy diagnosis by rectal palpation and to examine fetal survival, culling rates, and gestational lengths of cows diagnosed with twins. In this prospective study, 5309 cows on 14 farms in California were followed from pregnancy diagnosis to subsequent abortion or calving. The average sensitivity, specificity, accuracy, and predictive value of twin pregnancy diagnosis were 49.3%, 99.4%, 96.0%, and 86.1%, respectively. The abortion rate for single pregnancies of 12.0% differed significantly from those for bicornual twin pregnancies and unicornual twin pregnancies of 26.2% and 32.4%, respectively (P < 0.05). The early calf mortality between cows calving with singles (3.2%) and twins (15.7%) were significantly different (P < 0.005). The difference in fetal survival between single pregnancies and all twin pregnancies resulted in 0.42 and 0.29 viable heifers per pregnancy, respectively. The average gestation for single, bicornual, and unicornual pregnancies that did not abort before drying-off was 278, 272, and 270 days, respectively. Results of this study show that there is an increased fetal wastage associated with twin pregnancies and suggest a need for further research exploring management strategies for cows carrying twins. PMID:7728734
Moghram, Basem Ameen; Nabil, Emad; Badr, Amr
2018-01-01
T-cell epitope structure identification is a significant challenging immunoinformatic problem within epitope-based vaccine design. Epitopes or antigenic peptides are a set of amino acids that bind with the Major Histocompatibility Complex (MHC) molecules. The aim of this process is presented by Antigen Presenting Cells to be inspected by T-cells. MHC-molecule-binding epitopes are responsible for triggering the immune response to antigens. The epitope's three-dimensional (3D) molecular structure (i.e., tertiary structure) reflects its proper function. Therefore, the identification of MHC class-II epitopes structure is a significant step towards epitope-based vaccine design and understanding of the immune system. In this paper, we propose a new technique using a Genetic Algorithm for Predicting the Epitope Structure (GAPES), to predict the structure of MHC class-II epitopes based on their sequence. The proposed Elitist-based genetic algorithm for predicting the epitope's tertiary structure is based on Ab-Initio Empirical Conformational Energy Program for Peptides (ECEPP) Force Field Model. The developed secondary structure prediction technique relies on Ramachandran Plot. We used two alignment algorithms: the ROSS alignment and TM-Score alignment. We applied four different alignment approaches to calculate the similarity scores of the dataset under test. We utilized the support vector machine (SVM) classifier as an evaluation of the prediction performance. The prediction accuracy and the Area Under Receiver Operating Characteristic (ROC) Curve (AUC) were calculated as measures of performance. The calculations are performed on twelve similarity-reduced datasets of the Immune Epitope Data Base (IEDB) and a large dataset of peptide-binding affinities to HLA-DRB1*0101. The results showed that GAPES was reliable and very accurate. We achieved an average prediction accuracy of 93.50% and an average AUC of 0.974 in the IEDB dataset. Also, we achieved an accuracy of 95.125% and an AUC of 0.987 on the HLA-DRB1*0101 allele of the Wang benchmark dataset. The results indicate that the proposed prediction technique "GAPES" is a promising technique that will help researchers and scientists to predict the protein structure and it will assist them in the intelligent design of new epitope-based vaccines. Copyright © 2017 Elsevier B.V. All rights reserved.
Hua, Zhi-Gang; Lin, Yan; Yuan, Ya-Zhou; Yang, De-Chang; Wei, Wen; Guo, Feng-Biao
2015-01-01
In 2003, we developed an ab initio program, ZCURVE 1.0, to find genes in bacterial and archaeal genomes. In this work, we present the updated version (i.e. ZCURVE 3.0). Using 422 prokaryotic genomes, the average accuracy was 93.7% with the updated version, compared with 88.7% with the original version. Such results also demonstrate that ZCURVE 3.0 is comparable with Glimmer 3.02 and may provide complementary predictions to it. In fact, the joint application of the two programs generated better results by correctly finding more annotated genes while also containing fewer false-positive predictions. As the exclusive function, ZCURVE 3.0 contains one post-processing program that can identify essential genes with high accuracy (generally >90%). We hope ZCURVE 3.0 will receive wide use with the web-based running mode. The updated ZCURVE can be freely accessed from http://cefg.uestc.edu.cn/zcurve/ or http://tubic.tju.edu.cn/zcurveb/ without any restrictions. PMID:25977299
Tamura, Takeyuki; Akutsu, Tatsuya
2007-11-30
Subcellular location prediction of proteins is an important and well-studied problem in bioinformatics. This is a problem of predicting which part in a cell a given protein is transported to, where an amino acid sequence of the protein is given as an input. This problem is becoming more important since information on subcellular location is helpful for annotation of proteins and genes and the number of complete genomes is rapidly increasing. Since existing predictors are based on various heuristics, it is important to develop a simple method with high prediction accuracies. In this paper, we propose a novel and general predicting method by combining techniques for sequence alignment and feature vectors based on amino acid composition. We implemented this method with support vector machines on plant data sets extracted from the TargetP database. Through fivefold cross validation tests, the obtained overall accuracies and average MCC were 0.9096 and 0.8655 respectively. We also applied our method to other datasets including that of WoLF PSORT. Although there is a predictor which uses the information of gene ontology and yields higher accuracy than ours, our accuracies are higher than existing predictors which use only sequence information. Since such information as gene ontology can be obtained only for known proteins, our predictor is considered to be useful for subcellular location prediction of newly-discovered proteins. Furthermore, the idea of combination of alignment and amino acid frequency is novel and general so that it may be applied to other problems in bioinformatics. Our method for plant is also implemented as a web-system and available on http://sunflower.kuicr.kyoto-u.ac.jp/~tamura/slpfa.html.
Endelman, Jeffrey B; Carley, Cari A Schmitz; Bethke, Paul C; Coombs, Joseph J; Clough, Mark E; da Silva, Washington L; De Jong, Walter S; Douches, David S; Frederick, Curtis M; Haynes, Kathleen G; Holm, David G; Miller, J Creighton; Muñoz, Patricio R; Navarro, Felix M; Novy, Richard G; Palta, Jiwan P; Porter, Gregory A; Rak, Kyle T; Sathuvalli, Vidyasagar R; Thompson, Asunta L; Yencho, G Craig
2018-05-01
As one of the world's most important food crops, the potato ( Solanum tuberosum L.) has spurred innovation in autotetraploid genetics, including in the use of SNP arrays to determine allele dosage at thousands of markers. By combining genotype and pedigree information with phenotype data for economically important traits, the objectives of this study were to (1) partition the genetic variance into additive vs. nonadditive components, and (2) determine the accuracy of genome-wide prediction. Between 2012 and 2017, a training population of 571 clones was evaluated for total yield, specific gravity, and chip fry color. Genomic covariance matrices for additive ( G ), digenic dominant ( D ), and additive × additive epistatic ( G # G ) effects were calculated using 3895 markers, and the numerator relationship matrix ( A ) was calculated from a 13-generation pedigree. Based on model fit and prediction accuracy, mixed model analysis with G was superior to A for yield and fry color but not specific gravity. The amount of additive genetic variance captured by markers was 20% of the total genetic variance for specific gravity, compared to 45% for yield and fry color. Within the training population, including nonadditive effects improved accuracy and/or bias for all three traits when predicting total genotypic value. When six F 1 populations were used for validation, prediction accuracy ranged from 0.06 to 0.63 and was consistently lower (0.13 on average) without allele dosage information. We conclude that genome-wide prediction is feasible in potato and that it will improve selection for breeding value given the substantial amount of nonadditive genetic variance in elite germplasm. Copyright © 2018 by the Genetics Society of America.
Ho, Junming; Zwicker, Vincent E; Yuen, Karen K Y; Jolliffe, Katrina A
2017-10-06
Robust quantum chemical methods are employed to predict the pK a 's of several families of dual hydrogen-bonding organocatalysts/anion receptors, including deltamides and croconamides as well as their thio derivatives. The average accuracy of these predictions is ∼1 pK a unit and allows for a comparison of the acidity between classes of receptors and for quantitative studies of substituent effects. These computational insights further explain the relationship between pK a and chloride anion affinity of these receptors that will be important for designing future anion receptors and organocatalysts.
Li, Eldon Y; Tung, Chen-Yuan; Chang, Shu-Hsun
2016-08-01
The quest for an effective system capable of monitoring and predicting the trends of epidemic diseases is a critical issue for communities worldwide. With the prevalence of Internet access, more and more researchers today are using data from both search engines and social media to improve the prediction accuracy. In particular, a prediction market system (PMS) exploits the wisdom of crowds on the Internet to effectively accomplish relatively high accuracy. This study presents the architecture of a PMS and demonstrates the matching mechanism of logarithmic market scoring rules. The system was implemented to predict infectious diseases in Taiwan with the wisdom of crowds in order to improve the accuracy of epidemic forecasting. The PMS architecture contains three design components: database clusters, market engine, and Web applications. The system accumulated knowledge from 126 health professionals for 31 weeks to predict five disease indicators: the confirmed cases of dengue fever, the confirmed cases of severe and complicated influenza, the rate of enterovirus infections, the rate of influenza-like illnesses, and the confirmed cases of severe and complicated enterovirus infection. Based on the winning ratio, the PMS predicts the trends of three out of five disease indicators more accurately than does the existing system that uses the five-year average values of historical data for the same weeks. In addition, the PMS with the matching mechanism of logarithmic market scoring rules is easy to understand for health professionals and applicable to predict all the five disease indicators. The PMS architecture of this study affords organizations and individuals to implement it for various purposes in our society. The system can continuously update the data and improve prediction accuracy in monitoring and forecasting the trends of epidemic diseases. Future researchers could replicate and apply the PMS demonstrated in this study to more infectious diseases and wider geographical areas, especially the under-developed countries across Asia and Africa. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Cordeiro, Daniela Valença; Lima, Verônica Castro; Castro, Dinorah P; Castro, Leonardo C; Pacheco, Maria Angélica; Lee, Jae Min; Dimantas, Marcelo I; Prata, Tiago Santos
2011-01-01
To evaluate the influence of optic disc size on the diagnostic accuracy of macular ganglion cell complex (GCC) and conventional peripapillary retinal nerve fiber layer (pRNFL) analyses provided by spectral domain optical coherence tomography (SD-OCT) in glaucoma. Eighty-two glaucoma patients and 30 healthy subjects were included. All patients underwent GCC (7 × 7 mm macular grid, consisting of RNFL, ganglion cell and inner plexiform layers) and pRNFL thickness measurement (3.45 mm circular scan) by SD-OCT. One eye was randomly selected for analysis. Initially, receiver operating characteristic (ROC) curves were generated for different GCC and pRNFL parameters. The effect of disc area on the diagnostic accuracy of these parameters was evaluated using a logistic ROC regression model. Subsequently, 1.5, 2.0, and 2.5 mm(2) disc sizes were arbitrarily chosen (based on data distribution) and the predicted areas under the ROC curves (AUCs) and sensitivities were compared at fixed specificities for each. Average mean deviation index for glaucomatous eyes was -5.3 ± 5.2 dB. Similar AUCs were found for the best pRNFL (average thickness = 0.872) and GCC parameters (average thickness = 0.824; P = 0.19). The coefficient representing disc area in the ROC regression model was not statistically significant for average pRNFL thickness (-0.176) or average GCC thickness (0.088; P ≥ 0.56). AUCs for fixed disc areas (1.5, 2.0, and 2.5 mm(2)) were 0.904, 0.891, and 0.875 for average pRNFL thickness and 0.834, 0.842, and 0.851 for average GCC thickness, respectively. The highest sensitivities - at 80% specificity for average pRNFL (84.5%) and GCC thicknesses (74.5%) - were found with disc sizes fixed at 1.5 mm(2) and 2.5 mm(2). Diagnostic accuracy was similar between pRNFL and GCC thickness parameters. Although not statistically significant, there was a trend for a better diagnostic accuracy of pRNFL thickness measurement in cases of smaller discs. For GCC analysis, an inverse effect was observed.
ERIC Educational Resources Information Center
Cheung, Vivian H. Y.; Salih, Salih A.; Crouch, Alisa; Karunanithi, Mohanraj K.; Gray, Len
2012-01-01
The aim of this study is to determine whether clinicians' estimates of patients' walking time agree with those determined by accelerometer devices. The walking time was measured using a waist-mounted accelerometer device everyday during the patients' waking hours. At each weekly meeting, clinicians estimated the patients' average daily walking…
Vallejo, Roger L; Silva, Rafael M O; Evenhuis, Jason P; Gao, Guangtu; Liu, Sixin; Parsons, James E; Martin, Kyle E; Wiens, Gregory D; Lourenco, Daniela A L; Leeds, Timothy D; Palti, Yniv
2018-06-05
Previously accurate genomic predictions for Bacterial cold water disease (BCWD) resistance in rainbow trout were obtained using a medium-density single nucleotide polymorphism (SNP) array. Here, the impact of lower-density SNP panels on the accuracy of genomic predictions was investigated in a commercial rainbow trout breeding population. Using progeny performance data, the accuracy of genomic breeding values (GEBV) using 35K, 10K, 3K, 1K, 500, 300 and 200 SNP panels as well as a panel with 70 quantitative trait loci (QTL)-flanking SNP was compared. The GEBVs were estimated using the Bayesian method BayesB, single-step GBLUP (ssGBLUP) and weighted ssGBLUP (wssGBLUP). The accuracy of GEBVs remained high despite the sharp reductions in SNP density, and even with 500 SNP accuracy was higher than the pedigree-based prediction (0.50-0.56 versus 0.36). Furthermore, the prediction accuracy with the 70 QTL-flanking SNP (0.65-0.72) was similar to the panel with 35K SNP (0.65-0.71). Genomewide linkage disequilibrium (LD) analysis revealed strong LD (r 2 ≥ 0.25) spanning on average over 1 Mb across the rainbow trout genome. This long-range LD likely contributed to the accurate genomic predictions with the low-density SNP panels. Population structure analysis supported the hypothesis that long-range LD in this population may be caused by admixture. Results suggest that lower-cost, low-density SNP panels can be used for implementing genomic selection for BCWD resistance in rainbow trout breeding programs. © 2018 The Authors. This article is a U.S. Government work and is in the public domain in the USA. Journal of Animal Breeding and Genetics published by Blackwell Verlag GmbH.
The wisdom of crowds for visual search
Juni, Mordechai Z.; Eckstein, Miguel P.
2017-01-01
Decision-making accuracy typically increases through collective integration of people’s judgments into group decisions, a phenomenon known as the wisdom of crowds. For simple perceptual laboratory tasks, classic signal detection theory specifies the upper limit for collective integration benefits obtained by weighted averaging of people’s confidences, and simple majority voting can often approximate that limit. Life-critical perceptual decisions often involve searching large image data (e.g., medical, security, and aerial imagery), but the expected benefits and merits of using different pooling algorithms are unknown for such tasks. Here, we show that expected pooling benefits are significantly greater for visual search than for single-location perceptual tasks and the prediction given by classic signal detection theory. In addition, we show that simple majority voting obtains inferior accuracy benefits for visual search relative to averaging and weighted averaging of observers’ confidences. Analysis of gaze behavior across observers suggests that the greater collective integration benefits for visual search arise from an interaction between the foveated properties of the human visual system (high foveal acuity and low peripheral acuity) and observers’ nonexhaustive search patterns, and can be predicted by an extended signal detection theory framework with trial to trial sampling from a varying mixture of high and low target detectabilities across observers (SDT-MIX). These findings advance our theoretical understanding of how to predict and enhance the wisdom of crowds for real world search tasks and could apply more generally to any decision-making task for which the minority of group members with high expertise varies from decision to decision. PMID:28490500
In vivo dose measurement using TLDs and MOSFET dosimeters for cardiac radiosurgery
Sumanaweera, Thilaka S.; Blanck, Oliver; Iwamura, Alyson K.; Steel, James P.; Dieterich, Sonja; Maguire, Patrick
2012-01-01
In vivo measurements were made of the dose delivered to animal models in an effort to develop a method for treating cardiac arrhythmia using radiation. This treatment would replace RF energy (currently used to create cardiac scar) with ionizing radiation. In the current study, the pulmonary vein ostia of animal models were irradiated with 6 MV X‐rays in order to produce a scar that would block aberrant signals characteristic of atrial fibrillation. The CyberKnife radiosurgery system was used to deliver planned treatments of 20–35 Gy in a single fraction to four animals. The Synchrony system was used to track respiratory motion of the heart, while the contractile motion of the heart was untracked. The dose was measured on the epicardial surface near the right pulmonary vein and on the esophagus using surgically implanted TLD dosimeters, or in the coronary sinus using a MOSFET dosimeter placed using a catheter. The doses measured on the epicardium with TLDs averaged 5% less than predicted for those locations, while doses measured in the coronary sinus with the MOSFET sensor nearest the target averaged 6% less than the predicted dose. The measurements on the esophagus averaged 25% less than predicted. These results provide an indication of the accuracy with which the treatment planning methods accounted for the motion of the target, with its respiratory and cardiac components. This is the first report on the accuracy of CyberKnife dose delivery to cardiac targets. PACS numbers: 87.53.Ly, 87.53.Bn PMID:22584173
Park, Jonghyeok; Kim, Hackjin; Sohn, Jeong-Woo; Choi, Jong-ryul; Kim, Sung-Phil
2018-01-01
Humans often attempt to predict what others prefer based on a narrow slice of experience, called thin-slicing. According to the theoretical bases for how humans can predict the preference of others, one tends to estimate the other's preference using a perceived difference between the other and self. Previous neuroimaging studies have revealed that the network of dorsal medial prefrontal cortex (dmPFC) and right temporoparietal junction (rTPJ) is related to the ability of predicting others' preference. However, it still remains unknown about the temporal patterns of neural activities for others' preference prediction through thin-slicing. To investigate such temporal aspects of neural activities, we investigated human electroencephalography (EEG) recorded during the task of predicting the preference of others while only a facial picture of others was provided. Twenty participants (all female, average age: 21.86) participated in the study. In each trial of the task, participants were shown a picture of either a target person or self for 3 s, followed by the presentation of a movie poster over which participants predicted the target person's preference as liking or disliking. The time-frequency EEG analysis was employed to analyze temporal changes in the amplitudes of brain oscillations. Participants could predict others' preference for movies with accuracy of 56.89 ± 3.16% and 10 out of 20 participants exhibited prediction accuracy higher than a chance level (95% interval). There was a significant difference in the power of the parietal alpha (10~13 Hz) oscillation 0.6~0.8 s after the onset of poster presentation between the cases when participants predicted others' preference and when they reported self-preference (p < 0.05). The power of brain oscillations at any frequency band and time period during the trial did not show a significant correlation with individual prediction accuracy. However, when we measured differences of the power between the trials of predicting other's preference and reporting self-preference, the right temporal beta oscillations 1.6~1.8 s after the onset of facial picture presentation exhibited a significant correlation with individual accuracy. Our results suggest that right temporoparietal beta oscillations may be correlated with one's ability to predict what others prefer with minimal information. PMID:29479312
Park, Jonghyeok; Kim, Hackjin; Sohn, Jeong-Woo; Choi, Jong-Ryul; Kim, Sung-Phil
2018-01-01
Humans often attempt to predict what others prefer based on a narrow slice of experience, called thin-slicing. According to the theoretical bases for how humans can predict the preference of others, one tends to estimate the other's preference using a perceived difference between the other and self. Previous neuroimaging studies have revealed that the network of dorsal medial prefrontal cortex (dmPFC) and right temporoparietal junction (rTPJ) is related to the ability of predicting others' preference. However, it still remains unknown about the temporal patterns of neural activities for others' preference prediction through thin-slicing. To investigate such temporal aspects of neural activities, we investigated human electroencephalography (EEG) recorded during the task of predicting the preference of others while only a facial picture of others was provided. Twenty participants (all female, average age: 21.86) participated in the study. In each trial of the task, participants were shown a picture of either a target person or self for 3 s, followed by the presentation of a movie poster over which participants predicted the target person's preference as liking or disliking. The time-frequency EEG analysis was employed to analyze temporal changes in the amplitudes of brain oscillations. Participants could predict others' preference for movies with accuracy of 56.89 ± 3.16% and 10 out of 20 participants exhibited prediction accuracy higher than a chance level (95% interval). There was a significant difference in the power of the parietal alpha (10~13 Hz) oscillation 0.6~0.8 s after the onset of poster presentation between the cases when participants predicted others' preference and when they reported self-preference ( p < 0.05). The power of brain oscillations at any frequency band and time period during the trial did not show a significant correlation with individual prediction accuracy. However, when we measured differences of the power between the trials of predicting other's preference and reporting self-preference, the right temporal beta oscillations 1.6~1.8 s after the onset of facial picture presentation exhibited a significant correlation with individual accuracy. Our results suggest that right temporoparietal beta oscillations may be correlated with one's ability to predict what others prefer with minimal information.
Blum, Meike; Distl, Ottmar
2014-01-01
In the present study, breeding values for canine congenital sensorineural deafness, the presence of blue eyes and patches have been predicted using multivariate animal models to test the reliability of the breeding values for planned matings. The dataset consisted of 6669 German Dalmatian dogs born between 1988 and 2009. Data were provided by the Dalmatian kennel clubs which are members of the German Association for Dog Breeding and Husbandry (VDH). The hearing status for all dogs was evaluated using brainstem auditory evoked potentials. The reliability using the prediction error variance of breeding values and the realized reliability of the prediction of the phenotype of future progeny born in each one year between 2006 and 2009 were used as parameters to evaluate the goodness of prediction through breeding values. All animals from the previous birth years were used for prediction of the breeding values of the progeny in each of the up-coming birth years. The breeding values based on pedigree records achieved an average reliability of 0.19 for the future 1951 progeny. The predictive accuracy (R2) for the hearing status of single future progeny was at 1.3%. Combining breeding values for littermates increased the predictive accuracy to 3.5%. Corresponding values for maternal and paternal half-sib groups were at 3.2 and 7.3%. The use of breeding values for planned matings increases the phenotypic selection response over mass selection. The breeding values of sires may be used for planned matings because reliabilities and predictive accuracies for future paternal progeny groups were highest.
Hu, Chen; Steingrimsson, Jon Arni
2018-01-01
A crucial component of making individualized treatment decisions is to accurately predict each patient's disease risk. In clinical oncology, disease risks are often measured through time-to-event data, such as overall survival and progression/recurrence-free survival, and are often subject to censoring. Risk prediction models based on recursive partitioning methods are becoming increasingly popular largely due to their ability to handle nonlinear relationships, higher-order interactions, and/or high-dimensional covariates. The most popular recursive partitioning methods are versions of the Classification and Regression Tree (CART) algorithm, which builds a simple interpretable tree structured model. With the aim of increasing prediction accuracy, the random forest algorithm averages multiple CART trees, creating a flexible risk prediction model. Risk prediction models used in clinical oncology commonly use both traditional demographic and tumor pathological factors as well as high-dimensional genetic markers and treatment parameters from multimodality treatments. In this article, we describe the most commonly used extensions of the CART and random forest algorithms to right-censored outcomes. We focus on how they differ from the methods for noncensored outcomes, and how the different splitting rules and methods for cost-complexity pruning impact these algorithms. We demonstrate these algorithms by analyzing a randomized Phase III clinical trial of breast cancer. We also conduct Monte Carlo simulations to compare the prediction accuracy of survival forests with more commonly used regression models under various scenarios. These simulation studies aim to evaluate how sensitive the prediction accuracy is to the underlying model specifications, the choice of tuning parameters, and the degrees of missing covariates.
Xiaopeng, QI; Liang, WEI; BARKER, Laurie; LEKIACHVILI, Akaki; Xingyou, ZHANG
2015-01-01
Temperature changes are known to have significant impacts on human health. Accurate estimates of population-weighted average monthly air temperature for US counties are needed to evaluate temperature’s association with health behaviours and disease, which are sampled or reported at the county level and measured on a monthly—or 30-day—basis. Most reported temperature estimates were calculated using ArcGIS, relatively few used SAS. We compared the performance of geostatistical models to estimate population-weighted average temperature in each month for counties in 48 states using ArcGIS v9.3 and SAS v 9.2 on a CITGO platform. Monthly average temperature for Jan-Dec 2007 and elevation from 5435 weather stations were used to estimate the temperature at county population centroids. County estimates were produced with elevation as a covariate. Performance of models was assessed by comparing adjusted R2, mean squared error, root mean squared error, and processing time. Prediction accuracy for split validation was above 90% for 11 months in ArcGIS and all 12 months in SAS. Cokriging in SAS achieved higher prediction accuracy and lower estimation bias as compared to cokriging in ArcGIS. County-level estimates produced by both packages were positively correlated (adjusted R2 range=0.95 to 0.99); accuracy and precision improved with elevation as a covariate. Both methods from ArcGIS and SAS are reliable for U.S. county-level temperature estimates; However, ArcGIS’s merits in spatial data pre-processing and processing time may be important considerations for software selection, especially for multi-year or multi-state projects. PMID:26167169
Data-Driven High-Throughput Prediction of the 3D Structure of Small Molecules: Review and Progress
Andronico, Alessio; Randall, Arlo; Benz, Ryan W.; Baldi, Pierre
2011-01-01
Accurate prediction of the 3D structure of small molecules is essential in order to understand their physical, chemical, and biological properties including how they interact with other molecules. Here we survey the field of high-throughput methods for 3D structure prediction and set up new target specifications for the next generation of methods. We then introduce COSMOS, a novel data-driven prediction method that utilizes libraries of fragment and torsion angle parameters. We illustrate COSMOS using parameters extracted from the Cambridge Structural Database (CSD) by analyzing their distribution and then evaluating the system’s performance in terms of speed, coverage, and accuracy. Results show that COSMOS represents a significant improvement when compared to the state-of-the-art, particularly in terms of coverage of complex molecular structures, including metal-organics. COSMOS can predict structures for 96.4% of the molecules in the CSD [99.6% organic, 94.6% metal-organic] whereas the widely used commercial method CORINA predicts structures for 68.5% [98.5% organic, 51.6% metal-organic]. On the common subset of molecules predicted by both methods COSMOS makes predictions with an average speed per molecule of 0.15s [0.10s organic, 0.21s metal-organic], and an average RMSD of 1.57Å [1.26Å organic, 1.90Å metal-organic], and CORINA makes predictions with an average speed per molecule of 0.13s [0.18s organic, 0.08s metal-organic], and an average RMSD of 1.60Å [1.13Å organic, 2.11Å metal-organic]. COSMOS is available through the ChemDB chemoinformatics web portal at: http://cdb.ics.uci.edu/. PMID:21417267
Reynolds averaged turbulence modelling using deep neural networks with embedded invariance
Ling, Julia; Kurzawski, Andrew; Templeton, Jeremy
2016-10-18
There exists significant demand for improved Reynolds-averaged Navier–Stokes (RANS) turbulence models that are informed by and can represent a richer set of turbulence physics. This paper presents a method of using deep neural networks to learn a model for the Reynolds stress anisotropy tensor from high-fidelity simulation data. A novel neural network architecture is proposed which uses a multiplicative layer with an invariant tensor basis to embed Galilean invariance into the predicted anisotropy tensor. It is demonstrated that this neural network architecture provides improved prediction accuracy compared with a generic neural network architecture that does not embed this invariance property.more » Furthermore, the Reynolds stress anisotropy predictions of this invariant neural network are propagated through to the velocity field for two test cases. For both test cases, significant improvement versus baseline RANS linear eddy viscosity and nonlinear eddy viscosity models is demonstrated.« less
Reynolds averaged turbulence modelling using deep neural networks with embedded invariance
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ling, Julia; Kurzawski, Andrew; Templeton, Jeremy
There exists significant demand for improved Reynolds-averaged Navier–Stokes (RANS) turbulence models that are informed by and can represent a richer set of turbulence physics. This paper presents a method of using deep neural networks to learn a model for the Reynolds stress anisotropy tensor from high-fidelity simulation data. A novel neural network architecture is proposed which uses a multiplicative layer with an invariant tensor basis to embed Galilean invariance into the predicted anisotropy tensor. It is demonstrated that this neural network architecture provides improved prediction accuracy compared with a generic neural network architecture that does not embed this invariance property.more » Furthermore, the Reynolds stress anisotropy predictions of this invariant neural network are propagated through to the velocity field for two test cases. For both test cases, significant improvement versus baseline RANS linear eddy viscosity and nonlinear eddy viscosity models is demonstrated.« less
NASA Technical Reports Server (NTRS)
Bell, Thomas L.; Kundu, Prasun K.; Kummerow, Christian D.; Einaudi, Franco (Technical Monitor)
2000-01-01
Quantitative use of satellite-derived maps of monthly rainfall requires some measure of the accuracy of the satellite estimates. The rainfall estimate for a given map grid box is subject to both remote-sensing error and, in the case of low-orbiting satellites, sampling error due to the limited number of observations of the grid box provided by the satellite. A simple model of rain behavior predicts that Root-mean-square (RMS) random error in grid-box averages should depend in a simple way on the local average rain rate, and the predicted behavior has been seen in simulations using surface rain-gauge and radar data. This relationship was examined using satellite SSM/I data obtained over the western equatorial Pacific during TOGA COARE. RMS error inferred directly from SSM/I rainfall estimates was found to be larger than predicted from surface data, and to depend less on local rain rate than was predicted. Preliminary examination of TRMM microwave estimates shows better agreement with surface data. A simple method of estimating rms error in satellite rainfall estimates is suggested, based on quantities that can be directly computed from the satellite data.
A hybrid localization technique for patient tracking.
Rodionov, Denis; Kolev, George; Bushminkin, Kirill
2013-01-01
Nowadays numerous technologies are employed for tracking patients and assets in hospitals or nursing homes. Each of them has advantages and drawbacks. For example, WiFi localization has relatively good accuracy but cannot be used in case of power outage or in the areas with poor WiFi coverage. Magnetometer positioning or cellular network does not have such problems but they are not as accurate as localization with WiFi. This paper describes technique that simultaneously employs different localization technologies for enhancing stability and average accuracy of localization. The proposed algorithm is based on fingerprinting method paired with data fusion and prediction algorithms for estimating the object location. The core idea of the algorithm is technology fusion using error estimation methods. For testing accuracy and performance of the algorithm testing simulation environment has been implemented. Significant accuracy improvement was showed in practical scenarios.
An improved switching converter model. Ph.D. Thesis. Final Report
NASA Technical Reports Server (NTRS)
Shortt, D. J.
1982-01-01
The nonlinear modeling and analysis of dc-dc converters in the continuous mode and discontinuous mode was done by averaging and discrete sampling techniques. A model was developed by combining these two techniques. This model, the discrete average model, accurately predicts the envelope of the output voltage and is easy to implement in circuit and state variable forms. The proposed model is shown to be dependent on the type of duty cycle control. The proper selection of the power stage model, between average and discrete average, is largely a function of the error processor in the feedback loop. The accuracy of the measurement data taken by a conventional technique is affected by the conditions at which the data is collected.
Time series analysis of temporal networks
NASA Astrophysics Data System (ADS)
Sikdar, Sandipan; Ganguly, Niloy; Mukherjee, Animesh
2016-01-01
A common but an important feature of all real-world networks is that they are temporal in nature, i.e., the network structure changes over time. Due to this dynamic nature, it becomes difficult to propose suitable growth models that can explain the various important characteristic properties of these networks. In fact, in many application oriented studies only knowing these properties is sufficient. For instance, if one wishes to launch a targeted attack on a network, this can be done even without the knowledge of the full network structure; rather an estimate of some of the properties is sufficient enough to launch the attack. We, in this paper show that even if the network structure at a future time point is not available one can still manage to estimate its properties. We propose a novel method to map a temporal network to a set of time series instances, analyze them and using a standard forecast model of time series, try to predict the properties of a temporal network at a later time instance. To our aim, we consider eight properties such as number of active nodes, average degree, clustering coefficient etc. and apply our prediction framework on them. We mainly focus on the temporal network of human face-to-face contacts and observe that it represents a stochastic process with memory that can be modeled as Auto-Regressive-Integrated-Moving-Average (ARIMA). We use cross validation techniques to find the percentage accuracy of our predictions. An important observation is that the frequency domain properties of the time series obtained from spectrogram analysis could be used to refine the prediction framework by identifying beforehand the cases where the error in prediction is likely to be high. This leads to an improvement of 7.96% (for error level ≤20%) in prediction accuracy on an average across all datasets. As an application we show how such prediction scheme can be used to launch targeted attacks on temporal networks. Contribution to the Topical Issue "Temporal Network Theory and Applications", edited by Petter Holme.
Cluster-based analysis improves predictive validity of spike-triggered receptive field estimates
Malone, Brian J.
2017-01-01
Spectrotemporal receptive field (STRF) characterization is a central goal of auditory physiology. STRFs are often approximated by the spike-triggered average (STA), which reflects the average stimulus preceding a spike. In many cases, the raw STA is subjected to a threshold defined by gain values expected by chance. However, such correction methods have not been universally adopted, and the consequences of specific gain-thresholding approaches have not been investigated systematically. Here, we evaluate two classes of statistical correction techniques, using the resulting STRF estimates to predict responses to a novel validation stimulus. The first, more traditional technique eliminated STRF pixels (time-frequency bins) with gain values expected by chance. This correction method yielded significant increases in prediction accuracy, including when the threshold setting was optimized for each unit. The second technique was a two-step thresholding procedure wherein clusters of contiguous pixels surviving an initial gain threshold were then subjected to a cluster mass threshold based on summed pixel values. This approach significantly improved upon even the best gain-thresholding techniques. Additional analyses suggested that allowing threshold settings to vary independently for excitatory and inhibitory subfields of the STRF resulted in only marginal additional gains, at best. In summary, augmenting reverse correlation techniques with principled statistical correction choices increased prediction accuracy by over 80% for multi-unit STRFs and by over 40% for single-unit STRFs, furthering the interpretational relevance of the recovered spectrotemporal filters for auditory systems analysis. PMID:28877194
Sweat loss prediction using a multi-model approach
NASA Astrophysics Data System (ADS)
Xu, Xiaojiang; Santee, William R.
2011-07-01
A new multi-model approach (MMA) for sweat loss prediction is proposed to improve prediction accuracy. MMA was computed as the average of sweat loss predicted by two existing thermoregulation models: i.e., the rational model SCENARIO and the empirical model Heat Strain Decision Aid (HSDA). Three independent physiological datasets, a total of 44 trials, were used to compare predictions by MMA, SCENARIO, and HSDA. The observed sweat losses were collected under different combinations of uniform ensembles, environmental conditions (15-40°C, RH 25-75%), and exercise intensities (250-600 W). Root mean square deviation (RMSD), residual plots, and paired t tests were used to compare predictions with observations. Overall, MMA reduced RMSD by 30-39% in comparison with either SCENARIO or HSDA, and increased the prediction accuracy to 66% from 34% or 55%. Of the MMA predictions, 70% fell within the range of mean observed value ± SD, while only 43% of SCENARIO and 50% of HSDA predictions fell within the same range. Paired t tests showed that differences between observations and MMA predictions were not significant, but differences between observations and SCENARIO or HSDA predictions were significantly different for two datasets. Thus, MMA predicted sweat loss more accurately than either of the two single models for the three datasets used. Future work will be to evaluate MMA using additional physiological data to expand the scope of populations and conditions.
Catchment-scale groundwater recharge and vegetation water use efficiency
NASA Astrophysics Data System (ADS)
Troch, P. A. A.; Dwivedi, R.; Liu, T.; Meira, A.; Roy, T.; Valdés-Pineda, R.; Durcik, M.; Arciniega, S.; Brena-Naranjo, J. A.
2017-12-01
Precipitation undergoes a two-step partitioning when it falls on the land surface. At the land surface and in the shallow subsurface, rainfall or snowmelt can either runoff as infiltration/saturation excess or quick subsurface flow. The rest will be stored temporarily in the root zone. From the root zone, water can leave the catchment as evapotranspiration or percolate further and recharge deep storage (e.g. fractured bedrock aquifer). Quantifying the average amount of water that recharges deep storage and sustains low flows is extremely challenging, as we lack reliable methods to quantify this flux at the catchment scale. It was recently shown, however, that for semi-arid catchments in Mexico, an index of vegetation water use efficiency, i.e. the Horton index (HI), could predict deep storage dynamics. Here we test this finding using 247 MOPEX catchments across the conterminous US, including energy-limited catchments. Our results show that the observed HI is indeed a reliable predictor of deep storage dynamics in space and time. We further investigate whether the HI can also predict average recharge rates across the conterminous US. We find that the HI can reliably predict the average recharge rate, estimated from the 50th percentile flow of the flow duration curve. Our results compare favorably with estimates of average recharge rates from the US Geological Survey. Previous research has shown that HI can be reliably estimated based on aridity index, mean slope and mean elevation of a catchment (Voepel et al., 2011). We recalibrated Voepel's model and used it to predict the HI for our 247 catchments. We then used these predicted values of the HI to estimate average recharge rates for our catchments, and compared them with those estimated from observed HI. We find that the accuracies of our predictions based on observed and predicted HI are similar. This provides an estimation method of catchment-scale average recharge rates based on easily derived catchment characteristics, such as climate and topography, and free of discharge measurements.
Time averaging of NMR chemical shifts in the MLF peptide in the solid state.
De Gortari, Itzam; Portella, Guillem; Salvatella, Xavier; Bajaj, Vikram S; van der Wel, Patrick C A; Yates, Jonathan R; Segall, Matthew D; Pickard, Chris J; Payne, Mike C; Vendruscolo, Michele
2010-05-05
Since experimental measurements of NMR chemical shifts provide time and ensemble averaged values, we investigated how these effects should be included when chemical shifts are computed using density functional theory (DFT). We measured the chemical shifts of the N-formyl-L-methionyl-L-leucyl-L-phenylalanine-OMe (MLF) peptide in the solid state, and then used the X-ray structure to calculate the (13)C chemical shifts using the gauge including projector augmented wave (GIPAW) method, which accounts for the periodic nature of the crystal structure, obtaining an overall accuracy of 4.2 ppm. In order to understand the origin of the difference between experimental and calculated chemical shifts, we carried out first-principles molecular dynamics simulations to characterize the molecular motion of the MLF peptide on the picosecond time scale. We found that (13)C chemical shifts experience very rapid fluctuations of more than 20 ppm that are averaged out over less than 200 fs. Taking account of these fluctuations in the calculation of the chemical shifts resulted in an accuracy of 3.3 ppm. To investigate the effects of averaging over longer time scales we sampled the rotameric states populated by the MLF peptides in the solid state by performing a total of 5 micros classical molecular dynamics simulations. By averaging the chemical shifts over these rotameric states, we increased the accuracy of the chemical shift calculations to 3.0 ppm, with less than 1 ppm error in 10 out of 22 cases. These results suggests that better DFT-based predictions of chemical shifts of peptides and proteins will be achieved by developing improved computational strategies capable of taking into account the averaging process up to the millisecond time scale on which the chemical shift measurements report.
Shafizadeh-Moghadam, Hossein; Tayyebi, Amin; Helbich, Marco
2017-06-01
Transition index maps (TIMs) are key products in urban growth simulation models. However, their operationalization is still conflicting. Our aim was to compare the prediction accuracy of three TIM-based spatially explicit land cover change (LCC) models in the mega city of Mumbai, India. These LCC models include two data-driven approaches, namely artificial neural networks (ANNs) and weight of evidence (WOE), and one knowledge-based approach which integrates an analytical hierarchical process with fuzzy membership functions (FAHP). Using the relative operating characteristics (ROC), the performance of these three LCC models were evaluated. The results showed 85%, 75%, and 73% accuracy for the ANN, FAHP, and WOE. The ANN was clearly superior compared to the other LCC models when simulating urban growth for the year 2010; hence, ANN was used to predict urban growth for 2020 and 2030. Projected urban growth maps were assessed using statistical measures, including figure of merit, average spatial distance deviation, producer accuracy, and overall accuracy. Based on our findings, we recomend ANNs as an and accurate method for simulating future patterns of urban growth.
Möstl, C; Isavnin, A; Boakes, P D; Kilpua, E K J; Davies, J A; Harrison, R A; Barnes, D; Krupar, V; Eastwood, J P; Good, S W; Forsyth, R J; Bothmer, V; Reiss, M A; Amerstorfer, T; Winslow, R M; Anderson, B J; Philpott, L C; Rodriguez, L; Rouillard, A P; Gallagher, P; Nieves-Chinchilla, T; Zhang, T L
2017-07-01
We present an advance toward accurately predicting the arrivals of coronal mass ejections (CMEs) at the terrestrial planets, including Earth. For the first time, we are able to assess a CME prediction model using data over two thirds of a solar cycle of observations with the Heliophysics System Observatory. We validate modeling results of 1337 CMEs observed with the Solar Terrestrial Relations Observatory (STEREO) heliospheric imagers (HI) (science data) from 8 years of observations by five in situ observing spacecraft. We use the self-similar expansion model for CME fronts assuming 60° longitudinal width, constant speed, and constant propagation direction. With these assumptions we find that 23%-35% of all CMEs that were predicted to hit a certain spacecraft lead to clear in situ signatures, so that for one correct prediction, two to three false alarms would have been issued. In addition, we find that the prediction accuracy does not degrade with the HI longitudinal separation from Earth. Predicted arrival times are on average within 2.6 ± 16.6 h difference of the in situ arrival time, similar to analytical and numerical modeling, and a true skill statistic of 0.21. We also discuss various factors that may improve the accuracy of space weather forecasting using wide-angle heliospheric imager observations. These results form a first-order approximated baseline of the prediction accuracy that is possible with HI and other methods used for data by an operational space weather mission at the Sun-Earth L5 point.
Isavnin, A.; Boakes, P. D.; Kilpua, E. K. J.; Davies, J. A.; Harrison, R. A.; Barnes, D.; Krupar, V.; Eastwood, J. P.; Good, S. W.; Forsyth, R. J.; Bothmer, V.; Reiss, M. A.; Amerstorfer, T.; Winslow, R. M.; Anderson, B. J.; Philpott, L. C.; Rodriguez, L.; Rouillard, A. P.; Gallagher, P.; Nieves‐Chinchilla, T.; Zhang, T. L.
2017-01-01
Abstract We present an advance toward accurately predicting the arrivals of coronal mass ejections (CMEs) at the terrestrial planets, including Earth. For the first time, we are able to assess a CME prediction model using data over two thirds of a solar cycle of observations with the Heliophysics System Observatory. We validate modeling results of 1337 CMEs observed with the Solar Terrestrial Relations Observatory (STEREO) heliospheric imagers (HI) (science data) from 8 years of observations by five in situ observing spacecraft. We use the self‐similar expansion model for CME fronts assuming 60° longitudinal width, constant speed, and constant propagation direction. With these assumptions we find that 23%–35% of all CMEs that were predicted to hit a certain spacecraft lead to clear in situ signatures, so that for one correct prediction, two to three false alarms would have been issued. In addition, we find that the prediction accuracy does not degrade with the HI longitudinal separation from Earth. Predicted arrival times are on average within 2.6 ± 16.6 h difference of the in situ arrival time, similar to analytical and numerical modeling, and a true skill statistic of 0.21. We also discuss various factors that may improve the accuracy of space weather forecasting using wide‐angle heliospheric imager observations. These results form a first‐order approximated baseline of the prediction accuracy that is possible with HI and other methods used for data by an operational space weather mission at the Sun‐Earth L5 point. PMID:28983209
Combining forecast weights: Why and how?
NASA Astrophysics Data System (ADS)
Yin, Yip Chee; Kok-Haur, Ng; Hock-Eam, Lim
2012-09-01
This paper proposes a procedure called forecast weight averaging which is a specific combination of forecast weights obtained from different methods of constructing forecast weights for the purpose of improving the accuracy of pseudo out of sample forecasting. It is found that under certain specified conditions, forecast weight averaging can lower the mean squared forecast error obtained from model averaging. In addition, we show that in a linear and homoskedastic environment, this superior predictive ability of forecast weight averaging holds true irrespective whether the coefficients are tested by t statistic or z statistic provided the significant level is within the 10% range. By theoretical proofs and simulation study, we have shown that model averaging like, variance model averaging, simple model averaging and standard error model averaging, each produces mean squared forecast error larger than that of forecast weight averaging. Finally, this result also holds true marginally when applied to business and economic empirical data sets, Gross Domestic Product (GDP growth rate), Consumer Price Index (CPI) and Average Lending Rate (ALR) of Malaysia.
Predicting Length of Stay for Obstetric Patients via Electronic Medical Records.
Gao, Cheng; Kho, Abel N; Ivory, Catherine; Osmundson, Sarah; Malin, Bradley A; Chen, You
2017-01-01
Obstetric care refers to the care provided to patients during ante-, intra-, and postpartum periods. Predicting length of stay (LOS) for these patients during their hospitalizations can assist healthcare organizations in allocating hospital resources more effectively and efficiently, ultimately improving maternal care quality and reducing costs to patients. In this paper, we investigate the extent to which LOS can be forecast from a patient's medical history. We introduce a machine learning framework to incorporate a patient's prior conditions (e.g., diagnostic codes) as features in a predictive model for LOS. We evaluate the framework with three years of historical billing data from the electronic medical records of 9188 obstetric patients in a large academic medical center. The results indicate that our framework achieved an average accuracy of 49.3%, which is higher than the baseline accuracy 37.7% (that relies solely on a patient's age). The most predictive features were found to have statistically significant discriminative ability. These features included billing codes for normal delivery (indicative of shorter stay) and antepartum hypertension (indicative of longer stay).
Optimal predictions in everyday cognition: the wisdom of individuals or crowds?
Mozer, Michael C; Pashler, Harold; Homaei, Hadjar
2008-10-01
Griffiths and Tenenbaum (2006) asked individuals to make predictions about the duration or extent of everyday events (e.g., cake baking times), and reported that predictions were optimal, employing Bayesian inference based on veridical prior distributions. Although the predictions conformed strikingly to statistics of the world, they reflect averages over many individuals. On the conjecture that the accuracy of the group response is chiefly a consequence of aggregating across individuals, we constructed simple, heuristic approximations to the Bayesian model premised on the hypothesis that individuals have access merely to a sample of k instances drawn from the relevant distribution. The accuracy of the group response reported by Griffiths and Tenenbaum could be accounted for by supposing that individuals each utilize only two instances. Moreover, the variability of the group data is more consistent with this small-sample hypothesis than with the hypothesis that people utilize veridical or nearly veridical representations of the underlying prior distributions. Our analyses lead to a qualitatively different view of how individuals reason from past experience than the view espoused by Griffiths and Tenenbaum. 2008 Cognitive Science Society, Inc.
PockDrug: A Model for Predicting Pocket Druggability That Overcomes Pocket Estimation Uncertainties.
Borrel, Alexandre; Regad, Leslie; Xhaard, Henri; Petitjean, Michel; Camproux, Anne-Claude
2015-04-27
Predicting protein druggability is a key interest in the target identification phase of drug discovery. Here, we assess the pocket estimation methods' influence on druggability predictions by comparing statistical models constructed from pockets estimated using different pocket estimation methods: a proximity of either 4 or 5.5 Å to a cocrystallized ligand or DoGSite and fpocket estimation methods. We developed PockDrug, a robust pocket druggability model that copes with uncertainties in pocket boundaries. It is based on a linear discriminant analysis from a pool of 52 descriptors combined with a selection of the most stable and efficient models using different pocket estimation methods. PockDrug retains the best combinations of three pocket properties which impact druggability: geometry, hydrophobicity, and aromaticity. It results in an average accuracy of 87.9% ± 4.7% using a test set and exhibits higher accuracy (∼5-10%) than previous studies that used an identical apo set. In conclusion, this study confirms the influence of pocket estimation on pocket druggability prediction and proposes PockDrug as a new model that overcomes pocket estimation variability.
Beaulieu, J; Doerksen, T; Clément, S; MacKay, J; Bousquet, J
2014-01-01
Genomic selection (GS) is of interest in breeding because of its potential for predicting the genetic value of individuals and increasing genetic gains per unit of time. To date, very few studies have reported empirical results of GS potential in the context of large population sizes and long breeding cycles such as for boreal trees. In this study, we assessed the effectiveness of marker-aided selection in an undomesticated white spruce (Picea glauca (Moench) Voss) population of large effective size using a GS approach. A discovery population of 1694 trees representative of 214 open-pollinated families from 43 natural populations was phenotyped for 12 wood and growth traits and genotyped for 6385 single-nucleotide polymorphisms (SNPs) mined in 2660 gene sequences. GS models were built to predict estimated breeding values using all the available SNPs or SNP subsets of the largest absolute effects, and they were validated using various cross-validation schemes. The accuracy of genomic estimated breeding values (GEBVs) varied from 0.327 to 0.435 when the training and the validation data sets shared half-sibs that were on average 90% of the accuracies achieved through traditionally estimated breeding values. The trend was also the same for validation across sites. As expected, the accuracy of GEBVs obtained after cross-validation with individuals of unknown relatedness was lower with about half of the accuracy achieved when half-sibs were present. We showed that with the marker densities used in the current study, predictions with low to moderate accuracy could be obtained within a large undomesticated population of related individuals, potentially resulting in larger gains per unit of time with GS than with the traditional approach. PMID:24781808
How much is a word? Predicting ease of articulation planning from apraxic speech error patterns.
Ziegler, Wolfram; Aichert, Ingrid
2015-08-01
According to intuitive concepts, 'ease of articulation' is influenced by factors like word length or the presence of consonant clusters in an utterance. Imaging studies of speech motor control use these factors to systematically tax the speech motor system. Evidence from apraxia of speech, a disorder supposed to result from speech motor planning impairment after lesions to speech motor centers in the left hemisphere, supports the relevance of these and other factors in disordered speech planning and the genesis of apraxic speech errors. Yet, there is no unified account of the structural properties rendering a word easy or difficult to pronounce. To model the motor planning demands of word articulation by a nonlinear regression model trained to predict the likelihood of accurate word production in apraxia of speech. We used a tree-structure model in which vocal tract gestures are embedded in hierarchically nested prosodic domains to derive a recursive set of terms for the computation of the likelihood of accurate word production. The model was trained with accuracy data from a set of 136 words averaged over 66 samples from apraxic speakers. In a second step, the model coefficients were used to predict a test dataset of accuracy values for 96 new words, averaged over 120 samples produced by a different group of apraxic speakers. Accurate modeling of the first dataset was achieved in the training study (R(2)adj = .71). In the cross-validation, the test dataset was predicted with a high accuracy as well (R(2)adj = .67). The model shape, as reflected by the coefficient estimates, was consistent with current phonetic theories and with clinical evidence. In accordance with phonetic and psycholinguistic work, a strong influence of word stress on articulation errors was found. The proposed model provides a unified and transparent account of the motor planning requirements of word articulation. Copyright © 2015 Elsevier Ltd. All rights reserved.
Energy Expenditure in Critically Ill Elderly Patients: Indirect Calorimetry vs Predictive Equations.
Segadilha, Nara L A L; Rocha, Eduardo E M; Tanaka, Lilian M S; Gomes, Karla L P; Espinoza, Rodolfo E A; Peres, Wilza A F
2017-07-01
Predictive equations (PEs) are used for estimating resting energy expenditure (REE) when the measurements obtained from indirect calorimetry (IC) are not available. This study evaluated the degree of agreement and the accuracy between the REE measured by IC (REE-IC) and REE estimated by PE (REE-PE) in mechanically ventilated elderly patients admitted to the intensive care unit (ICU). REE-IC of 97 critically ill elderly patients was compared with REE-PE by 6 PEs: Harris and Benedict (HB) multiplied by the correction factor of 1.2; European Society for Clinical Nutrition and Metabolism (ESPEN) using the minimum (ESPENmi), average (ESPENme), and maximum (ESPENma) values; Mifflin-St Jeor; Ireton-Jones (IJ); Fredrix; and Lührmann. Degree of agreement between REE-PE and REE-IC was analyzed by the interclass correlation coefficient and the Bland-Altman test. The accuracy was calculated by the percentage of male and/or female patients whose REE-PE values differ by up to ±10% in relation to REE-IC. For both sexes, there was no difference for average REE-IC in kcal/kg when the values obtained with REE-PE by corrected HB and ESPENme were compared. A high level of agreement was demonstrated by corrected HB for both sexes, with greater accuracy for women. The best accuracy in the male group was obtained with the IJ equation but with a low level of agreement. The effectiveness of PEs is limited for estimating REE of critically ill elderly patients. Nonetheless, HB multiplied by a correction factor of 1.2 can be used until a specific PE for this group of patients is developed.
Single trial prediction of self-paced reaching directions from EEG signals.
Lew, Eileen Y L; Chavarriaga, Ricardo; Silvoni, Stefano; Millán, José Del R
2014-01-01
Early detection of movement intention could possibly minimize the delays in the activation of neuroprosthetic devices. As yet, single trial analysis using non-invasive approaches for understanding such movement preparation remains a challenging task. We studied the feasibility of predicting movement directions in self-paced upper limb center-out reaching tasks, i.e., spontaneous movements executed without an external cue that can better reflect natural motor behavior in humans. We reported results of non-invasive electroencephalography (EEG) recorded from mild stroke patients and able-bodied participants. Previous studies have shown that low frequency EEG oscillations are modulated by the intent to move and therefore, can be decoded prior to the movement execution. Motivated by these results, we investigated whether slow cortical potentials (SCPs) preceding movement onset can be used to classify reaching directions and evaluated the performance using 5-fold cross-validation. For able-bodied subjects, we obtained an average decoding accuracy of 76% (chance level of 25%) at 62.5 ms before onset using the amplitude of on-going SCPs with above chance level performances between 875 to 437.5 ms prior to onset. The decoding accuracy for the stroke patients was on average 47% with their paretic arms. Comparison of the decoding accuracy across different frequency ranges (i.e., SCPs, delta, theta, alpha, and gamma) yielded the best accuracy using SCPs filtered between 0.1 to 1 Hz. Across all the subjects, including stroke subjects, the best selected features were obtained mostly from the fronto-parietal regions, hence consistent with previous neurophysiological studies on arm reaching tasks. In summary, we concluded that SCPs allow the possibility of single trial decoding of reaching directions at least 312.5 ms before onset of reach.
Predicting Secchi disk depth from average beam attenuation in a deep, ultra-clear lake
Larson, G.L.; Hoffman, R.L.; Hargreaves, B.R.; Collier, R.W.
2007-01-01
We addressed potential sources of error in estimating the water clarity of mountain lakes by investigating the use of beam transmissometer measurements to estimate Secchi disk depth. The optical properties Secchi disk depth (SD) and beam transmissometer attenuation (BA) were measured in Crater Lake (Crater Lake National Park, Oregon, USA) at a designated sampling station near the maximum depth of the lake. A standard 20 cm black and white disk was used to measure SD. The transmissometer light source had a nearly monochromatic wavelength of 660 nm and a path length of 25 cm. We created a SD prediction model by regression of the inverse SD of 13 measurements recorded on days when environmental conditions were acceptable for disk deployment with BA averaged over the same depth range as the measured SD. The relationship between inverse SD and averaged BA was significant and the average 95% confidence interval for predicted SD relative to the measured SD was ??1.6 m (range = -4.6 to 5.5 m) or ??5.0%. Eleven additional sample dates tested the accuracy of the predictive model. The average 95% confidence interval for these sample dates was ??0.7 m (range = -3.5 to 3.8 m) or ??2.2%. The 1996-2000 time-series means for measured and predicted SD varied by 0.1 m, and the medians varied by 0.5 m. The time-series mean annual measured and predicted SD's also varied little, with intra-annual differences between measured and predicted mean annual SD ranging from -2.1 to 0.1 m. The results demonstrated that this prediction model reliably estimated Secchi disk depths and can be used to significantly expand optical observations in an environment where the conditions for standardized SD deployments are limited. ?? 2007 Springer Science+Business Media B.V.
NASA Astrophysics Data System (ADS)
Hardinata, Lingga; Warsito, Budi; Suparti
2018-05-01
Complexity of bankruptcy causes the accurate models of bankruptcy prediction difficult to be achieved. Various prediction models have been developed to improve the accuracy of bankruptcy predictions. Machine learning has been widely used to predict because of its adaptive capabilities. Artificial Neural Networks (ANN) is one of machine learning which proved able to complete inference tasks such as prediction and classification especially in data mining. In this paper, we propose the implementation of Jordan Recurrent Neural Networks (JRNN) to classify and predict corporate bankruptcy based on financial ratios. Feedback interconnection in JRNN enable to make the network keep important information well allowing the network to work more effectively. The result analysis showed that JRNN works very well in bankruptcy prediction with average success rate of 81.3785%.
Football experts versus sports economists: Whose forecasts are better?
Frick, Bernd; Wicker, Pamela
2016-08-01
Given the uncertainty of outcome in sport, predicting the outcome of sporting contests is a major topic in sport sciences. This study examines the accuracy of expert predictions in the German Bundesliga and compares their predictions to those of sports economists. Prior to the start of each season, a set of distinguished experts (head coaches and players) express their subjective evaluations of the teams in school grades. While experts may be driven by irrational sentiments and may therefore systematically over- or underestimate specific teams, sports economists use observable characteristics to predict season outcomes. The latter typically use team wage bills given the positive pay-performance relationship as well as other factors (average team age, tenure, appearances on national team, and attendance). Using data from 15 consecutive Bundesliga seasons, the predictive accuracy of expert evaluations and sports economists is analysed. The results of separate estimations show that relative grade and relative wage bill significantly affect relative points, while age, tenure, appearances, and attendance are insignificant. In a joint model, relative grade and relative wage bill are still statistically significant, suggesting that the two types of predictions are complements rather than substitutes. Consequently, football experts and sports economists seem to rely on completely different sources of information when making their predictions.
Rapid recipe formulation for plasma etching of new materials
NASA Astrophysics Data System (ADS)
Chopra, Meghali; Zhang, Zizhuo; Ekerdt, John; Bonnecaze, Roger T.
2016-03-01
A fast and inexpensive scheme for etch rate prediction using flexible continuum models and Bayesian statistics is demonstrated. Bulk etch rates of MgO are predicted using a steady-state model with volume-averaged plasma parameters and classical Langmuir surface kinetics. Plasma particle and surface kinetics are modeled within a global plasma framework using single component Metropolis Hastings methods and limited data. The accuracy of these predictions is evaluated with synthetic and experimental etch rate data for magnesium oxide in an ICP-RIE system. This approach is compared and superior to factorial models generated from JMP, a software package frequently employed for recipe creation and optimization.
Tian, Wenliang; Meng, Fandi; Liu, Li; Li, Ying; Wang, Fuhui
2017-01-01
A concept for prediction of organic coatings, based on the alternating hydrostatic pressure (AHP) accelerated tests, has been presented. An AHP accelerated test with different pressure values has been employed to evaluate coating degradation. And a back-propagation artificial neural network (BP-ANN) has been established to predict the service property and the service lifetime of coatings. The pressure value (P), immersion time (t) and service property (impedance modulus |Z|) are utilized as the parameters of the network. The average accuracies of the predicted service property and immersion time by the established network are 98.6% and 84.8%, respectively. The combination of accelerated test and prediction method by BP-ANN is promising to evaluate and predict coating property used in deep sea. PMID:28094340
Genomic and pedigree-based prediction for leaf, stem, and stripe rust resistance in wheat.
Juliana, Philomin; Singh, Ravi P; Singh, Pawan K; Crossa, Jose; Huerta-Espino, Julio; Lan, Caixia; Bhavani, Sridhar; Rutkoski, Jessica E; Poland, Jesse A; Bergstrom, Gary C; Sorrells, Mark E
2017-07-01
Genomic prediction for seedling and adult plant resistance to wheat rusts was compared to prediction using few markers as fixed effects in a least-squares approach and pedigree-based prediction. The unceasing plant-pathogen arms race and ephemeral nature of some rust resistance genes have been challenging for wheat (Triticum aestivum L.) breeding programs and farmers. Hence, it is important to devise strategies for effective evaluation and exploitation of quantitative rust resistance. One promising approach that could accelerate gain from selection for rust resistance is 'genomic selection' which utilizes dense genome-wide markers to estimate the breeding values (BVs) for quantitative traits. Our objective was to compare three genomic prediction models including genomic best linear unbiased prediction (GBLUP), GBLUP A that was GBLUP with selected loci as fixed effects and reproducing kernel Hilbert spaces-markers (RKHS-M) with least-squares (LS) approach, RKHS-pedigree (RKHS-P), and RKHS markers and pedigree (RKHS-MP) to determine the BVs for seedling and/or adult plant resistance (APR) to leaf rust (LR), stem rust (SR), and stripe rust (YR). The 333 lines in the 45th IBWSN and the 313 lines in the 46th IBWSN were genotyped using genotyping-by-sequencing and phenotyped in replicated trials. The mean prediction accuracies ranged from 0.31-0.74 for LR seedling, 0.12-0.56 for LR APR, 0.31-0.65 for SR APR, 0.70-0.78 for YR seedling, and 0.34-0.71 for YR APR. For most datasets, the RKHS-MP model gave the highest accuracies, while LS gave the lowest. GBLUP, GBLUP A, RKHS-M, and RKHS-P models gave similar accuracies. Using genome-wide marker-based models resulted in an average of 42% increase in accuracy over LS. We conclude that GS is a promising approach for improvement of quantitative rust resistance and can be implemented in the breeding pipeline.
ShinyGPAS: interactive genomic prediction accuracy simulator based on deterministic formulas.
Morota, Gota
2017-12-20
Deterministic formulas for the accuracy of genomic predictions highlight the relationships among prediction accuracy and potential factors influencing prediction accuracy prior to performing computationally intensive cross-validation. Visualizing such deterministic formulas in an interactive manner may lead to a better understanding of how genetic factors control prediction accuracy. The software to simulate deterministic formulas for genomic prediction accuracy was implemented in R and encapsulated as a web-based Shiny application. Shiny genomic prediction accuracy simulator (ShinyGPAS) simulates various deterministic formulas and delivers dynamic scatter plots of prediction accuracy versus genetic factors impacting prediction accuracy, while requiring only mouse navigation in a web browser. ShinyGPAS is available at: https://chikudaisei.shinyapps.io/shinygpas/ . ShinyGPAS is a shiny-based interactive genomic prediction accuracy simulator using deterministic formulas. It can be used for interactively exploring potential factors that influence prediction accuracy in genome-enabled prediction, simulating achievable prediction accuracy prior to genotyping individuals, or supporting in-class teaching. ShinyGPAS is open source software and it is hosted online as a freely available web-based resource with an intuitive graphical user interface.
QSAR Modeling of Rat Acute Toxicity by Oral Exposure
Zhu, Hao; Martin, Todd M.; Ye, Lin; Sedykh, Alexander; Young, Douglas M.; Tropsha, Alexander
2009-01-01
Few Quantitative Structure-Activity Relationship (QSAR) studies have successfully modeled large, diverse rodent toxicity endpoints. In this study, a comprehensive dataset of 7,385 compounds with their most conservative lethal dose (LD50) values has been compiled. A combinatorial QSAR approach has been employed to develop robust and predictive models of acute toxicity in rats caused by oral exposure to chemicals. To enable fair comparison between the predictive power of models generated in this study versus a commercial toxicity predictor, TOPKAT (Toxicity Prediction by Komputer Assisted Technology), a modeling subset of the entire dataset was selected that included all 3,472 compounds used in the TOPKAT’s training set. The remaining 3,913 compounds, which were not present in the TOPKAT training set, were used as the external validation set. QSAR models of five different types were developed for the modeling set. The prediction accuracy for the external validation set was estimated by determination coefficient R2 of linear regression between actual and predicted LD50 values. The use of the applicability domain threshold implemented in most models generally improved the external prediction accuracy but expectedly led to the decrease in chemical space coverage; depending on the applicability domain threshold, R2 ranged from 0.24 to 0.70. Ultimately, several consensus models were developed by averaging the predicted LD50 for every compound using all 5 models. The consensus models afforded higher prediction accuracy for the external validation dataset with the higher coverage as compared to individual constituent models. The validated consensus LD50 models developed in this study can be used as reliable computational predictors of in vivo acute toxicity. PMID:19845371
Peng, Yinghu; Zhang, Zhifeng; Gao, Yongchang; Chen, Zhenxian; Xin, Hua; Zhang, Qida; Fan, Xunjian; Jin, Zhongmin
2018-02-01
Ground reaction forces and moments (GRFs and GRMs) measured from force plates in a gait laboratory are usually used as the input conditions to predict the knee joint forces and moments via musculoskeletal (MSK) multibody dynamics (MBD) model. However, the measurements of the GRFs and GRMs data rely on force plates and sometimes are limited by the difficulty in some patient's gait patterns (e.g. treadmill gait). In addition, the force plate calibration error may influence the prediction accuracy of the MSK model. In this study, a prediction method of the GRFs and GRMs based on elastic contact element was integrated into a subject-specific MSK MBD modelling framework of total knee arthroplasty (TKA), and the GRFs and GRMs and knee contact forces (KCFs) during walking were predicted simultaneously with reasonable accuracy. The ground reaction forces and moments were predicted with an average root mean square errors (RMSEs) of 0.021 body weight (BW), 0.014 BW and 0.089 BW in the antero-posterior, medio-lateral and vertical directions and 0.005 BW•body height (BH), 0.011 BW•BH, 0.004 BW•BH in the sagittal, frontal and transverse planes, respectively. Meanwhile, the medial, lateral and total tibiofemoral (TF) contact forces were predicted by the developed MSK model with RMSEs of 0.025-0.032 BW, 0.018-0.022 BW, and 0.089-0.132 BW, respectively. The accuracy of the predicted medial TF contact force was improved by 12% using the present method. The proposed method can extend the application of the MSK model of TKA and is valuable for understanding the in vivo knee biomechanics and tribological conditions without the force plate data. Copyright © 2017 IPEM. Published by Elsevier Ltd. All rights reserved.
Quantitative structure-activity relationship modeling of rat acute toxicity by oral exposure.
Zhu, Hao; Martin, Todd M; Ye, Lin; Sedykh, Alexander; Young, Douglas M; Tropsha, Alexander
2009-12-01
Few quantitative structure-activity relationship (QSAR) studies have successfully modeled large, diverse rodent toxicity end points. In this study, a comprehensive data set of 7385 compounds with their most conservative lethal dose (LD(50)) values has been compiled. A combinatorial QSAR approach has been employed to develop robust and predictive models of acute toxicity in rats caused by oral exposure to chemicals. To enable fair comparison between the predictive power of models generated in this study versus a commercial toxicity predictor, TOPKAT (Toxicity Prediction by Komputer Assisted Technology), a modeling subset of the entire data set was selected that included all 3472 compounds used in TOPKAT's training set. The remaining 3913 compounds, which were not present in the TOPKAT training set, were used as the external validation set. QSAR models of five different types were developed for the modeling set. The prediction accuracy for the external validation set was estimated by determination coefficient R(2) of linear regression between actual and predicted LD(50) values. The use of the applicability domain threshold implemented in most models generally improved the external prediction accuracy but expectedly led to the decrease in chemical space coverage; depending on the applicability domain threshold, R(2) ranged from 0.24 to 0.70. Ultimately, several consensus models were developed by averaging the predicted LD(50) for every compound using all five models. The consensus models afforded higher prediction accuracy for the external validation data set with the higher coverage as compared to individual constituent models. The validated consensus LD(50) models developed in this study can be used as reliable computational predictors of in vivo acute toxicity.
The RAPIDD ebola forecasting challenge: Synthesis and lessons learnt.
Viboud, Cécile; Sun, Kaiyuan; Gaffey, Robert; Ajelli, Marco; Fumanelli, Laura; Merler, Stefano; Zhang, Qian; Chowell, Gerardo; Simonsen, Lone; Vespignani, Alessandro
2018-03-01
Infectious disease forecasting is gaining traction in the public health community; however, limited systematic comparisons of model performance exist. Here we present the results of a synthetic forecasting challenge inspired by the West African Ebola crisis in 2014-2015 and involving 16 international academic teams and US government agencies, and compare the predictive performance of 8 independent modeling approaches. Challenge participants were invited to predict 140 epidemiological targets across 5 different time points of 4 synthetic Ebola outbreaks, each involving different levels of interventions and "fog of war" in outbreak data made available for predictions. Prediction targets included 1-4 week-ahead case incidences, outbreak size, peak timing, and several natural history parameters. With respect to weekly case incidence targets, ensemble predictions based on a Bayesian average of the 8 participating models outperformed any individual model and did substantially better than a null auto-regressive model. There was no relationship between model complexity and prediction accuracy; however, the top performing models for short-term weekly incidence were reactive models with few parameters, fitted to a short and recent part of the outbreak. Individual model outputs and ensemble predictions improved with data accuracy and availability; by the second time point, just before the peak of the epidemic, estimates of final size were within 20% of the target. The 4th challenge scenario - mirroring an uncontrolled Ebola outbreak with substantial data reporting noise - was poorly predicted by all modeling teams. Overall, this synthetic forecasting challenge provided a deep understanding of model performance under controlled data and epidemiological conditions. We recommend such "peace time" forecasting challenges as key elements to improve coordination and inspire collaboration between modeling groups ahead of the next pandemic threat, and to assess model forecasting accuracy for a variety of known and hypothetical pathogens. Published by Elsevier B.V.
Predicting the Types of Ion Channel-Targeted Conotoxins Based on AVC-SVM Model.
Xianfang, Wang; Junmei, Wang; Xiaolei, Wang; Yue, Zhang
2017-01-01
The conotoxin proteins are disulfide-rich small peptides. Predicting the types of ion channel-targeted conotoxins has great value in the treatment of chronic diseases, epilepsy, and cardiovascular diseases. To solve the problem of information redundancy existing when using current methods, a new model is presented to predict the types of ion channel-targeted conotoxins based on AVC (Analysis of Variance and Correlation) and SVM (Support Vector Machine). First, the F value is used to measure the significance level of the feature for the result, and the attribute with smaller F value is filtered by rough selection. Secondly, redundancy degree is calculated by Pearson Correlation Coefficient. And the threshold is set to filter attributes with weak independence to get the result of the refinement. Finally, SVM is used to predict the types of ion channel-targeted conotoxins. The experimental results show the proposed AVC-SVM model reaches an overall accuracy of 91.98%, an average accuracy of 92.17%, and the total number of parameters of 68. The proposed model provides highly useful information for further experimental research. The prediction model will be accessed free of charge at our web server.
Predicting the Types of Ion Channel-Targeted Conotoxins Based on AVC-SVM Model
Xiaolei, Wang
2017-01-01
The conotoxin proteins are disulfide-rich small peptides. Predicting the types of ion channel-targeted conotoxins has great value in the treatment of chronic diseases, epilepsy, and cardiovascular diseases. To solve the problem of information redundancy existing when using current methods, a new model is presented to predict the types of ion channel-targeted conotoxins based on AVC (Analysis of Variance and Correlation) and SVM (Support Vector Machine). First, the F value is used to measure the significance level of the feature for the result, and the attribute with smaller F value is filtered by rough selection. Secondly, redundancy degree is calculated by Pearson Correlation Coefficient. And the threshold is set to filter attributes with weak independence to get the result of the refinement. Finally, SVM is used to predict the types of ion channel-targeted conotoxins. The experimental results show the proposed AVC-SVM model reaches an overall accuracy of 91.98%, an average accuracy of 92.17%, and the total number of parameters of 68. The proposed model provides highly useful information for further experimental research. The prediction model will be accessed free of charge at our web server. PMID:28497044
Study on the medical meteorological forecast of the number of hypertension inpatient based on SVR
NASA Astrophysics Data System (ADS)
Zhai, Guangyu; Chai, Guorong; Zhang, Haifeng
2017-06-01
The purpose of this study is to build a hypertension prediction model by discussing the meteorological factors for hypertension incidence. The research method is selecting the standard data of relative humidity, air temperature, visibility, wind speed and air pressure of Lanzhou from 2010 to 2012(calculating the maximum, minimum and average value with 5 days as a unit ) as the input variables of Support Vector Regression(SVR) and the standard data of hypertension incidence of the same period as the output dependent variables to obtain the optimal prediction parameters by cross validation algorithm, then by SVR algorithm learning and training, a SVR forecast model for hypertension incidence is built. The result shows that the hypertension prediction model is composed of 15 input independent variables, the training accuracy is 0.005, the final error is 0.0026389. The forecast accuracy based on SVR model is 97.1429%, which is higher than statistical forecast equation and neural network prediction method. It is concluded that SVR model provides a new method for hypertension prediction with its simple calculation, small error as well as higher historical sample fitting and Independent sample forecast capability.
Modeling time-to-event (survival) data using classification tree analysis.
Linden, Ariel; Yarnold, Paul R
2017-12-01
Time to the occurrence of an event is often studied in health research. Survival analysis differs from other designs in that follow-up times for individuals who do not experience the event by the end of the study (called censored) are accounted for in the analysis. Cox regression is the standard method for analysing censored data, but the assumptions required of these models are easily violated. In this paper, we introduce classification tree analysis (CTA) as a flexible alternative for modelling censored data. Classification tree analysis is a "decision-tree"-like classification model that provides parsimonious, transparent (ie, easy to visually display and interpret) decision rules that maximize predictive accuracy, derives exact P values via permutation tests, and evaluates model cross-generalizability. Using empirical data, we identify all statistically valid, reproducible, longitudinally consistent, and cross-generalizable CTA survival models and then compare their predictive accuracy to estimates derived via Cox regression and an unadjusted naïve model. Model performance is assessed using integrated Brier scores and a comparison between estimated survival curves. The Cox regression model best predicts average incidence of the outcome over time, whereas CTA survival models best predict either relatively high, or low, incidence of the outcome over time. Classification tree analysis survival models offer many advantages over Cox regression, such as explicit maximization of predictive accuracy, parsimony, statistical robustness, and transparency. Therefore, researchers interested in accurate prognoses and clear decision rules should consider developing models using the CTA-survival framework. © 2017 John Wiley & Sons, Ltd.
Erbe, M; Hayes, B J; Matukumalli, L K; Goswami, S; Bowman, P J; Reich, C M; Mason, B A; Goddard, M E
2012-07-01
Achieving accurate genomic estimated breeding values for dairy cattle requires a very large reference population of genotyped and phenotyped individuals. Assembling such reference populations has been achieved for breeds such as Holstein, but is challenging for breeds with fewer individuals. An alternative is to use a multi-breed reference population, such that smaller breeds gain some advantage in accuracy of genomic estimated breeding values (GEBV) from information from larger breeds. However, this requires that marker-quantitative trait loci associations persist across breeds. Here, we assessed the gain in accuracy of GEBV in Jersey cattle as a result of using a combined Holstein and Jersey reference population, with either 39,745 or 624,213 single nucleotide polymorphism (SNP) markers. The surrogate used for accuracy was the correlation of GEBV with daughter trait deviations in a validation population. Two methods were used to predict breeding values, either a genomic BLUP (GBLUP_mod), or a new method, BayesR, which used a mixture of normal distributions as the prior for SNP effects, including one distribution that set SNP effects to zero. The GBLUP_mod method scaled both the genomic relationship matrix and the additive relationship matrix to a base at the time the breeds diverged, and regressed the genomic relationship matrix to account for sampling errors in estimating relationship coefficients due to a finite number of markers, before combining the 2 matrices. Although these modifications did result in less biased breeding values for Jerseys compared with an unmodified genomic relationship matrix, BayesR gave the highest accuracies of GEBV for the 3 traits investigated (milk yield, fat yield, and protein yield), with an average increase in accuracy compared with GBLUP_mod across the 3 traits of 0.05 for both Jerseys and Holsteins. The advantage was limited for either Jerseys or Holsteins in using 624,213 SNP rather than 39,745 SNP (0.01 for Holsteins and 0.03 for Jerseys, averaged across traits). Even this limited and nonsignificant advantage was only observed when BayesR was used. An alternative panel, which extracted the SNP in the transcribed part of the bovine genome from the 624,213 SNP panel (to give 58,532 SNP), performed better, with an increase in accuracy of 0.03 for Jerseys across traits. This panel captures much of the increased genomic content of the 624,213 SNP panel, with the advantage of a greatly reduced number of SNP effects to estimate. Taken together, using this panel, a combined breed reference and using BayesR rather than GBLUP_mod increased the accuracy of GEBV in Jerseys from 0.43 to 0.52, averaged across the 3 traits. Copyright © 2012 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Porto, William F.; Pires, Állan S.; Franco, Octavio L.
2012-01-01
The antimicrobial peptides (AMP) have been proposed as an alternative to control resistant pathogens. However, due to multifunctional properties of several AMP classes, until now there has been no way to perform efficient AMP identification, except through in vitro and in vivo tests. Nevertheless, an indication of activity can be provided by prediction methods. In order to contribute to the AMP prediction field, the CS-AMPPred (Cysteine-Stabilized Antimicrobial Peptides Predictor) is presented here, consisting of an updated version of the Support Vector Machine (SVM) model for antimicrobial activity prediction in cysteine-stabilized peptides. The CS-AMPPred is based on five sequence descriptors: indexes of (i) α-helix and (ii) loop formation; and averages of (iii) net charge, (iv) hydrophobicity and (v) flexibility. CS-AMPPred was based on 310 cysteine-stabilized AMPs and 310 sequences extracted from PDB. The polynomial kernel achieves the best accuracy on 5-fold cross validation (85.81%), while the radial and linear kernels achieve 84.19%. Testing in a blind data set, the polynomial and radial kernels achieve an accuracy of 90.00%, while the linear model achieves 89.33%. The three models reach higher accuracies than previously described methods. A standalone version of CS-AMPPred is available for download at
Artificial Intelligence Can Predict Daily Trauma Volume and Average Acuity.
Stonko, David P; Dennis, Bradley M; Betzold, Richard D; Peetz, Allan B; Gunter, Oliver L; Guillamondegui, Oscar D
2018-04-19
The goal of this study was to integrate temporal and weather data in order to create an artificial neural network (ANN) to predict trauma volume, the number of emergent operative cases, and average daily acuity at a level 1 trauma center. Trauma admission data from TRACS and weather data from the National Oceanic and Atmospheric Administration (NOAA) was collected for all adult trauma patients from July 2013-June 2016. The ANN was constructed using temporal (time, day of week), and weather factors (daily high, active precipitation) to predict four points of daily trauma activity: number of traumas, number of penetrating traumas, average ISS, and number of immediate OR cases per day. We trained a two-layer feed-forward network with 10 sigmoid hidden neurons via the Levenberg-Marquardt backpropagation algorithm, and performed k-fold cross validation and accuracy calculations on 100 randomly generated partitions. 10,612 patients over 1,096 days were identified. The ANN accurately predicted the daily trauma distribution in terms of number of traumas, number of penetrating traumas, number of OR cases, and average daily ISS (combined training correlation coefficient r = 0.9018+/-0.002; validation r = 0.8899+/- 0.005; testing r = 0.8940+/-0.006). We were able to successfully predict trauma and emergent operative volume, and acuity using an ANN by integrating local weather and trauma admission data from a level 1 center. As an example, for June 30, 2016, it predicted 9.93 traumas (actual: 10), and a mean ISS score of 15.99 (actual: 13.12); see figure 3. This may prove useful for predicting trauma needs across the system and hospital administration when allocating limited resources. Level III STUDY TYPE: Prognostic/Epidemiological.
NASA Astrophysics Data System (ADS)
Bilalic, Rusmir
A novel application of support vector machines (SVMs), artificial neural networks (ANNs), and Gaussian processes (GPs) for machine learning (GPML) to model microcontroller unit (MCU) upset due to intentional electromagnetic interference (IEMI) is presented. In this approach, an MCU performs a counting operation (0-7) while electromagnetic interference in the form of a radio frequency (RF) pulse is direct-injected into the MCU clock line. Injection times with respect to the clock signal are the clock low, clock rising edge, clock high, and the clock falling edge periods in the clock window during which the MCU is performing initialization and executing the counting procedure. The intent is to cause disruption in the counting operation and model the probability of effect (PoE) using machine learning tools. Five experiments were executed as part of this research, each of which contained a set of 38,300 training points and 38,300 test points, for a total of 383,000 total points with the following experiment variables: injection times with respect to the clock signal, injected RF power, injected RF pulse width, and injected RF frequency. For the 191,500 training points, the average training error was 12.47%, while for the 191,500 test points the average test error was 14.85%, meaning that on average, the machine was able to predict MCU upset with an 85.15% accuracy. Leaving out the results for the worst-performing model (SVM with a linear kernel), the test prediction accuracy for the remaining machines is almost 89%. All three machine learning methods (ANNs, SVMs, and GPML) showed excellent and consistent results in their ability to model and predict the PoE on an MCU due to IEMI. The GP approach performed best during training with a 7.43% average training error, while the ANN technique was most accurate during the test with a 10.80% error.
Vehicle Mode and Driving Activity Detection Based on Analyzing Sensor Data of Smartphones.
Lu, Dang-Nhac; Nguyen, Duc-Nhan; Nguyen, Thi-Hau; Nguyen, Ha-Nam
2018-03-29
In this paper, we present a flexible combined system, namely the Vehicle mode-driving Activity Detection System (VADS), that is capable of detecting either the current vehicle mode or the current driving activity of travelers. Our proposed system is designed to be lightweight in computation and very fast in response to the changes of travelers' vehicle modes or driving events. The vehicle mode detection module is responsible for recognizing both motorized vehicles, such as cars, buses, and motorbikes, and non-motorized ones, for instance, walking, and bikes. It relies only on accelerometer data in order to minimize the energy consumption of smartphones. By contrast, the driving activity detection module uses the data collected from the accelerometer, gyroscope, and magnetometer of a smartphone to detect various driving activities, i.e., stopping, going straight, turning left, and turning right. Furthermore, we propose a method to compute the optimized data window size and the optimized overlapping ratio for each vehicle mode and each driving event from the training datasets. The experimental results show that this strategy significantly increases the overall prediction accuracy. Additionally, numerous experiments are carried out to compare the impact of different feature sets (time domain features, frequency domain features, Hjorth features) as well as the impact of various classification algorithms (Random Forest, Naïve Bayes, Decision tree J48, K Nearest Neighbor, Support Vector Machine) contributing to the prediction accuracy. Our system achieves an average accuracy of 98.33% in detecting the vehicle modes and an average accuracy of 98.95% in recognizing the driving events of motorcyclists when using the Random Forest classifier and a feature set containing time domain features, frequency domain features, and Hjorth features. Moreover, on a public dataset of HTC company in New Taipei, Taiwan, our framework obtains the overall accuracy of 97.33% that is considerably higher than that of the state-of the art.
Cuyabano, B C D; Su, G; Rosa, G J M; Lund, M S; Gianola, D
2015-10-01
This study compared the accuracy of genome-enabled prediction models using individual single nucleotide polymorphisms (SNP) or haplotype blocks as covariates when using either a single breed or a combined population of Nordic Red cattle. The main objective was to compare predictions of breeding values of complex traits using a combined training population with haplotype blocks, with predictions using a single breed as training population and individual SNP as predictors. To compare the prediction reliabilities, bootstrap samples were taken from the test data set. With the bootstrapped samples of prediction reliabilities, we built and graphed confidence ellipses to allow comparisons. Finally, measures of statistical distances were used to calculate the gain in predictive ability. Our analyses are innovative in the context of assessment of predictive models, allowing a better understanding of prediction reliabilities and providing a statistical basis to effectively calibrate whether one prediction scenario is indeed more accurate than another. An ANOVA indicated that use of haplotype blocks produced significant gains mainly when Bayesian mixture models were used but not when Bayesian BLUP was fitted to the data. Furthermore, when haplotype blocks were used to train prediction models in a combined Nordic Red cattle population, we obtained up to a statistically significant 5.5% average gain in prediction accuracy, over predictions using individual SNP and training the model with a single breed. Copyright © 2015 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Experimental and casework validation of ambient temperature corrections in forensic entomology.
Johnson, Aidan P; Wallman, James F; Archer, Melanie S
2012-01-01
This paper expands on Archer (J Forensic Sci 49, 2004, 553), examining additional factors affecting ambient temperature correction of weather station data in forensic entomology. Sixteen hypothetical body discovery sites (BDSs) in Victoria and New South Wales (Australia), both in autumn and in summer, were compared to test whether the accuracy of correlation was affected by (i) length of correlation period; (ii) distance between BDS and weather station; and (iii) periodicity of ambient temperature measurements. The accuracy of correlations in data sets from real Victorian and NSW forensic entomology cases was also examined. Correlations increased weather data accuracy in all experiments, but significant differences in accuracy were found only between periodicity treatments. We found that a >5°C difference between average values of body in situ and correlation period weather station data was predictive of correlations that decreased the accuracy of ambient temperatures estimated using correlation. Practitioners should inspect their weather data sets for such differences. © 2011 American Academy of Forensic Sciences.
Bridging a translational gap: using machine learning to improve the prediction of PTSD.
Karstoft, Karen-Inge; Galatzer-Levy, Isaac R; Statnikov, Alexander; Li, Zhiguo; Shalev, Arieh Y
2015-03-16
Predicting Posttraumatic Stress Disorder (PTSD) is a pre-requisite for targeted prevention. Current research has identified group-level risk-indicators, many of which (e.g., head trauma, receiving opiates) concern but a subset of survivors. Identifying interchangeable sets of risk indicators may increase the efficiency of early risk assessment. The study goal is to use supervised machine learning (ML) to uncover interchangeable, maximally predictive combinations of early risk indicators. Data variables (features) reflecting event characteristics, emergency department (ED) records and early symptoms were collected in 957 trauma survivors within ten days of ED admission, and used to predict PTSD symptom trajectories during the following fifteen months. A Target Information Equivalence Algorithm (TIE*) identified all minimal sets of features (Markov Boundaries; MBs) that maximized the prediction of a non-remitting PTSD symptom trajectory when integrated in a support vector machine (SVM). The predictive accuracy of each set of predictors was evaluated in a repeated 10-fold cross-validation and expressed as average area under the Receiver Operating Characteristics curve (AUC) for all validation trials. The average number of MBs per cross validation was 800. MBs' mean AUC was 0.75 (95% range: 0.67-0.80). The average number of features per MB was 18 (range: 12-32) with 13 features present in over 75% of the sets. Our findings support the hypothesized existence of multiple and interchangeable sets of risk indicators that equally and exhaustively predict non-remitting PTSD. ML's ability to increase prediction versatility is a promising step towards developing algorithmic, knowledge-based, personalized prediction of post-traumatic psychopathology.
Wu, J; Awate, S P; Licht, D J; Clouchoux, C; du Plessis, A J; Avants, B B; Vossough, A; Gee, J C; Limperopoulos, C
2015-07-01
Traditional methods of dating a pregnancy based on history or sonographic assessment have a large variation in the third trimester. We aimed to assess the ability of various quantitative measures of brain cortical folding on MR imaging in determining fetal gestational age in the third trimester. We evaluated 8 different quantitative cortical folding measures to predict gestational age in 33 healthy fetuses by using T2-weighted fetal MR imaging. We compared the accuracy of the prediction of gestational age by these cortical folding measures with the accuracy of prediction by brain volume measurement and by a previously reported semiquantitative visual scale of brain maturity. Regression models were constructed, and measurement biases and variances were determined via a cross-validation procedure. The cortical folding measures are accurate in the estimation and prediction of gestational age (mean of the absolute error, 0.43 ± 0.45 weeks) and perform better than (P = .024) brain volume (mean of the absolute error, 0.72 ± 0.61 weeks) or sonography measures (SDs approximately 1.5 weeks, as reported in literature). Prediction accuracy is comparable with that of the semiquantitative visual assessment score (mean, 0.57 ± 0.41 weeks). Quantitative cortical folding measures such as global average curvedness can be an accurate and reliable estimator of gestational age and brain maturity for healthy fetuses in the third trimester and have the potential to be an indicator of brain-growth delays for at-risk fetuses and preterm neonates. © 2015 by American Journal of Neuroradiology.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tessum, C. W.; Hill, J. D.; Marshall, J. D.
We present results from and evaluate the performance of a 12-month, 12 km horizontal resolution year 2005 air pollution simulation for the contiguous United States using the WRF-Chem (Weather Research and Forecasting with Chemistry) meteorology and chemical transport model (CTM). We employ the 2005 US National Emissions Inventory, the Regional Atmospheric Chemistry Mechanism (RACM), and the Modal Aerosol Dynamics Model for Europe (MADE) with a volatility basis set (VBS) secondary aerosol module. Overall, model performance is comparable to contemporary modeling efforts used for regulatory and health-effects analysis, with an annual average daytime ozone (O 3) mean fractional bias (MFB) ofmore » 12% and an annual average fine particulate matter (PM 2.5) MFB of −1%. WRF-Chem, as configured here, tends to overpredict total PM 2.5 at some high concentration locations and generally overpredicts average 24 h O 3 concentrations. Performance is better at predicting daytime-average and daily peak O 3 concentrations, which are more relevant for regulatory and health effects analyses relative to annual average values. Predictive performance for PM 2.5 subspecies is mixed: the model overpredicts particulate sulfate (MFB = 36%), underpredicts particulate nitrate (MFB = −110%) and organic carbon (MFB = −29%), and relatively accurately predicts particulate ammonium (MFB = 3%) and elemental carbon (MFB = 3%), so that the accuracy in total PM 2.5 predictions is to some extent a function of offsetting over- and underpredictions of PM 2.5 subspecies. Model predictive performance for PM 2.5 and its subspecies is in general worse in winter and in the western US than in other seasons and regions, suggesting spatial and temporal opportunities for future WRF-Chem model development and evaluation.« less
Tessum, C. W.; Hill, J. D.; Marshall, J. D.
2015-04-07
We present results from and evaluate the performance of a 12-month, 12 km horizontal resolution year 2005 air pollution simulation for the contiguous United States using the WRF-Chem (Weather Research and Forecasting with Chemistry) meteorology and chemical transport model (CTM). We employ the 2005 US National Emissions Inventory, the Regional Atmospheric Chemistry Mechanism (RACM), and the Modal Aerosol Dynamics Model for Europe (MADE) with a volatility basis set (VBS) secondary aerosol module. Overall, model performance is comparable to contemporary modeling efforts used for regulatory and health-effects analysis, with an annual average daytime ozone (O 3) mean fractional bias (MFB) ofmore » 12% and an annual average fine particulate matter (PM 2.5) MFB of −1%. WRF-Chem, as configured here, tends to overpredict total PM 2.5 at some high concentration locations and generally overpredicts average 24 h O 3 concentrations. Performance is better at predicting daytime-average and daily peak O 3 concentrations, which are more relevant for regulatory and health effects analyses relative to annual average values. Predictive performance for PM 2.5 subspecies is mixed: the model overpredicts particulate sulfate (MFB = 36%), underpredicts particulate nitrate (MFB = −110%) and organic carbon (MFB = −29%), and relatively accurately predicts particulate ammonium (MFB = 3%) and elemental carbon (MFB = 3%), so that the accuracy in total PM 2.5 predictions is to some extent a function of offsetting over- and underpredictions of PM 2.5 subspecies. Model predictive performance for PM 2.5 and its subspecies is in general worse in winter and in the western US than in other seasons and regions, suggesting spatial and temporal opportunities for future WRF-Chem model development and evaluation.« less
Hua, Zhi-Gang; Lin, Yan; Yuan, Ya-Zhou; Yang, De-Chang; Wei, Wen; Guo, Feng-Biao
2015-07-01
In 2003, we developed an ab initio program, ZCURVE 1.0, to find genes in bacterial and archaeal genomes. In this work, we present the updated version (i.e. ZCURVE 3.0). Using 422 prokaryotic genomes, the average accuracy was 93.7% with the updated version, compared with 88.7% with the original version. Such results also demonstrate that ZCURVE 3.0 is comparable with Glimmer 3.02 and may provide complementary predictions to it. In fact, the joint application of the two programs generated better results by correctly finding more annotated genes while also containing fewer false-positive predictions. As the exclusive function, ZCURVE 3.0 contains one post-processing program that can identify essential genes with high accuracy (generally >90%). We hope ZCURVE 3.0 will receive wide use with the web-based running mode. The updated ZCURVE can be freely accessed from http://cefg.uestc.edu.cn/zcurve/ or http://tubic.tju.edu.cn/zcurveb/ without any restrictions. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Predicting drug-induced liver injury using ensemble learning methods and molecular fingerprints.
Ai, Haixin; Chen, Wen; Zhang, Li; Huang, Liangchao; Yin, Zimo; Hu, Huan; Zhao, Qi; Zhao, Jian; Liu, Hongsheng
2018-05-21
Drug-induced liver injury (DILI) is a major safety concern in the drug-development process, and various methods have been proposed to predict the hepatotoxicity of compounds during the early stages of drug trials. In this study, we developed an ensemble model using three machine learning algorithms and 12 molecular fingerprints from a dataset containing 1,241 diverse compounds. The ensemble model achieved an average accuracy of 71.1±2.6%, sensitivity of 79.9±3.6%, specificity of 60.3±4.8%, and area under the receiver operating characteristic curve (AUC) of 0.764±0.026 in five-fold cross-validation and an accuracy of 84.3%, sensitivity of 86.9%, specificity of 75.4%, and AUC of 0.904 in an external validation dataset of 286 compounds collected from the Liver Toxicity Knowledge Base (LTKB). Compared with previous methods, the ensemble model achieved relatively high accuracy and sensitivity. We also identified several substructures related to DILI. In addition, we provide a web server offering access to our models (http://ccsipb.lnu.edu.cn/toxicity/HepatoPred-EL/).
Tissue shrinkage in microwave ablation of liver: an ex vivo predictive model.
Amabile, Claudio; Farina, Laura; Lopresto, Vanni; Pinto, Rosanna; Cassarino, Simone; Tosoratti, Nevio; Goldberg, S Nahum; Cavagnaro, Marta
2017-02-01
The aim of this study was to develop a predictive model of the shrinkage of liver tissues in microwave ablation. Thirty-seven cuboid specimens of ex vivo bovine liver of size ranging from 2 cm to 8 cm were heated exploiting different techniques: 1) using a microwave oven (2.45 GHz) operated at 420 W, 500 W and 700 W for 8 to 20 min, achieving complete carbonisation of the specimens, 2) using a radiofrequency ablation apparatus (450 kHz) operated at 70 W for a time ranging from 6 to 7.5 min obtaining white coagulation of the specimens, and 3) using a microwave (2.45 GHz) ablation apparatus operated at 60 W for 10 min. Measurements of specimen dimensions, carbonised and coagulated regions were performed using a ruler with an accuracy of 1 mm. Based on the results of the first two experiments a predictive model for the contraction of liver tissue from microwave ablation was constructed and compared to the result of the third experiment. For carbonised tissue, a linear contraction of 31 ± 6% was obtained independently of the heating source, power and operation time. Radiofrequency experiments determined that the average percentage linear contraction of white coagulated tissue was 12 ± 5%. The average accuracy of our model was determined to be 3 mm (5%). The proposed model allows the prediction of the shrinkage of liver tissues upon microwave ablation given the extension of the carbonised and coagulated zones. This may be useful in helping to predict whether sufficient tissue volume is ablated in clinical practice.
Paracrine communication maximizes cellular response fidelity in wound signaling
Handly, L Naomi; Pilko, Anna; Wollman, Roy
2015-01-01
Population averaging due to paracrine communication can arbitrarily reduce cellular response variability. Yet, variability is ubiquitously observed, suggesting limits to paracrine averaging. It remains unclear whether and how biological systems may be affected by such limits of paracrine signaling. To address this question, we quantify the signal and noise of Ca2+ and ERK spatial gradients in response to an in vitro wound within a novel microfluidics-based device. We find that while paracrine communication reduces gradient noise, it also reduces the gradient magnitude. Accordingly we predict the existence of a maximum gradient signal to noise ratio. Direct in vitro measurement of paracrine communication verifies these predictions and reveals that cells utilize optimal levels of paracrine signaling to maximize the accuracy of gradient-based positional information. Our results demonstrate the limits of population averaging and show the inherent tradeoff in utilizing paracrine communication to regulate cellular response fidelity. DOI: http://dx.doi.org/10.7554/eLife.09652.001 PMID:26448485
AMS 4.0: consensus prediction of post-translational modifications in protein sequences.
Plewczynski, Dariusz; Basu, Subhadip; Saha, Indrajit
2012-08-01
We present here the 2011 update of the AutoMotif Service (AMS 4.0) that predicts the wide selection of 88 different types of the single amino acid post-translational modifications (PTM) in protein sequences. The selection of experimentally confirmed modifications is acquired from the latest UniProt and Phospho.ELM databases for training. The sequence vicinity of each modified residue is represented using amino acids physico-chemical features encoded using high quality indices (HQI) obtaining by automatic clustering of known indices extracted from AAindex database. For each type of the numerical representation, the method builds the ensemble of Multi-Layer Perceptron (MLP) pattern classifiers, each optimising different objectives during the training (for example the recall, precision or area under the ROC curve (AUC)). The consensus is built using brainstorming technology, which combines multi-objective instances of machine learning algorithm, and the data fusion of different training objects representations, in order to boost the overall prediction accuracy of conserved short sequence motifs. The performance of AMS 4.0 is compared with the accuracy of previous versions, which were constructed using single machine learning methods (artificial neural networks, support vector machine). Our software improves the average AUC score of the earlier version by close to 7 % as calculated on the test datasets of all 88 PTM types. Moreover, for the selected most-difficult sequence motifs types it is able to improve the prediction performance by almost 32 %, when compared with previously used single machine learning methods. Summarising, the brainstorming consensus meta-learning methodology on the average boosts the AUC score up to around 89 %, averaged over all 88 PTM types. Detailed results for single machine learning methods and the consensus methodology are also provided, together with the comparison to previously published methods and state-of-the-art software tools. The source code and precompiled binaries of brainstorming tool are available at http://code.google.com/p/automotifserver/ under Apache 2.0 licensing.
Berman, Jesse D; Peters, Thomas M; Koehler, Kirsten A
2018-05-28
To design a method that uses preliminary hazard mapping data to optimize the number and location of sensors within a network for a long-term assessment of occupational concentrations, while preserving temporal variability, accuracy, and precision of predicted hazards. Particle number concentrations (PNCs) and respirable mass concentrations (RMCs) were measured with direct-reading instruments in a large heavy-vehicle manufacturing facility at 80-82 locations during 7 mapping events, stratified by day and season. Using kriged hazard mapping, a statistical approach identified optimal orders for removing locations to capture temporal variability and high prediction precision of PNC and RMC concentrations. We compared optimal-removal, random-removal, and least-optimal-removal orders to bound prediction performance. The temporal variability of PNC was found to be higher than RMC with low correlation between the two particulate metrics (ρ = 0.30). Optimal-removal orders resulted in more accurate PNC kriged estimates (root mean square error [RMSE] = 49.2) at sample locations compared with random-removal order (RMSE = 55.7). For estimates at locations having concentrations in the upper 10th percentile, the optimal-removal order preserved average estimated concentrations better than random- or least-optimal-removal orders (P < 0.01). However, estimated average concentrations using an optimal-removal were not statistically different than random-removal when averaged over the entire facility. No statistical difference was observed for optimal- and random-removal methods for RMCs that were less variable in time and space than PNCs. Optimized removal performed better than random-removal in preserving high temporal variability and accuracy of hazard map for PNC, but not for the more spatially homogeneous RMC. These results can be used to reduce the number of locations used in a network of static sensors for long-term monitoring of hazards in the workplace, without sacrificing prediction performance.
Integration of RNA-Seq and RPPA data for survival time prediction in cancer patients.
Isik, Zerrin; Ercan, Muserref Ece
2017-10-01
Integration of several types of patient data in a computational framework can accelerate the identification of more reliable biomarkers, especially for prognostic purposes. This study aims to identify biomarkers that can successfully predict the potential survival time of a cancer patient by integrating the transcriptomic (RNA-Seq), proteomic (RPPA), and protein-protein interaction (PPI) data. The proposed method -RPBioNet- employs a random walk-based algorithm that works on a PPI network to identify a limited number of protein biomarkers. Later, the method uses gene expression measurements of the selected biomarkers to train a classifier for the survival time prediction of patients. RPBioNet was applied to classify kidney renal clear cell carcinoma (KIRC), glioblastoma multiforme (GBM), and lung squamous cell carcinoma (LUSC) patients based on their survival time classes (long- or short-term). The RPBioNet method correctly identified the survival time classes of patients with between 66% and 78% average accuracy for three data sets. RPBioNet operates with only 20 to 50 biomarkers and can achieve on average 6% higher accuracy compared to the closest alternative method, which uses only RNA-Seq data in the biomarker selection. Further analysis of the most predictive biomarkers highlighted genes that are common for both cancer types, as they may be driver proteins responsible for cancer progression. The novelty of this study is the integration of a PPI network with mRNA and protein expression data to identify more accurate prognostic biomarkers that can be used for clinical purposes in the future. Copyright © 2017 Elsevier Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shiraishi, Satomi; Moore, Kevin L., E-mail: kevinmoore@ucsd.edu
Purpose: To demonstrate knowledge-based 3D dose prediction for external beam radiotherapy. Methods: Using previously treated plans as training data, an artificial neural network (ANN) was trained to predict a dose matrix based on patient-specific geometric and planning parameters, such as the closest distance (r) to planning target volume (PTV) and organ-at-risks (OARs). Twenty-three prostate and 43 stereotactic radiosurgery/radiotherapy (SRS/SRT) cases with at least one nearby OAR were studied. All were planned with volumetric-modulated arc therapy to prescription doses of 81 Gy for prostate and 12–30 Gy for SRS. Using these clinically approved plans, ANNs were trained to predict dose matrixmore » and the predictive accuracy was evaluated using the dose difference between the clinical plan and prediction, δD = D{sub clin} − D{sub pred}. The mean (〈δD{sub r}〉), standard deviation (σ{sub δD{sub r}}), and their interquartile range (IQR) for the training plans were evaluated at a 2–3 mm interval from the PTV boundary (r{sub PTV}) to assess prediction bias and precision. Initially, unfiltered models which were trained using all plans in the cohorts were created for each treatment site. The models predict approximately the average quality of OAR sparing. Emphasizing a subset of plans that exhibited superior to the average OAR sparing during training, refined models were created to predict high-quality rectum sparing for prostate and brainstem sparing for SRS. Using the refined model, potentially suboptimal plans were identified where the model predicted further sparing of the OARs was achievable. Replans were performed to test if the OAR sparing could be improved as predicted by the model. Results: The refined models demonstrated highly accurate dose distribution prediction. For prostate cases, the average prediction bias for all voxels irrespective of organ delineation ranged from −1% to 0% with maximum IQR of 3% over r{sub PTV} ∈ [ − 6, 30] mm. The average prediction error was less than 10% for the same r{sub PTV} range. For SRS cases, the average prediction bias ranged from −0.7% to 1.5% with maximum IQR of 5% over r{sub PTV} ∈ [ − 4, 32] mm. The average prediction error was less than 8%. Four potentially suboptimal plans were identified for each site and subsequent replanning demonstrated improved sparing of rectum and brainstem. Conclusions: The study demonstrates highly accurate knowledge-based 3D dose predictions for radiotherapy plans.« less
Prediction of Potential Hit Song and Musical Genre Using Artificial Neural Networks
NASA Astrophysics Data System (ADS)
Monterola, Christopher; Abundo, Cheryl; Tugaff, Jeric; Venturina, Lorcel Ericka
Accurately quantifying the goodness of music based on the seemingly subjective taste of the public is a multi-million industry. Recording companies can make sound decisions on which songs or artists to prioritize if accurate forecasting is achieved. We extract 56 single-valued musical features (e.g. pitch and tempo) from 380 Original Pilipino Music (OPM) songs (190 are hit songs) released from 2004 to 2006. Based on an effect size criterion which measures a variable's discriminating power, the 20 highest ranked features are fed to a classifier tasked to predict hit songs. We show that regardless of musical genre, a trained feed-forward neural network (NN) can predict potential hit songs with an average accuracy of ΦNN = 81%. The accuracy is about +20% higher than those of standard classifiers such as linear discriminant analysis (LDA, ΦLDA = 61%) and classification and regression trees (CART, ΦCART = 57%). Both LDA and CART are above the proportional chance criterion (PCC, ΦPCC = 50%) but are slightly below the suggested acceptable classifier requirement of 1.25*ΦPCC = 63%. Utilizing a similar procedure, we demonstrate that different genres (ballad, alternative rock or rock) of OPM songs can be automatically classified with near perfect accuracy using LDA or NN but only around 77% using CART.
Building machine learning force fields for nanoclusters
NASA Astrophysics Data System (ADS)
Zeni, Claudio; Rossi, Kevin; Glielmo, Aldo; Fekete, Ádám; Gaston, Nicola; Baletto, Francesca; De Vita, Alessandro
2018-06-01
We assess Gaussian process (GP) regression as a technique to model interatomic forces in metal nanoclusters by analyzing the performance of 2-body, 3-body, and many-body kernel functions on a set of 19-atom Ni cluster structures. We find that 2-body GP kernels fail to provide faithful force estimates, despite succeeding in bulk Ni systems. However, both 3- and many-body kernels predict forces within an ˜0.1 eV/Å average error even for small training datasets and achieve high accuracy even on out-of-sample, high temperature structures. While training and testing on the same structure always provide satisfactory accuracy, cross-testing on dissimilar structures leads to higher prediction errors, posing an extrapolation problem. This can be cured using heterogeneous training on databases that contain more than one structure, which results in a good trade-off between versatility and overall accuracy. Starting from a 3-body kernel trained this way, we build an efficient non-parametric 3-body force field that allows accurate prediction of structural properties at finite temperatures, following a newly developed scheme [A. Glielmo et al., Phys. Rev. B 95, 214302 (2017)]. We use this to assess the thermal stability of Ni19 nanoclusters at a fractional cost of full ab initio calculations.
NASA Astrophysics Data System (ADS)
Hemmat Esfe, Mohammad; Tatar, Afshin; Ahangar, Mohammad Reza Hassani; Rostamian, Hossein
2018-02-01
Since the conventional thermal fluids such as water, oil, and ethylene glycol have poor thermal properties, the tiny solid particles are added to these fluids to increase their heat transfer improvement. As viscosity determines the rheological behavior of a fluid, studying the parameters affecting the viscosity is crucial. Since the experimental measurement of viscosity is expensive and time consuming, predicting this parameter is the apt method. In this work, three artificial intelligence methods containing Genetic Algorithm-Radial Basis Function Neural Networks (GA-RBF), Least Square Support Vector Machine (LS-SVM) and Gene Expression Programming (GEP) were applied to predict the viscosity of TiO2/SAE 50 nano-lubricant with Non-Newtonian power-law behavior using experimental data. The correlation factor (R2), Average Absolute Relative Deviation (AARD), Root Mean Square Error (RMSE), and Margin of Deviation were employed to investigate the accuracy of the proposed models. RMSE values of 0.58, 1.28, and 6.59 and R2 values of 0.99998, 0.99991, and 0.99777 reveal the accuracy of the proposed models for respective GA-RBF, CSA-LSSVM, and GEP methods. Among the developed models, the GA-RBF shows the best accuracy.
Diaphragm depth in normal subjects.
Shahgholi, Leili; Baria, Michael R; Sorenson, Eric J; Harper, Caitlin J; Watson, James C; Strommen, Jeffrey A; Boon, Andrea J
2014-05-01
Needle electromyography (EMG) of the diaphragm carries the potential risk of pneumothorax. Knowing the approximate depth of the diaphragm should increase the test's safety and accuracy. Distances from the skin to the diaphragm and from the outer surface of the rib to the diaphragm were measured using B mode ultrasound in 150 normal subjects. When measured at the lower intercostal spaces, diaphragm depth varied between 0.78 and 4.91 cm beneath the skin surface and between 0.25 and 1.48 cm below the outer surface of the rib. Using linear regression modeling, body mass index (BMI) could be used to predict diaphragm depth from the skin to within an average of 1.15 mm. Diaphragm depth from the skin can vary by more than 4 cm. When image guidance is not available to enhance accuracy and safety of diaphragm EMG, it is possible to reliably predict the depth of the diaphragm based on BMI. Copyright © 2013 Wiley Periodicals, Inc.
Random Forests for Global and Regional Crop Yield Predictions.
Jeong, Jig Han; Resop, Jonathan P; Mueller, Nathaniel D; Fleisher, David H; Yun, Kyungdahm; Butler, Ethan E; Timlin, Dennis J; Shim, Kyo-Moon; Gerber, James S; Reddy, Vangimalla R; Kim, Soo-Hyung
2016-01-01
Accurate predictions of crop yield are critical for developing effective agricultural and food policies at the regional and global scales. We evaluated a machine-learning method, Random Forests (RF), for its ability to predict crop yield responses to climate and biophysical variables at global and regional scales in wheat, maize, and potato in comparison with multiple linear regressions (MLR) serving as a benchmark. We used crop yield data from various sources and regions for model training and testing: 1) gridded global wheat grain yield, 2) maize grain yield from US counties over thirty years, and 3) potato tuber and maize silage yield from the northeastern seaboard region. RF was found highly capable of predicting crop yields and outperformed MLR benchmarks in all performance statistics that were compared. For example, the root mean square errors (RMSE) ranged between 6 and 14% of the average observed yield with RF models in all test cases whereas these values ranged from 14% to 49% for MLR models. Our results show that RF is an effective and versatile machine-learning method for crop yield predictions at regional and global scales for its high accuracy and precision, ease of use, and utility in data analysis. RF may result in a loss of accuracy when predicting the extreme ends or responses beyond the boundaries of the training data.
A Prediction Method of Binding Free Energy of Protein and Ligand
NASA Astrophysics Data System (ADS)
Yang, Kun; Wang, Xicheng
2010-05-01
Predicting the binding free energy is an important problem in bimolecular simulation. Such prediction would be great benefit in understanding protein functions, and may be useful for computational prediction of ligand binding strengths, e.g., in discovering pharmaceutical drugs. Free energy perturbation (FEP)/thermodynamics integration (TI) is a classical method to explicitly predict free energy. However, this method need plenty of time to collect datum, and that attempts to deal with some simple systems and small changes of molecular structures. Another one for estimating ligand binding affinities is linear interaction energy (LIE) method. This method employs averages of interaction potential energy terms from molecular dynamics simulations or other thermal conformational sampling techniques. Incorporation of systematic deviations from electrostatic linear response, derived from free energy perturbation studies, into the absolute binding free energy expression significantly enhances the accuracy of the approach. However, it also is time-consuming work. In this paper, a new prediction method based on steered molecular dynamics (SMD) with direction optimization is developed to compute binding free energy. Jarzynski's equality is used to derive the PMF or free-energy. The results for two numerical examples are presented, showing that the method has good accuracy and efficiency. The novel method can also simulate whole binding proceeding and give some important structural information about development of new drugs.
Colloff, Melissa F.; Karoğlu, Nilda; Zelek, Katarzyna; Ryder, Hannah; Humphries, Joyce E.; Takarangi, Melanie K.T.
2017-01-01
Summary Acute alcohol intoxication during encoding can impair subsequent identification accuracy, but results across studies have been inconsistent, with studies often finding no effect. Little is also known about how alcohol intoxication affects the identification confidence–accuracy relationship. We randomly assigned women (N = 153) to consume alcohol (dosed to achieve a 0.08% blood alcohol content) or tonic water, controlling for alcohol expectancy. Women then participated in an interactive hypothetical sexual assault scenario and, 24 hours or 7 days later, attempted to identify the assailant from a perpetrator present or a perpetrator absent simultaneous line‐up and reported their decision confidence. Overall, levels of identification accuracy were similar across the alcohol and tonic water groups. However, women who had consumed tonic water as opposed to alcohol identified the assailant with higher confidence on average. Further, calibration analyses suggested that confidence is predictive of accuracy regardless of alcohol consumption. The theoretical and applied implications of our results are discussed.© 2017 The Authors Applied Cognitive Psychology Published by John Wiley & Sons Ltd. PMID:28781426
NASA Astrophysics Data System (ADS)
Hancock, Matthew C.; Magnan, Jerry F.
2017-03-01
To determine the potential usefulness of quantified diagnostic image features as inputs to a CAD system, we investigate the predictive capabilities of statistical learning methods for classifying nodule malignancy, utilizing the Lung Image Database Consortium (LIDC) dataset, and only employ the radiologist-assigned diagnostic feature values for the lung nodules therein, as well as our derived estimates of the diameter and volume of the nodules from the radiologists' annotations. We calculate theoretical upper bounds on the classification accuracy that is achievable by an ideal classifier that only uses the radiologist-assigned feature values, and we obtain an accuracy of 85.74 (+/-1.14)% which is, on average, 4.43% below the theoretical maximum of 90.17%. The corresponding area-under-the-curve (AUC) score is 0.932 (+/-0.012), which increases to 0.949 (+/-0.007) when diameter and volume features are included, along with the accuracy to 88.08 (+/-1.11)%. Our results are comparable to those in the literature that use algorithmically-derived image-based features, which supports our hypothesis that lung nodules can be classified as malignant or benign using only quantified, diagnostic image features, and indicates the competitiveness of this approach. We also analyze how the classification accuracy depends on specific features, and feature subsets, and we rank the features according to their predictive power, statistically demonstrating the top four to be spiculation, lobulation, subtlety, and calcification.
Logan, Kenneth J; Willis, Julie R
2011-12-01
The purpose of this study was to examine the extent to which adults who do not stutter can predict communication-related attitudes of adults who do stutter. 40 participants (mean age of 22.5 years) evaluated speech samples from an adult with mild stuttering and an adult with severe stuttering via audio-only (n=20) or audio-visual (n=20) modes to predict how the adults had responded on the S24 scale of communication attitudes. Participants correctly predicted which speaker had the more favorable S24 score, and the predicted scores were significantly different between the severity conditions. Across the four subgroups, predicted S24 scores differed from actual scores by 4-9 points. Predicted values were greater than the actual values for 3 of 4 subgroups, but still relatively positive in relation to the S24 norm sample. Stimulus presentation mode interacted with stuttering severity to affect prediction accuracy. The participants predicted the speakers' negative self-attributions more accurately than their positive self-attributions. Findings suggest that adults who do not stutter estimate the communication-related attitudes of specific adults who stutter in a manner that is generally accurate, though, in some conditions, somewhat less favorable than the speaker's actual ratings. At a group level, adults who do not stutter demonstrate the ability to discern minimal versus average levels of attitudinal impact for speakers who stutter. The participants' complex prediction patterns are discussed in relation to stereotype accuracy and classic views of negative stereotyping. The reader will be able to (a) summarize main findings on research related to listeners' attitudes toward people who stutter, (b) describe the extent to which people who do not stutter can predict the communication attitudes of people who do stutter; and (c) discuss how findings from the present study relate to previous findings on stereotypes about people who stutter. Copyright © 2011 Elsevier Inc. All rights reserved.
SU-E-J-191: Motion Prediction Using Extreme Learning Machine in Image Guided Radiotherapy
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jia, J; Cao, R; Pei, X
Purpose: Real-time motion tracking is a critical issue in image guided radiotherapy due to the time latency caused by image processing and system response. It is of great necessity to fast and accurately predict the future position of the respiratory motion and the tumor location. Methods: The prediction of respiratory position was done based on the positioning and tracking module in ARTS-IGRT system which was developed by FDS Team (www.fds.org.cn). An approach involving with the extreme learning machine (ELM) was adopted to predict the future respiratory position as well as the tumor’s location by training the past trajectories. For themore » training process, a feed-forward neural network with one single hidden layer was used for the learning. First, the number of hidden nodes was figured out for the single layered feed forward network (SLFN). Then the input weights and hidden layer biases of the SLFN were randomly assigned to calculate the hidden neuron output matrix. Finally, the predicted movement were obtained by applying the output weights and compared with the actual movement. Breathing movement acquired from the external infrared markers was used to test the prediction accuracy. And the implanted marker movement for the prostate cancer was used to test the implementation of the tumor motion prediction. Results: The accuracy of the predicted motion and the actual motion was tested. Five volunteers with different breathing patterns were tested. The average prediction time was 0.281s. And the standard deviation of prediction accuracy was 0.002 for the respiratory motion and 0.001 for the tumor motion. Conclusion: The extreme learning machine method can provide an accurate and fast prediction of the respiratory motion and the tumor location and therefore can meet the requirements of real-time tumor-tracking in image guided radiotherapy.« less
Limitations of signal averaging due to temporal correlation in laser remote-sensing measurements.
Menyuk, N; Killinger, D K; Menyuk, C R
1982-09-15
Laser remote sensing involves the measurement of laser-beam transmission through the atmosphere and is subject to uncertainties caused by strong fluctuations due primarily to speckle, glint, and atmospheric-turbulence effects. These uncertainties are generally reduced by taking average values of increasing numbers of measurements. An experiment was carried out to directly measure the effect of signal averaging on back-scattered laser return signals from a diffusely reflecting target using a direct-detection differential-absorption lidar (DIAL) system. The improvement in accuracy obtained by averaging over increasing numbers of data points was found to be smaller than that predicted for independent measurements. The experimental results are shown to be in excellent agreement with a theoretical analysis which considers the effect of temporal correlation. The analysis indicates that small but long-term temporal correlation severely limits the improvement available through signal averaging.
NASA Astrophysics Data System (ADS)
Cappelli, Mark; Young, Christopher
2016-10-01
We present continued efforts towards introducing physical models for cross-magnetic field electron transport into Hall thruster discharge simulations. In particular, we seek to evaluate whether such models accurately capture ion dynamics, both averaged and resolved in time, through comparisons with measured ion velocity distributions which are now becoming available for several devices. Here, we describe a turbulent electron transport model that is integrated into 2-D hybrid fluid/PIC simulations of a 72 mm diameter laboratory thruster operating at 400 W. We also compare this model's predictions with one recently proposed by Lafluer et al.. Introducing these models into 2-D hybrid simulations is relatively straightforward and leverages the existing framework for solving the electron fluid equations. The models are tested for their ability to capture the time-averaged experimental discharge current and its fluctuations due to ionization instabilities. Model predictions are also more rigorously evaluated against recent laser-induced fluorescence measurements of time-resolved ion velocity distributions.
Hybrid Reynolds-Averaged/Large Eddy Simulation of the Flow in a Model SCRamjet Cavity Flameholder
NASA Technical Reports Server (NTRS)
Baurle, R. A.
2016-01-01
Steady-state and scale-resolving simulations have been performed for flow in and around a model scramjet combustor flameholder. Experimental data available for this configuration include velocity statistics obtained from particle image velocimetry. Several turbulence models were used for the steady-state Reynolds-averaged simulations which included both linear and non-linear eddy viscosity models. The scale-resolving simulations used a hybrid Reynolds-averaged/large eddy simulation strategy that is designed to be a large eddy simulation everywhere except in the inner portion (log layer and below) of the boundary layer. Hence, this formulation can be regarded as a wall-modeled large eddy simulation. This e ort was undertaken to not only assess the performance of the hybrid Reynolds-averaged / large eddy simulation modeling approach in a flowfield of interest to the scramjet research community, but to also begin to understand how this capability can best be used to augment standard Reynolds-averaged simulations. The numerical errors were quantified for the steady-state simulations, and at least qualitatively assessed for the scale-resolving simulations prior to making any claims of predictive accuracy relative to the measurements. The steady-state Reynolds-averaged results displayed a high degree of variability when comparing the flameholder fuel distributions obtained from each turbulence model. This prompted the consideration of applying the higher-fidelity scale-resolving simulations as a surrogate "truth" model to calibrate the Reynolds-averaged closures in a non-reacting setting prior to their use for the combusting simulations. In general, the Reynolds-averaged velocity profile predictions at the lowest fueling level matched the particle imaging measurements almost as well as was observed for the non-reacting condition. However, the velocity field predictions proved to be more sensitive to the flameholder fueling rate than was indicated in the measurements.
A standardized model for predicting flap failure using indocyanine green dye
NASA Astrophysics Data System (ADS)
Zimmermann, Terence M.; Moore, Lindsay S.; Warram, Jason M.; Greene, Benjamin J.; Nakhmani, Arie; Korb, Melissa L.; Rosenthal, Eben L.
2016-03-01
Techniques that provide a non-invasive method for evaluation of intraoperative skin flap perfusion are currently available but underutilized. We hypothesize that intraoperative vascular imaging can be used to reliably assess skin flap perfusion and elucidate areas of future necrosis by means of a standardized critical perfusion threshold. Five animal groups (negative controls, n=4; positive controls, n=5; chemotherapy group, n=5; radiation group, n=5; chemoradiation group, n=5) underwent pre-flap treatments two weeks prior to undergoing random pattern dorsal fasciocutaneous flaps with a length to width ratio of 2:1 (3 x 1.5 cm). Flap perfusion was assessed via laser-assisted indocyanine green dye angiography and compared to standard clinical assessment for predictive accuracy of flap necrosis. For estimating flap-failure, clinical prediction achieved a sensitivity of 79.3% and a specificity of 90.5%. When average flap perfusion was more than three standard deviations below the average flap perfusion for the negative control group at the time of the flap procedure (144.3+/-17.05 absolute perfusion units), laser-assisted indocyanine green dye angiography achieved a sensitivity of 81.1% and a specificity of 97.3%. When absolute perfusion units were seven standard deviations below the average flap perfusion for the negative control group, specificity of necrosis prediction was 100%. Quantitative absolute perfusion units can improve specificity for intraoperative prediction of viable tissue. Using this strategy, a positive predictive threshold of flap failure can be standardized for clinical use.
Rajagopal, Rekha; Ranganathan, Vidhyapriya
2018-06-05
Automation in cardiac arrhythmia classification helps medical professionals make accurate decisions about the patient's health. The aim of this work was to design a hybrid classification model to classify cardiac arrhythmias. The design phase of the classification model comprises the following stages: preprocessing of the cardiac signal by eliminating detail coefficients that contain noise, feature extraction through Daubechies wavelet transform, and arrhythmia classification using a collaborative decision from the K nearest neighbor classifier (KNN) and a support vector machine (SVM). The proposed model is able to classify 5 arrhythmia classes as per the ANSI/AAMI EC57: 1998 classification standard. Level 1 of the proposed model involves classification using the KNN and the classifier is trained with examples from all classes. Level 2 involves classification using an SVM and is trained specifically to classify overlapped classes. The final classification of a test heartbeat pertaining to a particular class is done using the proposed KNN/SVM hybrid model. The experimental results demonstrated that the average sensitivity of the proposed model was 92.56%, the average specificity 99.35%, the average positive predictive value 98.13%, the average F-score 94.5%, and the average accuracy 99.78%. The results obtained using the proposed model were compared with the results of discriminant, tree, and KNN classifiers. The proposed model is able to achieve a high classification accuracy.
Prediction on sunspot activity based on fuzzy information granulation and support vector machine
NASA Astrophysics Data System (ADS)
Peng, Lingling; Yan, Haisheng; Yang, Zhigang
2018-04-01
In order to analyze the range of sunspots, a combined prediction method of forecasting the fluctuation range of sunspots based on fuzzy information granulation (FIG) and support vector machine (SVM) was put forward. Firstly, employing the FIG to granulate sample data and extract va)alid information of each window, namely the minimum value, the general average value and the maximum value of each window. Secondly, forecasting model is built respectively with SVM and then cross method is used to optimize these parameters. Finally, the fluctuation range of sunspots is forecasted with the optimized SVM model. Case study demonstrates that the model have high accuracy and can effectively predict the fluctuation of sunspots.
Predicting apricot phenology using meteorological data.
Ruml, Mirjana; Milatović, Dragan; Vulić, Todor; Vuković, Ana
2011-09-01
The main objective of this study was to develop feasible, easy to apply models for early prediction of full flowering (FF) and maturing (MA) in apricot (Prunus armeniaca L.). Phenological data for 20 apricot cultivars grown in the Belgrade region were modeled against averages of daily temperature records over ten seasons for FF and eight seasons for MA. A much stronger correlation was found between the phenological timing and temperature at the very beginning than at the end of phenophases. Also, the length of developmental periods were better correlated to daily maximum than to daily minimum and mean air temperatures. Using prediction models based on daily maximum temperatures averaged over 30-, 45- and 60-day periods, starting from 1 January for FF prediction and from the date of FF for MA prediction, the onset of examined phenophases in apricot cultivars could be predicted from a few weeks to up to 2 months ahead with acceptable accuracy. The mean absolute differences between the observations and cross-validated predictions obtained by 30-, 45- and 60-day models were 8.6, 6.9 and 5.7 days for FF and 6.1, 3.6 and 2.8 days for MA, respectively. The validity of the results was confirmed using an independent data set for the year 2009.
Predicting apricot phenology using meteorological data
NASA Astrophysics Data System (ADS)
Ruml, Mirjana; Milatović, Dragan; Vulić, Todor; Vuković, Ana
2011-09-01
The main objective of this study was to develop feasible, easy to apply models for early prediction of full flowering (FF) and maturing (MA) in apricot ( Prunus armeniaca L.). Phenological data for 20 apricot cultivars grown in the Belgrade region were modeled against averages of daily temperature records over ten seasons for FF and eight seasons for MA. A much stronger correlation was found between the phenological timing and temperature at the very beginning than at the end of phenophases. Also, the length of developmental periods were better correlated to daily maximum than to daily minimum and mean air temperatures. Using prediction models based on daily maximum temperatures averaged over 30-, 45- and 60-day periods, starting from 1 January for FF prediction and from the date of FF for MA prediction, the onset of examined phenophases in apricot cultivars could be predicted from a few weeks to up to 2 months ahead with acceptable accuracy. The mean absolute differences between the observations and cross-validated predictions obtained by 30-, 45- and 60-day models were 8.6, 6.9 and 5.7 days for FF and 6.1, 3.6 and 2.8 days for MA, respectively. The validity of the results was confirmed using an independent data set for the year 2009.
Ling, Julia; Templeton, Jeremy Alan
2015-08-04
Reynolds Averaged Navier Stokes (RANS) models are widely used in industry to predict fluid flows, despite their acknowledged deficiencies. Not only do RANS models often produce inaccurate flow predictions, but there are very limited diagnostics available to assess RANS accuracy for a given flow configuration. If experimental or higher fidelity simulation results are not available for RANS validation, there is no reliable method to evaluate RANS accuracy. This paper explores the potential of utilizing machine learning algorithms to identify regions of high RANS uncertainty. Three different machine learning algorithms were evaluated: support vector machines, Adaboost decision trees, and random forests.more » The algorithms were trained on a database of canonical flow configurations for which validated direct numerical simulation or large eddy simulation results were available, and were used to classify RANS results on a point-by-point basis as having either high or low uncertainty, based on the breakdown of specific RANS modeling assumptions. Classifiers were developed for three different basic RANS eddy viscosity model assumptions: the isotropy of the eddy viscosity, the linearity of the Boussinesq hypothesis, and the non-negativity of the eddy viscosity. It is shown that these classifiers are able to generalize to flows substantially different from those on which they were trained. As a result, feature selection techniques, model evaluation, and extrapolation detection are discussed in the context of turbulence modeling applications.« less
Baker, Erich J; Walter, Nicole A R; Salo, Alex; Rivas Perea, Pablo; Moore, Sharon; Gonzales, Steven; Grant, Kathleen A
2017-03-01
The Monkey Alcohol Tissue Research Resource (MATRR) is a repository and analytics platform for detailed data derived from well-documented nonhuman primate (NHP) alcohol self-administration studies. This macaque model has demonstrated categorical drinking norms reflective of human drinking populations, resulting in consumption pattern classifications of very heavy drinking (VHD), heavy drinking (HD), binge drinking (BD), and low drinking (LD) individuals. Here, we expand on previous findings that suggest ethanol drinking patterns during initial drinking to intoxication can reliably predict future drinking category assignment. The classification strategy uses a machine-learning approach to examine an extensive set of daily drinking attributes during 90 sessions of induction across 7 cohorts of 5 to 8 monkeys for a total of 50 animals. A Random Forest classifier is employed to accurately predict categorical drinking after 12 months of self-administration. Predictive outcome accuracy is approximately 78% when classes are aggregated into 2 groups, "LD and BD" and "HD and VHD." A subsequent 2-step classification model distinguishes individual LD and BD categories with 90% accuracy and between HD and VHD categories with 95% accuracy. Average 4-category classification accuracy is 74%, and provides putative distinguishing behavioral characteristics between groupings. We demonstrate that data derived from the induction phase of this ethanol self-administration protocol have significant predictive power for future ethanol consumption patterns. Importantly, numerous predictive factors are longitudinal, measuring the change of drinking patterns through 3 stages of induction. Factors during induction that predict future heavy drinkers include being younger at the time of first intoxication and developing a shorter latency to first ethanol drink. Overall, this analysis identifies predictive characteristics in future very heavy drinkers that optimize intoxication, such as having increasingly fewer bouts with more drinks. This analysis also identifies characteristic avoidance of intoxicating topographies in future low drinkers, such as increasing number of bouts and waiting longer before the first ethanol drink. Copyright © 2017 The Authors Alcoholism: Clinical & Experimental Research published by Wiley Periodicals, Inc. on behalf of Research Society on Alcoholism.
Real-Time Ensemble Forecasting of Coronal Mass Ejections Using the Wsa-Enlil+Cone Model
NASA Astrophysics Data System (ADS)
Mays, M. L.; Taktakishvili, A.; Pulkkinen, A. A.; Odstrcil, D.; MacNeice, P. J.; Rastaetter, L.; LaSota, J. A.
2014-12-01
Ensemble forecasting of coronal mass ejections (CMEs) provides significant information in that it provides an estimation of the spread or uncertainty in CME arrival time predictions. Real-time ensemble modeling of CME propagation is performed by forecasters at the Space Weather Research Center (SWRC) using the WSA-ENLIL+cone model available at the Community Coordinated Modeling Center (CCMC). To estimate the effect of uncertainties in determining CME input parameters on arrival time predictions, a distribution of n (routinely n=48) CME input parameter sets are generated using the CCMC Stereo CME Analysis Tool (StereoCAT) which employs geometrical triangulation techniques. These input parameters are used to perform n different simulations yielding an ensemble of solar wind parameters at various locations of interest, including a probability distribution of CME arrival times (for hits), and geomagnetic storm strength (for Earth-directed hits). We present the results of ensemble simulations for a total of 38 CME events in 2013-2014. For 28 of the ensemble runs containing hits, the observed CME arrival was within the range of ensemble arrival time predictions for 14 runs (half). The average arrival time prediction was computed for each of the 28 ensembles predicting hits and using the actual arrival time, an average absolute error of 10.0 hours (RMSE=11.4 hours) was found for all 28 ensembles, which is comparable to current forecasting errors. Some considerations for the accuracy of ensemble CME arrival time predictions include the importance of the initial distribution of CME input parameters, particularly the mean and spread. When the observed arrivals are not within the predicted range, this still allows the ruling out of prediction errors caused by tested CME input parameters. Prediction errors can also arise from ambient model parameters such as the accuracy of the solar wind background, and other limitations. Additionally the ensemble modeling sysem was used to complete a parametric event case study of the sensitivity of the CME arrival time prediction to free parameters for ambient solar wind model and CME. The parameter sensitivity study suggests future directions for the system, such as running ensembles using various magnetogram inputs to the WSA model.
NASA Astrophysics Data System (ADS)
Aminah, Agustin Siti; Pawitan, Gandhi; Tantular, Bertho
2017-03-01
So far, most of the data published by Statistics Indonesia (BPS) as data providers for national statistics are still limited to the district level. Less sufficient sample size for smaller area levels to make the measurement of poverty indicators with direct estimation produced high standard error. Therefore, the analysis based on it is unreliable. To solve this problem, the estimation method which can provide a better accuracy by combining survey data and other auxiliary data is required. One method often used for the estimation is the Small Area Estimation (SAE). There are many methods used in SAE, one of them is Empirical Best Linear Unbiased Prediction (EBLUP). EBLUP method of maximum likelihood (ML) procedures does not consider the loss of degrees of freedom due to estimating β with β ^. This drawback motivates the use of the restricted maximum likelihood (REML) procedure. This paper proposed EBLUP with REML procedure for estimating poverty indicators by modeling the average of household expenditures per capita and implemented bootstrap procedure to calculate MSE (Mean Square Error) to compare the accuracy EBLUP method with the direct estimation method. Results show that EBLUP method reduced MSE in small area estimation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lacaze, Guilhem; Oefelein, Joseph
Large-eddy-simulation (LES) is quickly becoming a method of choice for studying complex thermo-physics in a wide range of propulsion and power systems. It provides a means to study coupled turbulent combustion and flow processes in parameter spaces that are unattainable using direct-numerical-simulation (DNS), with a degree of fidelity that can be far more accurate than conventional engineering methods such as the Reynolds-averaged Navier-Stokes (RANS) approx- imation. However, development of predictive LES is complicated by the complex interdependence of different type of errors coming from numerical methods, algorithms, models and boundary con- ditions. On the other hand, control of accuracy hasmore » become a critical aspect in the development of predictive LES for design. The objective of this project is to create a framework of metrics aimed at quantifying the quality and accuracy of state-of-the-art LES in a manner that addresses the myriad of competing interdependencies. In a typical simulation cycle, only 20% of the computational time is actually usable. The rest is spent in case preparation, assessment, and validation, because of the lack of guidelines. This work increases confidence in the accuracy of a given solution while min- imizing the time obtaining the solution. The approach facilitates control of the tradeoffs between cost, accuracy, and uncertainties as a function of fidelity and methods employed. The analysis is coupled with advanced Uncertainty Quantification techniques employed to estimate confidence in model predictions and calibrate model's parameters. This work has provided positive conse- quences on the accuracy of the results delivered by LES and will soon have a broad impact on research supported both by the DOE and elsewhere.« less
NASA Astrophysics Data System (ADS)
Di Pasquale, Nicodemo; Davie, Stuart J.; Popelier, Paul L. A.
2018-06-01
Using the machine learning method kriging, we predict the energies of atoms in ion-water clusters, consisting of either Cl- or Na+ surrounded by a number of water molecules (i.e., without Na+Cl- interaction). These atomic energies are calculated following the topological energy partitioning method called Interacting Quantum Atoms (IQAs). Kriging predicts atomic properties (in this case IQA energies) by a model that has been trained over a small set of geometries with known property values. The results presented here are part of the development of an advanced type of force field, called FFLUX, which offers quantum mechanical information to molecular dynamics simulations without the limiting computational cost of ab initio calculations. The results reported for the prediction of the IQA components of the energy in the test set exhibit an accuracy of a few kJ/mol, corresponding to an average error of less than 5%, even when a large cluster of water molecules surrounding an ion is considered. Ions represent an important chemical system and this work shows that they can be correctly taken into account in the framework of the FFLUX force field.
[Application of ARIMA model to predict number of malaria cases in China].
Hui-Yu, H; Hua-Qin, S; Shun-Xian, Z; Lin, A I; Yan, L U; Yu-Chun, C; Shi-Zhu, L I; Xue-Jiao, T; Chun-Li, Y; Wei, H U; Jia-Xu, C
2017-08-15
Objective To study the application of autoregressive integrated moving average (ARIMA) model to predict the monthly reported malaria cases in China, so as to provide a reference for prevention and control of malaria. Methods SPSS 24.0 software was used to construct the ARIMA models based on the monthly reported malaria cases of the time series of 20062015 and 2011-2015, respectively. The data of malaria cases from January to December, 2016 were used as validation data to compare the accuracy of the two ARIMA models. Results The models of the monthly reported cases of malaria in China were ARIMA (2, 1, 1) (1, 1, 0) 12 and ARIMA (1, 0, 0) (1, 1, 0) 12 respectively. The comparison between the predictions of the two models and actual situation of malaria cases showed that the ARIMA model based on the data of 2011-2015 had a higher accuracy of forecasting than the model based on the data of 2006-2015 had. Conclusion The establishment and prediction of ARIMA model is a dynamic process, which needs to be adjusted unceasingly according to the accumulated data, and in addition, the major changes of epidemic characteristics of infectious diseases must be considered.
3D Cloud Field Prediction using A-Train Data and Machine Learning Techniques
NASA Astrophysics Data System (ADS)
Johnson, C. L.
2017-12-01
Validation of cloud process parameterizations used in global climate models (GCMs) would greatly benefit from observed 3D cloud fields at the size comparable to that of a GCM grid cell. For the highest resolution simulations, surface grid cells are on the order of 100 km by 100 km. CloudSat/CALIPSO data provides 1 km width of detailed vertical cloud fraction profile (CFP) and liquid and ice water content (LWC/IWC). This work utilizes four machine learning algorithms to create nonlinear regressions of CFP, LWC, and IWC data using radiances, surface type and location of measurement as predictors and applies the regression equations to off-track locations generating 3D cloud fields for 100 km by 100 km domains. The CERES-CloudSat-CALIPSO-MODIS (C3M) merged data set for February 2007 is used. Support Vector Machines, Artificial Neural Networks, Gaussian Processes and Decision Trees are trained on 1000 km of continuous C3M data. Accuracy is computed using existing vertical profiles that are excluded from the training data and occur within 100 km of the training data. Accuracy of the four algorithms is compared. Average accuracy for one day of predicted data is 86% for the most successful algorithm. The methodology for training the algorithms, determining valid prediction regions and applying the equations off-track is discussed. Predicted 3D cloud fields are provided as inputs to the Ed4 NASA LaRC Fu-Liou radiative transfer code and resulting TOA radiances compared to observed CERES/MODIS radiances. Differences in computed radiances using predicted profiles and observed radiances are compared.
A 30-day-ahead forecast model for grass pollen in north London, United Kingdom.
Smith, Matt; Emberlin, Jean
2006-03-01
A 30-day-ahead forecast method has been developed for grass pollen in north London. The total period of the grass pollen season is covered by eight multiple regression models, each covering a 10-day period running consecutively from 21 May to 8 August. This means that three models were used for each 30-day forecast. The forecast models were produced using grass pollen and environmental data from 1961 to 1999 and tested on data from 2000 and 2002. Model accuracy was judged in two ways: the number of times the forecast model was able to successfully predict the severity (relative to the 1961-1999 dataset as a whole) of grass pollen counts in each of the eight forecast periods on a scale of 1 to 4; the number of times the forecast model was able to predict whether grass pollen counts were higher or lower than the mean. The models achieved 62.5% accuracy in both assessment years when predicting the relative severity of grass pollen counts on a scale of 1 to 4, which equates to six of the eight 10-day periods being forecast correctly. The models attained 87.5% and 100% accuracy in 2000 and 2002, respectively, when predicting whether grass pollen counts would be higher or lower than the mean. Attempting to predict pollen counts during distinct 10-day periods throughout the grass pollen season is a novel approach. The models also employed original methodology in the use of winter averages of the North Atlantic Oscillation to forecast 10-day means of allergenic pollen counts.
Moore, D F; Harwood, V J; Ferguson, D M; Lukasik, J; Hannah, P; Getrich, M; Brownell, M
2005-01-01
The accuracy of ribotyping and antibiotic resistance analysis (ARA) for prediction of sources of faecal bacterial pollution in an urban southern California watershed was determined using blinded proficiency samples. Antibiotic resistance patterns and HindIII ribotypes of Escherichia coli (n = 997), and antibiotic resistance patterns of Enterococcus spp. (n = 3657) were used to construct libraries from sewage samples and from faeces of seagulls, dogs, cats, horses and humans within the watershed. The three libraries were analysed to determine the accuracy of host source prediction. The internal accuracy of the libraries (average rate of correct classification, ARCC) with six source categories was 44% for E. coli ARA, 69% for E. coli ribotyping and 48% for Enterococcus ARA. Each library's predictive ability towards isolates that were not part of the library was determined using a blinded proficiency panel of 97 E. coli and 99 Enterococcus isolates. Twenty-eight per cent (by ARA) and 27% (by ribotyping) of the E. coli proficiency isolates were assigned to the correct source category. Sixteen per cent were assigned to the same source category by both methods, and 6% were assigned to the correct category. Addition of 2480 E. coli isolates to the ARA library did not improve the ARCC or proficiency accuracy. In contrast, 45% of Enterococcus proficiency isolates were correctly identified by ARA. None of the methods performed well enough on the proficiency panel to be judged ready for application to environmental samples. Most microbial source tracking (MST) studies published have demonstrated library accuracy solely by the internal ARCC measurement. Low rates of correct classification for E. coli proficiency isolates compared with the ARCCs of the libraries indicate that testing of bacteria from samples that are not represented in the library, such as blinded proficiency samples, is necessary to accurately measure predictive ability. The library-based MST methods used in this study may not be suited for determination of the source(s) of faecal pollution in large, urban watersheds.
Graph pyramids for protein function prediction
2015-01-01
Background Uncovering the hidden organizational characteristics and regularities among biological sequences is the key issue for detailed understanding of an underlying biological phenomenon. Thus pattern recognition from nucleic acid sequences is an important affair for protein function prediction. As proteins from the same family exhibit similar characteristics, homology based approaches predict protein functions via protein classification. But conventional classification approaches mostly rely on the global features by considering only strong protein similarity matches. This leads to significant loss of prediction accuracy. Methods Here we construct the Protein-Protein Similarity (PPS) network, which captures the subtle properties of protein families. The proposed method considers the local as well as the global features, by examining the interactions among 'weakly interacting proteins' in the PPS network and by using hierarchical graph analysis via the graph pyramid. Different underlying properties of the protein families are uncovered by operating the proposed graph based features at various pyramid levels. Results Experimental results on benchmark data sets show that the proposed hierarchical voting algorithm using graph pyramid helps to improve computational efficiency as well the protein classification accuracy. Quantitatively, among 14,086 test sequences, on an average the proposed method misclassified only 21.1 sequences whereas baseline BLAST score based global feature matching method misclassified 362.9 sequences. With each correctly classified test sequence, the fast incremental learning ability of the proposed method further enhances the training model. Thus it has achieved more than 96% protein classification accuracy using only 20% per class training data. PMID:26044522
Graph pyramids for protein function prediction.
Sandhan, Tushar; Yoo, Youngjun; Choi, Jin; Kim, Sun
2015-01-01
Uncovering the hidden organizational characteristics and regularities among biological sequences is the key issue for detailed understanding of an underlying biological phenomenon. Thus pattern recognition from nucleic acid sequences is an important affair for protein function prediction. As proteins from the same family exhibit similar characteristics, homology based approaches predict protein functions via protein classification. But conventional classification approaches mostly rely on the global features by considering only strong protein similarity matches. This leads to significant loss of prediction accuracy. Here we construct the Protein-Protein Similarity (PPS) network, which captures the subtle properties of protein families. The proposed method considers the local as well as the global features, by examining the interactions among 'weakly interacting proteins' in the PPS network and by using hierarchical graph analysis via the graph pyramid. Different underlying properties of the protein families are uncovered by operating the proposed graph based features at various pyramid levels. Experimental results on benchmark data sets show that the proposed hierarchical voting algorithm using graph pyramid helps to improve computational efficiency as well the protein classification accuracy. Quantitatively, among 14,086 test sequences, on an average the proposed method misclassified only 21.1 sequences whereas baseline BLAST score based global feature matching method misclassified 362.9 sequences. With each correctly classified test sequence, the fast incremental learning ability of the proposed method further enhances the training model. Thus it has achieved more than 96% protein classification accuracy using only 20% per class training data.
Lung Ultrasound for Diagnosing Pneumothorax in the Critically Ill Neonate.
Raimondi, Francesco; Rodriguez Fanjul, Javier; Aversa, Salvatore; Chirico, Gaetano; Yousef, Nadya; De Luca, Daniele; Corsini, Iuri; Dani, Carlo; Grappone, Lidia; Orfeo, Luigi; Migliaro, Fiorella; Vallone, Gianfranco; Capasso, Letizia
2016-08-01
To evaluate the accuracy of lung ultrasound for the diagnosis of pneumothorax in the sudden decompensating patient. In an international, prospective study, sudden decompensation was defined as a prolonged significant desaturation (oxygen saturation <65% for more than 40 seconds) and bradycardia or sudden increase of oxygen requirement by at least 50% in less than 10 minutes with a final fraction of inspired oxygen ≥0.7 to keep stable saturations. All eligible patients had an ultrasound scan before undergoing a chest radiograph, which was the reference standard. Forty-two infants (birth weight = 1531 ± 812 g; gestational age = 31 ± 3.5 weeks) were enrolled in 6 centers; pneumothorax was detected in 26 (62%). Lung ultrasound accuracy in diagnosing pneumothorax was as follows: sensitivity 100%, specificity 100%, positive predictive value 100%, and negative predictive value 100%. Clinical evaluation of pneumothorax showed sensitivity 84%, specificity 56%, positive predictive value 76%, and negative predictive value 69%. After sudden decompensation, a lung ultrasound scan was performed in an average time of 5.3 ± 5.6 minutes vs 19 ± 11.7 minutes required for a chest radiography. Emergency drainage was performed after an ultrasound scan but before radiography in 9 cases. Lung ultrasound shows high accuracy in detecting pneumothorax in the critical infant, outperforming clinical evaluation and reducing time to imaging diagnosis and drainage. Copyright © 2016 Elsevier Inc. All rights reserved.
Yan, Long; Wang, Hong; Zhang, Xuan; Li, Ming-Yue; He, Juan
2017-01-01
Influence of meteorological variables on the transmission of bacillary dysentery (BD) is under investigated topic and effective forecasting models as public health tool are lacking. This paper aimed to quantify the relationship between meteorological variables and BD cases in Beijing and to establish an effective forecasting model. A time series analysis was conducted in the Beijing area based upon monthly data on weather variables (i.e. temperature, rainfall, relative humidity, vapor pressure, and wind speed) and on the number of BD cases during the period 1970-2012. Autoregressive integrated moving average models with explanatory variables (ARIMAX) were built based on the data from 1970 to 2004. Prediction of monthly BD cases from 2005 to 2012 was made using the established models. The prediction accuracy was evaluated by the mean square error (MSE). Firstly, temperature with 2-month and 7-month lags and rainfall with 12-month lag were found positively correlated with the number of BD cases in Beijing. Secondly, ARIMAX model with covariates of temperature with 7-month lag (β = 0.021, 95% confidence interval(CI): 0.004-0.038) and rainfall with 12-month lag (β = 0.023, 95% CI: 0.009-0.037) displayed the highest prediction accuracy. The ARIMAX model developed in this study showed an accurate goodness of fit and precise prediction accuracy in the short term, which would be beneficial for government departments to take early public health measures to prevent and control possible BD popularity.
Ma, Xiaolei; Dai, Zhuang; He, Zhengbing; Ma, Jihui; Wang, Yong; Wang, Yunpeng
2017-04-10
This paper proposes a convolutional neural network (CNN)-based method that learns traffic as images and predicts large-scale, network-wide traffic speed with a high accuracy. Spatiotemporal traffic dynamics are converted to images describing the time and space relations of traffic flow via a two-dimensional time-space matrix. A CNN is applied to the image following two consecutive steps: abstract traffic feature extraction and network-wide traffic speed prediction. The effectiveness of the proposed method is evaluated by taking two real-world transportation networks, the second ring road and north-east transportation network in Beijing, as examples, and comparing the method with four prevailing algorithms, namely, ordinary least squares, k-nearest neighbors, artificial neural network, and random forest, and three deep learning architectures, namely, stacked autoencoder, recurrent neural network, and long-short-term memory network. The results show that the proposed method outperforms other algorithms by an average accuracy improvement of 42.91% within an acceptable execution time. The CNN can train the model in a reasonable time and, thus, is suitable for large-scale transportation networks.
Ma, Xiaolei; Dai, Zhuang; He, Zhengbing; Ma, Jihui; Wang, Yong; Wang, Yunpeng
2017-01-01
This paper proposes a convolutional neural network (CNN)-based method that learns traffic as images and predicts large-scale, network-wide traffic speed with a high accuracy. Spatiotemporal traffic dynamics are converted to images describing the time and space relations of traffic flow via a two-dimensional time-space matrix. A CNN is applied to the image following two consecutive steps: abstract traffic feature extraction and network-wide traffic speed prediction. The effectiveness of the proposed method is evaluated by taking two real-world transportation networks, the second ring road and north-east transportation network in Beijing, as examples, and comparing the method with four prevailing algorithms, namely, ordinary least squares, k-nearest neighbors, artificial neural network, and random forest, and three deep learning architectures, namely, stacked autoencoder, recurrent neural network, and long-short-term memory network. The results show that the proposed method outperforms other algorithms by an average accuracy improvement of 42.91% within an acceptable execution time. The CNN can train the model in a reasonable time and, thus, is suitable for large-scale transportation networks. PMID:28394270
Combining dynamic and ECG-gated ⁸²Rb-PET for practical implementation in the clinic.
Sayre, George A; Bacharach, Stephen L; Dae, Michael W; Seo, Youngho
2012-01-01
For many cardiac clinics, list-mode PET is impractical. Therefore, separate dynamic and ECG-gated acquisitions are needed to detect harmful stenoses, indicate affected coronary arteries, and estimate stenosis severity. However, physicians usually order gated studies only because of dose, time, and cost limitations. These gated studies are limited to detection. In an effort to remove these limitations, we developed a novel curve-fitting algorithm [incomplete data (ICD)] to accurately calculate coronary flow reserve (CFR) from a combined dynamic-ECG protocol of a length equal to a typical gated scan. We selected several retrospective dynamic studies to simulate shortened dynamic acquisitions of the combined protocol and compared (a) the accuracy of ICD and a nominal method in extrapolating the complete functional form of arterial input functions (AIFs); and (b) the accuracy of ICD and ICD-AP (ICD with a-posteriori knowledge of complete-data AIFs) in predicting CFRs. According to the Akaike information criterion, AIFs predicted by ICD were more accurate than those predicted by the nominal method in 11 out of 12 studies. CFRs predicted by ICD and ICD-AP were similar to complete-data predictions (PICD=0.94 and PICD-AP=0.91) and had similar average errors (eICD=2.82% and eICD-AP=2.79%). According to a nuclear cardiologist and an expert analyst of PET data, both ICD and ICD-AP predicted CFR values with sufficient accuracy for the clinic. Therefore, by using our method, physicians in cardiac clinics would have access to the necessary amount of information to differentiate between single-vessel and triple-vessel disease for treatment decision making.
Pahlavian, Soroush Heidari; Bunck, Alexander C.; Thyagaraj, Suraj; Giese, Daniel; Loth, Francis; Hedderich, Dennis M.; Kröger, Jan Robert; Martin, Bryn A.
2016-01-01
Abnormal alterations in cerebrospinal fluid (CSF) flow are thought to play an important role in pathophysiology of various craniospinal disorders such as hydrocephalus and Chiari malformation. Three directional phase contrast MRI (4D Flow) has been proposed as one method for quantification of the CSF dynamics in healthy and disease states, but prior to further implementation of this technique, its accuracy in measuring CSF velocity magnitude and distribution must be evaluated. In this study, an MR-compatible experimental platform was developed based on an anatomically detailed 3D printed model of the cervical subarachnoid space and subject specific flow boundary conditions. Accuracy of 4D Flow measurements was assessed by comparison of CSF velocities obtained within the in vitro model with the numerically predicted velocities calculated from a spatially averaged computational fluid dynamics (CFD) model based on the same geometry and flow boundary conditions. Good agreement was observed between CFD and 4D Flow in terms of spatial distribution and peak magnitude of through-plane velocities with an average difference of 7.5% and 10.6% for peak systolic and diastolic velocities, respectively. Regression analysis showed lower accuracy of 4D Flow measurement at the timeframes corresponding to low CSF flow rate and poor correlation between CFD and 4D Flow in-plane velocities. PMID:27043214
NASA Astrophysics Data System (ADS)
Luiza Bondar, M.; Hoogeman, Mischa; Schillemans, Wilco; Heijmen, Ben
2013-08-01
For online adaptive radiotherapy of cervical cancer, fast and accurate image segmentation is required to facilitate daily treatment adaptation. Our aim was twofold: (1) to test and compare three intra-patient automated segmentation methods for the cervix-uterus structure in CT-images and (2) to improve the segmentation accuracy by including prior knowledge on the daily bladder volume or on the daily coordinates of implanted fiducial markers. The tested methods were: shape deformation (SD) and atlas-based segmentation (ABAS) using two non-rigid registration methods: demons and a hierarchical algorithm. Tests on 102 CT-scans of 13 patients demonstrated that the segmentation accuracy significantly increased by including the bladder volume predicted with a simple 1D model based on a manually defined bladder top. Moreover, manually identified implanted fiducial markers significantly improved the accuracy of the SD method. For patients with large cervix-uterus volume regression, the use of CT-data acquired toward the end of the treatment was required to improve segmentation accuracy. Including prior knowledge, the segmentation results of SD (Dice similarity coefficient 85 ± 6%, error margin 2.2 ± 2.3 mm, average time around 1 min) and of ABAS using hierarchical non-rigid registration (Dice 82 ± 10%, error margin 3.1 ± 2.3 mm, average time around 30 s) support their use for image guided online adaptive radiotherapy of cervical cancer.
Bondar, M Luiza; Hoogeman, Mischa; Schillemans, Wilco; Heijmen, Ben
2013-08-07
For online adaptive radiotherapy of cervical cancer, fast and accurate image segmentation is required to facilitate daily treatment adaptation. Our aim was twofold: (1) to test and compare three intra-patient automated segmentation methods for the cervix-uterus structure in CT-images and (2) to improve the segmentation accuracy by including prior knowledge on the daily bladder volume or on the daily coordinates of implanted fiducial markers. The tested methods were: shape deformation (SD) and atlas-based segmentation (ABAS) using two non-rigid registration methods: demons and a hierarchical algorithm. Tests on 102 CT-scans of 13 patients demonstrated that the segmentation accuracy significantly increased by including the bladder volume predicted with a simple 1D model based on a manually defined bladder top. Moreover, manually identified implanted fiducial markers significantly improved the accuracy of the SD method. For patients with large cervix-uterus volume regression, the use of CT-data acquired toward the end of the treatment was required to improve segmentation accuracy. Including prior knowledge, the segmentation results of SD (Dice similarity coefficient 85 ± 6%, error margin 2.2 ± 2.3 mm, average time around 1 min) and of ABAS using hierarchical non-rigid registration (Dice 82 ± 10%, error margin 3.1 ± 2.3 mm, average time around 30 s) support their use for image guided online adaptive radiotherapy of cervical cancer.
Chen, Peng; Li, Jinyan
2010-05-17
Prediction of long-range inter-residue contacts is an important topic in bioinformatics research. It is helpful for determining protein structures, understanding protein foldings, and therefore advancing the annotation of protein functions. In this paper, we propose a novel ensemble of genetic algorithm classifiers (GaCs) to address the long-range contact prediction problem. Our method is based on the key idea called sequence profile centers (SPCs). Each SPC is the average sequence profiles of residue pairs belonging to the same contact class or non-contact class. GaCs train on multiple but different pairs of long-range contact data (positive data) and long-range non-contact data (negative data). The negative data sets, having roughly the same sizes as the positive ones, are constructed by random sampling over the original imbalanced negative data. As a result, about 21.5% long-range contacts are correctly predicted. We also found that the ensemble of GaCs indeed makes an accuracy improvement by around 5.6% over the single GaC. Classifiers with the use of sequence profile centers may advance the long-range contact prediction. In line with this approach, key structural features in proteins would be determined with high efficiency and accuracy.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Christian, Mark H; Hadjerioua, Boualem; Lee, Kyutae
2015-01-01
The following paper represents the results of an investigation into the impact of the number and placement of Current Meter (CM) flow sensors on the accuracy to which they are capable of predicting the overall flow rate. Flow measurement accuracy is of particular importance in multiunit plants because it plays a pivotal role in determining the operational efficiency characteristics of each unit, allowing the operator to select the unit (or combination of units) which most efficiently meet demand. Several case studies have demonstrated that optimization of unit dispatch has the potential to increase plant efficiencies from between 1 to 4.4more » percent [2] [3]. Unfortunately current industry standards do not have an established methodology to measure the flow rate through hydropower units with short converging intakes (SCI); the only direction provided is that CM sensors should be used. The most common application of CM is horizontally, along a trolley which is incrementally lowered across a measurement cross section. As such, the measurement resolution is defined horizontally and vertically by the number of CM and the number of measurement increments respectively. There has not been any published research on the role of resolution in either direction on the accuracy of flow measurement. The work below investigates the effectiveness of flow measurement in a SCI by performing a case study in which point velocity measurements were extracted from a physical plant and then used to calculate a series of reference flow distributions. These distributions were then used to perform sensitivity studies on the relation between the number of CM and the accuracy to which the flow rate was predicted. The following research uncovered that a minimum of 795 plants contain SCI, a quantity which represents roughly 12% of total domestic hydropower capacity. In regards to measurement accuracy, it was determined that accuracy ceases to increase considerably due to strict increases in vertical resolution beyond the application of 49 transects. Moreover the research uncovered that the application of 5 CM (when applied at 49 vertical transects) resulted in an average accuracy of 95.6% and the application of additional sensors resulted in a linear increase in accuracy up to 17 CM which had an average accuracy of 98.5%. Beyond 17 CM incremental increases in accuracy due to the addition of CM was found decrease exponentially. Future work that will be performed in this area will investigate the use of computational fluid dynamics to acquire a broader range of flow fields within SCI.« less
Boerner, Vinzent; Johnston, David J; Tier, Bruce
2014-10-24
The major obstacles for the implementation of genomic selection in Australian beef cattle are the variety of breeds and in general, small numbers of genotyped and phenotyped individuals per breed. The Australian Beef Cooperative Research Center (Beef CRC) investigated these issues by deriving genomic prediction equations (PE) from a training set of animals that covers a range of breeds and crosses including Angus, Murray Grey, Shorthorn, Hereford, Brahman, Belmont Red, Santa Gertrudis and Tropical Composite. This paper presents accuracies of genomically estimated breeding values (GEBV) that were calculated from these PE in the commercial pure-breed beef cattle seed stock sector. PE derived by the Beef CRC from multi-breed and pure-breed training populations were applied to genotyped Angus, Limousin and Brahman sires and young animals, but with no pure-breed Limousin in the training population. The accuracy of the resulting GEBV was assessed by their genetic correlation to their phenotypic target trait in a bi-variate REML approach that models GEBV as trait observations. Accuracies of most GEBV for Angus and Brahman were between 0.1 and 0.4, with accuracies for abattoir carcass traits generally greater than for live animal body composition traits and reproduction traits. Estimated accuracies greater than 0.5 were only observed for Brahman abattoir carcass traits and for Angus carcass rib fat. Averaged across traits within breeds, accuracies of GEBV were highest when PE from the pooled across-breed training population were used. However, for the Angus and Brahman breeds the difference in accuracy from using pure-breed PE was small. For the Limousin breed no reasonable results could be achieved for any trait. Although accuracies were generally low compared to published accuracies estimated within breeds, they are in line with those derived in other multi-breed populations. Thus PE developed by the Beef CRC can contribute to the implementation of genomic selection in Australian beef cattle breeding.
Fuzzy model approach for estimating time of hospitalization due to cardiovascular diseases.
Coutinho, Karine Mayara Vieira; Rizol, Paloma Maria Silva Rocha; Nascimento, Luiz Fernando Costa; de Medeiros, Andréa Paula Peneluppi
2015-08-01
A fuzzy linguistic model based on the Mamdani method with input variables, particulate matter, sulfur dioxide, temperature and wind obtained from CETESB with two membership functions each was built to predict the average hospitalization time due to cardiovascular diseases related to exposure to air pollutants in São José dos Campos in the State of São Paulo in 2009. The output variable is the average length of hospitalization obtained from DATASUS with six membership functions. The average time given by the model was compared to actual data using lags of 0 to 4 days. This model was built using the Matlab v. 7.5 fuzzy toolbox. Its accuracy was assessed with the ROC curve. Hospitalizations with a mean time of 7.9 days (SD = 4.9) were recorded in 1119 cases. The data provided revealed a significant correlation with the actual data according to the lags of 0 to 4 days. The pollutant that showed the greatest accuracy was sulfur dioxide. This model can be used as the basis of a specialized system to assist the city health authority in assessing the risk of hospitalizations due to air pollutants.
Vibration Signature Analysis of a Faulted Gear Transmission System
NASA Technical Reports Server (NTRS)
Choy, F. K.; Huang, S.; Zakrajsek, J. J.; Handschuh, R. F.; Townsend, D. P.
1994-01-01
A comprehensive procedure in predicting faults in gear transmission systems under normal operating conditions is presented. Experimental data was obtained from a spiral bevel gear fatigue test rig at NASA Lewis Research Center. Time synchronous averaged vibration data was recorded throughout the test as the fault progressed from a small single pit to severe pitting over several teeth, and finally tooth fracture. A numerical procedure based on the Winger-Ville distribution was used to examine the time averaged vibration data. Results from the Wigner-Ville procedure are compared to results from a variety of signal analysis techniques which include time domain analysis methods and frequency analysis methods. Using photographs of the gear tooth at various stages of damage, the limitations and accuracy of the various techniques are compared and discussed. Conclusions are drawn from the comparison of the different approaches as well as the applicability of the Wigner-Ville method in predicting gear faults.
Timing constraints on remote sensing of wildland fire burned area in the southeastern US
Picotte, J.J.; Robertson, K.
2011-01-01
Remote sensing using Landsat Thematic Mapper (TM) satellite imagery is increasingly used for mapping wildland fire burned area and burn severity, owing to its frequency of collection, relatively high resolution, and availability free of charge. However, rapid response of vegetation following fire and frequent cloud cover pose challenges to this approach in the southeastern US. We assessed these timing constraints by using a series of Landsat TM images to determine how rapidly the remotely sensed burn scar signature fades following prescribed burns in wet flatwoods and depression swamp community types in the Apalachicola National Forest, Florida, USA during 2006. We used both the Normalized Burn Ratio (NBR) of reflectance bands sensitive to vegetation and exposed soil cover, as well as the change in NBR from before to after fire (dNBR), to estimate burned area. We also determined the average and maximum amount of time following fire required to obtain a cloud-free image for burns in each month of the year, as well as the predicted effect of this time lag on percent accuracy of burn scar estimates. Using both NBR and dNBR, the detectable area decreased linearly 9% per month on average over the first four months following fire. Our findings suggest that the NBR and dNBR methods for monitoring burned area in common southeastern US vegetation community types are limited to an average of 78-90% accuracy among months of the year, with individual burns having values as low as 38%, if restricted to use of Landsat 5 TM imagery. However, the majority of burns can still be mapped at accuracies similar to those in other regions of the US, and access to additional sources of satellite imagery would improve overall accuracy. ?? 2011 by the authors.
Timing constraints on remote sensing of wildland fire burned area in the southeastern US
Picotte, Joshua J.; Robertson, Kevin
2011-01-01
Remote sensing using Landsat Thematic Mapper (TM) satellite imagery is increasingly used for mapping wildland fire burned area and burn severity, owing to its frequency of collection, relatively high resolution, and availability free of charge. However, rapid response of vegetation following fire and frequent cloud cover pose challenges to this approach in the southeastern US. We assessed these timing constraints by using a series of Landsat TM images to determine how rapidly the remotely sensed burn scar signature fades following prescribed burns in wet flatwoods and depression swamp community types in the Apalachicola National Forest, Florida, USA during 2006. We used both the Normalized Burn Ratio (NBR) of reflectance bands sensitive to vegetation and exposed soil cover, as well as the change in NBR from before to after fire (dNBR), to estimate burned area. We also determined the average and maximum amount of time following fire required to obtain a cloud-free image for burns in each month of the year, as well as the predicted effect of this time lag on percent accuracy of burn scar estimates. Using both NBR and dNBR, the detectable area decreased linearly 9% per month on average over the first four months following fire. Our findings suggest that the NBR and dNBR methods for monitoring burned area in common southeastern US vegetation community types are limited to an average of 78–90% accuracy among months of the year, with individual burns having values as low as 38%, if restricted to use of Landsat 5 TM imagery. However, the majority of burns can still be mapped at accuracies similar to those in other regions of the US, and access to additional sources of satellite imagery would improve overall accuracy.
A genetic programming approach to oral cancer prognosis.
Tan, Mei Sze; Tan, Jing Wei; Chang, Siow-Wee; Yap, Hwa Jen; Abdul Kareem, Sameem; Zain, Rosnah Binti
2016-01-01
The potential of genetic programming (GP) on various fields has been attained in recent years. In bio-medical field, many researches in GP are focused on the recognition of cancerous cells and also on gene expression profiling data. In this research, the aim is to study the performance of GP on the survival prediction of a small sample size of oral cancer prognosis dataset, which is the first study in the field of oral cancer prognosis. GP is applied on an oral cancer dataset that contains 31 cases collected from the Malaysia Oral Cancer Database and Tissue Bank System (MOCDTBS). The feature subsets that is automatically selected through GP were noted and the influences of this subset on the results of GP were recorded. In addition, a comparison between the GP performance and that of the Support Vector Machine (SVM) and logistic regression (LR) are also done in order to verify the predictive capabilities of the GP. The result shows that GP performed the best (average accuracy of 83.87% and average AUROC of 0.8341) when the features selected are smoking, drinking, chewing, histological differentiation of SCC, and oncogene p63. In addition, based on the comparison results, we found that the GP outperformed the SVM and LR in oral cancer prognosis. Some of the features in the dataset are found to be statistically co-related. This is because the accuracy of the GP prediction drops when one of the feature in the best feature subset is excluded. Thus, GP provides an automatic feature selection function, which chooses features that are highly correlated to the prognosis of oral cancer. This makes GP an ideal prediction model for cancer clinical and genomic data that can be used to aid physicians in their decision making stage of diagnosis or prognosis.
Puton, Tomasz; Kozlowski, Lukasz P.; Rother, Kristian M.; Bujnicki, Janusz M.
2013-01-01
We present a continuous benchmarking approach for the assessment of RNA secondary structure prediction methods implemented in the CompaRNA web server. As of 3 October 2012, the performance of 28 single-sequence and 13 comparative methods has been evaluated on RNA sequences/structures released weekly by the Protein Data Bank. We also provide a static benchmark generated on RNA 2D structures derived from the RNAstrand database. Benchmarks on both data sets offer insight into the relative performance of RNA secondary structure prediction methods on RNAs of different size and with respect to different types of structure. According to our tests, on the average, the most accurate predictions obtained by a comparative approach are generated by CentroidAlifold, MXScarna, RNAalifold and TurboFold. On the average, the most accurate predictions obtained by single-sequence analyses are generated by CentroidFold, ContextFold and IPknot. The best comparative methods typically outperform the best single-sequence methods if an alignment of homologous RNA sequences is available. This article presents the results of our benchmarks as of 3 October 2012, whereas the rankings presented online are continuously updated. We will gladly include new prediction methods and new measures of accuracy in the new editions of CompaRNA benchmarks. PMID:23435231
Predicting therapy success for treatment as usual and blended treatment in the domain of depression.
van Breda, Ward; Bremer, Vincent; Becker, Dennis; Hoogendoorn, Mark; Funk, Burkhardt; Ruwaard, Jeroen; Riper, Heleen
2018-06-01
In this paper, we explore the potential of predicting therapy success for patients in mental health care. Such predictions can eventually improve the process of matching effective therapy types to individuals. In the EU project E-COMPARED, a variety of information is gathered about patients suffering from depression. We use this data, where 276 patients received treatment as usual and 227 received blended treatment, to investigate to what extent we are able to predict therapy success. We utilize different encoding strategies for preprocessing, varying feature selection techniques, and different statistical procedures for this purpose. Significant predictive power is found with average AUC values up to 0.7628 for treatment as usual and 0.7765 for blended treatment. Adding daily assessment data for blended treatment does currently not add predictive accuracy. Cost effectiveness analysis is needed to determine the added potential for real-world applications.
A Discrete-Time Average Model Based Predictive Control for Quasi-Z-Source Inverter
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, Yushan; Abu-Rub, Haitham; Xue, Yaosuo
A discrete-time average model-based predictive control (DTA-MPC) is proposed for a quasi-Z-source inverter (qZSI). As a single-stage inverter topology, the qZSI regulates the dc-link voltage and the ac output voltage through the shoot-through (ST) duty cycle and the modulation index. Several feedback strategies have been dedicated to produce these two control variables, among which the most popular are the proportional–integral (PI)-based control and the conventional model-predictive control (MPC). However, in the former, there are tradeoffs between fast response and stability; the latter is robust, but at the cost of high calculation burden and variable switching frequency. Moreover, they require anmore » elaborated design or fine tuning of controller parameters. The proposed DTA-MPC predicts future behaviors of the ST duty cycle and modulation signals, based on the established discrete-time average model of the quasi-Z-source (qZS) inductor current, the qZS capacitor voltage, and load currents. The prediction actions are applied to the qZSI modulator in the next sampling instant, without the need of other controller parameters’ design. A constant switching frequency and significantly reduced computations are achieved with high performance. Transient responses and steady-state accuracy of the qZSI system under the proposed DTA-MPC are investigated and compared with the PI-based control and the conventional MPC. Simulation and experimental results verify the effectiveness of the proposed approach for the qZSI.« less
A Discrete-Time Average Model Based Predictive Control for Quasi-Z-Source Inverter
Liu, Yushan; Abu-Rub, Haitham; Xue, Yaosuo; ...
2017-12-25
A discrete-time average model-based predictive control (DTA-MPC) is proposed for a quasi-Z-source inverter (qZSI). As a single-stage inverter topology, the qZSI regulates the dc-link voltage and the ac output voltage through the shoot-through (ST) duty cycle and the modulation index. Several feedback strategies have been dedicated to produce these two control variables, among which the most popular are the proportional–integral (PI)-based control and the conventional model-predictive control (MPC). However, in the former, there are tradeoffs between fast response and stability; the latter is robust, but at the cost of high calculation burden and variable switching frequency. Moreover, they require anmore » elaborated design or fine tuning of controller parameters. The proposed DTA-MPC predicts future behaviors of the ST duty cycle and modulation signals, based on the established discrete-time average model of the quasi-Z-source (qZS) inductor current, the qZS capacitor voltage, and load currents. The prediction actions are applied to the qZSI modulator in the next sampling instant, without the need of other controller parameters’ design. A constant switching frequency and significantly reduced computations are achieved with high performance. Transient responses and steady-state accuracy of the qZSI system under the proposed DTA-MPC are investigated and compared with the PI-based control and the conventional MPC. Simulation and experimental results verify the effectiveness of the proposed approach for the qZSI.« less
Moghaddar, N; van der Werf, J H J
2017-12-01
The objectives of this study were to estimate the additive and dominance variance component of several weight and ultrasound scanned body composition traits in purebred and combined cross-bred sheep populations based on single nucleotide polymorphism (SNP) marker genotypes and then to investigate the effect of fitting additive and dominance effects on accuracy of genomic evaluation. Additive and dominance variance components were estimated in a mixed model equation based on "average information restricted maximum likelihood" using additive and dominance (co)variances between animals calculated from 48,599 SNP marker genotypes. Genomic prediction was based on genomic best linear unbiased prediction (GBLUP), and the accuracy of prediction was assessed based on a random 10-fold cross-validation. Across different weight and scanned body composition traits, dominance variance ranged from 0.0% to 7.3% of the phenotypic variance in the purebred population and from 7.1% to 19.2% in the combined cross-bred population. In the combined cross-bred population, the range of dominance variance decreased to 3.1% and 9.9% after accounting for heterosis effects. Accounting for dominance effects significantly improved the likelihood of the fitting model in the combined cross-bred population. This study showed a substantial dominance genetic variance for weight and ultrasound scanned body composition traits particularly in cross-bred population; however, improvement in the accuracy of genomic breeding values was small and statistically not significant. Dominance variance estimates in combined cross-bred population could be overestimated if heterosis is not fitted in the model. © 2017 Blackwell Verlag GmbH.
A Probabilistic Strategy for Understanding Action Selection
Kim, Byounghoon; Basso, Michele A.
2010-01-01
Brain regions involved in transforming sensory signals into movement commands are the likely sites where decisions are formed. Once formed, a decision must be read-out from the activity of populations of neurons to produce a choice of action. How this occurs remains unresolved. We recorded from four superior colliculus (SC) neurons simultaneously while monkeys performed a target selection task. We implemented three models to gain insight into the computational principles underlying population coding of action selection. We compared the population vector average (PVA), winner-takes-all (WTA) and a Bayesian model, maximum a posteriori estimate (MAP) to determine which predicted choices most often. The probabilistic model predicted more trials correctly than both the WTA and the PVA. The MAP model predicted 81.88% whereas WTA predicted 71.11% and PVA/OLE predicted the least number of trials at 55.71 and 69.47%. Recovering MAP estimates using simulated, non-uniform priors that correlated with monkeys’ choice performance, improved the accuracy of the model by 2.88%. A dynamic analysis revealed that the MAP estimate evolved over time and the posterior probability of the saccade choice reached a maximum at the time of the saccade. MAP estimates also scaled with choice performance accuracy. Although there was overlap in the prediction abilities of all the models, we conclude that movement choice from populations of neurons may be best understood by considering frameworks based on probability. PMID:20147560
A New Scheme to Characterize and Identify Protein Ubiquitination Sites.
Nguyen, Van-Nui; Huang, Kai-Yao; Huang, Chien-Hsun; Lai, K Robert; Lee, Tzong-Yi
2017-01-01
Protein ubiquitination, involving the conjugation of ubiquitin on lysine residue, serves as an important modulator of many cellular functions in eukaryotes. Recent advancements in proteomic technology have stimulated increasing interest in identifying ubiquitination sites. However, most computational tools for predicting ubiquitination sites are focused on small-scale data. With an increasing number of experimentally verified ubiquitination sites, we were motivated to design a predictive model for identifying lysine ubiquitination sites for large-scale proteome dataset. This work assessed not only single features, such as amino acid composition (AAC), amino acid pair composition (AAPC) and evolutionary information, but also the effectiveness of incorporating two or more features into a hybrid approach to model construction. The support vector machine (SVM) was applied to generate the prediction models for ubiquitination site identification. Evaluation by five-fold cross-validation showed that the SVM models learned from the combination of hybrid features delivered a better prediction performance. Additionally, a motif discovery tool, MDDLogo, was adopted to characterize the potential substrate motifs of ubiquitination sites. The SVM models integrating the MDDLogo-identified substrate motifs could yield an average accuracy of 68.70 percent. Furthermore, the independent testing result showed that the MDDLogo-clustered SVM models could provide a promising accuracy (78.50 percent) and perform better than other prediction tools. Two cases have demonstrated the effective prediction of ubiquitination sites with corresponding substrate motifs.
A fragmentation and reassembly method for ab initio phasing.
Shrestha, Rojan; Zhang, Kam Y J
2015-02-01
Ab initio phasing with de novo models has become a viable approach for structural solution from protein crystallographic diffraction data. This approach takes advantage of the known protein sequence information, predicts de novo models and uses them for structure determination by molecular replacement. However, even the current state-of-the-art de novo modelling method has a limit as to the accuracy of the model predicted, which is sometimes insufficient to be used as a template for successful molecular replacement. A fragment-assembly phasing method has been developed that starts from an ensemble of low-accuracy de novo models, disassembles them into fragments, places them independently in the crystallographic unit cell by molecular replacement and then reassembles them into a whole structure that can provide sufficient phase information to enable complete structure determination by automated model building. Tests on ten protein targets showed that the method could solve structures for eight of these targets, although the predicted de novo models cannot be used as templates for successful molecular replacement since the best model for each target is on average more than 4.0 Å away from the native structure. The method has extended the applicability of the ab initio phasing by de novo models approach. The method can be used to solve structures when the best de novo models are still of low accuracy.
Chen, Yeh-Hsin; Schwartz, Joel D.; Rood, Richard B.; O’Neill, Marie S.
2014-01-01
Background: Heat wave and health warning systems are activated based on forecasts of health-threatening hot weather. Objective: We estimated heat–mortality associations based on forecast and observed weather data in Detroit, Michigan, and compared the accuracy of forecast products for predicting heat waves. Methods: We derived and compared apparent temperature (AT) and heat wave days (with heat waves defined as ≥ 2 days of daily mean AT ≥ 95th percentile of warm-season average) from weather observations and six different forecast products. We used Poisson regression with and without adjustment for ozone and/or PM10 (particulate matter with aerodynamic diameter ≤ 10 μm) to estimate and compare associations of daily all-cause mortality with observed and predicted AT and heat wave days. Results: The 1-day-ahead forecast of a local operational product, Revised Digital Forecast, had about half the number of false positives compared with all other forecasts. On average, controlling for heat waves, days with observed AT = 25.3°C were associated with 3.5% higher mortality (95% CI: –1.6, 8.8%) than days with AT = 8.5°C. Observed heat wave days were associated with 6.2% higher mortality (95% CI: –0.4, 13.2%) than non–heat wave days. The accuracy of predictions varied, but associations between mortality and forecast heat generally tended to overestimate heat effects, whereas associations with forecast heat waves tended to underestimate heat wave effects, relative to associations based on observed weather metrics. Conclusions: Our findings suggest that incorporating knowledge of local conditions may improve the accuracy of predictions used to activate heat wave and health warning systems. Citation: Zhang K, Chen YH, Schwartz JD, Rood RB, O’Neill MS. 2014. Using forecast and observed weather data to assess performance of forecast products in identifying heat waves and estimating heat wave effects on mortality. Environ Health Perspect 122:912–918; http://dx.doi.org/10.1289/ehp.1306858 PMID:24833618
NASA Technical Reports Server (NTRS)
Baurle, R. A.
2015-01-01
Steady-state and scale-resolving simulations have been performed for flow in and around a model scramjet combustor flameholder. The cases simulated corresponded to those used to examine this flowfield experimentally using particle image velocimetry. A variety of turbulence models were used for the steady-state Reynolds-averaged simulations which included both linear and non-linear eddy viscosity models. The scale-resolving simulations used a hybrid Reynolds-averaged / large eddy simulation strategy that is designed to be a large eddy simulation everywhere except in the inner portion (log layer and below) of the boundary layer. Hence, this formulation can be regarded as a wall-modeled large eddy simulation. This effort was undertaken to formally assess the performance of the hybrid Reynolds-averaged / large eddy simulation modeling approach in a flowfield of interest to the scramjet research community. The numerical errors were quantified for both the steady-state and scale-resolving simulations prior to making any claims of predictive accuracy relative to the measurements. The steady-state Reynolds-averaged results showed a high degree of variability when comparing the predictions obtained from each turbulence model, with the non-linear eddy viscosity model (an explicit algebraic stress model) providing the most accurate prediction of the measured values. The hybrid Reynolds-averaged/large eddy simulation results were carefully scrutinized to ensure that even the coarsest grid had an acceptable level of resolution for large eddy simulation, and that the time-averaged statistics were acceptably accurate. The autocorrelation and its Fourier transform were the primary tools used for this assessment. The statistics extracted from the hybrid simulation strategy proved to be more accurate than the Reynolds-averaged results obtained using the linear eddy viscosity models. However, there was no predictive improvement noted over the results obtained from the explicit Reynolds stress model. Fortunately, the numerical error assessment at most of the axial stations used to compare with measurements clearly indicated that the scale-resolving simulations were improving (i.e. approaching the measured values) as the grid was refined. Hence, unlike a Reynolds-averaged simulation, the hybrid approach provides a mechanism to the end-user for reducing model-form errors.
Assessing the accuracy of ANFIS, EEMD-GRNN, PCR, and MLR models in predicting PM2.5
NASA Astrophysics Data System (ADS)
Ausati, Shadi; Amanollahi, Jamil
2016-10-01
Since Sanandaj is considered one of polluted cities of Iran, prediction of any type of pollution especially prediction of suspended particles of PM2.5, which are the cause of many diseases, could contribute to health of society by timely announcements and prior to increase of PM2.5. In order to predict PM2.5 concentration in the Sanandaj air the hybrid models consisting of an ensemble empirical mode decomposition and general regression neural network (EEMD-GRNN), Adaptive Neuro-Fuzzy Inference System (ANFIS), principal component regression (PCR), and linear model such as multiple liner regression (MLR) model were used. In these models the data of suspended particles of PM2.5 were the dependent variable and the data related to air quality including PM2.5, PM10, SO2, NO2, CO, O3 and meteorological data including average minimum temperature (Min T), average maximum temperature (Max T), average atmospheric pressure (AP), daily total precipitation (TP), daily relative humidity level of the air (RH) and daily wind speed (WS) for the year 2014 in Sanandaj were the independent variables. Among the used models, EEMD-GRNN model with values of R2 = 0.90, root mean square error (RMSE) = 4.9218 and mean absolute error (MAE) = 3.4644 in the training phase and with values of R2 = 0.79, RMSE = 5.0324 and MAE = 3.2565 in the testing phase, exhibited the best function in predicting this phenomenon. It can be concluded that hybrid models have accurate results to predict PM2.5 concentration compared with linear model.
A Lagrangian stochastic model for aerial spray transport above an oak forest
Wang, Yansen; Miller, David R.; Anderson, Dean E.; McManus, Michael L.
1995-01-01
An aerial spray droplets' transport model has been developed by applying recent advances in Lagrangian stochastic simulation of heavy particles. A two-dimensional Lagrangian stochastic model was adopted to simulate the spray droplet dispersion in atmospheric turbulence by adjusting the Lagrangian integral time scale along the drop trajectory. The other major physical processes affecting the transport of spray droplets above a forest canopy, the aircraft wingtip vortices and the droplet evaporation, were also included in each time step of the droplets' transport.The model was evaluated using data from an aerial spray field experiment. In generally neutral stability conditions, the accuracy of the model predictions varied from run-to-run as expected. The average root-mean-square error was 24.61 IU cm−2, and the average relative error was 15%. The model prediction was adequate in two-dimensional steady wind conditions, but was less accurate in variable wind condition. The results indicated that the model can simulate successfully the ensemble; average transport of aerial spray droplets under neutral, steady atmospheric wind conditions.
Automated Protocol for Large-Scale Modeling of Gene Expression Data.
Hall, Michelle Lynn; Calkins, David; Sherman, Woody
2016-11-28
With the continued rise of phenotypic- and genotypic-based screening projects, computational methods to analyze, process, and ultimately make predictions in this field take on growing importance. Here we show how automated machine learning workflows can produce models that are predictive of differential gene expression as a function of a compound structure using data from A673 cells as a proof of principle. In particular, we present predictive models with an average accuracy of greater than 70% across a highly diverse ∼1000 gene expression profile. In contrast to the usual in silico design paradigm, where one interrogates a particular target-based response, this work opens the opportunity for virtual screening and lead optimization for desired multitarget gene expression profiles.
Modeling Sediment Detention Ponds Using Reactor Theory and Advection-Diffusion Concepts
NASA Astrophysics Data System (ADS)
Wilson, Bruce N.; Barfield, Billy J.
1985-04-01
An algorithm is presented to model the sedimentation process in detention ponds. This algorithm is based on a mass balance for an infinitesimal layer that couples reactor theory concepts with advection-diffusion processes. Reactor theory concepts are used to (1) determine residence time of sediment particles and to (2) mix influent sediment with previously stored flow. Advection-diffusion processes are used to model the (1) settling characteristics of sediment and the (2) vertical diffusion of sediment due to turbulence. Predicted results of the model are compared to those observed on two pilot scale ponds for a total of 12 runs. The average percent error between predicted and observed trap efficiency was 5.2%. Overall, the observed sedimentology values were predicted with reasonable accuracy.
Deposition of aerially applied BT in an oak forest and its prediction with the FSCBG model
Anderson, Dean E.; Miller, David R.; Wang, Yansen; Yendol, William G.; Mierzejewski, Karl; McManus, Michael L.
1992-01-01
Data are provided from 17 single-swath aerial spray trials that were conducted over a fully leafed, 16-m tall, mixed oak forest. The distribution of cross-swath spray deposits was sampled at the top of the canopy and below the canopy. Micrometeorological conditions were measured above and within the canopy during the spray trials. The USDA Forest Service FSCBG (Forest Service-Cramer-Barry-Grim) model was run to predict the target sampler catch for each trial using forest stand, airplane-application-equipment configuration, and micrometeorological conditions as inputs. Observations showed an average cross-swath deposition of 100 IU cm−2 with large run-to-run variability in deposition patterns, magnitudes, and drift. Eleven percent of the spray material that reached the top of the canopy penetrated through the tree canopy to the forest floor.The FSCBG predictions of the ensemble-averaged deposition were within 17% of the measured deposition at the canopy top and within 8% on the ground beneath the canopy. Run-to-run deposit predictions by FSCBG were considerably less variable than the measured deposits. Individual run predictions were much less accurate than the ensemble-averaged predictions as demonstrated by an average root-mean-square-error (rmse) of 27.9 IU CM−2 at the top of the canopy. Comparisons of the differences between predicted and observed deposits indicated that the model accuracy was sensitive to atmospheric stability conditions. In neutral and stable conditions, a regular pattern of error was indicated by overprediction of the canopy-top deposit at distances from 0 to 20 m downwind from the flight line and underprediction of the deposit both farther downwind than 20 m and upwind of the flight line. In unstable conditions the model generally underpredicted the deposit downwind from the flight line, but showed no regular pattern of error.
Accuracy of genomic predictions in Gyr (Bos indicus) dairy cattle.
Boison, S A; Utsunomiya, A T H; Santos, D J A; Neves, H H R; Carvalheiro, R; Mészáros, G; Utsunomiya, Y T; do Carmo, A S; Verneque, R S; Machado, M A; Panetto, J C C; Garcia, J F; Sölkner, J; da Silva, M V G B
2017-07-01
Genomic selection may accelerate genetic progress in breeding programs of indicine breeds when compared with traditional selection methods. We present results of genomic predictions in Gyr (Bos indicus) dairy cattle of Brazil for milk yield (MY), fat yield (FY), protein yield (PY), and age at first calving using information from bulls and cows. Four different single nucleotide polymorphism (SNP) chips were studied. Additionally, the effect of the use of imputed data on genomic prediction accuracy was studied. A total of 474 bulls and 1,688 cows were genotyped with the Illumina BovineHD (HD; San Diego, CA) and BovineSNP50 (50K) chip, respectively. Genotypes of cows were imputed to HD using FImpute v2.2. After quality check of data, 496,606 markers remained. The HD markers present on the GeneSeek SGGP-20Ki (15,727; Lincoln, NE), 50K (22,152), and GeneSeek GGP-75Ki (65,018) were subset and used to assess the effect of lower SNP density on accuracy of prediction. Deregressed breeding values were used as pseudophenotypes for model training. Data were split into reference and validation to mimic a forward prediction scheme. The reference population consisted of animals whose birth year was ≤2004 and consisted of either only bulls (TR1) or a combination of bulls and dams (TR2), whereas the validation set consisted of younger bulls (born after 2004). Genomic BLUP was used to estimate genomic breeding values (GEBV) and reliability of GEBV (R 2 PEV ) was based on the prediction error variance approach. Reliability of GEBV ranged from ∼0.46 (FY and PY) to 0.56 (MY) with TR1 and from 0.51 (PY) to 0.65 (MY) with TR2. When averaged across all traits, R 2 PEV were substantially higher (R 2 PEV of TR1 = 0.50 and TR2 = 0.57) compared with reliabilities of parent averages (0.35) computed from pedigree data and based on diagonals of the coefficient matrix (prediction error variance approach). Reliability was similar for all the 4 marker panels using either TR1 or TR2, except that imputed HD cow data set led to an inflation of reliability. Reliability of GEBV could be increased by enlarging the limited bull reference population with cow information. A reduced panel of ∼15K markers resulted in reliabilities similar to using HD markers. Reliability of GEBV could be increased by enlarging the limited bull reference population with cow information. Copyright © 2017 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Song, Jiangning; Yuan, Zheng; Tan, Hao; Huber, Thomas; Burrage, Kevin
2007-12-01
Disulfide bonds are primary covalent crosslinks between two cysteine residues in proteins that play critical roles in stabilizing the protein structures and are commonly found in extracy-toplasmatic or secreted proteins. In protein folding prediction, the localization of disulfide bonds can greatly reduce the search in conformational space. Therefore, there is a great need to develop computational methods capable of accurately predicting disulfide connectivity patterns in proteins that could have potentially important applications. We have developed a novel method to predict disulfide connectivity patterns from protein primary sequence, using a support vector regression (SVR) approach based on multiple sequence feature vectors and predicted secondary structure by the PSIPRED program. The results indicate that our method could achieve a prediction accuracy of 74.4% and 77.9%, respectively, when averaged on proteins with two to five disulfide bridges using 4-fold cross-validation, measured on the protein and cysteine pair on a well-defined non-homologous dataset. We assessed the effects of different sequence encoding schemes on the prediction performance of disulfide connectivity. It has been shown that the sequence encoding scheme based on multiple sequence feature vectors coupled with predicted secondary structure can significantly improve the prediction accuracy, thus enabling our method to outperform most of other currently available predictors. Our work provides a complementary approach to the current algorithms that should be useful in computationally assigning disulfide connectivity patterns and helps in the annotation of protein sequences generated by large-scale whole-genome projects. The prediction web server and Supplementary Material are accessible at http://foo.maths.uq.edu.au/~huber/disulfide
Systematic bias of correlation coefficient may explain negative accuracy of genomic prediction.
Zhou, Yao; Vales, M Isabel; Wang, Aoxue; Zhang, Zhiwu
2017-09-01
Accuracy of genomic prediction is commonly calculated as the Pearson correlation coefficient between the predicted and observed phenotypes in the inference population by using cross-validation analysis. More frequently than expected, significant negative accuracies of genomic prediction have been reported in genomic selection studies. These negative values are surprising, given that the minimum value for prediction accuracy should hover around zero when randomly permuted data sets are analyzed. We reviewed the two common approaches for calculating the Pearson correlation and hypothesized that these negative accuracy values reflect potential bias owing to artifacts caused by the mathematical formulas used to calculate prediction accuracy. The first approach, Instant accuracy, calculates correlations for each fold and reports prediction accuracy as the mean of correlations across fold. The other approach, Hold accuracy, predicts all phenotypes in all fold and calculates correlation between the observed and predicted phenotypes at the end of the cross-validation process. Using simulated and real data, we demonstrated that our hypothesis is true. Both approaches are biased downward under certain conditions. The biases become larger when more fold are employed and when the expected accuracy is low. The bias of Instant accuracy can be corrected using a modified formula. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
The value of vital sign trends for detecting clinical deterioration on the wards
Churpek, Matthew M; Adhikari, Richa; Edelson, Dana P
2016-01-01
Aim Early detection of clinical deterioration on the wards may improve outcomes, and most early warning scores only utilize a patient’s current vital signs. The added value of vital sign trends over time is poorly characterized. We investigated whether adding trends improves accuracy and which methods are optimal for modelling trends. Methods Patients admitted to five hospitals over a five-year period were included in this observational cohort study, with 60% of the data used for model derivation and 40% for validation. Vital signs were utilized to predict the combined outcome of cardiac arrest, intensive care unit transfer, and death. The accuracy of models utilizing both the current value and different trend methods were compared using the area under the receiver operating characteristic curve (AUC). Results A total of 269,999 patient admissions were included, which resulted in 16,452 outcomes. Overall, trends increased accuracy compared to a model containing only current vital signs (AUC 0.78 vs. 0.74; p<0.001). The methods that resulted in the greatest average increase in accuracy were the vital sign slope (AUC improvement 0.013) and minimum value (AUC improvement 0.012), while the change from the previous value resulted in an average worsening of the AUC (change in AUC −0.002). The AUC increased most for systolic blood pressure when trends were added (AUC improvement 0.05). Conclusion Vital sign trends increased the accuracy of models designed to detect critical illness on the wards. Our findings have important implications for clinicians at the bedside and for the development of early warning scores. PMID:26898412
The value of vital sign trends for detecting clinical deterioration on the wards.
Churpek, Matthew M; Adhikari, Richa; Edelson, Dana P
2016-05-01
Early detection of clinical deterioration on the wards may improve outcomes, and most early warning scores only utilize a patient's current vital signs. The added value of vital sign trends over time is poorly characterized. We investigated whether adding trends improves accuracy and which methods are optimal for modelling trends. Patients admitted to five hospitals over a five-year period were included in this observational cohort study, with 60% of the data used for model derivation and 40% for validation. Vital signs were utilized to predict the combined outcome of cardiac arrest, intensive care unit transfer, and death. The accuracy of models utilizing both the current value and different trend methods were compared using the area under the receiver operating characteristic curve (AUC). A total of 269,999 patient admissions were included, which resulted in 16,452 outcomes. Overall, trends increased accuracy compared to a model containing only current vital signs (AUC 0.78 vs. 0.74; p<0.001). The methods that resulted in the greatest average increase in accuracy were the vital sign slope (AUC improvement 0.013) and minimum value (AUC improvement 0.012), while the change from the previous value resulted in an average worsening of the AUC (change in AUC -0.002). The AUC increased most for systolic blood pressure when trends were added (AUC improvement 0.05). Vital sign trends increased the accuracy of models designed to detect critical illness on the wards. Our findings have important implications for clinicians at the bedside and for the development of early warning scores. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Assessing the accuracy of predictive models for numerical data: Not r nor r2, why not? Then what?
2017-01-01
Assessing the accuracy of predictive models is critical because predictive models have been increasingly used across various disciplines and predictive accuracy determines the quality of resultant predictions. Pearson product-moment correlation coefficient (r) and the coefficient of determination (r2) are among the most widely used measures for assessing predictive models for numerical data, although they are argued to be biased, insufficient and misleading. In this study, geometrical graphs were used to illustrate what were used in the calculation of r and r2 and simulations were used to demonstrate the behaviour of r and r2 and to compare three accuracy measures under various scenarios. Relevant confusions about r and r2, has been clarified. The calculation of r and r2 is not based on the differences between the predicted and observed values. The existing error measures suffer various limitations and are unable to tell the accuracy. Variance explained by predictive models based on cross-validation (VEcv) is free of these limitations and is a reliable accuracy measure. Legates and McCabe’s efficiency (E1) is also an alternative accuracy measure. The r and r2 do not measure the accuracy and are incorrect accuracy measures. The existing error measures suffer limitations. VEcv and E1 are recommended for assessing the accuracy. The applications of these accuracy measures would encourage accuracy-improved predictive models to be developed to generate predictions for evidence-informed decision-making. PMID:28837692
Snorkel: Rapid Training Data Creation with Weak Supervision.
Ratner, Alexander; Bach, Stephen H; Ehrenberg, Henry; Fries, Jason; Wu, Sen; Ré, Christopher
2017-11-01
Labeling training data is increasingly the largest bottleneck in deploying machine learning systems. We present Snorkel, a first-of-its-kind system that enables users to train state-of- the-art models without hand labeling any training data. Instead, users write labeling functions that express arbitrary heuristics, which can have unknown accuracies and correlations. Snorkel denoises their outputs without access to ground truth by incorporating the first end-to-end implementation of our recently proposed machine learning paradigm, data programming. We present a flexible interface layer for writing labeling functions based on our experience over the past year collaborating with companies, agencies, and research labs. In a user study, subject matter experts build models 2.8× faster and increase predictive performance an average 45.5% versus seven hours of hand labeling. We study the modeling tradeoffs in this new setting and propose an optimizer for automating tradeoff decisions that gives up to 1.8× speedup per pipeline execution. In two collaborations, with the U.S. Department of Veterans Affairs and the U.S. Food and Drug Administration, and on four open-source text and image data sets representative of other deployments, Snorkel provides 132% average improvements to predictive performance over prior heuristic approaches and comes within an average 3.60% of the predictive performance of large hand-curated training sets.
Apollo 16, LM-11 descent propulsion system final flight evaluation
NASA Technical Reports Server (NTRS)
Avvenire, A. T.
1974-01-01
The performance of the LM-11 descent propulsion system during the Apollo 16 missions was evaluated and found satisfactory. The average engine effective specific impulse was 0.1 second higher than predicted, but well within the predicted one sigma uncertainty of 0.2 seconds. Several flight measurement discrepancies existed during the flight as follows: (1) the chamber pressure transducer had a noticeable drift, exhibiting a maximum error of about 1.5 psi at approximately 130 seconds after engine ignition, (2) the fuel and oxidizer interface pressure measurements appeared to be low during the entire flight, and (3) the fuel propellant quantity gaging system did not perform within expected accuracies.
RANS Simulation of the Separated Flow over a Bump with Active Control
NASA Technical Reports Server (NTRS)
Iaccarino, Gianluca; Marongiu, Claudio; Catalano, Pietro; Amato, Marcello
2003-01-01
The objective of this paper is to investigate the accuracy of Reynolds-Averaged Navier- Stokes (RANS) techniques in predicting the effect of steady and unsteady flow control devices. This is part of a larger effort in applying numerical simulation tools to investigate of the performance of synthetic jets in high Reynolds number turbulent flows. RANS techniques have been successful in predicting isolated synthetic jets as reported by Kral et al. Nevertheless, due to the complex, and inherently unsteady nature of the interaction between the synthetic jet and the external boundary layer flow, it is not clear whether RANS models can represent the turbulence statistics correctly.
NASA Astrophysics Data System (ADS)
Jiao, Peng; Yang, Er; Ni, Yong Xin
2018-06-01
The overland flow resistance on grassland slope of 20° was studied by using simulated rainfall experiments. Model of overland flow resistance coefficient was established based on BP neural network. The input variations of model were rainfall intensity, flow velocity, water depth, and roughness of slope surface, and the output variations was overland flow resistance coefficient. Model was optimized by Genetic Algorithm. The results show that the model can be used to calculate overland flow resistance coefficient, and has high simulation accuracy. The average prediction error of the optimized model of test set is 8.02%, and the maximum prediction error was 18.34%.
Changes in the relation between snow station observations and basin scale snow water resources
NASA Astrophysics Data System (ADS)
Sexstone, G. A.; Penn, C. A.; Clow, D. W.; Moeser, D.; Liston, G. E.
2017-12-01
Snow monitoring stations that measure snow water equivalent or snow depth provide fundamental observations used for predicting water availability and flood risk in mountainous regions. In the western United States, snow station observations provided by the Natural Resources Conservation Service Snow Telemetry (SNOTEL) network are relied upon for forecasting spring and summer streamflow volume. Streamflow forecast accuracy has declined for many regions over the last several decades. Changes in snow accumulation and melt related to climate, land use, and forest cover are not accounted for in current forecasts, and are likely sources of error. Therefore, understanding and updating relations between snow station observations and basin scale snow water resources is crucial to improve accuracy of streamflow prediction. In this study, we investigated the representativeness of snow station observations when compared to simulated basin-wide snow water resources within the Rio Grande headwaters of Colorado. We used the combination of a process-based snow model (SnowModel), field-based measurements, and remote sensing observations to compare the spatiotemporal variability of simulated basin-wide snow accumulation and melt with that of SNOTEL station observations. Results indicated that observations are comparable to simulated basin-average winter precipitation but overestimate both the simulated basin-average snow water equivalent and snowmelt rate. Changes in the representation of snow station observations over time in the Rio Grande headwaters were also investigated and compared to observed streamflow and streamflow forecasting errors. Results from this study provide important insight in the context of non-stationarity for future water availability assessments and streamflow predictions.
Statistical validation of a solar wind propagation model from 1 to 10 AU
NASA Astrophysics Data System (ADS)
Zieger, Bertalan; Hansen, Kenneth C.
2008-08-01
A one-dimensional (1-D) numerical magnetohydrodynamic (MHD) code is applied to propagate the solar wind from 1 AU through 10 AU, i.e., beyond the heliocentric distance of Saturn's orbit, in a non-rotating frame of reference. The time-varying boundary conditions at 1 AU are obtained from hourly solar wind data observed near the Earth. Although similar MHD simulations have been carried out and used by several authors, very little work has been done to validate the statistical accuracy of such solar wind predictions. In this paper, we present an extensive analysis of the prediction efficiency, using 12 selected years of solar wind data from the major heliospheric missions Pioneer, Voyager, and Ulysses. We map the numerical solution to each spacecraft in space and time, and validate the simulation, comparing the propagated solar wind parameters with in-situ observations. We do not restrict our statistical analysis to the times of spacecraft alignment, as most of the earlier case studies do. Our superposed epoch analysis suggests that the prediction efficiency is significantly higher during periods with high recurrence index of solar wind speed, typically in the late declining phase of the solar cycle. Among the solar wind variables, the solar wind speed can be predicted to the highest accuracy, with a linear correlation of 0.75 on average close to the time of opposition. We estimate the accuracy of shock arrival times to be as high as 10-15 hours within ±75 d from apparent opposition during years with high recurrence index. During solar activity maximum, there is a clear bias for the model to predicted shocks arriving later than observed in the data, suggesting that during these periods, there is an additional acceleration mechanism in the solar wind that is not included in the model.
CFD Computations for a Generic High-Lift Configuration Using TetrUSS
NASA Technical Reports Server (NTRS)
Pandya, Mohagna J.; Abdol-Hamid, Khaled S.; Parlette, Edward B.
2011-01-01
Assessment of the accuracy of computational results for a generic high-lift trapezoidal wing with a single slotted flap and slat is presented. The paper is closely aligned with the focus of the 1st AIAA CFD High Lift Prediction Workshop (HiLiftPW-1) which was to assess the accuracy of CFD methods for multi-element high-lift configurations. The unstructured grid Reynolds-Averaged Navier-Stokes solver TetrUSS/USM3D is used for the computational results. USM3D results are obtained assuming fully turbulent flow using the Spalart-Allmaras (SA) and Shear Stress Transport (SST) turbulence models. Computed solutions have been obtained at seven different angles-of-attack ranging from 6 -37 . Three grids providing progressively higher grid resolution are used to quantify the effect of grid resolution on the lift, drag, pitching moment, surface pressure and stall angle. SA results, as compared to SST results, exhibit better agreement with the measured data. However, both turbulence models under-predict upper surface pressures near the wing tip region.
Zednikova Mala, Pavla; Veleminska, Jana
2018-01-01
This study measured the accuracy of traditional and validated newly proposed methods for globe positioning in lateral view. Eighty lateral head cephalograms of adult subjects from Central Europe were taken, and the actual and predicted dimensions were compared. The anteroposterior eyeball position was estimated as the most accurate method based on the proportion of the orbital height (SEE = 1.9 mm) and was followed by the "tangent to the iris method" showing SEE = 2.4 mm. The traditional "tangent to the cornea method" underestimated the eyeball projection by SEE = 5.8 mm. Concerning the superoinferior eyeball position, the results showed a deviation from a central to a more superior position by 0.3 mm, on average, and the traditional method of central positioning of the globe could not be rejected as inaccurate (SEE = 0.3 mm). Based on regression analyzes or proportionality of the orbital height, the SEE = 2.1 mm. © 2017 American Academy of Forensic Sciences.
Gironés, Xavier; Carbó-Dorca, Ramon; Ponec, Robert
2003-01-01
A new approach allowing the theoretical modeling of the electronic substituent effect is proposed. The approach is based on the use of fragment Quantum Self-Similarity Measures (MQS-SM) calculated from domain averaged Fermi Holes as new theoretical descriptors allowing for the replacement of Hammett sigma constants in QSAR models. To demonstrate the applicability of this new approach its formalism was applied to the description of the substituent effect on the dissociation of a broad series of meta and para substituted benzoic acids. The accuracy and the predicting power of this new approach was tested on the comparison with a recent exhaustive study by Sullivan et al. It has been shown that the accuracy and the predicting power of both procedures is comparable, but, in contrast to a five-parameter correlation equation necessary to describe the data in the study, our approach is more simple and, in fact, only a simple one-parameter correlation equation is required.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Teo, Troy; Alayoubi, Nadia; Bruce, Neil
Purpose: In image-guided adaptive radiotherapy systems, prediction of tumor motion is required to compensate for system latencies. However, due to the non-stationary nature of respiration, it is a challenge to predict the associated tumor motions. In this work, a systematic design of the neural network (NN) using a mixture of online data acquired during the initial period of the tumor trajectory, coupled with a generalized model optimized using a group of patient data (obtained offline) is presented. Methods: The average error surface obtained from seven patients was used to determine the input data size and number of hidden neurons formore » the generalized NN. To reduce training time, instead of using random weights to initialize learning (method 1), weights inherited from previous training batches (method 2) were used to predict tumor position for each sliding window. Results: The generalized network was established with 35 input data (∼4.66s) and 20 hidden nodes. For a prediction horizon of 650 ms, mean absolute errors of 0.73 mm and 0.59 mm were obtained for method 1 and 2 respectively. An average initial learning period of 8.82 s is obtained. Conclusions: A network with a relatively short initial learning time was achieved. Its accuracy is comparable to previous studies. This network could be used as a plug-and play predictor in which (a) tumor positions can be predicted as soon as treatment begins and (b) the need for pretreatment data and optimization for individual patients can be avoided.« less
Genetically optimizing weather predictions
NASA Astrophysics Data System (ADS)
Potter, S. B.; Staats, Kai; Romero-Colmenero, Encarni
2016-07-01
humidity, air pressure, wind speed and wind direction) into a database. Built upon this database, we have developed a remarkably simple approach to derive a functional weather predictor. The aim is provide up to the minute local weather predictions in order to e.g. prepare dome environment conditions ready for night time operations or plan, prioritize and update weather dependent observing queues. In order to predict the weather for the next 24 hours, we take the current live weather readings and search the entire archive for similar conditions. Predictions are made against an averaged, subsequent 24 hours of the closest matches for the current readings. We use an Evolutionary Algorithm to optimize our formula through weighted parameters. The accuracy of the predictor is routinely tested and tuned against the full, updated archive to account for seasonal trends and total, climate shifts. The live (updated every 5 minutes) SALT weather predictor can be viewed here: http://www.saao.ac.za/ sbp/suthweather_predict.html
A Method for WD40 Repeat Detection and Secondary Structure Prediction
Wang, Yang; Jiang, Fan; Zhuo, Zhu; Wu, Xian-Hui; Wu, Yun-Dong
2013-01-01
WD40-repeat proteins (WD40s), as one of the largest protein families in eukaryotes, play vital roles in assembling protein-protein/DNA/RNA complexes. WD40s fold into similar β-propeller structures despite diversified sequences. A program WDSP (WD40 repeat protein Structure Predictor) has been developed to accurately identify WD40 repeats and predict their secondary structures. The method is designed specifically for WD40 proteins by incorporating both local residue information and non-local family-specific structural features. It overcomes the problem of highly diversified protein sequences and variable loops. In addition, WDSP achieves a better prediction in identifying multiple WD40-domain proteins by taking the global combination of repeats into consideration. In secondary structure prediction, the average Q3 accuracy of WDSP in jack-knife test reaches 93.7%. A disease related protein LRRK2 was used as a representive example to demonstrate the structure prediction. PMID:23776530
Applying a weighted random forests method to extract karst sinkholes from LiDAR data
NASA Astrophysics Data System (ADS)
Zhu, Junfeng; Pierskalla, William P.
2016-02-01
Detailed mapping of sinkholes provides critical information for mitigating sinkhole hazards and understanding groundwater and surface water interactions in karst terrains. LiDAR (Light Detection and Ranging) measures the earth's surface in high-resolution and high-density and has shown great potentials to drastically improve locating and delineating sinkholes. However, processing LiDAR data to extract sinkholes requires separating sinkholes from other depressions, which can be laborious because of the sheer number of the depressions commonly generated from LiDAR data. In this study, we applied the random forests, a machine learning method, to automatically separate sinkholes from other depressions in a karst region in central Kentucky. The sinkhole-extraction random forest was grown on a training dataset built from an area where LiDAR-derived depressions were manually classified through a visual inspection and field verification process. Based on the geometry of depressions, as well as natural and human factors related to sinkholes, 11 parameters were selected as predictive variables to form the dataset. Because the training dataset was imbalanced with the majority of depressions being non-sinkholes, a weighted random forests method was used to improve the accuracy of predicting sinkholes. The weighted random forest achieved an average accuracy of 89.95% for the training dataset, demonstrating that the random forest can be an effective sinkhole classifier. Testing of the random forest in another area, however, resulted in moderate success with an average accuracy rate of 73.96%. This study suggests that an automatic sinkhole extraction procedure like the random forest classifier can significantly reduce time and labor costs and makes its more tractable to map sinkholes using LiDAR data for large areas. However, the random forests method cannot totally replace manual procedures, such as visual inspection and field verification.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Garnon, J., E-mail: juliengarnon@gmail.com; Ramamurthy, N., E-mail: nitin-ramamurthy@hotmail.com; Caudrelier J, J., E-mail: caudjean@yahoo.fr
2016-05-15
ObjectiveTo evaluate the diagnostic accuracy and safety of magnetic resonance imaging (MRI)-guided percutaneous biopsy of mediastinal masses performed using a wide-bore high-field scanner.Materials and MethodsThis is a retrospective study of 16 consecutive patients (8 male, 8 female; mean age 74 years) who underwent MRI-guided core needle biopsy of a mediastinal mass between February 2010 and January 2014. Size and location of lesion, approach taken, time for needle placement, overall duration of procedure, and post-procedural complications were evaluated. Technical success rates and correlation with surgical pathology (where available) were assessed.ResultsTarget lesions were located in the anterior (n = 13), middle (n = 2), and posterior mediastinummore » (n = 1), respectively. Mean size was 7.2 cm (range 3.6–11 cm). Average time for needle placement was 9.4 min (range 3–18 min); average duration of entire procedure was 42 min (range 27–62 min). 2–5 core samples were obtained from each lesion (mean 2.6). Technical success rate was 100 %, with specimens successfully obtained in all 16 patients. There were no immediate complications. Histopathology revealed malignancy in 12 cases (4 of which were surgically confirmed), benign lesions in 3 cases (1 of which was false negative following surgical resection), and one inconclusive specimen (treated as inaccurate since repeat CT-guided biopsy demonstrated thymic hyperplasia). Sensitivity, specificity, positive predictive value, negative predictive value, and accuracy in our study were 92.3, 100, 100, 66.7, and 87.5 %, respectively.ConclusionMRI-guided mediastinal biopsy is a safe procedure with high diagnostic accuracy, which may offer a non-ionizing alternative to CT guidance.« less
Palucci Vieira, Luiz H; de Andrade, Vitor L; Aquino, Rodrigo L; Moraes, Renato; Barbieri, Fabio A; Cunha, Sérgio A; Bedo, Bruno L; Santiago, Paulo R
2017-12-01
The main aim of this study was to verify the relationship between the classification of coaches and actual performance in field tests that measure the kicking performance in young soccer players, using the K-means clustering technique. Twenty-three U-14 players performed 8 tests to measure their kicking performance. Four experienced coaches provided a rating for each player as follows: 1: poor; 2: below average; 3: average; 4: very good; 5: excellent as related to three parameters (i.e. accuracy, power and ability to put spin on the ball). The scores interval established from k-means cluster metric was useful to originating five groups of performance level, since ANOVA revealed significant differences between clusters generated (P<0.01). Accuracy seems to be moderately predicted by the penalty kick, free kick, kicking the ball rolling and Wall Volley Test (0.44≤r≤0.56), while the ability to put spin on the ball can be measured by the free kick and the corner kick tests (0.52≤r≤0.61). Body measurements, age and PHV did not systematically influence the performance. The Wall Volley Test seems to be a good predictor of other tests. Five tests showed reasonable construct validity and can be used to predict the accuracy (penalty kick, free kick, kicking a rolling ball and Wall Volley Test) and ability to put spin on the ball (free kick and corner kick tests) when kicking in soccer. In contrast, the goal kick, kicking the ball when airborne and the vertical kick tests exhibited low power of discrimination and using them should be viewed with caution.
Stocco, G; Cipolat-Gotet, C; Bonfatti, V; Schiavon, S; Bittante, G; Cecchinato, A
2016-11-01
The aims of this study were (1) to assess variability in the major mineral components of buffalo milk, (2) to estimate the effect of certain environmental sources of variation on the major minerals during lactation, and (3) to investigate the possibility of using Fourier-transform infrared (FTIR) spectroscopy as an indirect, noninvasive tool for routine prediction of the mineral content of buffalo milk. A total of 173 buffaloes reared in 5 herds were sampled once during the morning milking. Milk samples were analyzed for Ca, P, K, and Mg contents within 3h of sample collection using inductively coupled plasma optical emission spectrometry. A Milkoscan FT2 (Foss, Hillerød, Denmark) was used to acquire milk spectra over the spectral range from 5,000 to 900 wavenumber/cm. Prediction models were built using a partial least square approach, and cross-validation was used to assess the prediction accuracy of FTIR. Prediction models were validated using a 4-fold random cross-validation, thus dividing the calibration-test set in 4 folds, using one of them to check the results (prediction models) and the remaining 3 to develop the calibration models. Buffalo milk minerals averaged 162, 117, 86, and 14.4mg/dL of milk for Ca, P, K, and Mg, respectively. Herd and days in milk were the most important sources of variation in the traits investigated. Parity slightly affected only Ca content. Coefficients of determination of cross-validation between the FTIR-predicted and the measured values were 0.71, 0.70, and 0.72 for Ca, Mg, and P, respectively, whereas prediction accuracy was lower for K (0.55). Our findings reveal FTIR to be an unsuitable tool when milk mineral content needs to be predicted with high accuracy. Predictions may play a role as indicator traits in selective breeding (if the additive genetic correlation between FTIR predictions and measures of milk minerals is high enough) or in monitoring the milk of buffalo populations for dairy industry purposes. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Predicting Key Events in the Popularity Evolution of Online Information.
Hu, Ying; Hu, Changjun; Fu, Shushen; Fang, Mingzhe; Xu, Wenwen
2017-01-01
The popularity of online information generally experiences a rising and falling evolution. This paper considers the "burst", "peak", and "fade" key events together as a representative summary of popularity evolution. We propose a novel prediction task-predicting when popularity undergoes these key events. It is of great importance to know when these three key events occur, because doing so helps recommendation systems, online marketing, and containment of rumors. However, it is very challenging to solve this new prediction task due to two issues. First, popularity evolution has high variation and can follow various patterns, so how can we identify "burst", "peak", and "fade" in different patterns of popularity evolution? Second, these events usually occur in a very short time, so how can we accurately yet promptly predict them? In this paper we address these two issues. To handle the first one, we use a simple moving average to smooth variation, and then a universal method is presented for different patterns to identify the key events in popularity evolution. To deal with the second one, we extract different types of features that may have an impact on the key events, and then a correlation analysis is conducted in the feature selection step to remove irrelevant and redundant features. The remaining features are used to train a machine learning model. The feature selection step improves prediction accuracy, and in order to emphasize prediction promptness, we design a new evaluation metric which considers both accuracy and promptness to evaluate our prediction task. Experimental and comparative results show the superiority of our prediction solution.
Predicting Key Events in the Popularity Evolution of Online Information
Fu, Shushen; Fang, Mingzhe; Xu, Wenwen
2017-01-01
The popularity of online information generally experiences a rising and falling evolution. This paper considers the “burst”, “peak”, and “fade” key events together as a representative summary of popularity evolution. We propose a novel prediction task—predicting when popularity undergoes these key events. It is of great importance to know when these three key events occur, because doing so helps recommendation systems, online marketing, and containment of rumors. However, it is very challenging to solve this new prediction task due to two issues. First, popularity evolution has high variation and can follow various patterns, so how can we identify “burst”, “peak”, and “fade” in different patterns of popularity evolution? Second, these events usually occur in a very short time, so how can we accurately yet promptly predict them? In this paper we address these two issues. To handle the first one, we use a simple moving average to smooth variation, and then a universal method is presented for different patterns to identify the key events in popularity evolution. To deal with the second one, we extract different types of features that may have an impact on the key events, and then a correlation analysis is conducted in the feature selection step to remove irrelevant and redundant features. The remaining features are used to train a machine learning model. The feature selection step improves prediction accuracy, and in order to emphasize prediction promptness, we design a new evaluation metric which considers both accuracy and promptness to evaluate our prediction task. Experimental and comparative results show the superiority of our prediction solution. PMID:28046121
NASA Astrophysics Data System (ADS)
Holden, Z.; Cushman, S.; Evans, J.; Littell, J. S.
2009-12-01
The resolution of current climate interpolation models limits our ability to adequately account for temperature variability in complex mountainous terrain. We empirically derive 30 meter resolution models of June-October day and nighttime temperature and April nighttime Vapor Pressure Deficit (VPD) using hourly data from 53 Hobo dataloggers stratified by topographic setting in mixed conifer forests near Bonners Ferry, ID. 66%, of the variability in average June-October daytime temperature is explained by 3 variables (elevation, relative slope position and topographic roughness) derived from 30 meter digital elevation models. 69% of the variability in nighttime temperatures among stations is explained by elevation, relative slope position and topographic dissection (450 meter window). 54% of variability in April nighttime VPD is explained by elevation, soil wetness and the NDVIc derived from Landsat. We extract temperature and VPD predictions at 411 intensified Forest Inventory and Analysis plots (FIA). We use these variables with soil wetness and solar radiation indices derived from a 30 meter DEM to predict the presence and absence of 10 common forest tree species and 25 shrub species. Classification accuracies range from 87% for Pinus ponderosa , to > 97% for most other tree species. Shrub model accuracies are also high with greater than 90% accuracy for the majority of species. Species distribution models based on the physical variables that drive species occurrence, rather than their topographic surrogates, will eventually allow us to predict potential future distributions of these species with warming climate at fine spatial scales.
Forecasting malaria in a highly endemic country using environmental and clinical predictors.
Zinszer, Kate; Kigozi, Ruth; Charland, Katia; Dorsey, Grant; Brewer, Timothy F; Brownstein, John S; Kamya, Moses R; Buckeridge, David L
2015-06-18
Malaria thrives in poor tropical and subtropical countries where local resources are limited. Accurate disease forecasts can provide public and clinical health services with the information needed to implement targeted approaches for malaria control that make effective use of limited resources. The objective of this study was to determine the relevance of environmental and clinical predictors of malaria across different settings in Uganda. Forecasting models were based on health facility data collected by the Uganda Malaria Surveillance Project and satellite-derived rainfall, temperature, and vegetation estimates from 2006 to 2013. Facility-specific forecasting models of confirmed malaria were developed using multivariate autoregressive integrated moving average models and produced weekly forecast horizons over a 52-week forecasting period. The model with the most accurate forecasts varied by site and by forecast horizon. Clinical predictors were retained in the models with the highest predictive power for all facility sites. The average error over the 52 forecasting horizons ranged from 26 to 128% whereas the cumulative burden forecast error ranged from 2 to 22%. Clinical data, such as drug treatment, could be used to improve the accuracy of malaria predictions in endemic settings when coupled with environmental predictors. Further exploration of malaria forecasting is necessary to improve its accuracy and value in practice, including examining other environmental and intervention predictors, including insecticide-treated nets.
Reducing unnecessary lab testing in the ICU with artificial intelligence.
Cismondi, F; Celi, L A; Fialho, A S; Vieira, S M; Reti, S R; Sousa, J M C; Finkelstein, S N
2013-05-01
To reduce unnecessary lab testing by predicting when a proposed future lab test is likely to contribute information gain and thereby influence clinical management in patients with gastrointestinal bleeding. Recent studies have demonstrated that frequent laboratory testing does not necessarily relate to better outcomes. Data preprocessing, feature selection, and classification were performed and an artificial intelligence tool, fuzzy modeling, was used to identify lab tests that do not contribute an information gain. There were 11 input variables in total. Ten of these were derived from bedside monitor trends heart rate, oxygen saturation, respiratory rate, temperature, blood pressure, and urine collections, as well as infusion products and transfusions. The final input variable was a previous value from one of the eight lab tests being predicted: calcium, PTT, hematocrit, fibrinogen, lactate, platelets, INR and hemoglobin. The outcome for each test was a binary framework defining whether a test result contributed information gain or not. Predictive modeling was applied to recognize unnecessary lab tests in a real world ICU database extract comprising 746 patients with gastrointestinal bleeding. Classification accuracy of necessary and unnecessary lab tests of greater than 80% was achieved for all eight lab tests. Sensitivity and specificity were satisfactory for all the outcomes. An average reduction of 50% of the lab tests was obtained. This is an improvement from previously reported similar studies with average performance 37% by [1-3]. Reducing frequent lab testing and the potential clinical and financial implications are an important issue in intensive care. In this work we present an artificial intelligence method to predict the benefit of proposed future laboratory tests. Using ICU data from 746 patients with gastrointestinal bleeding, and eleven measurements, we demonstrate high accuracy in predicting the likely information to be gained from proposed future lab testing for eight common GI related lab tests. Future work will explore applications of this approach to a range of underlying medical conditions and laboratory tests. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Reducing unnecessary lab testing in the ICU with artificial intelligence
Cismondi, F.; Celi, L.A.; Fialho, A.S.; Vieira, S.M.; Reti, S.R.; Sousa, J.M.C.; Finkelstein, S.N.
2017-01-01
Objectives To reduce unnecessary lab testing by predicting when a proposed future lab test is likely to contribute information gain and thereby influence clinical management in patients with gastrointestinal bleeding. Recent studies have demonstrated that frequent laboratory testing does not necessarily relate to better outcomes. Design Data preprocessing, feature selection, and classification were performed and an artificial intelligence tool, fuzzy modeling, was used to identify lab tests that do not contribute an information gain. There were 11 input variables in total. Ten of these were derived from bedside monitor trends heart rate, oxygen saturation, respiratory rate, temperature, blood pressure, and urine collections, as well as infusion products and transfusions. The final input variable was a previous value from one of the eight lab tests being predicted: calcium, PTT, hematocrit, fibrinogen, lactate, platelets, INR and hemoglobin. The outcome for each test was a binary framework defining whether a test result contributed information gain or not. Patients Predictive modeling was applied to recognize unnecessary lab tests in a real world ICU database extract comprising 746 patients with gastrointestinal bleeding. Main results Classification accuracy of necessary and unnecessary lab tests of greater than 80% was achieved for all eight lab tests. Sensitivity and specificity were satisfactory for all the outcomes. An average reduction of 50% of the lab tests was obtained. This is an improvement from previously reported similar studies with average performance 37% by [1–3]. Conclusions Reducing frequent lab testing and the potential clinical and financial implications are an important issue in intensive care. In this work we present an artificial intelligence method to predict the benefit of proposed future laboratory tests. Using ICU data from 746 patients with gastrointestinal bleeding, and eleven measurements, we demonstrate high accuracy in predicting the likely information to be gained from proposed future lab testing for eight common GI related lab tests. Future work will explore applications of this approach to a range of underlying medical conditions and laboratory tests. PMID:23273628
An investigation of interface transferring mechanism of surface-bonded fiber Bragg grating sensors
NASA Astrophysics Data System (ADS)
Wu, Rujun; Fu, Kunkun; Chen, Tian
2017-08-01
Surface-bonded fiber Bragg grating sensor has been widely used in measuring strain in materials. The existence of fiber Bragg grating sensor affects strain distribution of the host material, which may result in a decrease in strain measurement accuracy. To improve the measurement accuracy, a theoretical model of strain transfer from the host material to optical fiber was developed, incorporating the influence of the fiber Bragg grating sensor. Subsequently, theoretical predictions were validated by comparing with data from finite element analysis and the existing experiment [F. Ansari and Y. Libo, J. Eng. Mech. 124(4), 385-394 (1998)]. Finally, the effect of parameters of fiber Bragg grating sensors on the average strain transfer rate was discussed.
Li, Wen-bing; Yao, Lin-tao; Liu, Mu-hua; Huang, Lin; Yao, Ming-yin; Chen, Tian-bing; He, Xiu-wen; Yang, Ping; Hu, Hui-qin; Nie, Jiang-hui
2015-05-01
Cu in navel orange was detected rapidly by laser-induced breakdown spectroscopy (LIBS) combined with partial least squares (PLS) for quantitative analysis, then the effect on the detection accuracy of the model with different spectral data ptetreatment methods was explored. Spectral data for the 52 Gannan navel orange samples were pretreated by different data smoothing, mean centralized and standard normal variable transform. Then 319~338 nm wavelength section containing characteristic spectral lines of Cu was selected to build PLS models, the main evaluation indexes of models such as regression coefficient (r), root mean square error of cross validation (RMSECV) and the root mean square error of prediction (RMSEP) were compared and analyzed. Three indicators of PLS model after 13 points smoothing and processing of the mean center were found reaching 0. 992 8, 3. 43 and 3. 4 respectively, the average relative error of prediction model is only 5. 55%, and in one word, the quality of calibration and prediction of this model are the best results. The results show that selecting the appropriate data pre-processing method, the prediction accuracy of PLS quantitative model of fruits and vegetables detected by LIBS can be improved effectively, providing a new method for fast and accurate detection of fruits and vegetables by LIBS.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sun, W; Jiang, M; Yin, F
Purpose: Dynamic tracking of moving organs, such as lung and liver tumors, under radiation therapy requires prediction of organ motions prior to delivery. The shift of moving organ may change a lot due to huge transform of respiration at different periods. This study aims to reduce the influence of that changes using adjustable training signals and multi-layer perceptron neural network (ASMLP). Methods: Respiratory signals obtained using a Real-time Position Management(RPM) device were used for this study. The ASMLP uses two multi-layer perceptron neural networks(MLPs) to infer respiration position alternately and the training sample will be updated with time. Firstly, amore » Savitzky-Golay finite impulse response smoothing filter was established to smooth the respiratory signal. Secondly, two same MLPs were developed to estimate respiratory position from its previous positions separately. Weights and thresholds were updated to minimize network errors according to Leverberg-Marquart optimization algorithm through backward propagation method. Finally, MLP 1 was used to predict 120∼150s respiration position using 0∼120s training signals. At the same time, MLP 2 was trained using 30∼150s training signals. Then MLP is used to predict 150∼180s training signals according to 30∼150s training signals. The respiration position is predicted as this way until it was finished. Results: In this experiment, the two methods were used to predict 2.5 minute respiratory signals. For predicting 1s ahead of response time, correlation coefficient was improved from 0.8250(MLP method) to 0.8856(ASMLP method). Besides, a 30% improvement of mean absolute error between MLP(0.1798 on average) and ASMLP(0.1267 on average) was achieved. For predicting 2s ahead of response time, correlation coefficient was improved from 0.61415 to 0.7098.Mean absolute error of MLP method(0.3111 on average) was reduced by 35% using ASMLP method(0.2020 on average). Conclusion: The preliminary results demonstrate that the ASMLP respiratory prediction method is more accurate than MLP method and can improve the respiration forecast accuracy.« less
Kårstad, S B; Kvello, O; Wichstrøm, L; Berg-Nielsen, T S
2014-05-01
Parents' ability to correctly perceive their child's skills has implications for how the child develops. In some studies, parents have shown to overestimate their child's abilities in areas such as IQ, memory and language. Emotion Comprehension (EC) is a skill central to children's emotion regulation, initially learned from their parents. In this cross-sectional study we first tested children's EC and then asked parents to estimate the child's performance. Thus, a measure of accuracy between child performance and parents' estimates was obtained. Subsequently, we obtained information on child and parent factors that might predict parents' accuracy in estimating their child's EC. Child EC and parental accuracy of estimation was tested by studying a community sample of 882 4-year-olds who completed the Test of Emotion Comprehension (TEC). The parents were instructed to guess their children's responses on the TEC. Predictors of parental accuracy of estimation were child actual performance on the TEC, child language comprehension, observed parent-child interaction, the education level of the parent, and child mental health. Ninety-one per cent of the parents overestimated their children's EC. On average, parents estimated that their 4-year-old children would display the level of EC corresponding to a 7-year-old. Accuracy of parental estimation was predicted by child high performance on the TEC, child advanced language comprehension, and more optimal parent-child interaction. Parents' ability to estimate the level of their child's EC was characterized by a substantial overestimation. The more competent the child, and the more sensitive and structuring the parent was interacting with the child, the more accurate the parent was in the estimation of their child's EC. © 2013 John Wiley & Sons Ltd.
Evaluation of approaches for estimating the accuracy of genomic prediction in plant breeding
2013-01-01
Background In genomic prediction, an important measure of accuracy is the correlation between the predicted and the true breeding values. Direct computation of this quantity for real datasets is not possible, because the true breeding value is unknown. Instead, the correlation between the predicted breeding values and the observed phenotypic values, called predictive ability, is often computed. In order to indirectly estimate predictive accuracy, this latter correlation is usually divided by an estimate of the square root of heritability. In this study we use simulation to evaluate estimates of predictive accuracy for seven methods, four (1 to 4) of which use an estimate of heritability to divide predictive ability computed by cross-validation. Between them the seven methods cover balanced and unbalanced datasets as well as correlated and uncorrelated genotypes. We propose one new indirect method (4) and two direct methods (5 and 6) for estimating predictive accuracy and compare their performances and those of four other existing approaches (three indirect (1 to 3) and one direct (7)) with simulated true predictive accuracy as the benchmark and with each other. Results The size of the estimated genetic variance and hence heritability exerted the strongest influence on the variation in the estimated predictive accuracy. Increasing the number of genotypes considerably increases the time required to compute predictive accuracy by all the seven methods, most notably for the five methods that require cross-validation (Methods 1, 2, 3, 4 and 6). A new method that we propose (Method 5) and an existing method (Method 7) used in animal breeding programs were the fastest and gave the least biased, most precise and stable estimates of predictive accuracy. Of the methods that use cross-validation Methods 4 and 6 were often the best. Conclusions The estimated genetic variance and the number of genotypes had the greatest influence on predictive accuracy. Methods 5 and 7 were the fastest and produced the least biased, the most precise, robust and stable estimates of predictive accuracy. These properties argue for routinely using Methods 5 and 7 to assess predictive accuracy in genomic selection studies. PMID:24314298
Evaluation of approaches for estimating the accuracy of genomic prediction in plant breeding.
Ould Estaghvirou, Sidi Boubacar; Ogutu, Joseph O; Schulz-Streeck, Torben; Knaak, Carsten; Ouzunova, Milena; Gordillo, Andres; Piepho, Hans-Peter
2013-12-06
In genomic prediction, an important measure of accuracy is the correlation between the predicted and the true breeding values. Direct computation of this quantity for real datasets is not possible, because the true breeding value is unknown. Instead, the correlation between the predicted breeding values and the observed phenotypic values, called predictive ability, is often computed. In order to indirectly estimate predictive accuracy, this latter correlation is usually divided by an estimate of the square root of heritability. In this study we use simulation to evaluate estimates of predictive accuracy for seven methods, four (1 to 4) of which use an estimate of heritability to divide predictive ability computed by cross-validation. Between them the seven methods cover balanced and unbalanced datasets as well as correlated and uncorrelated genotypes. We propose one new indirect method (4) and two direct methods (5 and 6) for estimating predictive accuracy and compare their performances and those of four other existing approaches (three indirect (1 to 3) and one direct (7)) with simulated true predictive accuracy as the benchmark and with each other. The size of the estimated genetic variance and hence heritability exerted the strongest influence on the variation in the estimated predictive accuracy. Increasing the number of genotypes considerably increases the time required to compute predictive accuracy by all the seven methods, most notably for the five methods that require cross-validation (Methods 1, 2, 3, 4 and 6). A new method that we propose (Method 5) and an existing method (Method 7) used in animal breeding programs were the fastest and gave the least biased, most precise and stable estimates of predictive accuracy. Of the methods that use cross-validation Methods 4 and 6 were often the best. The estimated genetic variance and the number of genotypes had the greatest influence on predictive accuracy. Methods 5 and 7 were the fastest and produced the least biased, the most precise, robust and stable estimates of predictive accuracy. These properties argue for routinely using Methods 5 and 7 to assess predictive accuracy in genomic selection studies.
Fleming, A; Schenkel, F S; Koeck, A; Malchiodi, F; Ali, R A; Corredig, M; Mallard, B; Sargolzaei, M; Miglior, F
2017-05-01
The objective of this study was to estimate the heritability of milk fat globule (MFG) size and mid-infrared (MIR) predicted MFG size in Holstein cattle. The genetic correlations between measured and predicted MFG size with milk fat and protein percentage were also investigated. Average MFG size was measured in 1,583 milk samples taken from 254 Holstein cows from 29 herds across Canada. Size was expressed as volume moment mean (D[4,3]) and surface moment mean (D[3,2]). Analyzed milk samples also had average MFG size predicted from their MIR spectral records. Fat and protein percentages were obtained for all test-day milk samples in the cow's lactation. Univariate and bivariate repeatability animal models were used to estimate heritability and genetic correlations. Moderate heritabilities of 0.364 and 0.466 were found for D[4,3] and D[3,2], respectively, and a strong genetic correlation was found between the 2 traits (0.98). The heritabilities for the MIR-predicted MFG size were lower than those estimated for the measured MFG size at 0.300 for predicted D[4,3] and 0.239 for predicted D[3,2]. The genetic correlation between measured and predicted D[4,3] was 0.685; the correlation was slightly higher between measured and predicted D[3,2] at 0.764, likely due to the better prediction accuracy of D[3,2]. Milk fat percentage had moderate genetic correlations with both D[4,3] and D[3,2] (0.538 and 0.681, respectively). The genetic correlation between predicted MFG size and fat percentage was much stronger (greater than 0.97 for both predicted D[4,3] and D[3,2]). The stronger correlation suggests a limitation for the use of the predicted values of MFG size as indicator traits for true average MFG size in milk in selection programs. Larger samples sizes are required to provide better evidence of the estimated genetic parameters. A genetic component appears to exist for the average MFG size in bovine milk, and the variation could be exploited in selection programs. Copyright © 2017 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Fazel, Seena; Singh, Jay P; Doll, Helen; Grann, Martin
2012-07-24
To investigate the predictive validity of tools commonly used to assess the risk of violence, sexual, and criminal behaviour. Systematic review and tabular meta-analysis of replication studies following PRISMA guidelines. PsycINFO, Embase, Medline, and United States Criminal Justice Reference Service Abstracts. We included replication studies from 1 January 1995 to 1 January 2011 if they provided contingency data for the offending outcome that the tools were designed to predict. We calculated the diagnostic odds ratio, sensitivity, specificity, area under the curve, positive predictive value, negative predictive value, the number needed to detain to prevent one offence, as well as a novel performance indicator-the number safely discharged. We investigated potential sources of heterogeneity using metaregression and subgroup analyses. Risk assessments were conducted on 73 samples comprising 24,847 participants from 13 countries, of whom 5879 (23.7%) offended over an average of 49.6 months. When used to predict violent offending, risk assessment tools produced low to moderate positive predictive values (median 41%, interquartile range 27-60%) and higher negative predictive values (91%, 81-95%), and a corresponding median number needed to detain of 2 (2-4) and number safely discharged of 10 (4-18). Instruments designed to predict violent offending performed better than those aimed at predicting sexual or general crime. Although risk assessment tools are widely used in clinical and criminal justice settings, their predictive accuracy varies depending on how they are used. They seem to identify low risk individuals with high levels of accuracy, but their use as sole determinants of detention, sentencing, and release is not supported by the current evidence. Further research is needed to examine their contribution to treatment and management.
Přibyl, J; Madsen, P; Bauer, J; Přibylová, J; Simečková, M; Vostrý, L; Zavadilová, L
2013-03-01
Estimated breeding values (EBV) for first-lactation milk production of Holstein cattle in the Czech Republic were calculated using a conventional animal model and by single-step prediction of the genomic enhanced breeding value. Two overlapping data sets of milk production data were evaluated: (1) calving years 1991 to 2006, with 861,429 lactations and 1,918,901 animals in the pedigree and (2) calving years 1991 to 2010, with 1,097,319 lactations and 1,906,576 animals in the pedigree. Global Interbull (Uppsala, Sweden) deregressed proofs of 114,189 bulls were used in the analyses. Reliabilities of Interbull values were equivalent to an average of 8.53 effective records, which were used in a weighted analysis. A total of 1,341 bulls were genotyped using the Illumina BovineSNP50 BeadChip V2 (Illumina Inc., San Diego, CA). Among the genotyped bulls were 332 young bulls with no daughters in the first data set but more than 50 daughters (88.41, on average) with performance records in the second data set. For young bulls, correlations of EBV and genomic enhanced breeding value before and after progeny testing, corresponding average expected reliabilities, and effective daughter contributions (EDC) were calculated. The reliability of prediction pedigree EBV of young bulls was 0.41, corresponding to EDC=10.6. Including Interbull deregressed proofs improved the reliability of prediction by EDC=13.4 and including genotyping improved prediction reliability by EDC=6.2. Total average expected reliability of prediction reached 0.67, corresponding to EDC=30.2. The combination of domestic and Interbull sources for both genotyped and nongenotyped animals is valuable for improving the accuracy of genetic prediction in small populations of dairy cattle. Copyright © 2013 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Taamneh, Madhar; Taamneh, Salah; Alkheder, Sharaf
2017-09-01
Artificial neural networks (ANNs) have been widely used in predicting the severity of road traffic crashes. All available information about previously occurred accidents is typically used for building a single prediction model (i.e., classifier). Too little attention has been paid to the differences between these accidents, leading, in most cases, to build less accurate predictors. Hierarchical clustering is a well-known clustering method that seeks to group data by creating a hierarchy of clusters. Using hierarchical clustering and ANNs, a clustering-based classification approach for predicting the injury severity of road traffic accidents was proposed. About 6000 road accidents occurred over a six-year period from 2008 to 2013 in Abu Dhabi were used throughout this study. In order to reduce the amount of variation in data, hierarchical clustering was applied on the data set to organize it into six different forms, each with different number of clusters (i.e., clusters from 1 to 6). Two ANN models were subsequently built for each cluster of accidents in each generated form. The first model was built and validated using all accidents (training set), whereas only 66% of the accidents were used to build the second model, and the remaining 34% were used to test it (percentage split). Finally, the weighted average accuracy was computed for each type of models in each from of data. The results show that when testing the models using the training set, clustering prior to classification achieves (11%-16%) more accuracy than without using clustering, while the percentage split achieves (2%-5%) more accuracy. The results also suggest that partitioning the accidents into six clusters achieves the best accuracy if both types of models are taken into account.
Intra- and Inter-Fractional Variation Prediction of Lung Tumors Using Fuzzy Deep Learning
Park, Seonyeong; Lee, Suk Jin; Weiss, Elisabeth
2016-01-01
Tumor movements should be accurately predicted to improve delivery accuracy and reduce unnecessary radiation exposure to healthy tissue during radiotherapy. The tumor movements pertaining to respiration are divided into intra-fractional variation occurring in a single treatment session and inter-fractional variation arising between different sessions. Most studies of patients’ respiration movements deal with intra-fractional variation. Previous studies on inter-fractional variation are hardly mathematized and cannot predict movements well due to inconstant variation. Moreover, the computation time of the prediction should be reduced. To overcome these limitations, we propose a new predictor for intra- and inter-fractional data variation, called intra- and inter-fraction fuzzy deep learning (IIFDL), where FDL, equipped with breathing clustering, predicts the movement accurately and decreases the computation time. Through the experimental results, we validated that the IIFDL improved root-mean-square error (RMSE) by 29.98% and prediction overshoot by 70.93%, compared with existing methods. The results also showed that the IIFDL enhanced the average RMSE and overshoot by 59.73% and 83.27%, respectively. In addition, the average computation time of IIFDL was 1.54 ms for both intra- and inter-fractional variation, which was much smaller than the existing methods. Therefore, the proposed IIFDL might achieve real-time estimation as well as better tracking techniques in radiotherapy. PMID:27170914
NASA Astrophysics Data System (ADS)
Jia, Song; Xu, Tian-he; Sun, Zhang-zhen; Li, Jia-jing
2017-02-01
UT1-UTC is an important part of the Earth Orientation Parameters (EOP). The high-precision predictions of UT1-UTC play a key role in practical applications of deep space exploration, spacecraft tracking and satellite navigation and positioning. In this paper, a new prediction method with combination of Gray Model (GM(1, 1)) and Autoregressive Integrated Moving Average (ARIMA) is developed. The main idea is as following. Firstly, the UT1-UTC data are preprocessed by removing the leap second and Earth's zonal harmonic tidal to get UT1R-TAI data. Periodic terms are estimated and removed by the least square to get UT2R-TAI. Then the linear terms of UT2R-TAI data are modeled by the GM(1, 1), and the residual terms are modeled by the ARIMA. Finally, the UT2R-TAI prediction can be performed based on the combined model of GM(1, 1) and ARIMA, and the UT1-UTC predictions are obtained by adding the corresponding periodic terms, leap second correction and the Earth's zonal harmonic tidal correction. The results show that the proposed model can be used to predict UT1-UTC effectively with higher middle and long-term (from 32 to 360 days) accuracy than those of LS + AR, LS + MAR and WLS + MAR.
Antibody modeling using the prediction of immunoglobulin structure (PIGS) web server [corrected].
Marcatili, Paolo; Olimpieri, Pier Paolo; Chailyan, Anna; Tramontano, Anna
2014-12-01
Antibodies (or immunoglobulins) are crucial for defending organisms from pathogens, but they are also key players in many medical, diagnostic and biotechnological applications. The ability to predict their structure and the specific residues involved in antigen recognition has several useful applications in all of these areas. Over the years, we have developed or collaborated in developing a strategy that enables researchers to predict the 3D structure of antibodies with a very satisfactory accuracy. The strategy is completely automated and extremely fast, requiring only a few minutes (∼10 min on average) to build a structural model of an antibody. It is based on the concept of canonical structures of antibody loops and on our understanding of the way light and heavy chains pack together.
Development of a mobile application for oral cancer screening.
Gomes, Mayra Sousa; Bonan, Paulo Rogério Ferreti; Ferreira, Vitor Yuri Nicolau; de Lucena Pereira, Laudenice; Correia, Ricardo João Cruz; da Silva Teixeira, Hélder Bruno; Pereira, Daniel Cláudio; Bonan, Paulo
2017-01-01
To develop a mobile application (app) for oral cancer screening. The app was developed using Android system version 4.4.2, with JAVA language. Information concerning sociodemographic data and risk factors for oral cancer development, e.g., tobacco and alcohol use, sun exposure and other contributing factors, such as unprotected oral sex, oral pain and denture use, were included. We surveyed a population at high risk for oral cancer development and then evaluated the sensitivity/specificity/accuracy and predictive values of clinical oral diagnosis between two blinded trained examiners, who used movies and data from the app, and in loco oral examination as gold-standard. A total of 55 individuals at high risk for oral cancer development were surveyed. Of these, 31% presented homogeneous/heterogeneous white lesions with potential of malignancy. The clinical diagnoses performed by the two examiners using videos were found to have sensitivity of 82%-100% (average 91%), specificity of 81%-100% (average 90.5%), and accuracy of 87.27%-95.54% (average 90.90%), as compared with the gold-standard. The Kappa agreement value between the gold-standard and the examiner with the best agreement was 0.597. Mobile apps including videos and data collection interfaces could be an interesting alternative in oral cancer research development.
Biddle, Daniel J; Robillard, Rébecca; Hermens, Daniel F; Hickie, Ian B; Glozier, Nicholas
2015-09-01
Validation of self-report assessment of habitual sleep duration and onset time in young people with mental ill-health. Validation sample. Specialized early intervention centers for young people in Sydney, Australia. One hundred and forty-six young people with mental ill-health. N/A. Self-reported habitual sleep duration and onset time were compared against at least 7 days of actigraphy monitoring. Average bias in and calibration of subjective measures were assessed, along with correlation of subjective and objective measures. Differences by age, sex, mental-disorder type, and reported insomnia were also explored. On average, subjective estimates of sleep were unbiased. Overall, each additional hour of objective habitual sleep duration predicted 41 minutes more subjective habitual sleep duration, and each hour later objective habitual sleep onset occurred predicted a 43-minute later subjective habitual sleep onset. There were subgroup differences: subjective habitual sleep duration in self-reported insomnia was shorter than objective duration by 30 minutes (SD = 119), on average. Calibration of habitual sleep duration was worse for those with mood disorders than with other primary diagnoses (t = -2.39, P = .018). Correlation between subjective and objective measures was strong for sleep onset time (Á = .667, P < .001) and moderate for sleep duration (r = .332, P < .001). For the mood disorder group, subjective and objective sleep durations were uncorrelated. Self-reports seem valid for large-scale studies of habitual sleep duration and onset in help-seeking young people, but assessment of habitual sleep duration requires objective measures where individual accuracy is important. Copyright © 2015 National Sleep Foundation. Published by Elsevier Inc. All rights reserved.
Hong, Huixiao; Shen, Jie; Ng, Hui Wen; Sakkiah, Sugunadevi; Ye, Hao; Ge, Weigong; Gong, Ping; Xiao, Wenming; Tong, Weida
2016-03-25
Endocrine disruptors such as polychlorinated biphenyls (PCBs), diethylstilbestrol (DES) and dichlorodiphenyltrichloroethane (DDT) are agents that interfere with the endocrine system and cause adverse health effects. Huge public health concern about endocrine disruptors has arisen. One of the mechanisms of endocrine disruption is through binding of endocrine disruptors with the hormone receptors in the target cells. Entrance of endocrine disruptors into target cells is the precondition of endocrine disruption. The binding capability of a chemical with proteins in the blood affects its entrance into the target cells and, thus, is very informative for the assessment of potential endocrine disruption of chemicals. α-fetoprotein is one of the major serum proteins that binds to a variety of chemicals such as estrogens. To better facilitate assessment of endocrine disruption of environmental chemicals, we developed a model for α-fetoprotein binding activity prediction using the novel pattern recognition method (Decision Forest) and the molecular descriptors calculated from two-dimensional structures by Mold² software. The predictive capability of the model has been evaluated through internal validation using 125 training chemicals (average balanced accuracy of 69%) and external validations using 22 chemicals (balanced accuracy of 71%). Prediction confidence analysis revealed the model performed much better at high prediction confidence. Our results indicate that the model is useful (when predictions are in high confidence) in endocrine disruption risk assessment of environmental chemicals though improvement by increasing number of training chemicals is needed.
Zhang, Li; Ai, Haixin; Chen, Wen; Yin, Zimo; Hu, Huan; Zhu, Junfeng; Zhao, Jian; Zhao, Qi; Liu, Hongsheng
2017-05-18
Carcinogenicity refers to a highly toxic end point of certain chemicals, and has become an important issue in the drug development process. In this study, three novel ensemble classification models, namely Ensemble SVM, Ensemble RF, and Ensemble XGBoost, were developed to predict carcinogenicity of chemicals using seven types of molecular fingerprints and three machine learning methods based on a dataset containing 1003 diverse compounds with rat carcinogenicity. Among these three models, Ensemble XGBoost is found to be the best, giving an average accuracy of 70.1 ± 2.9%, sensitivity of 67.0 ± 5.0%, and specificity of 73.1 ± 4.4% in five-fold cross-validation and an accuracy of 70.0%, sensitivity of 65.2%, and specificity of 76.5% in external validation. In comparison with some recent methods, the ensemble models outperform some machine learning-based approaches and yield equal accuracy and higher specificity but lower sensitivity than rule-based expert systems. It is also found that the ensemble models could be further improved if more data were available. As an application, the ensemble models are employed to discover potential carcinogens in the DrugBank database. The results indicate that the proposed models are helpful in predicting the carcinogenicity of chemicals. A web server called CarcinoPred-EL has been built for these models ( http://ccsipb.lnu.edu.cn/toxicity/CarcinoPred-EL/ ).
Rogers, Katherine H; Biesanz, Jeremy C
2015-12-01
There are strong differences between individuals in the tendency to view the personality of others as similar to the average person. That is, some people tend to form more normatively accurate impressions than do others. However, the process behind the formation of normatively accurate first impressions is not yet fully understood. Given that the average individual's personality is highly socially desirable (Borkenau & Zaltauskas, 2009; Wood, Gosling & Potter, 2007), individuals may achieve high normative accuracy by viewing others as similar to the average person or by viewing them in an overly socially desirable manner. The average self-reported personality profile and social desirability, despite being strongly correlated, independently and strongly predict first impressions. Further, some individuals have a more accurate understanding of the average individual's personality than do others. Perceivers with more accurate knowledge about the average individual's personality rated the personality of specific others more normatively accurately (more similar to the average person), suggesting that individual differences in normative judgments include a component of accurate knowledge regarding the average personality. In contrast, perceivers who explicitly evaluated others more positively formed more socially desirable impressions, but not more normatively accurate impressions. (c) 2015 APA, all rights reserved).
LRSSLMDA: Laplacian Regularized Sparse Subspace Learning for MiRNA-Disease Association prediction
Huang, Li
2017-01-01
Predicting novel microRNA (miRNA)-disease associations is clinically significant due to miRNAs’ potential roles of diagnostic biomarkers and therapeutic targets for various human diseases. Previous studies have demonstrated the viability of utilizing different types of biological data to computationally infer new disease-related miRNAs. Yet researchers face the challenge of how to effectively integrate diverse datasets and make reliable predictions. In this study, we presented a computational model named Laplacian Regularized Sparse Subspace Learning for MiRNA-Disease Association prediction (LRSSLMDA), which projected miRNAs/diseases’ statistical feature profile and graph theoretical feature profile to a common subspace. It used Laplacian regularization to preserve the local structures of the training data and a L1-norm constraint to select important miRNA/disease features for prediction. The strength of dimensionality reduction enabled the model to be easily extended to much higher dimensional datasets than those exploited in this study. Experimental results showed that LRSSLMDA outperformed ten previous models: the AUC of 0.9178 in global leave-one-out cross validation (LOOCV) and the AUC of 0.8418 in local LOOCV indicated the model’s superior prediction accuracy; and the average AUC of 0.9181+/-0.0004 in 5-fold cross validation justified its accuracy and stability. In addition, three types of case studies further demonstrated its predictive power. Potential miRNAs related to Colon Neoplasms, Lymphoma, Kidney Neoplasms, Esophageal Neoplasms and Breast Neoplasms were predicted by LRSSLMDA. Respectively, 98%, 88%, 96%, 98% and 98% out of the top 50 predictions were validated by experimental evidences. Therefore, we conclude that LRSSLMDA would be a valuable computational tool for miRNA-disease association prediction. PMID:29253885
Evaluation of wave runup predictions from numerical and parametric models
Stockdon, Hilary F.; Thompson, David M.; Plant, Nathaniel G.; Long, Joseph W.
2014-01-01
Wave runup during storms is a primary driver of coastal evolution, including shoreline and dune erosion and barrier island overwash. Runup and its components, setup and swash, can be predicted from a parameterized model that was developed by comparing runup observations to offshore wave height, wave period, and local beach slope. Because observations during extreme storms are often unavailable, a numerical model is used to simulate the storm-driven runup to compare to the parameterized model and then develop an approach to improve the accuracy of the parameterization. Numerically simulated and parameterized runup were compared to observations to evaluate model accuracies. The analysis demonstrated that setup was accurately predicted by both the parameterized model and numerical simulations. Infragravity swash heights were most accurately predicted by the parameterized model. The numerical model suffered from bias and gain errors that depended on whether a one-dimensional or two-dimensional spatial domain was used. Nonetheless, all of the predictions were significantly correlated to the observations, implying that the systematic errors can be corrected. The numerical simulations did not resolve the incident-band swash motions, as expected, and the parameterized model performed best at predicting incident-band swash heights. An assimilated prediction using a weighted average of the parameterized model and the numerical simulations resulted in a reduction in prediction error variance. Finally, the numerical simulations were extended to include storm conditions that have not been previously observed. These results indicated that the parameterized predictions of setup may need modification for extreme conditions; numerical simulations can be used to extend the validity of the parameterized predictions of infragravity swash; and numerical simulations systematically underpredict incident swash, which is relatively unimportant under extreme conditions.
CSmetaPred: a consensus method for prediction of catalytic residues.
Choudhary, Preeti; Kumar, Shailesh; Bachhawat, Anand Kumar; Pandit, Shashi Bhushan
2017-12-22
Knowledge of catalytic residues can play an essential role in elucidating mechanistic details of an enzyme. However, experimental identification of catalytic residues is a tedious and time-consuming task, which can be expedited by computational predictions. Despite significant development in active-site prediction methods, one of the remaining issues is ranked positions of putative catalytic residues among all ranked residues. In order to improve ranking of catalytic residues and their prediction accuracy, we have developed a meta-approach based method CSmetaPred. In this approach, residues are ranked based on the mean of normalized residue scores derived from four well-known catalytic residue predictors. The mean residue score of CSmetaPred is combined with predicted pocket information to improve prediction performance in meta-predictor, CSmetaPred_poc. Both meta-predictors are evaluated on two comprehensive benchmark datasets and three legacy datasets using Receiver Operating Characteristic (ROC) and Precision Recall (PR) curves. The visual and quantitative analysis of ROC and PR curves shows that meta-predictors outperform their constituent methods and CSmetaPred_poc is the best of evaluated methods. For instance, on CSAMAC dataset CSmetaPred_poc (CSmetaPred) achieves highest Mean Average Specificity (MAS), a scalar measure for ROC curve, of 0.97 (0.96). Importantly, median predicted rank of catalytic residues is the lowest (best) for CSmetaPred_poc. Considering residues ranked ≤20 classified as true positive in binary classification, CSmetaPred_poc achieves prediction accuracy of 0.94 on CSAMAC dataset. Moreover, on the same dataset CSmetaPred_poc predicts all catalytic residues within top 20 ranks for ~73% of enzymes. Furthermore, benchmarking of prediction on comparative modelled structures showed that models result in better prediction than only sequence based predictions. These analyses suggest that CSmetaPred_poc is able to rank putative catalytic residues at lower (better) ranked positions, which can facilitate and expedite their experimental characterization. The benchmarking studies showed that employing meta-approach in combining residue-level scores derived from well-known catalytic residue predictors can improve prediction accuracy as well as provide improved ranked positions of known catalytic residues. Hence, such predictions can assist experimentalist to prioritize residues for mutational studies in their efforts to characterize catalytic residues. Both meta-predictors are available as webserver at: http://14.139.227.206/csmetapred/ .
Rajgaria, R.; Wei, Y.; Floudas, C. A.
2010-01-01
An integer linear optimization model is presented to predict residue contacts in β, α + β, and α/β proteins. The total energy of a protein is expressed as sum of a Cα – Cα distance dependent contact energy contribution and a hydrophobic contribution. The model selects contacts that assign lowest energy to the protein structure while satisfying a set of constraints that are included to enforce certain physically observed topological information. A new method based on hydrophobicity is proposed to find the β-sheet alignments. These β-sheet alignments are used as constraints for contacts between residues of β-sheets. This model was tested on three independent protein test sets and CASP8 test proteins consisting of β, α + β, α/β proteins and was found to perform very well. The average accuracy of the predictions (separated by at least six residues) was approximately 61%. The average true positive and false positive distances were also calculated for each of the test sets and they are 7.58 Å and 15.88 Å, respectively. Residue contact prediction can be directly used to facilitate the protein tertiary structure prediction. This proposed residue contact prediction model is incorporated into the first principles protein tertiary structure prediction approach, ASTRO-FOLD. The effectiveness of the contact prediction model was further demonstrated by the improvement in the quality of the protein structure ensemble generated using the predicted residue contacts for a test set of 10 proteins. PMID:20225257
Correlation of Mixture Temperature Data Obtained from Bare Intake-manifold Thermocouples
NASA Technical Reports Server (NTRS)
White, H. Jack; Gammon, Goldie L
1946-01-01
A relatively simple equation has been found to express with fair accuracy, variation in manifold-charge temperature with charge in engine operating conditions. This equation and associated curves have been checked by multi cylinder-engine data, both test stand and flight, over a wide range of operating conditions. Average mixture temperatures, predicted by the equations of this report, agree reasonably well with results within the same range of carburetor-air temperatures from laboratories and test stands other than the NACA.
Lee, S. A.; Kong, C.; Adeola, O.; Kim, B. G.
2016-01-01
Estimation of feed intake (FI) for individual animals within a pen is needed in situations where more than one animal share a feeder during feeding trials. A partitioning method (PM) was previously published as a model to estimate the individual FI (IFI). Briefly, the IFI of a pig within the pen was calculated by partitioning IFI into IFI for maintenance (IFIm) and IFI for growth. In the PM, IFIm is determined based on the metabolic body weight (BW), which is calculated using the coefficient of 106 and exponent of 0.75. Two simulation studies were conducted to test the hypothesis that the use of different coefficients and exponents for metabolic BW to calculate IFIm improves the accuracy of the estimates of IFI for pigs, and that PM is applied to pigs fed in group-housing systems. The accuracy of prediction represented by difference between actual and estimated IFI was compared using PM, ratio (RM), or averaging method (AM). In simulation studies 1 and 2, the PM estimated IFI better than the AM and RM during most of the periods (p<0.05). The use of 0.60 as the exponent and the coefficient of 197 to calculate metabolic BW did not improve the accuracy of the IFI estimates in both simulation studies 1 and 2. The results imply that the use of 197 kcal×kg BW0.60 as metabolizable energy for maintenance in PM does not improve the accuracy of IFI estimations compared with the use of 106 kcal×kg BW0.75 and that the PM estimates the IFI of pigs with greater accuracy compared with the averaging or ratio methods in group-housing systems. PMID:27608642
Accuracy of Referring Provider and Endoscopist Impressions of Colonoscopy Indication.
Naveed, Mariam; Clary, Meredith; Ahn, Chul; Kubiliun, Nisa; Agrawal, Deepak; Cryer, Byron; Murphy, Caitlin; Singal, Amit G
2017-07-01
Background: Referring provider and endoscopist impressions of colonoscopy indication are used for clinical care, reimbursement, and quality reporting decisions; however, the accuracy of these impressions is unknown. This study assessed the sensitivity, specificity, positive and negative predictive value, and overall accuracy of methods to classify colonoscopy indication, including referring provider impression, endoscopist impression, and administrative algorithm compared with gold standard chart review. Methods: We randomly sampled 400 patients undergoing a colonoscopy at a Veterans Affairs health system between January 2010 and December 2010. Referring provider and endoscopist impressions of colonoscopy indication were compared with gold-standard chart review. Indications were classified into 4 mutually exclusive categories: diagnostic, surveillance, high-risk screening, or average-risk screening. Results: Of 400 colonoscopies, 26% were performed for average-risk screening, 7% for high-risk screening, 26% for surveillance, and 41% for diagnostic indications. Accuracy of referring provider and endoscopist impressions of colonoscopy indication were 87% and 84%, respectively, which were significantly higher than that of the administrative algorithm (45%; P <.001 for both). There was substantial agreement between endoscopist and referring provider impressions (κ=0.76). All 3 methods showed high sensitivity (>90%) for determining screening (vs nonscreening) indication, but specificity of the administrative algorithm was lower (40.3%) compared with referring provider (93.7%) and endoscopist (84.0%) impressions. Accuracy of endoscopist, but not referring provider, impression was lower in patients with a family history of colon cancer than in those without (65% vs 84%; P =.001). Conclusions: Referring provider and endoscopist impressions of colonoscopy indication are both accurate and may be useful data to incorporate into algorithms classifying colonoscopy indication. Copyright © 2017 by the National Comprehensive Cancer Network.
Fokkema, M; Smits, N; Zeileis, A; Hothorn, T; Kelderman, H
2017-10-25
Identification of subgroups of patients for whom treatment A is more effective than treatment B, and vice versa, is of key importance to the development of personalized medicine. Tree-based algorithms are helpful tools for the detection of such interactions, but none of the available algorithms allow for taking into account clustered or nested dataset structures, which are particularly common in psychological research. Therefore, we propose the generalized linear mixed-effects model tree (GLMM tree) algorithm, which allows for the detection of treatment-subgroup interactions, while accounting for the clustered structure of a dataset. The algorithm uses model-based recursive partitioning to detect treatment-subgroup interactions, and a GLMM to estimate the random-effects parameters. In a simulation study, GLMM trees show higher accuracy in recovering treatment-subgroup interactions, higher predictive accuracy, and lower type II error rates than linear-model-based recursive partitioning and mixed-effects regression trees. Also, GLMM trees show somewhat higher predictive accuracy than linear mixed-effects models with pre-specified interaction effects, on average. We illustrate the application of GLMM trees on an individual patient-level data meta-analysis on treatments for depression. We conclude that GLMM trees are a promising exploratory tool for the detection of treatment-subgroup interactions in clustered datasets.
Real-time Ensemble Forecasting of Coronal Mass Ejections using the WSA-ENLIL+Cone Model
NASA Astrophysics Data System (ADS)
Mays, M. L.; Taktakishvili, A.; Pulkkinen, A. A.; MacNeice, P. J.; Rastaetter, L.; Kuznetsova, M. M.; Odstrcil, D.
2013-12-01
Ensemble forecasting of coronal mass ejections (CMEs) provides significant information in that it provides an estimation of the spread or uncertainty in CME arrival time predictions due to uncertainties in determining CME input parameters. Ensemble modeling of CME propagation in the heliosphere is performed by forecasters at the Space Weather Research Center (SWRC) using the WSA-ENLIL cone model available at the Community Coordinated Modeling Center (CCMC). SWRC is an in-house research-based operations team at the CCMC which provides interplanetary space weather forecasting for NASA's robotic missions and performs real-time model validation. A distribution of n (routinely n=48) CME input parameters are generated using the CCMC Stereo CME Analysis Tool (StereoCAT) which employs geometrical triangulation techniques. These input parameters are used to perform n different simulations yielding an ensemble of solar wind parameters at various locations of interest (satellites or planets), including a probability distribution of CME shock arrival times (for hits), and geomagnetic storm strength (for Earth-directed hits). Ensemble simulations have been performed experimentally in real-time at the CCMC since January 2013. We present the results of ensemble simulations for a total of 15 CME events, 10 of which were performed in real-time. The observed CME arrival was within the range of ensemble arrival time predictions for 5 out of the 12 ensemble runs containing hits. The average arrival time prediction was computed for each of the twelve ensembles predicting hits and using the actual arrival time an average absolute error of 8.20 hours was found for all twelve ensembles, which is comparable to current forecasting errors. Some considerations for the accuracy of ensemble CME arrival time predictions include the importance of the initial distribution of CME input parameters, particularly the mean and spread. When the observed arrivals are not within the predicted range, this still allows the ruling out of prediction errors caused by tested CME input parameters. Prediction errors can also arise from ambient model parameters such as the accuracy of the solar wind background, and other limitations. Additionally the ensemble modeling setup was used to complete a parametric event case study of the sensitivity of the CME arrival time prediction to free parameters for ambient solar wind model and CME.
Alejo, Luz; Atkinson, John; Guzmán-Fierro, Víctor; Roeckel, Marlene
2018-05-16
Computational self-adapting methods (Support Vector Machines, SVM) are compared with an analytical method in effluent composition prediction of a two-stage anaerobic digestion (AD) process. Experimental data for the AD of poultry manure were used. The analytical method considers the protein as the only source of ammonia production in AD after degradation. Total ammonia nitrogen (TAN), total solids (TS), chemical oxygen demand (COD), and total volatile solids (TVS) were measured in the influent and effluent of the process. The TAN concentration in the effluent was predicted, this being the most inhibiting and polluting compound in AD. Despite the limited data available, the SVM-based model outperformed the analytical method for the TAN prediction, achieving a relative average error of 15.2% against 43% for the analytical method. Moreover, SVM showed higher prediction accuracy in comparison with Artificial Neural Networks. This result reveals the future promise of SVM for prediction in non-linear and dynamic AD processes. Graphical abstract ᅟ.
Macrocell path loss prediction using artificial intelligence techniques
NASA Astrophysics Data System (ADS)
Usman, Abraham U.; Okereke, Okpo U.; Omizegba, Elijah E.
2014-04-01
The prediction of propagation loss is a practical non-linear function approximation problem which linear regression or auto-regression models are limited in their ability to handle. However, some computational Intelligence techniques such as artificial neural networks (ANNs) and adaptive neuro-fuzzy inference systems (ANFISs) have been shown to have great ability to handle non-linear function approximation and prediction problems. In this study, the multiple layer perceptron neural network (MLP-NN), radial basis function neural network (RBF-NN) and an ANFIS network were trained using actual signal strength measurement taken at certain suburban areas of Bauchi metropolis, Nigeria. The trained networks were then used to predict propagation losses at the stated areas under differing conditions. The predictions were compared with the prediction accuracy of the popular Hata model. It was observed that ANFIS model gave a better fit in all cases having higher R2 values in each case and on average is more robust than MLP and RBF models as it generalises better to a different data.
Duan, Liwei; Zhang, Sheng; Lin, Zhaofen
2017-02-01
To explore the method and performance of using multiple indices to diagnose sepsis and to predict the prognosis of severe ill patients. Critically ill patients at first admission to intensive care unit (ICU) of Changzheng Hospital, Second Military Medical University, from January 2014 to September 2015 were enrolled if the following conditions were satisfied: (1) patients were 18-75 years old; (2) the length of ICU stay was more than 24 hours; (3) All records of the patients were available. Data of the patients was collected by searching the electronic medical record system. Logistic regression model was formulated to create the new combined predictive indicator and the receiver operating characteristic (ROC) curve for the new predictive indicator was built. The area under the ROC curve (AUC) for both the new indicator and original ones were compared. The optimal cut-off point was obtained where the Youden index reached the maximum value. Diagnostic parameters such as sensitivity, specificity and predictive accuracy were also calculated for comparison. Finally, individual values were substituted into the equation to test the performance in predicting clinical outcomes. A total of 362 patients (218 males and 144 females) were enrolled in our study and 66 patients died. The average age was (48.3±19.3) years old. (1) For the predictive model only containing categorical covariants [including procalcitonin (PCT), lipopolysaccharide (LPS), infection, white blood cells count (WBC) and fever], increased PCT, increased WBC and fever were demonstrated to be independent risk factors for sepsis in the logistic equation. The AUC for the new combined predictive indicator was higher than that of any other indictor, including PCT, LPS, infection, WBC and fever (0.930 vs. 0.661, 0.503, 0.570, 0.837, 0.800). The optimal cut-off value for the new combined predictive indicator was 0.518. Using the new indicator to diagnose sepsis, the sensitivity, specificity and diagnostic accuracy rate were 78.00%, 93.36% and 87.47%, respectively. One patient was randomly selected, and the clinical data was substituted into the probability equation for prediction. The calculated value was 0.015, which was less than the cut-off value (0.518), indicating that the prognosis was non-sepsis at an accuracy of 87.47%. (2) For the predictive model only containing continuous covariants, the logistic model which combined acute physiology and chronic health evaluation II (APACHE II) score and sequential organ failure assessment (SOFA) score to predict in-hospital death events, both APACHE II score and SOFA score were independent risk factors for death. The AUC for the new predictive indicator was higher than that of APACHE II score and SOFA score (0.834 vs. 0.812, 0.813). The optimal cut-off value for the new combined predictive indicator in predicting in-hospital death events was 0.236, and the corresponding sensitivity, specificity and diagnostic accuracy for the combined predictive indicator were 73.12%, 76.51% and 75.70%, respectively. One patient was randomly selected, and the APACHE II score and SOFA score was substituted into the probability equation for prediction. The calculated value was 0.570, which was higher than the cut-off value (0.236), indicating that the death prognosis at an accuracy of 75.70%. The combined predictive indicator, which is formulated by logistic regression models, is superior to any single indicator in predicting sepsis or in-hospital death events.
Peng, Jiang Chen; Ran, Zhi Hua; Shen, Jun
2015-09-01
Previous research has yielded conflicting data as to whether the natural history of inflammatory bowel disease follows a seasonal pattern. The purpose of this study was (1) to determine whether the frequency of onset and relapse of inflammatory bowel disease follows a seasonal pattern and (2) to establish a model to predict the frequency of onset, relapse, and severity of inflammatory bowel disease (IBD) with meteorological data based on artificial neural network (ANN). Patients with diagnosis of ulcerative colitis (UC) or Crohn's disease (CD) between 2003 and 2011 were investigated according to the occurrence of onset and flares of symptoms. The expected onset or relapse was calculated on a monthly basis over the study period. For artificial neural network (ANN), patients from 2003 to 2010 were assigned as training cohort and patients in 2011 were assigned as validation cohort. Mean square error (MSE) and mean absolute percentage error (MAPE) were used to evaluate the predictive accuracy. We found no seasonal pattern of onset (P = 0.248) and relapse (P = 0.394) among UC patients. But, the onset (P = 0.015) and relapse (P = 0.004) of CD were associated with seasonal pattern, with a peak in July and August. ANN had average accuracy to predict the frequency of onset (MSE = 0.076, MAPE = 37.58%) and severity of IBD (MSE = 0.065, MAPE = 42.15%) but high accuracy in predicting the frequency of relapse of IBD (MSE = 0.009, MAPE = 17.1%). The frequency of onset and relapse in IBD showed seasonality only in CD, with a peak in July and August, but not in UC. ANN may have its value in predicting the frequency of relapse among patients with IBD.
Predicting links based on knowledge dissemination in complex network
NASA Astrophysics Data System (ADS)
Zhou, Wen; Jia, Yifan
2017-04-01
Link prediction is the task of mining the missing links in networks or predicting the next vertex pair to be connected by a link. A lot of link prediction methods were inspired by evolutionary processes of networks. In this paper, a new mechanism for the formation of complex networks called knowledge dissemination (KD) is proposed with the assumption of knowledge disseminating through the paths of a network. Accordingly, a new link prediction method-knowledge dissemination based link prediction (KDLP)-is proposed to test KD. KDLP characterizes vertex similarity based on knowledge quantity (KQ) which measures the importance of a vertex through H-index. Extensive numerical simulations on six real-world networks demonstrate that KDLP is a strong link prediction method which performs at a higher prediction accuracy than four well-known similarity measures including common neighbors, local path index, average commute time and matrix forest index. Furthermore, based on the common conclusion that an excellent link prediction method reveals a good evolving mechanism, the experiment results suggest that KD is a considerable network evolving mechanism for the formation of complex networks.
Gene and translation initiation site prediction in metagenomic sequences
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hyatt, Philip Douglas; LoCascio, Philip F; Hauser, Loren John
2012-01-01
Gene prediction in metagenomic sequences remains a difficult problem. Current sequencing technologies do not achieve sufficient coverage to assemble the individual genomes in a typical sample; consequently, sequencing runs produce a large number of short sequences whose exact origin is unknown. Since these sequences are usually smaller than the average length of a gene, algorithms must make predictions based on very little data. We present MetaProdigal, a metagenomic version of the gene prediction program Prodigal, that can identify genes in short, anonymous coding sequences with a high degree of accuracy. The novel value of the method consists of enhanced translationmore » initiation site identification, ability to identify sequences that use alternate genetic codes and confidence values for each gene call. We compare the results of MetaProdigal with other methods and conclude with a discussion of future improvements.« less
NASA Astrophysics Data System (ADS)
Jung, Hyunuk; Kum, Oyeon; Han, Youngyih; Park, Byungdo; Cheong, Kwang-Ho
2014-12-01
For a better understanding of the accuracy of state-of-the-art-radiation therapies, 2-dimensional dosimetry in a patient-like environment will be helpful. Therefore, the dosimetry of EBT3 films in non-water-equivalent tissues was investigated, and the accuracy of commercially-used dose-calculation algorithms was evaluated with EBT3 measurement. Dose distributions were measured with EBT3 films for an in-house-designed phantom that contained a lung or a bone substitute, i.e., an air cavity (3 × 3 × 3 cm3) or teflon (2 × 2 × 2 cm3 or 3 × 3 × 3 cm3), respectively. The phantom was irradiated with 6-MV X-rays with field sizes of 2 × 2, 3 × 3, and 5 × 5 cm2. The accuracy of EBT3 dosimetry was evaluated by comparing the measured dose with the dose obtained from Monte Carlo (MC) simulations. A dose-to-bone-equivalent material was obtained by multiplying the EBT3 measurements by the stopping power ratio (SPR). The EBT3 measurements were then compared with the predictions from four algorithms: Monte Carlo (MC) in iPlan, acuros XB (AXB), analytical anisotropic algorithm (AAA) in Eclipse, and superposition-convolution (SC) in Pinnacle. For the air cavity, the EBT3 measurements agreed with the MC calculation to within 2% on average. For teflon, the EBT3 measurements differed by 9.297% (±0.9229%) on average from the Monte Carlo calculation before dose conversion, and by 0.717% (±0.6546%) after applying the SPR. The doses calculated by using the MC, AXB, AAA, and SC algorithms for the air cavity differed from the EBT3 measurements on average by 2.174, 2.863, 18.01, and 8.391%, respectively; for teflon, the average differences were 3.447, 4.113, 7.589, and 5.102%. The EBT3 measurements corrected with the SPR agreed with 2% on average both within and beyond the heterogeneities with MC results, thereby indicating that EBT3 dosimetry can be used in heterogeneous media. The MC and the AXB dose calculation algorithms exhibited clinically-acceptable accuracy (<5%) in heterogeneities.
COUSCOus: improved protein contact prediction using an empirical Bayes covariance estimator.
Rawi, Reda; Mall, Raghvendra; Kunji, Khalid; El Anbari, Mohammed; Aupetit, Michael; Ullah, Ehsan; Bensmail, Halima
2016-12-15
The post-genomic era with its wealth of sequences gave rise to a broad range of protein residue-residue contact detecting methods. Although various coevolution methods such as PSICOV, DCA and plmDCA provide correct contact predictions, they do not completely overlap. Hence, new approaches and improvements of existing methods are needed to motivate further development and progress in the field. We present a new contact detecting method, COUSCOus, by combining the best shrinkage approach, the empirical Bayes covariance estimator and GLasso. Using the original PSICOV benchmark dataset, COUSCOus achieves mean accuracies of 0.74, 0.62 and 0.55 for the top L/10 predicted long, medium and short range contacts, respectively. In addition, COUSCOus attains mean areas under the precision-recall curves of 0.25, 0.29 and 0.30 for long, medium and short contacts and outperforms PSICOV. We also observed that COUSCOus outperforms PSICOV w.r.t. Matthew's correlation coefficient criterion on full list of residue contacts. Furthermore, COUSCOus achieves on average 10% more gain in prediction accuracy compared to PSICOV on an independent test set composed of CASP11 protein targets. Finally, we showed that when using a simple random forest meta-classifier, by combining contact detecting techniques and sequence derived features, PSICOV predictions should be replaced by the more accurate COUSCOus predictions. We conclude that the consideration of superior covariance shrinkage approaches will boost several research fields that apply the GLasso procedure, amongst the presented one of residue-residue contact prediction as well as fields such as gene network reconstruction.
PreTIS: A Tool to Predict Non-canonical 5’ UTR Translational Initiation Sites in Human and Mouse
Reuter, Kerstin; Helms, Volkhard
2016-01-01
Translation of mRNA sequences into proteins typically starts at an AUG triplet. In rare cases, translation may also start at alternative non–AUG codons located in the annotated 5’ UTR which leads to an increased regulatory complexity. Since ribosome profiling detects translational start sites at the nucleotide level, the properties of these start sites can then be used for the statistical evaluation of functional open reading frames. We developed a linear regression approach to predict in–frame and out–of–frame translational start sites within the 5’ UTR from mRNA sequence information together with their translation initiation confidence. Predicted start codons comprise AUG as well as near–cognate codons. The underlying datasets are based on published translational start sites for human HEK293 and mouse embryonic stem cells that were derived by the original authors from ribosome profiling data. The average prediction accuracy of true vs. false start sites for HEK293 cells was 80%. When applied to mouse mRNA sequences, the same model predicted translation initiation sites observed in mouse ES cells with an accuracy of 76%. Moreover, we illustrate the effect of in silico mutations in the flanking sequence context of a start site on the predicted initiation confidence. Our new webservice PreTIS visualizes alternative start sites and their respective ORFs and predicts their ability to initiate translation. Solely, the mRNA sequence is required as input. PreTIS is accessible at http://service.bioinformatik.uni-saarland.de/pretis. PMID:27768687
Relevance of genetic relationship in GWAS and genomic prediction.
Pereira, Helcio Duarte; Soriano Viana, José Marcelo; Andrade, Andréa Carla Bastos; Fonseca E Silva, Fabyano; Paes, Geísa Pinheiro
2018-02-01
The objective of this study was to analyze the relevance of relationship information on the identification of low heritability quantitative trait loci (QTLs) from a genome-wide association study (GWAS) and on the genomic prediction of complex traits in human, animal and cross-pollinating populations. The simulation-based data sets included 50 samples of 1000 individuals of seven populations derived from a common population with linkage disequilibrium. The populations had non-inbred and inbred progeny structure (50 to 200) with varying number of members (5 to 20). The individuals were genotyped for 10,000 single nucleotide polymorphisms (SNPs) and phenotyped for a quantitative trait controlled by 10 QTLs and 90 minor genes showing dominance. The SNP density was 0.1 cM and the narrow sense heritability was 25%. The QTL heritabilities ranged from 1.1 to 2.9%. We applied mixed model approaches for both GWAS and genomic prediction using pedigree-based and genomic relationship matrices. For GWAS, the observed false discovery rate was kept below the significance level of 5%, the power of detection for the low heritability QTLs ranged from 14 to 50%, and the average bias between significant SNPs and a QTL ranged from less than 0.01 to 0.23 cM. The QTL detection power was consistently higher using genomic relationship matrix. Regardless of population and training set size, genomic prediction provided higher prediction accuracy of complex trait when compared to pedigree-based prediction. The accuracy of genomic prediction when there is relatedness between individuals in the training set and the reference population is much higher than the value for unrelated individuals.
An, Ji-Yong; You, Zhu-Hong; Meng, Fan-Rong; Xu, Shu-Juan; Wang, Yin
2016-05-18
Protein-Protein Interactions (PPIs) play essential roles in most cellular processes. Knowledge of PPIs is becoming increasingly more important, which has prompted the development of technologies that are capable of discovering large-scale PPIs. Although many high-throughput biological technologies have been proposed to detect PPIs, there are unavoidable shortcomings, including cost, time intensity, and inherently high false positive and false negative rates. For the sake of these reasons, in silico methods are attracting much attention due to their good performances in predicting PPIs. In this paper, we propose a novel computational method known as RVM-AB that combines the Relevance Vector Machine (RVM) model and Average Blocks (AB) to predict PPIs from protein sequences. The main improvements are the results of representing protein sequences using the AB feature representation on a Position Specific Scoring Matrix (PSSM), reducing the influence of noise using a Principal Component Analysis (PCA), and using a Relevance Vector Machine (RVM) based classifier. We performed five-fold cross-validation experiments on yeast and Helicobacter pylori datasets, and achieved very high accuracies of 92.98% and 95.58% respectively, which is significantly better than previous works. In addition, we also obtained good prediction accuracies of 88.31%, 89.46%, 91.08%, 91.55%, and 94.81% on other five independent datasets C. elegans, M. musculus, H. sapiens, H. pylori, and E. coli for cross-species prediction. To further evaluate the proposed method, we compare it with the state-of-the-art support vector machine (SVM) classifier on the yeast dataset. The experimental results demonstrate that our RVM-AB method is obviously better than the SVM-based method. The promising experimental results show the efficiency and simplicity of the proposed method, which can be an automatic decision support tool. To facilitate extensive studies for future proteomics research, we developed a freely available web server called RVMAB-PPI in Hypertext Preprocessor (PHP) for predicting PPIs. The web server including source code and the datasets are available at http://219.219.62.123:8888/ppi_ab/.
Kievit, A J; Dobbe, J G G; Streekstra, G J; Blankevoort, L; Schafroth, M U
2018-06-01
Malalignment of implants is a major source of failure during total knee arthroplasty. To achieve more accurate 3D planning and execution of the osteotomy cuts during surgery, the Signature (Biomet, Warsaw) patient-specific instrumentation (PSI) was used to produce pin guides for the positioning of the osteotomy blocks by means of computer-aided manufacture based on CT scan images. The research question of this study is: what is the transfer accuracy of osteotomy planes predicted by the Signature PSI system for preoperative 3D planning and intraoperative block-guided pin placement to perform total knee arthroplasty procedures? The transfer accuracy achieved by using the Signature PSI system was evaluated by comparing the osteotomy planes predicted preoperatively with the osteotomy planes seen intraoperatively in human cadaveric legs. Outcomes were measured in terms of translational and rotational errors (varus, valgus, flexion, extension and axial rotation) for both tibia and femur osteotomies. Average translational errors between the osteotomy planes predicted using the Signature system and the actual osteotomy planes achieved was 0.8 mm (± 0.5 mm) for the tibia and 0.7 mm (± 4.0 mm) for the femur. Average rotational errors in relation to predicted and achieved osteotomy planes were 0.1° (± 1.2°) of varus and 0.4° (± 1.7°) of anterior slope (extension) for the tibia, and 2.8° (± 2.0°) of varus and 0.9° (± 2.7°) of flexion and 1.4° (± 2.2°) of external rotation for the femur. The similarity between osteotomy planes predicted using the Signature system and osteotomy planes actually achieved was excellent for the tibia although some discrepancies were seen for the femur. The use of 3D system techniques in TKA surgery can provide accurate intraoperative guidance, especially for patients with deformed bone, tailored to individual patients and ensure better placement of the implant.
NASA Astrophysics Data System (ADS)
Li, Xiongwei; Wang, Zhe; Lui, Siu-Lung; Fu, Yangting; Li, Zheng; Liu, Jianming; Ni, Weidou
2013-10-01
A bottleneck of the wide commercial application of laser-induced breakdown spectroscopy (LIBS) technology is its relatively high measurement uncertainty. A partial least squares (PLS) based normalization method was proposed to improve pulse-to-pulse measurement precision for LIBS based on our previous spectrum standardization method. The proposed model utilized multi-line spectral information of the measured element and characterized the signal fluctuations due to the variation of plasma characteristic parameters (plasma temperature, electron number density, and total number density) for signal uncertainty reduction. The model was validated by the application of copper concentration prediction in 29 brass alloy samples. The results demonstrated an improvement on both measurement precision and accuracy over the generally applied normalization as well as our previously proposed simplified spectrum standardization method. The average relative standard deviation (RSD), average of the standard error (error bar), the coefficient of determination (R2), the root-mean-square error of prediction (RMSEP), and average value of the maximum relative error (MRE) were 1.80%, 0.23%, 0.992, 1.30%, and 5.23%, respectively, while those for the generally applied spectral area normalization were 3.72%, 0.71%, 0.973, 1.98%, and 14.92%, respectively.
Rochefort, Christian M; Verma, Aman D; Eguale, Tewodros; Lee, Todd C; Buckeridge, David L
2015-01-01
Background Venous thromboembolisms (VTEs), which include deep vein thrombosis (DVT) and pulmonary embolism (PE), are associated with significant mortality, morbidity, and cost in hospitalized patients. To evaluate the success of preventive measures, accurate and efficient methods for monitoring VTE rates are needed. Therefore, we sought to determine the accuracy of statistical natural language processing (NLP) for identifying DVT and PE from electronic health record data. Methods We randomly sampled 2000 narrative radiology reports from patients with a suspected DVT/PE in Montreal (Canada) between 2008 and 2012. We manually identified DVT/PE within each report, which served as our reference standard. Using a bag-of-words approach, we trained 10 alternative support vector machine (SVM) models predicting DVT, and 10 predicting PE. SVM training and testing was performed with nested 10-fold cross-validation, and the average accuracy of each model was measured and compared. Results On manual review, 324 (16.2%) reports were DVT-positive and 154 (7.7%) were PE-positive. The best DVT model achieved an average sensitivity of 0.80 (95% CI 0.76 to 0.85), specificity of 0.98 (98% CI 0.97 to 0.99), positive predictive value (PPV) of 0.89 (95% CI 0.85 to 0.93), and an area under the curve (AUC) of 0.98 (95% CI 0.97 to 0.99). The best PE model achieved sensitivity of 0.79 (95% CI 0.73 to 0.85), specificity of 0.99 (95% CI 0.98 to 0.99), PPV of 0.84 (95% CI 0.75 to 0.92), and AUC of 0.99 (95% CI 0.98 to 1.00). Conclusions Statistical NLP can accurately identify VTE from narrative radiology reports. PMID:25332356
Nichols, Jennifer A; Roach, Koren E; Fiorentino, Niccolo M; Anderson, Andrew E
2016-09-01
Evidence suggests that the tibiotalar and subtalar joints provide near six degree-of-freedom (DOF) motion. Yet, kinematic models frequently assume one DOF at each of these joints. In this study, we quantified the accuracy of kinematic models to predict joint angles at the tibiotalar and subtalar joints from skin-marker data. Models included 1 or 3 DOF at each joint. Ten asymptomatic subjects, screened for deformities, performed 1.0m/s treadmill walking and a balanced, single-leg heel-rise. Tibiotalar and subtalar joint angles calculated by inverse kinematics for the 1 and 3 DOF models were compared to those measured directly in vivo using dual-fluoroscopy. Results demonstrated that, for each activity, the average error in tibiotalar joint angles predicted by the 1 DOF model were significantly smaller than those predicted by the 3 DOF model for inversion/eversion and internal/external rotation. In contrast, neither model consistently demonstrated smaller errors when predicting subtalar joint angles. Additionally, neither model could accurately predict discrete angles for the tibiotalar and subtalar joints on a per-subject basis. Differences between model predictions and dual-fluoroscopy measurements were highly variable across subjects, with joint angle errors in at least one rotation direction surpassing 10° for 9 out of 10 subjects. Our results suggest that both the 1 and 3 DOF models can predict trends in tibiotalar joint angles on a limited basis. However, as currently implemented, neither model can predict discrete tibiotalar or subtalar joint angles for individual subjects. Inclusion of subject-specific attributes may improve the accuracy of these models. Copyright © 2016 Elsevier B.V. All rights reserved.
Snorkel: Rapid Training Data Creation with Weak Supervision
Ratner, Alexander; Bach, Stephen H.; Ehrenberg, Henry; Fries, Jason; Wu, Sen; Ré, Christopher
2018-01-01
Labeling training data is increasingly the largest bottleneck in deploying machine learning systems. We present Snorkel, a first-of-its-kind system that enables users to train state-of- the-art models without hand labeling any training data. Instead, users write labeling functions that express arbitrary heuristics, which can have unknown accuracies and correlations. Snorkel denoises their outputs without access to ground truth by incorporating the first end-to-end implementation of our recently proposed machine learning paradigm, data programming. We present a flexible interface layer for writing labeling functions based on our experience over the past year collaborating with companies, agencies, and research labs. In a user study, subject matter experts build models 2.8× faster and increase predictive performance an average 45.5% versus seven hours of hand labeling. We study the modeling tradeoffs in this new setting and propose an optimizer for automating tradeoff decisions that gives up to 1.8× speedup per pipeline execution. In two collaborations, with the U.S. Department of Veterans Affairs and the U.S. Food and Drug Administration, and on four open-source text and image data sets representative of other deployments, Snorkel provides 132% average improvements to predictive performance over prior heuristic approaches and comes within an average 3.60% of the predictive performance of large hand-curated training sets. PMID:29770249
In-use activity, fuel use, and emissions of heavy-duty diesel roll-off refuse trucks.
Sandhu, Gurdas S; Frey, H Christopher; Bartelt-Hunt, Shannon; Jones, Elizabeth
2015-03-01
The objectives of this study were to quantify real-world activity, fuel use, and emissions for heavy duty diesel roll-off refuse trucks; evaluate the contribution of duty cycles and emissions controls to variability in cycle average fuel use and emission rates; quantify the effect of vehicle weight on fuel use and emission rates; and compare empirical cycle average emission rates with the U.S. Environmental Protection Agency's MOVES emission factor model predictions. Measurements were made at 1 Hz on six trucks of model years 2005 to 2012, using onboard systems. The trucks traveled 870 miles, had an average speed of 16 mph, and collected 165 tons of trash. The average fuel economy was 4.4 mpg, which is approximately twice previously reported values for residential trash collection trucks. On average, 50% of time is spent idling and about 58% of emissions occur in urban areas. Newer trucks with selective catalytic reduction and diesel particulate filter had NOx and PM cycle average emission rates that were 80% lower and 95% lower, respectively, compared to older trucks without. On average, the combined can and trash weight was about 55% of chassis weight. The marginal effect of vehicle weight on fuel use and emissions is highest at low loads and decreases as load increases. Among 36 cycle average rates (6 trucks×6 cycles), MOVES-predicted values and estimates based on real-world data have similar relative trends. MOVES-predicted CO2 emissions are similar to those of the real world, while NOx and PM emissions are, on average, 43% lower and 300% higher, respectively. The real-world data presented here can be used to estimate benefits of replacing old trucks with new trucks. Further, the data can be used to improve emission inventories and model predictions. In-use measurements of the real-world activity, fuel use, and emissions of heavy-duty diesel roll-off refuse trucks can be used to improve the accuracy of predictive models, such as MOVES, and emissions inventories. Further, the activity data from this study can be used to generate more representative duty cycles for more accurate chassis dynamometer testing. Comparisons of old and new model year diesel trucks are useful in analyzing the effect of fleet turnover. The analysis of effect of haul weight on fuel use can be used by fleet managers to optimize operations to reduce fuel cost.
Effect of genotyped cows in the reference population on the genomic evaluation of Holstein cattle.
Uemoto, Y; Osawa, T; Saburi, J
2017-03-01
This study evaluated the dependence of reliability and prediction bias on the prediction method, the contribution of including animals (bulls or cows), and the genetic relatedness, when including genotyped cows in the progeny-tested bull reference population. We performed genomic evaluation using a Japanese Holstein population, and assessed the accuracy of genomic enhanced breeding value (GEBV) for three production traits and 13 linear conformation traits. A total of 4564 animals for production traits and 4172 animals for conformation traits were genotyped using Illumina BovineSNP50 array. Single- and multi-step methods were compared for predicting GEBV in genotyped bull-only and genotyped bull-cow reference populations. No large differences in realized reliability and regression coefficient were found between the two reference populations; however, a slight difference was found between the two methods for production traits. The accuracy of GEBV determined by single-step method increased slightly when genotyped cows were included in the bull reference population, but decreased slightly by multi-step method. A validation study was used to evaluate the accuracy of GEBV when 800 additional genotyped bulls (POPbull) or cows (POPcow) were included in the base reference population composed of 2000 genotyped bulls. The realized reliabilities of POPbull were higher than those of POPcow for all traits. For the gain of realized reliability over the base reference population, the average ratios of POPbull gain to POPcow gain for production traits and conformation traits were 2.6 and 7.2, respectively, and the ratios depended on heritabilities of the traits. For regression coefficient, no large differences were found between the results for POPbull and POPcow. Another validation study was performed to investigate the effect of genetic relatedness between cows and bulls in the reference and test populations. The effect of genetic relationship among bulls in the reference population was also assessed. The results showed that it is important to account for relatedness among bulls in the reference population. Our studies indicate that the prediction method, the contribution ratio of including animals, and genetic relatedness could affect the prediction accuracy in genomic evaluation of Holstein cattle, when including genotyped cows in the reference population.
NASA Astrophysics Data System (ADS)
Ryzhenkov, V.; Ivashchenko, V.; Vinuesa, R.; Mullyadzhanov, R.
2016-10-01
We use the open-source code nek5000 to assess the accuracy of high-order spectral element large-eddy simulations (LES) of a turbulent channel flow depending on the spatial resolution compared to the direct numerical simulation (DNS). The Reynolds number Re = 6800 is considered based on the bulk velocity and half-width of the channel. The filtered governing equations are closed with the dynamic Smagorinsky model for subgrid stresses and heat flux. The results show very good agreement between LES and DNS for time-averaged velocity and temperature profiles and their fluctuations. Even the coarse LES grid which contains around 30 times less points than the DNS one provided predictions of the friction velocity within 2.0% accuracy interval.
Evaluation of anthropogenic influence in probabilistic forecasting of coastal change
NASA Astrophysics Data System (ADS)
Hapke, C. J.; Wilson, K.; Adams, P. N.
2014-12-01
Prediction of large scale coastal behavior is especially challenging in areas of pervasive human activity. Many coastal zones on the Gulf and Atlantic coasts are moderately to highly modified through the use of soft sediment and hard stabilization techniques. These practices have the potential to alter sediment transport and availability, as well as reshape the beach profile, ultimately transforming the natural evolution of the coastal system. We present the results of a series of probabilistic models, designed to predict the observed geomorphic response to high wave events at Fire Island, New York. The island comprises a variety of land use types, including inhabited communities with modified beaches, where beach nourishment and artificial dune construction (scraping) occur, unmodified zones, and protected national seashore. This variation in land use presents an opportunity for comparison of model accuracy across highly modified and rarely modified stretches of coastline. Eight models with basic and expanded structures were developed, resulting in sixteen models, informed with observational data from Fire Island. The basic model type does not include anthropogenic modification. The expanded model includes records of nourishment and scraping, designed to quantify the improved accuracy when anthropogenic activity is represented. Modification was included as frequency of occurrence divided by the time since the most recent event, to distinguish between recent and historic events. All but one model reported improved predictive accuracy from the basic to expanded form. The addition of nourishment and scraping parameters resulted in a maximum reduction in predictive error of 36%. The seven improved models reported an average 23% reduction in error. These results indicate that it is advantageous to incorporate the human forcing into a coastal hazards probability model framework.
KANAZAWA, Tomomi; SEKI, Motohide; ISHIYAMA, Keiki; ARASEKI, Masao; IZAIKE, Yoshiaki; TAKAHASHI, Toru
2017-01-01
This study assessed the effects of gonadotropin-releasing hormone (GnRH) treatment on Day 5 (Day 0 = estrus) on luteal blood flow and accuracy of pregnancy prediction in recipient cows. On Day 5, 120 lactating Holstein cows were randomly assigned to a control group (n = 63) or GnRH group treated with 100 μg of GnRH agonist (n = 57). On Days 3, 5, 7, and 14, each cow underwent ultrasound examination to measure the blood flow area (BFA) and time-averaged maximum velocity (TAMV) at the spiral arteries at the base of the corpus luteum using color Doppler ultrasonography. Cows with a corpus luteum diameter ≥ 20 mm (n = 120) received embryo transfers on Day 7. The BFA values in the GnRH group were significantly higher than those in the control group on Days 7 and 14. TAMV did not differ between these groups. According to receiver operating characteristic analyses to predict pregnancy, a BFA cutoff of 0.52 cm2 yielded the highest sensitivity (83.3%) and specificity (90.5%) on Day 7, and BFA and TAMV values of 0.94 cm2 and 44.93 cm/s, respectively, yielded the highest sensitivity (97.1%) and specificity (100%) on Day 14 in the GnRH group. The areas under the curve for the paired BFA and TAMV in the GnRH group were 0.058 higher than those in the control group (0.996 and 0.938, respectively; P < 0.05). In conclusion, GnRH treatment on Day 5 increased the luteal BFA in recipient cows on Days 7 and 14, and improved the accuracy of pregnancy prediction on Day 14. PMID:28552886
EKG-based detection of deep brain stimulation in fMRI studies.
Fiveland, Eric; Madhavan, Radhika; Prusik, Julia; Linton, Renee; Dimarzio, Marisa; Ashe, Jeffrey; Pilitsis, Julie; Hancu, Ileana
2018-04-01
To assess the impact of synchronization errors between the assumed functional MRI paradigm timing and the deep brain stimulation (DBS) on/off cycling using a custom electrocardiogram-based triggering system METHODS: A detector for measuring and predicting the on/off state of cycling deep brain stimulation was developed and tested in six patients in office visits. Three-electrode electrocardiogram measurements, amplified by a commercial bio-amplifier, were used as input for a custom electronics box (e-box). The e-box transformed the deep brain stimulation waveforms into transistor-transistor logic pulses, recorded their timing, and propagated it in time. The e-box was used to trigger task-based deep brain stimulation functional MRI scans in 5 additional subjects; the impact of timing accuracy on t-test values was investigated in a simulation study using the functional MRI data. Following locking to each patient's individual waveform, the e-box was shown to predict stimulation onset with an average absolute error of 112 ± 148 ms, 30 min after disconnecting from the patients. The subsecond accuracy of the e-box in predicting timing onset is more than adequate for our slow varying, 30-/30-s on/off stimulation paradigm. Conversely, the experimental deep brain stimulation onset prediction accuracy in the absence of the e-box, which could be off by as much as 4 to 6 s, could significantly decrease activation strength. Using this detector, stimulation can be accurately synchronized to functional MRI acquisitions, without adding any additional hardware in the MRI environment. Magn Reson Med 79:2432-2439, 2018. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.
Regional Seismic Travel-Time Prediction, Uncertainty, and Location Improvement in Western Eurasia
NASA Astrophysics Data System (ADS)
Flanagan, M. P.; Myers, S. C.
2004-12-01
We investigate our ability to improve regional travel-time prediction and seismic event location using an a priori, three-dimensional velocity model of Western Eurasia and North Africa: WENA1.0 [Pasyanos et al., 2004]. Our objective is to improve the accuracy of seismic location estimates and calculate representative location uncertainty estimates. As we focus on the geographic region of Western Eurasia, the Middle East, and North Africa, we develop, test, and validate 3D model-based travel-time prediction models for 30 stations in the study region. Three principal results are presented. First, the 3D WENA1.0 velocity model improves travel-time prediction over the iasp91 model, as measured by variance reduction, for regional Pg, Pn, and P phases recorded at the 30 stations. Second, a distance-dependent uncertainty model is developed and tested for the WENA1.0 model. Third, an end-to-end validation test based on 500 event relocations demonstrates improved location performance over the 1-dimensional iasp91 model. Validation of the 3D model is based on a comparison of approximately 11,000 Pg, Pn, and P travel-time predictions and empirical observations from ground truth (GT) events. Ray coverage for the validation dataset is chosen to provide representative, regional-distance sampling across Eurasia and North Africa. The WENA1.0 model markedly improves travel-time predictions for most stations with an average variance reduction of 25% for all ray paths. We find that improvement is station dependent, with some stations benefiting greatly from WENA1.0 predictions (52% at APA, 33% at BKR, and 32% at NIL), some stations showing moderate improvement (12% at KEV, 14% at BOM, and 12% at TAM), some benefiting only slightly (6% at MOX, and 4% at SVE), and some are degraded (-6% at MLR and -18% at QUE). We further test WENA1.0 by comparing location accuracy with results obtained using the iasp91 model. Again, relocation of these events is dependent on ray paths that evenly sample WENA1.0 and therefore provide an unbiased assessment of location performance. A statistically significant sample is achieved by generating 500 location realizations based on 5 events with location accuracy between 1 km and 5 km. Each realization is a randomly selected event with location determined by randomly selecting 5 stations from the available network. In 340 cases (68% of the instances), locations are improved, and average mislocation is reduced from 31 km to 26 km. Preliminary test of uncertainty estimates suggest that our uncertainty model produces location uncertainty ellipses that are representative of location accuracy. These results highlight the importance of accurate GT datasets in assessing regional travel-time models and demonstrate that an a priori 3D model can markedly improve our ability to locate small magnitude events in a regional monitoring context. This work was performed under the auspices of the U.S. Department of Energy by the University of California Lawrence Livermore National Laboratory under contract No. W-7405-Eng-48, Contribution UCRL-CONF-206386.
The Satellite Clock Bias Prediction Method Based on Takagi-Sugeno Fuzzy Neural Network
NASA Astrophysics Data System (ADS)
Cai, C. L.; Yu, H. G.; Wei, Z. C.; Pan, J. D.
2017-05-01
The continuous improvement of the prediction accuracy of Satellite Clock Bias (SCB) is the key problem of precision navigation. In order to improve the precision of SCB prediction and better reflect the change characteristics of SCB, this paper proposes an SCB prediction method based on the Takagi-Sugeno fuzzy neural network. Firstly, the SCB values are pre-treated based on their characteristics. Then, an accurate Takagi-Sugeno fuzzy neural network model is established based on the preprocessed data to predict SCB. This paper uses the precise SCB data with different sampling intervals provided by IGS (International Global Navigation Satellite System Service) to realize the short-time prediction experiment, and the results are compared with the ARIMA (Auto-Regressive Integrated Moving Average) model, GM(1,1) model, and the quadratic polynomial model. The results show that the Takagi-Sugeno fuzzy neural network model is feasible and effective for the SCB short-time prediction experiment, and performs well for different types of clocks. The prediction results for the proposed method are better than the conventional methods obviously.
NASA Astrophysics Data System (ADS)
Liang, Zhang; Yanqing, Hou; Jie, Wu
2016-12-01
The multi-antenna synchronized receiver (using a common clock) is widely applied in GNSS-based attitude determination (AD) or terrain deformations monitoring, and many other applications, since the high-accuracy single-differenced carrier phase can be used to improve the positioning or AD accuracy. Thus, the line bias (LB) parameter (fractional bias isolating) should be calibrated in the single-differenced phase equations. In the past decades, all researchers estimated the LB as a constant parameter in advance and compensated it in real time. However, the constant LB assumption is inappropriate in practical applications because of the physical length and permittivity changes of the cables, caused by the environmental temperature variation and the instability of receiver-self inner circuit transmitting delay. Considering the LB drift (or colored LB) in practical circumstances, this paper initiates a real-time estimator using auto regressive moving average-based (ARMA) prediction/whitening filter model or Moving average-based (MA) constant calibration model. In the ARMA-based filter model, four cases namely AR(1), ARMA(1, 1), AR(2) and ARMA(2, 1) are applied for the LB prediction. The real-time relative positioning model using the ARMA-based predicting LB is derived and it is theoretically proved that the positioning accuracy is better than the traditional double difference carrier phase (DDCP) model. The drifting LB is defined with a phase temperature changing rate integral function, which is a random walk process if the phase temperature changing rate is white noise, and is validated by the analysis of the AR model coefficient. The auto covariance function shows that the LB is indeed varying in time and estimating it as a constant is not safe, which is also demonstrated by the analysis on LB variation of each visible satellite during a zero and short baseline BDS/GPS experiment. Compared to the DDCP approach, in the zero-baseline experiment, the LB constant calibration (LBCC) and MA approaches improved the positioning accuracy of the vertical component, while slightly degrading the accuracy of the horizontal components. The ARMA(1, 0) model, however, improved the positioning accuracy of all three components, with 40 and 50 % improvement of the vertical component for BDS and GPS, respectively. In the short baseline experiment, compared to the DDCP approach, the LBCC approach yielded bad positioning solutions and degraded the AD accuracy; both MA and ARMA-based filter approaches improved the AD accuracy. Moreover, the ARMA(1, 0) and ARMA(1, 1) models have relatively better performance, improving to 55 % and 48 % the elevation angle in ARMA(1, 1) and MA model for GPS, respectively. Furthermore, the drifting LB variation is found to be continuous and slowly cumulative; the variation magnitudes in the unit of length are almost identical on different frequency carrier phases, so the LB variation does not show obvious correlation between different frequencies. Consequently, the wide-lane LB in the unit of cycle is very stable, while the narrow-lane LB varies largely in time. This reasoning probably also explains the phenomenon that the wide-lane LB originating in the satellites is stable, while the narrow-lane LB varies. The results of ARMA-based filters are better than the MA model, which probably implies that the modeling for drifting LB can further improve the precise point positioning accuracy.
Loading Intensity Prediction by Velocity and the OMNI-RES 0-10 Scale in Bench Press.
Naclerio, Fernando; Larumbe-Zabala, Eneko
2017-02-01
Naclerio, F and Larumbe-Zabala, E. Loading intensity prediction by velocity and the OMNI-RES 0-10 scale in bench press. J Strength Cond Res 32(1): 323-329, 2017-This study examined the possibility of using movement velocity and the perceived exertion as indicators of relative load in the bench press (BP) exercise. A total of 308 young, healthy, resistance trained athletes (242 men and 66 women) performed a progressive strength test up to the one repetition maximum for the individual determination of the full load-velocity and load-exertion relationships. Longitudinal regression models were used to predict the relative load from the average velocity (AV) and the OMNI-Resistance Exercise Scales (OMNI-RES 0-10 scale), considering sets as the time-related variable. Load associated with the AV and the OMNI-RES 0-10 scale value expressed after performing a set of 1-3 repetitions were used to construct 2 adjusted predictive equations: Relative load = 107.75 - 62.97 × average velocity; and Relative load = 29.03 + 7.26 × OMNI-RES 0-10 scale value. The 2 models were capable of estimating the relative load with an accuracy of 84 and 93%, respectively. These findings confirm the ability of the 2 calculated regression models, using load-velocity and load-exertion from the OMNI-RES 0-10 scale, to accurately predict strength performance in BP.
NASA Astrophysics Data System (ADS)
Rahmat, R. F.; Nasution, F. R.; Seniman; Syahputra, M. F.; Sitompul, O. S.
2018-02-01
Weather is condition of air in a certain region at a relatively short period of time, measured with various parameters such as; temperature, air preasure, wind velocity, humidity and another phenomenons in the atmosphere. In fact, extreme weather due to global warming would lead to drought, flood, hurricane and other forms of weather occasion, which directly affects social andeconomic activities. Hence, a forecasting technique is to predict weather with distinctive output, particullary mapping process based on GIS with information about current weather status in certain cordinates of each region with capability to forecast for seven days afterward. Data used in this research are retrieved in real time from the server openweathermap and BMKG. In order to obtain a low error rate and high accuracy of forecasting, the authors use Bayesian Model Averaging (BMA) method. The result shows that the BMA method has good accuracy. Forecasting error value is calculated by mean square error shows (MSE). The error value emerges at minumum temperature rated at 0.28 and maximum temperature rated at 0.15. Meanwhile, the error value of minimum humidity rates at 0.38 and the error value of maximum humidity rates at 0.04. Afterall, the forecasting error rate of wind speed is at 0.076. The lower the forecasting error rate, the more optimized the accuracy is.
A hidden markov model derived structural alphabet for proteins.
Camproux, A C; Gautier, R; Tufféry, P
2004-06-04
Understanding and predicting protein structures depends on the complexity and the accuracy of the models used to represent them. We have set up a hidden Markov model that discretizes protein backbone conformation as series of overlapping fragments (states) of four residues length. This approach learns simultaneously the geometry of the states and their connections. We obtain, using a statistical criterion, an optimal systematic decomposition of the conformational variability of the protein peptidic chain in 27 states with strong connection logic. This result is stable over different protein sets. Our model fits well the previous knowledge related to protein architecture organisation and seems able to grab some subtle details of protein organisation, such as helix sub-level organisation schemes. Taking into account the dependence between the states results in a description of local protein structure of low complexity. On an average, the model makes use of only 8.3 states among 27 to describe each position of a protein structure. Although we use short fragments, the learning process on entire protein conformations captures the logic of the assembly on a larger scale. Using such a model, the structure of proteins can be reconstructed with an average accuracy close to 1.1A root-mean-square deviation and for a complexity of only 3. Finally, we also observe that sequence specificity increases with the number of states of the structural alphabet. Such models can constitute a very relevant approach to the analysis of protein architecture in particular for protein structure prediction.
NASA Astrophysics Data System (ADS)
Acosta, Oscar; Dowling, Jason; Cazoulat, Guillaume; Simon, Antoine; Salvado, Olivier; de Crevoisier, Renaud; Haigron, Pascal
The prediction of toxicity is crucial to managing prostate cancer radiotherapy (RT). This prediction is classically organ wise and based on the dose volume histograms (DVH) computed during the planning step, and using for example the mathematical Lyman Normal Tissue Complication Probability (NTCP) model. However, these models lack spatial accuracy, do not take into account deformations and may be inappropiate to explain toxicity events related with the distribution of the delivered dose. Producing voxel wise statistical models of toxicity might help to explain the risks linked to the dose spatial distribution but is challenging due to the difficulties lying on the mapping of organs and dose in a common template. In this paper we investigate the use of atlas based methods to perform the non-rigid mapping and segmentation of the individuals' organs at risk (OAR) from CT scans. To build a labeled atlas, 19 CT scans were selected from a population of patients treated for prostate cancer by radiotherapy. The prostate and the OAR (Rectum, Bladder, Bones) were then manually delineated by an expert and constituted the training data. After a number of affine and non rigid registration iterations, an average image (template) representing the whole population was obtained. The amount of consensus between labels was used to generate probabilistic maps for each organ. We validated the accuracy of the approach by segmenting the organs using the training data in a leave one out scheme. The agreement between the volumes after deformable registration and the manually segmented organs was on average above 60% for the organs at risk. The proposed methodology provides a way to map the organs from a whole population on a single template and sets the stage to perform further voxel wise analysis. With this method new and accurate predictive models of toxicity will be built.
A genetic programming approach to oral cancer prognosis
Tan, Mei Sze; Tan, Jing Wei; Yap, Hwa Jen; Abdul Kareem, Sameem; Zain, Rosnah Binti
2016-01-01
Background The potential of genetic programming (GP) on various fields has been attained in recent years. In bio-medical field, many researches in GP are focused on the recognition of cancerous cells and also on gene expression profiling data. In this research, the aim is to study the performance of GP on the survival prediction of a small sample size of oral cancer prognosis dataset, which is the first study in the field of oral cancer prognosis. Method GP is applied on an oral cancer dataset that contains 31 cases collected from the Malaysia Oral Cancer Database and Tissue Bank System (MOCDTBS). The feature subsets that is automatically selected through GP were noted and the influences of this subset on the results of GP were recorded. In addition, a comparison between the GP performance and that of the Support Vector Machine (SVM) and logistic regression (LR) are also done in order to verify the predictive capabilities of the GP. Result The result shows that GP performed the best (average accuracy of 83.87% and average AUROC of 0.8341) when the features selected are smoking, drinking, chewing, histological differentiation of SCC, and oncogene p63. In addition, based on the comparison results, we found that the GP outperformed the SVM and LR in oral cancer prognosis. Discussion Some of the features in the dataset are found to be statistically co-related. This is because the accuracy of the GP prediction drops when one of the feature in the best feature subset is excluded. Thus, GP provides an automatic feature selection function, which chooses features that are highly correlated to the prognosis of oral cancer. This makes GP an ideal prediction model for cancer clinical and genomic data that can be used to aid physicians in their decision making stage of diagnosis or prognosis. PMID:27688975
Impact of mutations on the allosteric conformational equilibrium
Weinkam, Patrick; Chen, Yao Chi; Pons, Jaume; Sali, Andrej
2012-01-01
Allostery in a protein involves effector binding at an allosteric site that changes the structure and/or dynamics at a distant, functional site. In addition to the chemical equilibrium of ligand binding, allostery involves a conformational equilibrium between one protein substate that binds the effector and a second substate that less strongly binds the effector. We run molecular dynamics simulations using simple, smooth energy landscapes to sample specific ligand-induced conformational transitions, as defined by the effector-bound and unbound protein structures. These simulations can be performed using our web server: http://salilab.org/allosmod/. We then develop a set of features to analyze the simulations and capture the relevant thermodynamic properties of the allosteric conformational equilibrium. These features are based on molecular mechanics energy functions, stereochemical effects, and structural/dynamic coupling between sites. Using a machine-learning algorithm on a dataset of 10 proteins and 179 mutations, we predict both the magnitude and sign of the allosteric conformational equilibrium shift by the mutation; the impact of a large identifiable fraction of the mutations can be predicted with an average unsigned error of 1 kBT. With similar accuracy, we predict the mutation effects for an 11th protein that was omitted from the initial training and testing of the machine-learning algorithm. We also assess which calculated thermodynamic properties contribute most to the accuracy of the prediction. PMID:23228330
NASA Astrophysics Data System (ADS)
Ye, Liming; Yang, Guixia; Van Ranst, Eric; Tang, Huajun
2013-03-01
A generalized, structural, time series modeling framework was developed to analyze the monthly records of absolute surface temperature, one of the most important environmental parameters, using a deterministicstochastic combined (DSC) approach. Although the development of the framework was based on the characterization of the variation patterns of a global dataset, the methodology could be applied to any monthly absolute temperature record. Deterministic processes were used to characterize the variation patterns of the global trend and the cyclic oscillations of the temperature signal, involving polynomial functions and the Fourier method, respectively, while stochastic processes were employed to account for any remaining patterns in the temperature signal, involving seasonal autoregressive integrated moving average (SARIMA) models. A prediction of the monthly global surface temperature during the second decade of the 21st century using the DSC model shows that the global temperature will likely continue to rise at twice the average rate of the past 150 years. The evaluation of prediction accuracy shows that DSC models perform systematically well against selected models of other authors, suggesting that DSC models, when coupled with other ecoenvironmental models, can be used as a supplemental tool for short-term (˜10-year) environmental planning and decision making.
Comparison of theory and experiment for NAPL dissolution in porous media
NASA Astrophysics Data System (ADS)
Bahar, T.; Golfier, F.; Oltéan, C.; Lefevre, E.; Lorgeoux, C.
2018-04-01
Contamination of groundwater resources by an immiscible organic phase commonly called NAPL (Non Aqueous Phase Liquid) represents a major scientific challenge considering the residence time of such a pollutant. This contamination leads to the formation of NAPL blobs trapped in the soil and impact of this residual saturation cannot be ignored for correct predictions of the contaminant fate. In this paper, we present results of micromodel experiments on the dissolution of pure hydrocarbon phase (toluene). They were conducted for two values of the Péclet number. These experiments provide data for comparison and validation of a two-phase non-equilibrium theoretical model developed by Quintard and Whitaker (1994) using the volume averaging method. The model was directly upscaled from the averaged pore-scale mass balance equations. The effective properties of the macroscopic model were calculated over periodic unit cells designed from images of the experimental flow cell. Comparison of experimental and numerical results shows that the transport model predicts correctly - with no fitting parameters - the main mechanisms of NAPL mass transfer. The study highlights the crucial need of having a fair recovery of pore-scale characteristic lengths to predict the mass transfer coefficient with accuracy.
Vehicular traffic noise prediction using soft computing approach.
Singh, Daljeet; Nigam, S P; Agrawal, V P; Kumar, Maneek
2016-12-01
A new approach for the development of vehicular traffic noise prediction models is presented. Four different soft computing methods, namely, Generalized Linear Model, Decision Trees, Random Forests and Neural Networks, have been used to develop models to predict the hourly equivalent continuous sound pressure level, Leq, at different locations in the Patiala city in India. The input variables include the traffic volume per hour, percentage of heavy vehicles and average speed of vehicles. The performance of the four models is compared on the basis of performance criteria of coefficient of determination, mean square error and accuracy. 10-fold cross validation is done to check the stability of the Random Forest model, which gave the best results. A t-test is performed to check the fit of the model with the field data. Copyright © 2016 Elsevier Ltd. All rights reserved.
Assessment of Computational Fluid Dynamics (CFD) Models for Shock Boundary-Layer Interaction
NASA Technical Reports Server (NTRS)
DeBonis, James R.; Oberkampf, William L.; Wolf, Richard T.; Orkwis, Paul D.; Turner, Mark G.; Babinsky, Holger
2011-01-01
A workshop on the computational fluid dynamics (CFD) prediction of shock boundary-layer interactions (SBLIs) was held at the 48th AIAA Aerospace Sciences Meeting. As part of the workshop numerous CFD analysts submitted solutions to four experimentally measured SBLIs. This paper describes the assessment of the CFD predictions. The assessment includes an uncertainty analysis of the experimental data, the definition of an error metric and the application of that metric to the CFD solutions. The CFD solutions provided very similar levels of error and in general it was difficult to discern clear trends in the data. For the Reynolds Averaged Navier-Stokes methods the choice of turbulence model appeared to be the largest factor in solution accuracy. Large-eddy simulation methods produced error levels similar to RANS methods but provided superior predictions of normal stresses.
NASA Astrophysics Data System (ADS)
Lahmiri, Salim; Boukadoum, Mounir
2015-08-01
We present a new ensemble system for stock market returns prediction where continuous wavelet transform (CWT) is used to analyze return series and backpropagation neural networks (BPNNs) for processing CWT-based coefficients, determining the optimal ensemble weights, and providing final forecasts. Particle swarm optimization (PSO) is used for finding optimal weights and biases for each BPNN. To capture symmetry/asymmetry in the underlying data, three wavelet functions with different shapes are adopted. The proposed ensemble system was tested on three Asian stock markets: The Hang Seng, KOSPI, and Taiwan stock market data. Three statistical metrics were used to evaluate the forecasting accuracy; including, mean of absolute errors (MAE), root mean of squared errors (RMSE), and mean of absolute deviations (MADs). Experimental results showed that our proposed ensemble system outperformed the individual CWT-ANN models each with different wavelet function. In addition, the proposed ensemble system outperformed the conventional autoregressive moving average process. As a result, the proposed ensemble system is suitable to capture symmetry/asymmetry in financial data fluctuations for better prediction accuracy.
NASA Astrophysics Data System (ADS)
Wong, Pak-kin; Vong, Chi-man; Wong, Hang-cheong; Li, Ke
2010-05-01
Modern automotive spark-ignition (SI) power performance usually refers to output power and torque, and they are significantly affected by the setup of control parameters in the engine management system (EMS). EMS calibration is done empirically through tests on the dynamometer (dyno) because no exact mathematical engine model is yet available. With an emerging nonlinear function estimation technique of Least squares support vector machines (LS-SVM), the approximate power performance model of a SI engine can be determined by training the sample data acquired from the dyno. A novel incremental algorithm based on typical LS-SVM is also proposed in this paper, so the power performance models built from the incremental LS-SVM can be updated whenever new training data arrives. With updating the models, the model accuracies can be continuously increased. The predicted results using the estimated models from the incremental LS-SVM are good agreement with the actual test results and with the almost same average accuracy of retraining the models from scratch, but the incremental algorithm can significantly shorten the model construction time when new training data arrives.
Analysis of spatial distribution of land cover maps accuracy
NASA Astrophysics Data System (ADS)
Khatami, R.; Mountrakis, G.; Stehman, S. V.
2017-12-01
Land cover maps have become one of the most important products of remote sensing science. However, classification errors will exist in any classified map and affect the reliability of subsequent map usage. Moreover, classification accuracy often varies over different regions of a classified map. These variations of accuracy will affect the reliability of subsequent analyses of different regions based on the classified maps. The traditional approach of map accuracy assessment based on an error matrix does not capture the spatial variation in classification accuracy. Here, per-pixel accuracy prediction methods are proposed based on interpolating accuracy values from a test sample to produce wall-to-wall accuracy maps. Different accuracy prediction methods were developed based on four factors: predictive domain (spatial versus spectral), interpolation function (constant, linear, Gaussian, and logistic), incorporation of class information (interpolating each class separately versus grouping them together), and sample size. Incorporation of spectral domain as explanatory feature spaces of classification accuracy interpolation was done for the first time in this research. Performance of the prediction methods was evaluated using 26 test blocks, with 10 km × 10 km dimensions, dispersed throughout the United States. The performance of the predictions was evaluated using the area under the curve (AUC) of the receiver operating characteristic. Relative to existing accuracy prediction methods, our proposed methods resulted in improvements of AUC of 0.15 or greater. Evaluation of the four factors comprising the accuracy prediction methods demonstrated that: i) interpolations should be done separately for each class instead of grouping all classes together; ii) if an all-classes approach is used, the spectral domain will result in substantially greater AUC than the spatial domain; iii) for the smaller sample size and per-class predictions, the spectral and spatial domain yielded similar AUC; iv) for the larger sample size (i.e., very dense spatial sample) and per-class predictions, the spatial domain yielded larger AUC; v) increasing the sample size improved accuracy predictions with a greater benefit accruing to the spatial domain; and vi) the function used for interpolation had the smallest effect on AUC.
Mining the key predictors for event outbreaks in social networks
NASA Astrophysics Data System (ADS)
Yi, Chengqi; Bao, Yuanyuan; Xue, Yibo
2016-04-01
It will be beneficial to devise a method to predict a so-called event outbreak. Existing works mainly focus on exploring effective methods for improving the accuracy of predictions, while ignoring the underlying causes: What makes event go viral? What factors that significantly influence the prediction of an event outbreak in social networks? In this paper, we proposed a novel definition for an event outbreak, taking into account the structural changes to a network during the propagation of content. In addition, we investigated features that were sensitive to predicting an event outbreak. In order to investigate the universality of these features at different stages of an event, we split the entire lifecycle of an event into 20 equal segments according to the proportion of the propagation time. We extracted 44 features, including features related to content, users, structure, and time, from each segment of the event. Based on these features, we proposed a prediction method using supervised classification algorithms to predict event outbreaks. Experimental results indicate that, as time goes by, our method is highly accurate, with a precision rate ranging from 79% to 97% and a recall rate ranging from 74% to 97%. In addition, after applying a feature-selection algorithm, the top five selected features can considerably improve the accuracy of the prediction. Data-driven experimental results show that the entropy of the eigenvector centrality, the entropy of the PageRank, the standard deviation of the betweenness centrality, the proportion of re-shares without content, and the average path length are the key predictors for an event outbreak. Our findings are especially useful for further exploring the intrinsic characteristics of outbreak prediction.
Acosta-Pech, Rocío; Crossa, José; de Los Campos, Gustavo; Teyssèdre, Simon; Claustres, Bruno; Pérez-Elizalde, Sergio; Pérez-Rodríguez, Paulino
2017-07-01
A new genomic model that incorporates genotype × environment interaction gave increased prediction accuracy of untested hybrid response for traits such as percent starch content, percent dry matter content and silage yield of maize hybrids. The prediction of hybrid performance (HP) is very important in agricultural breeding programs. In plant breeding, multi-environment trials play an important role in the selection of important traits, such as stability across environments, grain yield and pest resistance. Environmental conditions modulate gene expression causing genotype × environment interaction (G × E), such that the estimated genetic correlations of the performance of individual lines across environments summarize the joint action of genes and environmental conditions. This article proposes a genomic statistical model that incorporates G × E for general and specific combining ability for predicting the performance of hybrids in environments. The proposed model can also be applied to any other hybrid species with distinct parental pools. In this study, we evaluated the predictive ability of two HP prediction models using a cross-validation approach applied in extensive maize hybrid data, comprising 2724 hybrids derived from 507 dent lines and 24 flint lines, which were evaluated for three traits in 58 environments over 12 years; analyses were performed for each year. On average, genomic models that include the interaction of general and specific combining ability with environments have greater predictive ability than genomic models without interaction with environments (ranging from 12 to 22%, depending on the trait). We concluded that including G × E in the prediction of untested maize hybrids increases the accuracy of genomic models.
2014-01-01
Background Protein model quality assessment is an essential component of generating and using protein structural models. During the Tenth Critical Assessment of Techniques for Protein Structure Prediction (CASP10), we developed and tested four automated methods (MULTICOM-REFINE, MULTICOM-CLUSTER, MULTICOM-NOVEL, and MULTICOM-CONSTRUCT) that predicted both local and global quality of protein structural models. Results MULTICOM-REFINE was a clustering approach that used the average pairwise structural similarity between models to measure the global quality and the average Euclidean distance between a model and several top ranked models to measure the local quality. MULTICOM-CLUSTER and MULTICOM-NOVEL were two new support vector machine-based methods of predicting both the local and global quality of a single protein model. MULTICOM-CONSTRUCT was a new weighted pairwise model comparison (clustering) method that used the weighted average similarity between models in a pool to measure the global model quality. Our experiments showed that the pairwise model assessment methods worked better when a large portion of models in the pool were of good quality, whereas single-model quality assessment methods performed better on some hard targets when only a small portion of models in the pool were of reasonable quality. Conclusions Since digging out a few good models from a large pool of low-quality models is a major challenge in protein structure prediction, single model quality assessment methods appear to be poised to make important contributions to protein structure modeling. The other interesting finding was that single-model quality assessment scores could be used to weight the models by the consensus pairwise model comparison method to improve its accuracy. PMID:24731387
Cao, Renzhi; Wang, Zheng; Cheng, Jianlin
2014-04-15
Protein model quality assessment is an essential component of generating and using protein structural models. During the Tenth Critical Assessment of Techniques for Protein Structure Prediction (CASP10), we developed and tested four automated methods (MULTICOM-REFINE, MULTICOM-CLUSTER, MULTICOM-NOVEL, and MULTICOM-CONSTRUCT) that predicted both local and global quality of protein structural models. MULTICOM-REFINE was a clustering approach that used the average pairwise structural similarity between models to measure the global quality and the average Euclidean distance between a model and several top ranked models to measure the local quality. MULTICOM-CLUSTER and MULTICOM-NOVEL were two new support vector machine-based methods of predicting both the local and global quality of a single protein model. MULTICOM-CONSTRUCT was a new weighted pairwise model comparison (clustering) method that used the weighted average similarity between models in a pool to measure the global model quality. Our experiments showed that the pairwise model assessment methods worked better when a large portion of models in the pool were of good quality, whereas single-model quality assessment methods performed better on some hard targets when only a small portion of models in the pool were of reasonable quality. Since digging out a few good models from a large pool of low-quality models is a major challenge in protein structure prediction, single model quality assessment methods appear to be poised to make important contributions to protein structure modeling. The other interesting finding was that single-model quality assessment scores could be used to weight the models by the consensus pairwise model comparison method to improve its accuracy.
Ianniello, Stefania; Di Giacomo, Vincenza; Sessa, Barbara; Miele, Vittorio
2014-09-01
Combined clinical examination and supine chest radiography have shown low accuracy in the assessment of pneumothorax in unstable patients with major chest trauma during the primary survey in the emergency room. The aim of our study was to evaluate the diagnostic accuracy of extended-focused assessment with sonography in trauma (e-FAST), in the diagnosis of pneumothorax, compared with the results of multidetector computed tomography (MDCT) and of invasive interventions (thoracostomy tube placement). This was a retrospective case series involving 368 consecutive unstable adult patients (273 men and 95 women; average age, 25 years; range, 16-68 years) admitted to our hospital's emergency department between January 2011 and December 2012 for major trauma (Injury Severity Score ≥ 15). We evaluated the accuracy of thoracic ultrasound in the detection of pneumothorax compared with the results of MDCT and invasive interventions (thoracostomy tube placement). Institutional review board approval was obtained prior to commencement of this study. Among the 736 lung fields included in the study, 87 pneumothoraces were detected with thoracic CT scans (23.6%). e-FAST detected 67/87 and missed 20 pneumothoraces (17 mild, 3 moderate). The diagnostic performance of ultrasound was: sensitivity 77% (74% in 2011 and 80% in 2012), specificity 99.8%, positive predictive value 98.5%, negative predictive value 97%, accuracy 97.2% (67 true positive; 668 true negative; 1 false positive; 20 false negative); 17 missed mild pneumothoraces were not immediately life-threatening (thickness less than 5 mm). Thoracic ultrasound (e-FAST) is a rapid and accurate first-line, bedside diagnostic modality for the diagnosis of pneumothorax in unstable patients with major chest trauma during the primary survey in the emergency room.
Forecasting daily patient volumes in the emergency department.
Jones, Spencer S; Thomas, Alun; Evans, R Scott; Welch, Shari J; Haug, Peter J; Snow, Gregory L
2008-02-01
Shifts in the supply of and demand for emergency department (ED) resources make the efficient allocation of ED resources increasingly important. Forecasting is a vital activity that guides decision-making in many areas of economic, industrial, and scientific planning, but has gained little traction in the health care industry. There are few studies that explore the use of forecasting methods to predict patient volumes in the ED. The goals of this study are to explore and evaluate the use of several statistical forecasting methods to predict daily ED patient volumes at three diverse hospital EDs and to compare the accuracy of these methods to the accuracy of a previously proposed forecasting method. Daily patient arrivals at three hospital EDs were collected for the period January 1, 2005, through March 31, 2007. The authors evaluated the use of seasonal autoregressive integrated moving average, time series regression, exponential smoothing, and artificial neural network models to forecast daily patient volumes at each facility. Forecasts were made for horizons ranging from 1 to 30 days in advance. The forecast accuracy achieved by the various forecasting methods was compared to the forecast accuracy achieved when using a benchmark forecasting method already available in the emergency medicine literature. All time series methods considered in this analysis provided improved in-sample model goodness of fit. However, post-sample analysis revealed that time series regression models that augment linear regression models by accounting for serial autocorrelation offered only small improvements in terms of post-sample forecast accuracy, relative to multiple linear regression models, while seasonal autoregressive integrated moving average, exponential smoothing, and artificial neural network forecasting models did not provide consistently accurate forecasts of daily ED volumes. This study confirms the widely held belief that daily demand for ED services is characterized by seasonal and weekly patterns. The authors compared several time series forecasting methods to a benchmark multiple linear regression model. The results suggest that the existing methodology proposed in the literature, multiple linear regression based on calendar variables, is a reasonable approach to forecasting daily patient volumes in the ED. However, the authors conclude that regression-based models that incorporate calendar variables, account for site-specific special-day effects, and allow for residual autocorrelation provide a more appropriate, informative, and consistently accurate approach to forecasting daily ED patient volumes.
Collin, Blaise P.; Petti, David A.; Demkowicz, Paul A.; ...
2015-08-22
Here, the PARFUME (PARticle FUel ModEl) code was used to predict the release of fission products silver, cesium, and strontium from tristructural isotropic coated fuel particles and compacts during the first irradiation experiment (AGR-1) of the Advanced Gas Reactor Fuel Development and Qualification program. The PARFUME model for the AGR-1 experiment used the fuel compact volume average temperature for each of the 620 days of irradiation to calculate the release of silver, cesium, and strontium from a representative particle for a select number of AGR-1 compacts. Post-irradiation examination measurements provided data on release of these fission products from fuel compactsmore » and fuel particles, and retention of silver in the compacts outside of the silicon carbide (SiC) layer. PARFUME-predicted fractional release of silver, cesium, and strontium was determined and compared to the PIE measurements. For silver, comparisons show a trend of over-prediction at low burnup and under-prediction at high burnup. PARFUME has limitations in the modeling of the temporal and spatial distributions of the temperature and burnup across the compacts, which affects the accuracy of its predictions. Nevertheless, the comparisons on silver release lie in the same order of magnitude. Results show an overall over-prediction of the fractional release of cesium by PARFUME. For particles with failed SiC layers, the over-prediction is by a factor of up to 3, corresponding to a potential over-estimation of the diffusivity in uranium oxycarbide (UCO) by a factor of up to 250. For intact particles, whose release is much lower, the over-prediction is by a factor of up to 100, which could be attributed to an over-estimated diffusivity in SiC by about 40% on average. The release of strontium from intact particles is also over-predicted by PARFUME, which also points towards an over-estimated diffusivity of strontium in either SiC or UCO, or possibly both. The measured strontium fractional release from intact particles varied considerably from compact to compact, making it difficult to assess the effective over-estimation of the diffusivities. Moreover, the release of strontium from particles with failed SiC is difficult to observe experimentally due to the release from intact particles, preventing any conclusions to be made on the accuracy or validity of the PARFUME predictions and the modeled diffusivity of strontium in UCO.« less
NASA Astrophysics Data System (ADS)
Collin, Blaise P.; Petti, David A.; Demkowicz, Paul A.; Maki, John T.
2015-11-01
The PARFUME (PARticle FUel ModEl) code was used to predict the release of fission products silver, cesium, and strontium from tristructural isotropic coated fuel particles and compacts during the first irradiation experiment (AGR-1) of the Advanced Gas Reactor Fuel Development and Qualification program. The PARFUME model for the AGR-1 experiment used the fuel compact volume average temperature for each of the 620 days of irradiation to calculate the release of silver, cesium, and strontium from a representative particle for a select number of AGR-1 compacts. Post-irradiation examination (PIE) measurements provided data on release of these fission products from fuel compacts and fuel particles, and retention of silver in the compacts outside of the silicon carbide (SiC) layer. PARFUME-predicted fractional release of silver, cesium, and strontium was determined and compared to the PIE measurements. For silver, comparisons show a trend of over-prediction at low burnup and under-prediction at high burnup. PARFUME has limitations in the modeling of the temporal and spatial distributions of the temperature and burnup across the compacts, which affects the accuracy of its predictions. Nevertheless, the comparisons on silver release lie in the same order of magnitude. Results show an overall over-prediction of the fractional release of cesium by PARFUME. For particles with failed SiC layers, the over-prediction is by a factor of up to 3, corresponding to a potential over-estimation of the diffusivity in uranium oxycarbide (UCO) by a factor of up to 250. For intact particles, whose release is much lower, the over-prediction is by a factor of up to 100, which could be attributed to an over-estimated diffusivity in SiC by about 40% on average. The release of strontium from intact particles is also over-predicted by PARFUME, which also points towards an over-estimated diffusivity of strontium in either SiC or UCO, or possibly both. The measured strontium fractional release from intact particles varied considerably from compact to compact, making it difficult to assess the effective over-estimation of the diffusivities. Furthermore, the release of strontium from particles with failed SiC is difficult to observe experimentally due to the release from intact particles, preventing any conclusions to be made on the accuracy or validity of the PARFUME predictions and the modeled diffusivity of strontium in UCO.
Probability-based collaborative filtering model for predicting gene-disease associations.
Zeng, Xiangxiang; Ding, Ningxiang; Rodríguez-Patón, Alfonso; Zou, Quan
2017-12-28
Accurately predicting pathogenic human genes has been challenging in recent research. Considering extensive gene-disease data verified by biological experiments, we can apply computational methods to perform accurate predictions with reduced time and expenses. We propose a probability-based collaborative filtering model (PCFM) to predict pathogenic human genes. Several kinds of data sets, containing data of humans and data of other nonhuman species, are integrated in our model. Firstly, on the basis of a typical latent factorization model, we propose model I with an average heterogeneous regularization. Secondly, we develop modified model II with personal heterogeneous regularization to enhance the accuracy of aforementioned models. In this model, vector space similarity or Pearson correlation coefficient metrics and data on related species are also used. We compared the results of PCFM with the results of four state-of-arts approaches. The results show that PCFM performs better than other advanced approaches. PCFM model can be leveraged for predictions of disease genes, especially for new human genes or diseases with no known relationships.
Knowing what to expect, forecasting monthly emergency department visits: A time-series analysis.
Bergs, Jochen; Heerinckx, Philipe; Verelst, Sandra
2014-04-01
To evaluate an automatic forecasting algorithm in order to predict the number of monthly emergency department (ED) visits one year ahead. We collected retrospective data of the number of monthly visiting patients for a 6-year period (2005-2011) from 4 Belgian Hospitals. We used an automated exponential smoothing approach to predict monthly visits during the year 2011 based on the first 5 years of the dataset. Several in- and post-sample forecasting accuracy measures were calculated. The automatic forecasting algorithm was able to predict monthly visits with a mean absolute percentage error ranging from 2.64% to 4.8%, indicating an accurate prediction. The mean absolute scaled error ranged from 0.53 to 0.68 indicating that, on average, the forecast was better compared with in-sample one-step forecast from the naïve method. The applied automated exponential smoothing approach provided useful predictions of the number of monthly visits a year in advance. Copyright © 2013 Elsevier Ltd. All rights reserved.
McGinitie, Teague M; Ebrahimi-Najafabadi, Heshmatollah; Harynuk, James J
2014-02-21
A new method for calibrating thermodynamic data to be used in the prediction of analyte retention times is presented. The method allows thermodynamic data collected on one column to be used in making predictions across columns of the same stationary phase but with varying geometries. This calibration is essential as slight variances in the column inner diameter and stationary phase film thickness between columns or as a column ages will adversely affect the accuracy of predictions. The calibration technique uses a Grob standard mixture along with a Nelder-Mead simplex algorithm and a previously developed model of GC retention times based on a three-parameter thermodynamic model to estimate both inner diameter and stationary phase film thickness. The calibration method is highly successful with the predicted retention times for a set of alkanes, ketones and alcohols having an average error of 1.6s across three columns. Copyright © 2014 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Rampidis, I.; Nikolopoulos, A.; Koukouzas, N.; Grammelis, P.; Kakaras, E.
2007-09-01
This work aims to present a pure 3-D CFD model, accurate and efficient, for the simulation of a pilot scale CFB hydrodynamics. The accuracy of the model was investigated as a function of the numerical parameters, in order to derive an optimum model setup with respect to computational cost. The necessity of the in depth examination of hydrodynamics emerges by the trend to scale up CFBCs. This scale up brings forward numerous design problems and uncertainties, which can be successfully elucidated by CFD techniques. Deriving guidelines for setting a computational efficient model is important as the scale of the CFBs grows fast, while computational power is limited. However, the optimum efficiency matter has not been investigated thoroughly in the literature as authors were more concerned for their models accuracy and validity. The objective of this work is to investigate the parameters that influence the efficiency and accuracy of CFB computational fluid dynamics models, find the optimum set of these parameters and thus establish this technique as a competitive method for the simulation and design of industrial, large scale beds, where the computational cost is otherwise prohibitive. During the tests that were performed in this work, the influence of turbulence modeling approach, time and space density and discretization schemes were investigated on a 1.2 MWth CFB test rig. Using Fourier analysis dominant frequencies were extracted in order to estimate the adequate time period for the averaging of all instantaneous values. The compliance with the experimental measurements was very good. The basic differences between the predictions that arose from the various model setups were pointed out and analyzed. The results showed that a model with high order space discretization schemes when applied on a coarse grid and averaging of the instantaneous scalar values for a 20 sec period, adequately described the transient hydrodynamic behaviour of a pilot CFB while the computational cost was kept low. Flow patterns inside the bed such as the core-annulus flow and the transportation of clusters were at least qualitatively captured.
ERIC Educational Resources Information Center
Scott, Katelyn C.; Skinner, Christopher H.; Moore, Tara C.; McCurdy, Merilee; Ciancio, Dennis; Cihak, David F.
2017-01-01
An adapted alternating treatments design was used to evaluate and compare the effects of two group contingency interventions on mathematics assignment accuracy in an intact first-grade classroom. Both an interdependent contingency with class-average criteria (16 students) and a dependent contingency with criteria based on the average of a smaller,…
A Pareto-optimal moving average multigene genetic programming model for daily streamflow prediction
NASA Astrophysics Data System (ADS)
Danandeh Mehr, Ali; Kahya, Ercan
2017-06-01
Genetic programming (GP) is able to systematically explore alternative model structures of different accuracy and complexity from observed input and output data. The effectiveness of GP in hydrological system identification has been recognized in recent studies. However, selecting a parsimonious (accurate and simple) model from such alternatives still remains a question. This paper proposes a Pareto-optimal moving average multigene genetic programming (MA-MGGP) approach to develop a parsimonious model for single-station streamflow prediction. The three main components of the approach that take us from observed data to a validated model are: (1) data pre-processing, (2) system identification and (3) system simplification. The data pre-processing ingredient uses a simple moving average filter to diminish the lagged prediction effect of stand-alone data-driven models. The multigene ingredient of the model tends to identify the underlying nonlinear system with expressions simpler than classical monolithic GP and, eventually simplification component exploits Pareto front plot to select a parsimonious model through an interactive complexity-efficiency trade-off. The approach was tested using the daily streamflow records from a station on Senoz Stream, Turkey. Comparing to the efficiency results of stand-alone GP, MGGP, and conventional multi linear regression prediction models as benchmarks, the proposed Pareto-optimal MA-MGGP model put forward a parsimonious solution, which has a noteworthy importance of being applied in practice. In addition, the approach allows the user to enter human insight into the problem to examine evolved models and pick the best performing programs out for further analysis.
Video and accelerometer-based motion analysis for automated surgical skills assessment.
Zia, Aneeq; Sharma, Yachna; Bettadapura, Vinay; Sarin, Eric L; Essa, Irfan
2018-03-01
Basic surgical skills of suturing and knot tying are an essential part of medical training. Having an automated system for surgical skills assessment could help save experts time and improve training efficiency. There have been some recent attempts at automated surgical skills assessment using either video analysis or acceleration data. In this paper, we present a novel approach for automated assessment of OSATS-like surgical skills and provide an analysis of different features on multi-modal data (video and accelerometer data). We conduct a large study for basic surgical skill assessment on a dataset that contained video and accelerometer data for suturing and knot-tying tasks. We introduce "entropy-based" features-approximate entropy and cross-approximate entropy, which quantify the amount of predictability and regularity of fluctuations in time series data. The proposed features are compared to existing methods of Sequential Motion Texture, Discrete Cosine Transform and Discrete Fourier Transform, for surgical skills assessment. We report average performance of different features across all applicable OSATS-like criteria for suturing and knot-tying tasks. Our analysis shows that the proposed entropy-based features outperform previous state-of-the-art methods using video data, achieving average classification accuracies of 95.1 and 92.2% for suturing and knot tying, respectively. For accelerometer data, our method performs better for suturing achieving 86.8% average accuracy. We also show that fusion of video and acceleration features can improve overall performance for skill assessment. Automated surgical skills assessment can be achieved with high accuracy using the proposed entropy features. Such a system can significantly improve the efficiency of surgical training in medical schools and teaching hospitals.
Genomic Prediction of Gene Bank Wheat Landraces.
Crossa, José; Jarquín, Diego; Franco, Jorge; Pérez-Rodríguez, Paulino; Burgueño, Juan; Saint-Pierre, Carolina; Vikram, Prashant; Sansaloni, Carolina; Petroli, Cesar; Akdemir, Deniz; Sneller, Clay; Reynolds, Matthew; Tattaris, Maria; Payne, Thomas; Guzman, Carlos; Peña, Roberto J; Wenzl, Peter; Singh, Sukhwinder
2016-07-07
This study examines genomic prediction within 8416 Mexican landrace accessions and 2403 Iranian landrace accessions stored in gene banks. The Mexican and Iranian collections were evaluated in separate field trials, including an optimum environment for several traits, and in two separate environments (drought, D and heat, H) for the highly heritable traits, days to heading (DTH), and days to maturity (DTM). Analyses accounting and not accounting for population structure were performed. Genomic prediction models include genotype × environment interaction (G × E). Two alternative prediction strategies were studied: (1) random cross-validation of the data in 20% training (TRN) and 80% testing (TST) (TRN20-TST80) sets, and (2) two types of core sets, "diversity" and "prediction", including 10% and 20%, respectively, of the total collections. Accounting for population structure decreased prediction accuracy by 15-20% as compared to prediction accuracy obtained when not accounting for population structure. Accounting for population structure gave prediction accuracies for traits evaluated in one environment for TRN20-TST80 that ranged from 0.407 to 0.677 for Mexican landraces, and from 0.166 to 0.662 for Iranian landraces. Prediction accuracy of the 20% diversity core set was similar to accuracies obtained for TRN20-TST80, ranging from 0.412 to 0.654 for Mexican landraces, and from 0.182 to 0.647 for Iranian landraces. The predictive core set gave similar prediction accuracy as the diversity core set for Mexican collections, but slightly lower for Iranian collections. Prediction accuracy when incorporating G × E for DTH and DTM for Mexican landraces for TRN20-TST80 was around 0.60, which is greater than without the G × E term. For Iranian landraces, accuracies were 0.55 for the G × E model with TRN20-TST80. Results show promising prediction accuracies for potential use in germplasm enhancement and rapid introgression of exotic germplasm into elite materials. Copyright © 2016 Crossa et al.
Olayan, Rawan S; Ashoor, Haitham; Bajic, Vladimir B
2018-04-01
Finding computationally drug-target interactions (DTIs) is a convenient strategy to identify new DTIs at low cost with reasonable accuracy. However, the current DTI prediction methods suffer the high false positive prediction rate. We developed DDR, a novel method that improves the DTI prediction accuracy. DDR is based on the use of a heterogeneous graph that contains known DTIs with multiple similarities between drugs and multiple similarities between target proteins. DDR applies non-linear similarity fusion method to combine different similarities. Before fusion, DDR performs a pre-processing step where a subset of similarities is selected in a heuristic process to obtain an optimized combination of similarities. Then, DDR applies a random forest model using different graph-based features extracted from the DTI heterogeneous graph. Using 5-repeats of 10-fold cross-validation, three testing setups, and the weighted average of area under the precision-recall curve (AUPR) scores, we show that DDR significantly reduces the AUPR score error relative to the next best start-of-the-art method for predicting DTIs by 34% when the drugs are new, by 23% when targets are new and by 34% when the drugs and the targets are known but not all DTIs between them are not known. Using independent sources of evidence, we verify as correct 22 out of the top 25 DDR novel predictions. This suggests that DDR can be used as an efficient method to identify correct DTIs. The data and code are provided at https://bitbucket.org/RSO24/ddr/. vladimir.bajic@kaust.edu.sa. Supplementary data are available at Bioinformatics online.
A Unified Model of Performance for Predicting the Effects of Sleep and Caffeine.
Ramakrishnan, Sridhar; Wesensten, Nancy J; Kamimori, Gary H; Moon, James E; Balkin, Thomas J; Reifman, Jaques
2016-10-01
Existing mathematical models of neurobehavioral performance cannot predict the beneficial effects of caffeine across the spectrum of sleep loss conditions, limiting their practical utility. Here, we closed this research gap by integrating a model of caffeine effects with the recently validated unified model of performance (UMP) into a single, unified modeling framework. We then assessed the accuracy of this new UMP in predicting performance across multiple studies. We hypothesized that the pharmacodynamics of caffeine vary similarly during both wakefulness and sleep, and that caffeine has a multiplicative effect on performance. Accordingly, to represent the effects of caffeine in the UMP, we multiplied a dose-dependent caffeine factor (which accounts for the pharmacokinetics and pharmacodynamics of caffeine) to the performance estimated in the absence of caffeine. We assessed the UMP predictions in 14 distinct laboratory- and field-study conditions, including 7 different sleep-loss schedules (from 5 h of sleep per night to continuous sleep loss for 85 h) and 6 different caffeine doses (from placebo to repeated 200 mg doses to a single dose of 600 mg). The UMP accurately predicted group-average psychomotor vigilance task performance data across the different sleep loss and caffeine conditions (6% < error < 27%), yielding greater accuracy for mild and moderate sleep loss conditions than for more severe cases. Overall, accounting for the effects of caffeine resulted in improved predictions (after caffeine consumption) by up to 70%. The UMP provides the first comprehensive tool for accurate selection of combinations of sleep schedules and caffeine countermeasure strategies to optimize neurobehavioral performance. © 2016 Associated Professional Sleep Societies, LLC.
Examining speed versus selection in connectivity models using elk migration as an example
Brennan, Angela; Hanks, Ephraim M.; Merkle, Jerod A.; Cole, Eric K.; Dewey, Sarah R.; Courtemanch, Alyson B.; Cross, Paul C.
2018-01-01
ContextLandscape resistance is vital to connectivity modeling and frequently derived from resource selection functions (RSFs). RSFs estimate relative probability of use and tend to focus on understanding habitat preferences during slow, routine animal movements (e.g., foraging). Dispersal and migration, however, can produce rarer, faster movements, in which case models of movement speed rather than resource selection may be more realistic for identifying habitats that facilitate connectivity.ObjectiveTo compare two connectivity modeling approaches applied to resistance estimated from models of movement rate and resource selection.MethodsUsing movement data from migrating elk, we evaluated continuous time Markov chain (CTMC) and movement-based RSF models (i.e., step selection functions [SSFs]). We applied circuit theory and shortest random path (SRP) algorithms to CTMC, SSF and null (i.e., flat) resistance surfaces to predict corridors between elk seasonal ranges. We evaluated prediction accuracy by comparing model predictions to empirical elk movements.ResultsAll connectivity models predicted elk movements well, but models applied to CTMC resistance were more accurate than models applied to SSF and null resistance. Circuit theory models were more accurate on average than SRP models.ConclusionsCTMC can be more realistic than SSFs for estimating resistance for fast movements, though SSFs may demonstrate some predictive ability when animals also move slowly through corridors (e.g., stopover use during migration). High null model accuracy suggests seasonal range data may also be critical for predicting direct migration routes. For animals that migrate or disperse across large landscapes, we recommend incorporating CTMC into the connectivity modeling toolkit.
Comparison of Eight Equations That Predict Percent Body Fat Using Skinfolds in American Youth
Roberts, Amy; Cai, Jianwen; Berge, Jerica M.; Stevens, June
2016-01-01
Abstract Background: Skinfolds are often used in equations to predict percent body fat (PBF) in youth. Although there are numerous such equations published, there is limited information to help researchers determine which equation to use for their sample. Methods: Using data from the 1999–2006 National Health and Nutrition Examination Surveys (NHANES), we compared eight published equations for prediction of PBF. These published equations all included triceps and/or subscapular skinfold measurements. We examined the PBF equations in a nationally representative sample of American youth that was matched by age, sex, and race/ethnicity to the original equation development population and a full sample of 8- to 18-year-olds. We compared the equation-predicted PBF to the dual-emission X-ray absorptiometry (DXA)-measured PBF. The adjusted R2, root mean square error (RMSE), and mean signed difference (MSD) were compared. The MSDs were used to examine accuracy and differential bias by age, sex, and race/ethnicity. Results: When applied to the full range of 8- 18-year-old youth, the R2 values ranged from 0.495 to 0.738. The MSD between predicted and DXA-measured PBF indicated high average accuracy (MSD between −1.0 and 1.0) for only three equations (Bray subscapular equation and Dezenberg equations [with and without race/ethnicity]). The majority of the equations showed differential bias by sex, race/ethnicity, weight status, or age. Conclusions: These findings indicate that investigators should use caution in the selection of an equation to predict PBF in youth given that results may vary systematically in important subgroups. PMID:27045618
Examining speed versus selection in connectivity models using elk migration as an example
Brennan, Angela; Hanks, EM; Merkle, JA; Cole, EK; Dewey, SR; Courtemanch, AB; Cross, Paul C.
2018-01-01
Context: Landscape resistance is vital to connectivity modeling and frequently derived from resource selection functions (RSFs). RSFs estimate relative probability of use and tend to focus on understanding habitat preferences during slow, routine animal movements (e.g., foraging). Dispersal and migration, however, can produce rarer, faster movements, in which case models of movement speed rather than resource selection may be more realistic for identifying habitats that facilitate connectivity. Objective: To compare two connectivity modeling approaches applied to resistance estimated from models of movement rate and resource selection. Methods: Using movement data from migrating elk, we evaluated continuous time Markov chain (CTMC) and movement-based RSF models (i.e., step selection functions [SSFs]). We applied circuit theory and shortest random path (SRP) algorithms to CTMC, SSF and null (i.e., flat) resistance surfaces to predict corridors between elk seasonal ranges. We evaluated prediction accuracy by comparing model predictions to empirical elk movements. Results: All models predicted elk movements well, but models applied to CTMC resistance were more accurate than models applied to SSF and null resistance. Circuit theory models were more accurate on average than SRP algorithms. Conclusions: CTMC can be more realistic than SSFs for estimating resistance for fast movements, though SSFs may demonstrate some predictive ability when animals also move slowly through corridors (e.g., stopover use during migration). High null model accuracy suggests seasonal range data may also be critical for predicting direct migration routes. For animals that migrate or disperse across large landscapes, we recommend incorporating CTMC into the connectivity modeling toolkit.
Comparison of Eight Equations That Predict Percent Body Fat Using Skinfolds in American Youth.
Truesdale, Kimberly P; Roberts, Amy; Cai, Jianwen; Berge, Jerica M; Stevens, June
2016-08-01
Skinfolds are often used in equations to predict percent body fat (PBF) in youth. Although there are numerous such equations published, there is limited information to help researchers determine which equation to use for their sample. Using data from the 1999-2006 National Health and Nutrition Examination Surveys (NHANES), we compared eight published equations for prediction of PBF. These published equations all included triceps and/or subscapular skinfold measurements. We examined the PBF equations in a nationally representative sample of American youth that was matched by age, sex, and race/ethnicity to the original equation development population and a full sample of 8- to 18-year-olds. We compared the equation-predicted PBF to the dual-emission X-ray absorptiometry (DXA)-measured PBF. The adjusted R(2), root mean square error (RMSE), and mean signed difference (MSD) were compared. The MSDs were used to examine accuracy and differential bias by age, sex, and race/ethnicity. When applied to the full range of 8- 18-year-old youth, the R(2) values ranged from 0.495 to 0.738. The MSD between predicted and DXA-measured PBF indicated high average accuracy (MSD between -1.0 and 1.0) for only three equations (Bray subscapular equation and Dezenberg equations [with and without race/ethnicity]). The majority of the equations showed differential bias by sex, race/ethnicity, weight status, or age. These findings indicate that investigators should use caution in the selection of an equation to predict PBF in youth given that results may vary systematically in important subgroups.
Beyond Captions: Linking Figures with Abstract Sentences in Biomedical Articles
Bockhorst, Joseph P.; Conroy, John M.; Agarwal, Shashank; O’Leary, Dianne P.; Yu, Hong
2012-01-01
Although figures in scientific articles have high information content and concisely communicate many key research findings, they are currently under utilized by literature search and retrieval systems. Many systems ignore figures, and those that do not typically only consider caption text. This study describes and evaluates a fully automated approach for associating figures in the body of a biomedical article with sentences in its abstract. We use supervised methods to learn probabilistic language models, hidden Markov models, and conditional random fields for predicting associations between abstract sentences and figures. Three kinds of evidence are used: text in abstract sentences and figures, relative positions of sentences and figures, and the patterns of sentence/figure associations across an article. Each information source is shown to have predictive value, and models that use all kinds of evidence are more accurate than models that do not. Our most accurate method has an -score of 69% on a cross-validation experiment, is competitive with the accuracy of human experts, has significantly better predictive accuracy than state-of-the-art methods and enables users to access figures associated with an abstract sentence with an average of 1.82 fewer mouse clicks. A user evaluation shows that human users find our system beneficial. The system is available at http://FigureItOut.askHERMES.org. PMID:22815711
NASA Astrophysics Data System (ADS)
Richardson, Robert R.; Zhao, Shi; Howey, David A.
2016-09-01
Estimating the temperature distribution within Li-ion batteries during operation is critical for safety and control purposes. Although existing control-oriented thermal models - such as thermal equivalent circuits (TEC) - are computationally efficient, they only predict average temperatures, and are unable to predict the spatially resolved temperature distribution throughout the cell. We present a low-order 2D thermal model of a cylindrical battery based on a Chebyshev spectral-Galerkin (SG) method, capable of predicting the full temperature distribution with a similar efficiency to a TEC. The model accounts for transient heat generation, anisotropic heat conduction, and non-homogeneous convection boundary conditions. The accuracy of the model is validated through comparison with finite element simulations, which show that the 2-D temperature field (r, z) of a large format (64 mm diameter) cell can be accurately modelled with as few as 4 states. Furthermore, the performance of the model for a range of Biot numbers is investigated via frequency analysis. For larger cells or highly transient thermal dynamics, the model order can be increased for improved accuracy. The incorporation of this model in a state estimation scheme with experimental validation against thermocouple measurements is presented in the companion contribution (http://www.sciencedirect.com/science/article/pii/S0378775316308163)
Inanlouganji, Alireza; Reddy, T. Agami; Katipamula, Srinivas
2018-04-13
Forecasting solar irradiation has acquired immense importance in view of the exponential increase in the number of solar photovoltaic (PV) system installations. In this article, analyses results involving statistical and machine-learning techniques to predict solar irradiation for different forecasting horizons are reported. Yearlong typical meteorological year 3 (TMY3) datasets from three cities in the United States with different climatic conditions have been used in this analysis. A simple forecast approach that assumes consecutive days to be identical serves as a baseline model to compare forecasting alternatives. To account for seasonal variability and to capture short-term fluctuations, different variants of themore » lagged moving average (LMX) model with cloud cover as the input variable are evaluated. Finally, the proposed LMX model is evaluated against an artificial neural network (ANN) model. How the one-hour and 24-hour models can be used in conjunction to predict different short-term rolling horizons is discussed, and this joint application is illustrated for a four-hour rolling horizon forecast scheme. Lastly, the effect of using predicted cloud cover values, instead of measured ones, on the accuracy of the models is assessed. Results show that LMX models do not degrade in forecast accuracy if models are trained with the forecast cloud cover data.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Inanlouganji, Alireza; Reddy, T. Agami; Katipamula, Srinivas
Forecasting solar irradiation has acquired immense importance in view of the exponential increase in the number of solar photovoltaic (PV) system installations. In this article, analyses results involving statistical and machine-learning techniques to predict solar irradiation for different forecasting horizons are reported. Yearlong typical meteorological year 3 (TMY3) datasets from three cities in the United States with different climatic conditions have been used in this analysis. A simple forecast approach that assumes consecutive days to be identical serves as a baseline model to compare forecasting alternatives. To account for seasonal variability and to capture short-term fluctuations, different variants of themore » lagged moving average (LMX) model with cloud cover as the input variable are evaluated. Finally, the proposed LMX model is evaluated against an artificial neural network (ANN) model. How the one-hour and 24-hour models can be used in conjunction to predict different short-term rolling horizons is discussed, and this joint application is illustrated for a four-hour rolling horizon forecast scheme. Lastly, the effect of using predicted cloud cover values, instead of measured ones, on the accuracy of the models is assessed. Results show that LMX models do not degrade in forecast accuracy if models are trained with the forecast cloud cover data.« less
Interobserver variability of sonography for prediction of placenta accreta.
Bowman, Zachary S; Eller, Alexandra G; Kennedy, Anne M; Richards, Douglas S; Winter, Thomas C; Woodward, Paula J; Silver, Robert M
2014-12-01
The sensitivity of sonography to predict accreta has been reported as higher than 90%. However, most studies are from single expert investigators. Our objective was to analyze interobserver variability of sonography for prediction of placenta accreta. Patients with previa with and without accreta were ascertained, and images with placental views were collected, deidentified, and placed in random sequence. Three radiologists and 3 maternal-fetal medicine specialists interpreted each study for the presence of accreta and specific findings reported to be associated with its diagnosis. Investigator-specific sensitivity, specificity, and accuracy were calculated. κ statistics were used to assess variability between individuals and types of investigators. A total of 229 sonographic studies from 55 patients with accreta and 56 control patients were examined. Accuracy ranged from 55.9% to 76.4%. Of imaging studies yielding diagnoses, sensitivity ranged from 53.4% to 74.4%, and specificity ranged from 70.8% to 94.8%. Overall interobserver agreement was moderate (mean κ ± SD = 0.47 ± 0.12). κ values between pairs of investigators ranged from 0.32 (fair agreement) to 0.73 (substantial agreement). Average individual agreement ranged from fair (κ = 0.35) to moderate (κ = 0.53). Blinded from clinical data, sonography has significant interobserver variability for the diagnosis of placenta accreta. © 2013 by the American Institute of Ultrasound in Medicine.
Alemi, Farrokh; Torii, Manabu; Atherton, Martin J; Pattie, David C; Cox, Kenneth L
2012-01-01
This article aims to examine whether words listed in reasons for appointments could effectively predict laboratory-verified influenza cases in syndromic surveillance systems. Data were collected from the Armed Forces Health Longitudinal Technological Application medical record system. We used 2 algorithms to combine the impact of words within reasons for appointments: Dependent (DBSt) and Independent (IBSt) Bayesian System. We used receiver operating characteristic curves to compare the accuracy of these 2 methods of processing reasons for appointments against current and previous lists of diagnoses used in the Department of Defense's syndromic surveillance system. We examined 13,096 cases, where the results of influenza tests were available. Each reason for an appointment had an average of 3.5 words (standard deviation = 2.2 words). There was no difference in performance of the 2 algorithms. The area under the curve for IBSt was 0.58 and for DBSt was 0.56. The difference was not statistically significant (McNemar statistic = 0.0054; P = 0.07). These data suggest that reasons for appointments can improve the accuracy of lists of diagnoses in predicting laboratory-verified influenza cases. This study recommends further exploration of the DBSt algorithm and reasons for appointments in predicting likely influenza cases.
Multiscale weighted colored graphs for protein flexibility and rigidity analysis
NASA Astrophysics Data System (ADS)
Bramer, David; Wei, Guo-Wei
2018-02-01
Protein structural fluctuation, measured by Debye-Waller factors or B-factors, is known to correlate to protein flexibility and function. A variety of methods has been developed for protein Debye-Waller factor prediction and related applications to domain separation, docking pose ranking, entropy calculation, hinge detection, stability analysis, etc. Nevertheless, none of the current methodologies are able to deliver an accuracy of 0.7 in terms of the Pearson correlation coefficients averaged over a large set of proteins. In this work, we introduce a paradigm-shifting geometric graph model, multiscale weighted colored graph (MWCG), to provide a new generation of computational algorithms to significantly change the current status of protein structural fluctuation analysis. Our MWCG model divides a protein graph into multiple subgraphs based on interaction types between graph nodes and represents the protein rigidity by generalized centralities of subgraphs. MWCGs not only predict the B-factors of protein residues but also accurately analyze the flexibility of all atoms in a protein. The MWCG model is validated over a number of protein test sets and compared with many standard methods. An extensive numerical study indicates that the proposed MWCG offers an accuracy of over 0.8 and thus provides perhaps the first reliable method for estimating protein flexibility and B-factors. It also simultaneously predicts all-atom flexibility in a molecule.
Lomsadze, Alexandre; Gemayel, Karl; Tang, Shiyuyun; Borodovsky, Mark
2018-05-17
In a conventional view of the prokaryotic genome organization, promoters precede operons and ribosome binding sites (RBSs) with Shine-Dalgarno consensus precede genes. However, recent experimental research suggesting a more diverse view motivated us to develop an algorithm with improved gene-finding accuracy. We describe GeneMarkS-2, an ab initio algorithm that uses a model derived by self-training for finding species-specific (native) genes, along with an array of precomputed "heuristic" models designed to identify harder-to-detect genes (likely horizontally transferred). Importantly, we designed GeneMarkS-2 to identify several types of distinct sequence patterns (signals) involved in gene expression control, among them the patterns characteristic for leaderless transcription as well as noncanonical RBS patterns. To assess the accuracy of GeneMarkS-2, we used genes validated by COG (Clusters of Orthologous Groups) annotation, proteomics experiments, and N-terminal protein sequencing. We observed that GeneMarkS-2 performed better on average in all accuracy measures when compared with the current state-of-the-art gene prediction tools. Furthermore, the screening of ∼5000 representative prokaryotic genomes made by GeneMarkS-2 predicted frequent leaderless transcription in both archaea and bacteria. We also observed that the RBS sites in some species with leadered transcription did not necessarily exhibit the Shine-Dalgarno consensus. The modeling of different types of sequence motifs regulating gene expression prompted a division of prokaryotic genomes into five categories with distinct sequence patterns around the gene starts. © 2018 Lomsadze et al.; Published by Cold Spring Harbor Laboratory Press.
NASA Astrophysics Data System (ADS)
Wijesingha, J. S. J.; Deshapriya, N. L.; Samarakoon, L.
2015-04-01
Billions of people in the world depend on rice as a staple food and as an income-generating crop. Asia is the leader in rice cultivation and it is necessary to maintain an up-to-date rice-related database to ensure food security as well as economic development. This study investigates general applicability of high temporal resolution Moderate Resolution Imaging Spectroradiometer (MODIS) 250m gridded vegetation product for monitoring rice crop growth, mapping rice crop acreage and analyzing crop yield, at the province-level. The MODIS 250m Normalized Difference Vegetation Index (NDVI) and Enhanced Vegetation Index (EVI) time series data, field data and crop calendar information were utilized in this research in Sa Kaeo Province, Thailand. The following methodology was used: (1) data pre-processing and rice plant growth analysis using Vegetation Indices (VI) (2) extraction of rice acreage and start-of-season dates from VI time series data (3) accuracy assessment, and (4) yield analysis with MODIS VI. The results show a direct relationship between rice plant height and MODIS VI. The crop calendar information and the smoothed NDVI time series with Whittaker Smoother gave high rice acreage estimation (with 86% area accuracy and 75% classification accuracy). Point level yield analysis showed that the MODIS EVI is highly correlated with rice yield and yield prediction using maximum EVI in the rice cycle predicted yield with an average prediction error 4.2%. This study shows the immense potential of MODIS gridded vegetation product for keeping an up-to-date Geographic Information System of rice cultivation.
NASA Astrophysics Data System (ADS)
Hu, Xiaogang; Rymer, William Z.; Suresh, Nina L.
2014-04-01
Objective. The aim of this study is to assess the accuracy of a surface electromyogram (sEMG) motor unit (MU) decomposition algorithm during low levels of muscle contraction. Approach. A two-source method was used to verify the accuracy of the sEMG decomposition system, by utilizing simultaneous intramuscular and surface EMG recordings from the human first dorsal interosseous muscle recorded during isometric trapezoidal force contractions. Spike trains from each recording type were decomposed independently utilizing two different algorithms, EMGlab and dEMG decomposition algorithms. The degree of agreement of the decomposed spike timings was assessed for three different segments of the EMG signals, corresponding to specified regions in the force task. A regression analysis was performed to examine whether certain properties of the sEMG and force signal can predict the decomposition accuracy. Main results. The average accuracy of successful decomposition among the 119 MUs that were common to both intramuscular and surface records was approximately 95%, and the accuracy was comparable between the different segments of the sEMG signals (i.e., force ramp-up versus steady state force versus combined). The regression function between the accuracy and properties of sEMG and force signals revealed that the signal-to-noise ratio of the action potential and stability in the action potential records were significant predictors of the surface decomposition accuracy. Significance. The outcomes of our study confirm the accuracy of the sEMG decomposition algorithm during low muscle contraction levels and provide confidence in the overall validity of the surface dEMG decomposition algorithm.
Prediction of Land use changes using CA in GIS Environment
NASA Astrophysics Data System (ADS)
Kiavarz Moghaddam, H.; Samadzadegan, F.
2009-04-01
Urban growth is a typical self-organized system that results from the interaction between three defined systems; developed urban system, natural non-urban system and planned urban system. Urban growth simulation for an artificial city is carried out first. It evaluates a number of urban sprawl parameters including the size and shape of neighborhood besides testing different types of constraints on urban growth simulation. The results indicate that circular-type neighborhood shows smoother but faster urban growth as compared to nine-cell Moore neighborhood. Cellular Automata is proved to be very efficient in simulating the urban growth simulation over time. The strength of this technology comes from the ability of urban modeler to implement the growth simulation model, evaluating the results and presenting the output simulation results in visual interpretable environment. Artificial city simulation model provides an excellent environment to test a number of simulation parameters such as neighborhood influence on growth results and constraints role in driving the urban growth .Also, CA rules definition is critical stage in simulating the urban growth pattern in a close manner to reality. CA urban growth simulation and prediction of Tehran over the last four decades succeeds to simulate specified tested growth years at a high accuracy level. Some real data layer have been used in the CA simulation training phase such as 1995 while others used for testing the prediction results such as 2002. Tuning the CA growth rules is important through comparing the simulated images with the real data to obtain feedback. An important notice is that CA rules need also to be modified over time to adapt to the urban growth pattern. The evaluation method used on region basis has its advantage in covering the spatial distribution component of the urban growth process. Next step includes running the developed CA simulation over classified raster data for three years in a developed ArcGIS extention. A set of crisp rules are defined and calibrated based on real urban growth pattern. Uncertainty analysis is performed to evaluate the accuracy of the simulated results as compared to the historical real data. Evaluation shows promising results represented by the high average accuracies achieved. The average accuracy for the predicted growth images 1964 and 2002 is over 80 %. Modifying CA growth rules over time to match the growth pattern changes is important to obtain accurate simulation. This modification is based on the urban growth relationship for Tehran over time as can be seen in the historical raster data. The feedback obtained from comparing the simulated and real data is crucial in identifying the optimal set of CA rules for reliable simulation and calibrating growth steps.
Fast depth decision for HEVC inter prediction based on spatial and temporal correlation
NASA Astrophysics Data System (ADS)
Chen, Gaoxing; Liu, Zhenyu; Ikenaga, Takeshi
2016-07-01
High efficiency video coding (HEVC) is a video compression standard that outperforms the predecessor H.264/AVC by doubling the compression efficiency. To enhance the compression accuracy, the partition sizes ranging is from 4x4 to 64x64 in HEVC. However, the manifold partition sizes dramatically increase the encoding complexity. This paper proposes a fast depth decision based on spatial and temporal correlation. Spatial correlation utilize the code tree unit (CTU) Splitting information and temporal correlation utilize the motion vector predictor represented CTU in inter prediction to determine the maximum depth in each CTU. Experimental results show that the proposed method saves about 29.1% of the original processing time with 0.9% of BD-bitrate increase on average.
Environmentally-driven ensemble forecasts of dengue fever
NASA Astrophysics Data System (ADS)
Yamana, T. K.; Shaman, J. L.
2017-12-01
Dengue fever is a mosquito-borne viral disease prevalent in the tropics and subtropics, with an estimated 2.5 billion people at risk of transmission. In many areas where dengue is found, disease transmission is seasonal but prone to high inter-annual variability with occasional severe epidemics. Predicting and preparing for periods of higher than average transmission remains a significant public health challenge. Recently, we developed a framework for forecasting dengue incidence using an dynamical model of disease transmission coupled with observational data of dengue cases using data-assimilation methods. Here, we investigate the use of environmental data to drive the disease transmission model. We produce retrospective forecasts of the timing and severity of dengue outbreaks, and quantify forecast predictive accuracy.
A water balance model to estimate flow through the Old and Middle River corridor
Andrews, Stephen W.; Gross, Edward S.; Hutton, Paul H.
2016-01-01
We applied a water balance model to predict tidally averaged (subtidal) flows through the Old River and Middle River corridor in the Sacramento–San Joaquin Delta. We reviewed the dynamics that govern subtidal flows and water levels and adopted a simplified representation. In this water balance approach, we estimated ungaged flows as linear functions of known (or specified) flows. We assumed that subtidal storage within the control volume varies because of fortnightly variation in subtidal water level, Delta inflow, and barometric pressure. The water balance model effectively predicts subtidal flows and approaches the accuracy of a 1–D Delta hydrodynamic model. We explore the potential to improve the approach by representing more complex dynamics and identify possible future improvements.
Determining GPS average performance metrics
NASA Technical Reports Server (NTRS)
Moore, G. V.
1995-01-01
Analytic and semi-analytic methods are used to show that users of the GPS constellation can expect performance variations based on their location. Specifically, performance is shown to be a function of both altitude and latitude. These results stem from the fact that the GPS constellation is itself non-uniform. For example, GPS satellites are over four times as likely to be directly over Tierra del Fuego than over Hawaii or Singapore. Inevitable performance variations due to user location occur for ground, sea, air and space GPS users. These performance variations can be studied in an average relative sense. A semi-analytic tool which symmetrically allocates GPS satellite latitude belt dwell times among longitude points is used to compute average performance metrics. These metrics include average number of GPS vehicles visible, relative average accuracies in the radial, intrack and crosstrack (or radial, north/south, east/west) directions, and relative average PDOP or GDOP. The tool can be quickly changed to incorporate various user antenna obscuration models and various GPS constellation designs. Among other applications, tool results can be used in studies to: predict locations and geometries of best/worst case performance, design GPS constellations, determine optimal user antenna location and understand performance trends among various users.
Pretreatment data is highly predictive of liver chemistry signals in clinical trials.
Cai, Zhaohui; Bresell, Anders; Steinberg, Mark H; Silberg, Debra G; Furlong, Stephen T
2012-01-01
The goal of this retrospective analysis was to assess how well predictive models could determine which patients would develop liver chemistry signals during clinical trials based on their pretreatment (baseline) information. Based on data from 24 late-stage clinical trials, classification models were developed to predict liver chemistry outcomes using baseline information, which included demographics, medical history, concomitant medications, and baseline laboratory results. Predictive models using baseline data predicted which patients would develop liver signals during the trials with average validation accuracy around 80%. Baseline levels of individual liver chemistry tests were most important for predicting their own elevations during the trials. High bilirubin levels at baseline were not uncommon and were associated with a high risk of developing biochemical Hy's law cases. Baseline γ-glutamyltransferase (GGT) level appeared to have some predictive value, but did not increase predictability beyond using established liver chemistry tests. It is possible to predict which patients are at a higher risk of developing liver chemistry signals using pretreatment (baseline) data. Derived knowledge from such predictions may allow proactive and targeted risk management, and the type of analysis described here could help determine whether new biomarkers offer improved performance over established ones.
A comparison of LOD and UT1-UTC forecasts by different combined prediction techniques
NASA Astrophysics Data System (ADS)
Kosek, W.; Kalarus, M.; Johnson, T. J.; Wooden, W. H.; McCarthy, D. D.; Popiński, W.
Stochastic prediction techniques including autocovariance, autoregressive, autoregressive moving average, and neural networks were applied to the UT1-UTC and Length of Day (LOD) International Earth Rotation and Reference Systems Servive (IERS) EOPC04 time series to evaluate the capabilities of each method. All known effects such as leap seconds and solid Earth zonal tides were first removed from the observed values of UT1-UTC and LOD. Two combination procedures were applied to predict the resulting LODR time series: 1) the combination of the least-squares (LS) extrapolation with a stochastic predition method, and 2) the combination of the discrete wavelet transform (DWT) filtering and a stochastic prediction method. The results of the combination of the LS extrapolation with different stochastic prediction techniques were compared with the results of the UT1-UTC prediction method currently used by the IERS Rapid Service/Prediction Centre (RS/PC). It was found that the prediction accuracy depends on the starting prediction epochs, and for the combined forecast methods, the mean prediction errors for 1 to about 70 days in the future are of the same order as those of the method used by the IERS RS/PC.
CONFOLD2: improved contact-driven ab initio protein structure modeling.
Adhikari, Badri; Cheng, Jianlin
2018-01-25
Contact-guided protein structure prediction methods are becoming more and more successful because of the latest advances in residue-residue contact prediction. To support contact-driven structure prediction, effective tools that can quickly build tertiary structural models of good quality from predicted contacts need to be developed. We develop an improved contact-driven protein modelling method, CONFOLD2, and study how it may be effectively used for ab initio protein structure prediction with predicted contacts as input. It builds models using various subsets of input contacts to explore the fold space under the guidance of a soft square energy function, and then clusters the models to obtain the top five models. CONFOLD2 obtains an average reconstruction accuracy of 0.57 TM-score for the 150 proteins in the PSICOV contact prediction dataset. When benchmarked on the CASP11 contacts predicted using CONSIP2 and CASP12 contacts predicted using Raptor-X, CONFOLD2 achieves a mean TM-score of 0.41 on both datasets. CONFOLD2 allows to quickly generate top five structural models for a protein sequence when its secondary structures and contacts predictions at hand. The source code of CONFOLD2 is publicly available at https://github.com/multicom-toolbox/CONFOLD2/ .
Prediction Models for Dynamic Demand Response
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aman, Saima; Frincu, Marc; Chelmis, Charalampos
2015-11-02
As Smart Grids move closer to dynamic curtailment programs, Demand Response (DR) events will become necessary not only on fixed time intervals and weekdays predetermined by static policies, but also during changing decision periods and weekends to react to real-time demand signals. Unique challenges arise in this context vis-a-vis demand prediction and curtailment estimation and the transformation of such tasks into an automated, efficient dynamic demand response (D 2R) process. While existing work has concentrated on increasing the accuracy of prediction models for DR, there is a lack of studies for prediction models for D 2R, which we address inmore » this paper. Our first contribution is the formal definition of D 2R, and the description of its challenges and requirements. Our second contribution is a feasibility analysis of very-short-term prediction of electricity consumption for D 2R over a diverse, large-scale dataset that includes both small residential customers and large buildings. Our third, and major contribution is a set of insights into the predictability of electricity consumption in the context of D 2R. Specifically, we focus on prediction models that can operate at a very small data granularity (here 15-min intervals), for both weekdays and weekends - all conditions that characterize scenarios for D 2R. We find that short-term time series and simple averaging models used by Independent Service Operators and utilities achieve superior prediction accuracy. We also observe that workdays are more predictable than weekends and holiday. Also, smaller customers have large variation in consumption and are less predictable than larger buildings. Key implications of our findings are that better models are required for small customers and for non-workdays, both of which are critical for D 2R. Also, prediction models require just few days’ worth of data indicating that small amounts of historical training data can be used to make reliable predictions, simplifying the complexity of big data challenge associated with D 2R.« less
Genomic Prediction of Gene Bank Wheat Landraces
Crossa, José; Jarquín, Diego; Franco, Jorge; Pérez-Rodríguez, Paulino; Burgueño, Juan; Saint-Pierre, Carolina; Vikram, Prashant; Sansaloni, Carolina; Petroli, Cesar; Akdemir, Deniz; Sneller, Clay; Reynolds, Matthew; Tattaris, Maria; Payne, Thomas; Guzman, Carlos; Peña, Roberto J.; Wenzl, Peter; Singh, Sukhwinder
2016-01-01
This study examines genomic prediction within 8416 Mexican landrace accessions and 2403 Iranian landrace accessions stored in gene banks. The Mexican and Iranian collections were evaluated in separate field trials, including an optimum environment for several traits, and in two separate environments (drought, D and heat, H) for the highly heritable traits, days to heading (DTH), and days to maturity (DTM). Analyses accounting and not accounting for population structure were performed. Genomic prediction models include genotype × environment interaction (G × E). Two alternative prediction strategies were studied: (1) random cross-validation of the data in 20% training (TRN) and 80% testing (TST) (TRN20-TST80) sets, and (2) two types of core sets, “diversity” and “prediction”, including 10% and 20%, respectively, of the total collections. Accounting for population structure decreased prediction accuracy by 15–20% as compared to prediction accuracy obtained when not accounting for population structure. Accounting for population structure gave prediction accuracies for traits evaluated in one environment for TRN20-TST80 that ranged from 0.407 to 0.677 for Mexican landraces, and from 0.166 to 0.662 for Iranian landraces. Prediction accuracy of the 20% diversity core set was similar to accuracies obtained for TRN20-TST80, ranging from 0.412 to 0.654 for Mexican landraces, and from 0.182 to 0.647 for Iranian landraces. The predictive core set gave similar prediction accuracy as the diversity core set for Mexican collections, but slightly lower for Iranian collections. Prediction accuracy when incorporating G × E for DTH and DTM for Mexican landraces for TRN20-TST80 was around 0.60, which is greater than without the G × E term. For Iranian landraces, accuracies were 0.55 for the G × E model with TRN20-TST80. Results show promising prediction accuracies for potential use in germplasm enhancement and rapid introgression of exotic germplasm into elite materials. PMID:27172218
Accuracy of Predicted Genomic Breeding Values in Purebred and Crossbred Pigs.
Hidalgo, André M; Bastiaansen, John W M; Lopes, Marcos S; Harlizius, Barbara; Groenen, Martien A M; de Koning, Dirk-Jan
2015-05-26
Genomic selection has been widely implemented in dairy cattle breeding when the aim is to improve performance of purebred animals. In pigs, however, the final product is a crossbred animal. This may affect the efficiency of methods that are currently implemented for dairy cattle. Therefore, the objective of this study was to determine the accuracy of predicted breeding values in crossbred pigs using purebred genomic and phenotypic data. A second objective was to compare the predictive ability of SNPs when training is done in either single or multiple populations for four traits: age at first insemination (AFI); total number of piglets born (TNB); litter birth weight (LBW); and litter variation (LVR). We performed marker-based and pedigree-based predictions. Within-population predictions for the four traits ranged from 0.21 to 0.72. Multi-population prediction yielded accuracies ranging from 0.18 to 0.67. Predictions across purebred populations as well as predicting genetic merit of crossbreds from their purebred parental lines for AFI performed poorly (not significantly different from zero). In contrast, accuracies of across-population predictions and accuracies of purebred to crossbred predictions for LBW and LVR ranged from 0.08 to 0.31 and 0.11 to 0.31, respectively. Accuracy for TNB was zero for across-population prediction, whereas for purebred to crossbred prediction it ranged from 0.08 to 0.22. In general, marker-based outperformed pedigree-based prediction across populations and traits. However, in some cases pedigree-based prediction performed similarly or outperformed marker-based prediction. There was predictive ability when purebred populations were used to predict crossbred genetic merit using an additive model in the populations studied. AFI was the only exception, indicating that predictive ability depends largely on the genetic correlation between PB and CB performance, which was 0.31 for AFI. Multi-population prediction was no better than within-population prediction for the purebred validation set. Accuracy of prediction was very trait-dependent. Copyright © 2015 Hidalgo et al.
The accuracy of Genomic Selection in Norwegian red cattle assessed by cross-validation.
Luan, Tu; Woolliams, John A; Lien, Sigbjørn; Kent, Matthew; Svendsen, Morten; Meuwissen, Theo H E
2009-11-01
Genomic Selection (GS) is a newly developed tool for the estimation of breeding values for quantitative traits through the use of dense markers covering the whole genome. For a successful application of GS, accuracy of the prediction of genomewide breeding value (GW-EBV) is a key issue to consider. Here we investigated the accuracy and possible bias of GW-EBV prediction, using real bovine SNP genotyping (18,991 SNPs) and phenotypic data of 500 Norwegian Red bulls. The study was performed on milk yield, fat yield, protein yield, first lactation mastitis traits, and calving ease. Three methods, best linear unbiased prediction (G-BLUP), Bayesian statistics (BayesB), and a mixture model approach (MIXTURE), were used to estimate marker effects, and their accuracy and bias were estimated by using cross-validation. The accuracies of the GW-EBV prediction were found to vary widely between 0.12 and 0.62. G-BLUP gave overall the highest accuracy. We observed a strong relationship between the accuracy of the prediction and the heritability of the trait. GW-EBV prediction for production traits with high heritability achieved higher accuracy and also lower bias than health traits with low heritability. To achieve a similar accuracy for the health traits probably more records will be needed.
Peak expiratory flow profiles delivered by pump systems. Limitations due to wave action.
Miller, M R; Jones, B; Xu, Y; Pedersen, O F; Quanjer, P H
2000-06-01
Pump systems are currently used to test the performance of both spirometers and peak expiratory flow (PEF) meters, but for certain flow profiles the input signal (i.e., requested profile) and the output profile can differ. We developed a mathematical model of wave action within a pump and compared the recorded flow profiles with both the input profiles and the output predicted by the model. Three American Thoracic Society (ATS) flow profiles and four artificial flow-versus-time profiles were delivered by a pump, first to a pneumotachograph (PT) on its own, then to the PT with a 32-cm upstream extension tube (which would favor wave action), and lastly with the PT in series with and immediately downstream to a mini-Wright peak flow meter. With the PT on its own, recorded flow for the seven profiles was 2.4 +/- 1.9% (mean +/- SD) higher than the pump's input flow, and similarly was 2.3 +/- 2.3% higher than the pump's output flow as predicted by the model. With the extension tube in place, the recorded flow was 6.6 +/- 6.4% higher than the input flow (range: 0.1 to 18.4%), but was only 1.2 +/- 2.5% higher than the output flow predicted by the model (range: -0.8 to 5.2%). With the mini-Wright meter in series, the flow recorded by the PT was on average 6.1 +/- 9.1% below the input flow (range: -23.8 to 2. 5%), but was only 0.6 +/- 3.3% above the pump's output flow predicted by the model (range: -5.5 to 3.9%). The mini-Wright meter's reading (corrected for its nonlinearity) was on average 1.3 +/- 3.6% below the model's predicted output flow (range: -9.0 to 1. 5%). The mini-Wright meter would be deemed outside ATS limits for accuracy for three of the seven profiles when compared with the pump's input PEF, but this would be true for only one profile when compared with the pump's output PEF as predicted by the model. Our study shows that the output flow from pump systems can differ from the input waveform depending on the operating configuration. This effect can be predicted with reasonable accuracy using a model based on nonsteady flow analysis that takes account of pressure wave reflections within pump systems.
NASA Astrophysics Data System (ADS)
Balachandran, Prasanna V.; Emery, Antoine A.; Gubernatis, James E.; Lookman, Turab; Wolverton, Chris; Zunger, Alex
2018-04-01
We apply machine learning (ML) methods to a database of 390 experimentally reported A B O3 compounds to construct two statistical models that predict possible new perovskite materials and possible new cubic perovskites. The first ML model classified the 390 compounds into 254 perovskites and 136 that are not perovskites with a 90% average cross-validation (CV) accuracy; the second ML model further classified the perovskites into 22 known cubic perovskites and 232 known noncubic perovskites with a 94% average CV accuracy. We find that the most effective chemical descriptors affecting our classification include largely geometric constructs such as the A and B Shannon ionic radii, the tolerance and octahedral factors, the A -O and B -O bond length, and the A and B Villars' Mendeleev numbers. We then construct an additional list of 625 A B O3 compounds assembled from charge conserving combinations of A and B atoms absent from our list of known compounds. Then, using the two ML models constructed on the known compounds, we predict that 235 of the 625 exist in a perovskite structure with a confidence greater than 50% and among them that 20 exist in the cubic structure (albeit, the latter with only ˜50 % confidence). We find that the new perovskites are most likely to occur when the A and B atoms are a lanthanide or actinide, when the A atom is an alkali, alkali earth, or late transition metal atom, or when the B atom is a p -block atom. We also compare the ML findings with the density functional theory calculations and convex hull analyses in the Open Quantum Materials Database (OQMD), which predicts the T =0 K ground-state stability of all the A B O3 compounds. We find that OQMD predicts 186 of 254 of the perovskites in the experimental database to be thermodynamically stable within 100 meV/atom of the convex hull and predicts 87 of the 235 ML-predicted perovskite compounds to be thermodynamically stable within 100 meV/atom of the convex hull, including 6 of these to be in cubic structures. We suggest these 87 as the most promising candidates for future experimental synthesis of novel perovskites.
NASA Technical Reports Server (NTRS)
Green, S.; Cochrane, D. L.; Truhlar, D. G.
1986-01-01
The utility of the energy-corrected sudden (ECS) scaling method is evaluated on the basis of how accurately it predicts the entire matrix of state-to-state rate constants, when the fundamental rate constants are independently known. It is shown for the case of Ar-CO collisions at 500 K that when a critical impact parameter is about 1.75-2.0 A, the ECS method yields excellent excited state rates on the average and has an rms error of less than 20 percent.
Lou, Wangchao; Wang, Xiaoqing; Chen, Fan; Chen, Yixiao; Jiang, Bo; Zhang, Hua
2014-01-01
Developing an efficient method for determination of the DNA-binding proteins, due to their vital roles in gene regulation, is becoming highly desired since it would be invaluable to advance our understanding of protein functions. In this study, we proposed a new method for the prediction of the DNA-binding proteins, by performing the feature rank using random forest and the wrapper-based feature selection using forward best-first search strategy. The features comprise information from primary sequence, predicted secondary structure, predicted relative solvent accessibility, and position specific scoring matrix. The proposed method, called DBPPred, used Gaussian naïve Bayes as the underlying classifier since it outperformed five other classifiers, including decision tree, logistic regression, k-nearest neighbor, support vector machine with polynomial kernel, and support vector machine with radial basis function. As a result, the proposed DBPPred yields the highest average accuracy of 0.791 and average MCC of 0.583 according to the five-fold cross validation with ten runs on the training benchmark dataset PDB594. Subsequently, blind tests on the independent dataset PDB186 by the proposed model trained on the entire PDB594 dataset and by other five existing methods (including iDNA-Prot, DNA-Prot, DNAbinder, DNABIND and DBD-Threader) were performed, resulting in that the proposed DBPPred yielded the highest accuracy of 0.769, MCC of 0.538, and AUC of 0.790. The independent tests performed by the proposed DBPPred on completely a large non-DNA binding protein dataset and two RNA binding protein datasets also showed improved or comparable quality when compared with the relevant prediction methods. Moreover, we observed that majority of the selected features by the proposed method are statistically significantly different between the mean feature values of the DNA-binding and the non DNA-binding proteins. All of the experimental results indicate that the proposed DBPPred can be an alternative perspective predictor for large-scale determination of DNA-binding proteins. PMID:24475169
EVALUATING RISK-PREDICTION MODELS USING DATA FROM ELECTRONIC HEALTH RECORDS.
Wang, L E; Shaw, Pamela A; Mathelier, Hansie M; Kimmel, Stephen E; French, Benjamin
2016-03-01
The availability of data from electronic health records facilitates the development and evaluation of risk-prediction models, but estimation of prediction accuracy could be limited by outcome misclassification, which can arise if events are not captured. We evaluate the robustness of prediction accuracy summaries, obtained from receiver operating characteristic curves and risk-reclassification methods, if events are not captured (i.e., "false negatives"). We derive estimators for sensitivity and specificity if misclassification is independent of marker values. In simulation studies, we quantify the potential for bias in prediction accuracy summaries if misclassification depends on marker values. We compare the accuracy of alternative prognostic models for 30-day all-cause hospital readmission among 4548 patients discharged from the University of Pennsylvania Health System with a primary diagnosis of heart failure. Simulation studies indicate that if misclassification depends on marker values, then the estimated accuracy improvement is also biased, but the direction of the bias depends on the direction of the association between markers and the probability of misclassification. In our application, 29% of the 1143 readmitted patients were readmitted to a hospital elsewhere in Pennsylvania, which reduced prediction accuracy. Outcome misclassification can result in erroneous conclusions regarding the accuracy of risk-prediction models.
Uyar, Asli; Bener, Ayse; Ciray, H Nadir
2015-08-01
Multiple embryo transfers in in vitro fertilization (IVF) treatment increase the number of successful pregnancies while elevating the risk of multiple gestations. IVF-associated multiple pregnancies exhibit significant financial, social, and medical implications. Clinicians need to decide the number of embryos to be transferred considering the tradeoff between successful outcomes and multiple pregnancies. To predict implantation outcome of individual embryos in an IVF cycle with the aim of providing decision support on the number of embryos transferred. Retrospective cohort study. Electronic health records of one of the largest IVF clinics in Turkey. The study data set included 2453 embryos transferred at day 2 or day 3 after intracytoplasmic sperm injection (ICSI). Each embryo was represented with 18 clinical features and a class label, +1 or -1, indicating positive and negative implantation outcomes, respectively. For each classifier tested, a model was developed using two-thirds of the data set, and prediction performance was evaluated on the remaining one-third of the samples using receiver operating characteristic (ROC) analysis. The training-testing procedure was repeated 10 times on randomly split (two-thirds to one-third) data. The relative predictive values of clinical input characteristics were assessed using information gain feature weighting and forward feature selection methods. The naïve Bayes model provided 80.4% accuracy, 63.7% sensitivity, and 17.6% false alarm rate in embryo-based implantation prediction. Multiple embryo implantations were predicted at a 63.8% sensitivity level. Predictions using the proposed model resulted in higher accuracy compared with expert judgment alone (on average, 75.7% and 60.1%, respectively). A machine learning-based decision support system would be useful in improving the success rates of IVF treatment. © The Author(s) 2014.
A Unified Model of Performance for Predicting the Effects of Sleep and Caffeine
Ramakrishnan, Sridhar; Wesensten, Nancy J.; Kamimori, Gary H.; Moon, James E.; Balkin, Thomas J.; Reifman, Jaques
2016-01-01
Study Objectives: Existing mathematical models of neurobehavioral performance cannot predict the beneficial effects of caffeine across the spectrum of sleep loss conditions, limiting their practical utility. Here, we closed this research gap by integrating a model of caffeine effects with the recently validated unified model of performance (UMP) into a single, unified modeling framework. We then assessed the accuracy of this new UMP in predicting performance across multiple studies. Methods: We hypothesized that the pharmacodynamics of caffeine vary similarly during both wakefulness and sleep, and that caffeine has a multiplicative effect on performance. Accordingly, to represent the effects of caffeine in the UMP, we multiplied a dose-dependent caffeine factor (which accounts for the pharmacokinetics and pharmacodynamics of caffeine) to the performance estimated in the absence of caffeine. We assessed the UMP predictions in 14 distinct laboratory- and field-study conditions, including 7 different sleep-loss schedules (from 5 h of sleep per night to continuous sleep loss for 85 h) and 6 different caffeine doses (from placebo to repeated 200 mg doses to a single dose of 600 mg). Results: The UMP accurately predicted group-average psychomotor vigilance task performance data across the different sleep loss and caffeine conditions (6% < error < 27%), yielding greater accuracy for mild and moderate sleep loss conditions than for more severe cases. Overall, accounting for the effects of caffeine resulted in improved predictions (after caffeine consumption) by up to 70%. Conclusions: The UMP provides the first comprehensive tool for accurate selection of combinations of sleep schedules and caffeine countermeasure strategies to optimize neurobehavioral performance. Citation: Ramakrishnan S, Wesensten NJ, Kamimori GH, Moon JE, Balkin TJ, Reifman J. A unified model of performance for predicting the effects of sleep and caffeine. SLEEP 2016;39(10):1827–1841. PMID:27397562
NASA Astrophysics Data System (ADS)
Slater, L. J.; Villarini, G.; Bradley, A.
2015-12-01
Model predictions of precipitation and temperature are crucial to mitigate the impacts of major flood and drought events through informed planning and response. However, the potential value and applicability of these predictions is inescapably linked to their forecast quality. The North-American Multi-Model Ensemble (NMME) is a multi-agency supported forecasting system for intraseasonal to interannual (ISI) climate predictions. Retrospective forecasts and real-time information are provided by each agency free of charge to facilitate collaborative research efforts for predicting future climate conditions as well as extreme weather events such as floods and droughts. Using the PRISM climate mapping system as the reference data, we examine the skill of five General Circulation Models (GCMs) from the NMME project to forecast monthly and seasonal precipitation and temperature over seven sub-regions of the continental United States. For each model, we quantify the seasonal accuracy of the forecast relative to observed precipitation using the mean square error skill score. This score is decomposed to assess the accuracy of the forecast in the absence of biases (potential skill), and in the presence of conditional (slope reliability) and unconditional (standardized mean error) biases. The quantification of these biases allows us to diagnose each model's skill over a full range temporal and spatial scales. Finally, we test each model's forecasting skill by evaluating its ability to predict extended periods of extreme temperature and precipitation that were conducive to 'billion-dollar' historical flood and drought events in different regions of the continental USA. The forecasting skill of the individual climate models is summarized and presented along with a discussion of different multi-model averaging techniques for predicting such events.
Classification of ECG signal with Support Vector Machine Method for Arrhythmia Detection
NASA Astrophysics Data System (ADS)
Turnip, Arjon; Ilham Rizqywan, M.; Kusumandari, Dwi E.; Turnip, Mardi; Sihombing, Poltak
2018-03-01
An electrocardiogram is a potential bioelectric record that occurs as a result of cardiac activity. QRS Detection with zero crossing calculation is one method that can precisely determine peak R of QRS wave as part of arrhythmia detection. In this paper, two experimental scheme (2 minutes duration with different activities: relaxed and, typing) were conducted. From the two experiments it were obtained: accuracy, sensitivity, and positive predictivity about 100% each for the first experiment and about 79%, 93%, 83% for the second experiment, respectively. Furthermore, the feature set of MIT-BIH arrhythmia using the support vector machine (SVM) method on the WEKA software is evaluated. By combining the available attributes on the WEKA algorithm, the result is constant since all classes of SVM goes to the normal class with average 88.49% accuracy.
Plume interference with space shuttle range safety signals
NASA Technical Reports Server (NTRS)
Boynton, F. P.; Rajaseknar, P. S.
1979-01-01
The computational procedure for signal propagation in the presence of an exhaust plume is presented. Comparisons with well-known analytic diffraction solutions indicate that accuracy suffers when mesh spacing is inadequate to resolve the first unobstructed Fresnel zone at the plume edge. Revisions to the procedure to improve its accuracy without requiring very large arrays are discussed. Comparisons to field measurements during a shuttle solid rocket motor (SRM) test firing suggest that the plume is sharper edged than one would expect on the basis of time averaged electron density calculations. The effects, both of revisions to the computational procedure and of allowing for a sharper plume edge, are to raise the signal level near tail aspect. The attenuation levels then predicted are still high enough to be of concern near SRM burnout for northerly launches of the space shuttle.
Fuzzy Temporal Logic Based Railway Passenger Flow Forecast Model
Dou, Fei; Jia, Limin; Wang, Li; Xu, Jie; Huang, Yakun
2014-01-01
Passenger flow forecast is of essential importance to the organization of railway transportation and is one of the most important basics for the decision-making on transportation pattern and train operation planning. Passenger flow of high-speed railway features the quasi-periodic variations in a short time and complex nonlinear fluctuation because of existence of many influencing factors. In this study, a fuzzy temporal logic based passenger flow forecast model (FTLPFFM) is presented based on fuzzy logic relationship recognition techniques that predicts the short-term passenger flow for high-speed railway, and the forecast accuracy is also significantly improved. An applied case that uses the real-world data illustrates the precision and accuracy of FTLPFFM. For this applied case, the proposed model performs better than the k-nearest neighbor (KNN) and autoregressive integrated moving average (ARIMA) models. PMID:25431586
Kiang, Richard; Adimi, Farida; Soika, Valerii; Nigro, Joseph; Singhasivanon, Pratap; Sirichaisinthop, Jeeraphat; Leemingsawat, Somjai; Apiwathnasorn, Chamnarn; Looareesuwan, Sornchai
2006-11-01
In many malarious regions malaria transmission roughly coincides with rainy seasons, which provide for more abundant larval habitats. In addition to precipitation, other meteorological and environmental factors may also influence malaria transmission. These factors can be remotely sensed using earth observing environmental satellites and estimated with seasonal climate forecasts. The use of remote sensing usage as an early warning tool for malaria epidemics have been broadly studied in recent years, especially for Africa, where the majority of the world's malaria occurs. Although the Greater Mekong Subregion (GMS), which includes Thailand and the surrounding countries, is an epicenter of multidrug resistant falciparum malaria, the meteorological and environmental factors affecting malaria transmissions in the GMS have not been examined in detail. In this study, the parasitological data used consisted of the monthly malaria epidemiology data at the provincial level compiled by the Thai Ministry of Public Health. Precipitation, temperature, relative humidity, and vegetation index obtained from both climate time series and satellite measurements were used as independent variables to model malaria. We used neural network methods, an artificial-intelligence technique, to model the dependency of malaria transmission on these variables. The average training accuracy of the neural network analysis for three provinces (Kanchanaburi, Mae Hong Son, and Tak) which are among the provinces most endemic for malaria, is 72.8% and the average testing accuracy is 62.9% based on the 1994-1999 data. A more complex neural network architecture resulted in higher training accuracy but also lower testing accuracy. Taking into account of the uncertainty regarding reported malaria cases, we divided the malaria cases into bands (classes) to compute training accuracy. Using the same neural network architecture on the 19 most endemic provinces for years 1994 to 2000, the mean training accuracy weighted by provincial malaria cases was 73%. Prediction of malaria cases for 2001 using neural networks trained for 1994-2000 gave a weighted accuracy of 53%. Because there was a significant decrease (31%) in the number of malaria cases in the 19 provinces from 2000 to 2001, the networks overestimated malaria transmissions. The decrease in transmission was not due to climatic or environmental changes. Thailand is a country with long borders. Migrant populations from the neighboring countries enlarge the human malaria reservoir because these populations have more limited access to health care. This issue also confounds the complexity of modeling malaria based on meteorological and environmental variables alone. In spite of the relatively low resolution of the data and the impact of migrant populations, we have uncovered a reasonably clear dependency of malaria on meteorological and environmental remote sensing variables. When other contextual determinants do not vary significantly, using neural network analysis along with remote sensing variables to predict malaria endemicity should be feasible.
Improved method for predicting protein fold patterns with ensemble classifiers.
Chen, W; Liu, X; Huang, Y; Jiang, Y; Zou, Q; Lin, C
2012-01-27
Protein folding is recognized as a critical problem in the field of biophysics in the 21st century. Predicting protein-folding patterns is challenging due to the complex structure of proteins. In an attempt to solve this problem, we employed ensemble classifiers to improve prediction accuracy. In our experiments, 188-dimensional features were extracted based on the composition and physical-chemical property of proteins and 20-dimensional features were selected using a coupled position-specific scoring matrix. Compared with traditional prediction methods, these methods were superior in terms of prediction accuracy. The 188-dimensional feature-based method achieved 71.2% accuracy in five cross-validations. The accuracy rose to 77% when we used a 20-dimensional feature vector. These methods were used on recent data, with 54.2% accuracy. Source codes and dataset, together with web server and software tools for prediction, are available at: http://datamining.xmu.edu.cn/main/~cwc/ProteinPredict.html.
Onukwugha, Eberechukwu; Qi, Ran; Jayasekera, Jinani; Zhou, Shujia
2016-02-01
Prognostic classification approaches are commonly used in clinical practice to predict health outcomes. However, there has been limited focus on use of the general approach for predicting costs. We applied a grouping algorithm designed for large-scale data sets and multiple prognostic factors to investigate whether it improves cost prediction among older Medicare beneficiaries diagnosed with prostate cancer. We analysed the linked Surveillance, Epidemiology and End Results (SEER)-Medicare data, which included data from 2000 through 2009 for men diagnosed with incident prostate cancer between 2000 and 2007. We split the survival data into two data sets (D0 and D1) of equal size. We trained the classifier of the Grouping Algorithm for Cancer Data (GACD) on D0 and tested it on D1. The prognostic factors included cancer stage, age, race and performance status proxies. We calculated the average difference between observed D1 costs and predicted D1 costs at 5 years post-diagnosis with and without the GACD. The sample included 110,843 men with prostate cancer. The median age of the sample was 74 years, and 10% were African American. The average difference (mean absolute error [MAE]) per person between the real and predicted total 5-year cost was US$41,525 (MAE US$41,790; 95% confidence interval [CI] US$41,421-42,158) with the GACD and US$43,113 (MAE US$43,639; 95% CI US$43,062-44,217) without the GACD. The 5-year cost prediction without grouping resulted in a sample overestimate of US$79,544,508. The grouping algorithm developed for complex, large-scale data improves the prediction of 5-year costs. The prediction accuracy could be improved by utilization of a richer set of prognostic factors and refinement of categorical specifications.
NASA Astrophysics Data System (ADS)
Sahu, Neelesh Kumar; Andhare, Atul B.; Andhale, Sandip; Raju Abraham, Roja
2018-04-01
Present work deals with prediction of surface roughness using cutting parameters along with in-process measured cutting force and tool vibration (acceleration) during turning of Ti-6Al-4V with cubic boron nitride (CBN) inserts. Full factorial design is used for design of experiments using cutting speed, feed rate and depth of cut as design variables. Prediction model for surface roughness is developed using response surface methodology with cutting speed, feed rate, depth of cut, resultant cutting force and acceleration as control variables. Analysis of variance (ANOVA) is performed to find out significant terms in the model. Insignificant terms are removed after performing statistical test using backward elimination approach. Effect of each control variables on surface roughness is also studied. Correlation coefficient (R2 pred) of 99.4% shows that model correctly explains the experiment results and it behaves well even when adjustment is made in factors or new factors are added or eliminated. Validation of model is done with five fresh experiments and measured forces and acceleration values. Average absolute error between RSM model and experimental measured surface roughness is found to be 10.2%. Additionally, an artificial neural network model is also developed for prediction of surface roughness. The prediction results of modified regression model are compared with ANN. It is found that RSM model and ANN (average absolute error 7.5%) are predicting roughness with more than 90% accuracy. From the results obtained it is found that including cutting force and vibration for prediction of surface roughness gives better prediction than considering only cutting parameters. Also, ANN gives better prediction over RSM models.
Improved Short-Term Clock Prediction Method for Real-Time Positioning.
Lv, Yifei; Dai, Zhiqiang; Zhao, Qile; Yang, Sheng; Zhou, Jinning; Liu, Jingnan
2017-06-06
The application of real-time precise point positioning (PPP) requires real-time precise orbit and clock products that should be predicted within a short time to compensate for the communication delay or data gap. Unlike orbit correction, clock correction is difficult to model and predict. The widely used linear model hardly fits long periodic trends with a small data set and exhibits significant accuracy degradation in real-time prediction when a large data set is used. This study proposes a new prediction model for maintaining short-term satellite clocks to meet the high-precision requirements of real-time clocks and provide clock extrapolation without interrupting the real-time data stream. Fast Fourier transform (FFT) is used to analyze the linear prediction residuals of real-time clocks. The periodic terms obtained through FFT are adopted in the sliding window prediction to achieve a significant improvement in short-term prediction accuracy. This study also analyzes and compares the accuracy of short-term forecasts (less than 3 h) by using different length observations. Experimental results obtained from International GNSS Service (IGS) final products and our own real-time clocks show that the 3-h prediction accuracy is better than 0.85 ns. The new model can replace IGS ultra-rapid products in the application of real-time PPP. It is also found that there is a positive correlation between the prediction accuracy and the short-term stability of on-board clocks. Compared with the accuracy of the traditional linear model, the accuracy of the static PPP using the new model of the 2-h prediction clock in N, E, and U directions is improved by about 50%. Furthermore, the static PPP accuracy of 2-h clock products is better than 0.1 m. When an interruption occurs in the real-time model, the accuracy of the kinematic PPP solution using 1-h clock prediction product is better than 0.2 m, without significant accuracy degradation. This model is of practical significance because it solves the problems of interruption and delay in data broadcast in real-time clock estimation and can meet the requirements of real-time PPP.
Genomic-Enabled Prediction in Maize Using Kernel Models with Genotype × Environment Interaction
Bandeira e Sousa, Massaine; Cuevas, Jaime; de Oliveira Couto, Evellyn Giselly; Pérez-Rodríguez, Paulino; Jarquín, Diego; Fritsche-Neto, Roberto; Burgueño, Juan; Crossa, Jose
2017-01-01
Multi-environment trials are routinely conducted in plant breeding to select candidates for the next selection cycle. In this study, we compare the prediction accuracy of four developed genomic-enabled prediction models: (1) single-environment, main genotypic effect model (SM); (2) multi-environment, main genotypic effects model (MM); (3) multi-environment, single variance G×E deviation model (MDs); and (4) multi-environment, environment-specific variance G×E deviation model (MDe). Each of these four models were fitted using two kernel methods: a linear kernel Genomic Best Linear Unbiased Predictor, GBLUP (GB), and a nonlinear kernel Gaussian kernel (GK). The eight model-method combinations were applied to two extensive Brazilian maize data sets (HEL and USP data sets), having different numbers of maize hybrids evaluated in different environments for grain yield (GY), plant height (PH), and ear height (EH). Results show that the MDe and the MDs models fitted with the Gaussian kernel (MDe-GK, and MDs-GK) had the highest prediction accuracy. For GY in the HEL data set, the increase in prediction accuracy of SM-GK over SM-GB ranged from 9 to 32%. For the MM, MDs, and MDe models, the increase in prediction accuracy of GK over GB ranged from 9 to 49%. For GY in the USP data set, the increase in prediction accuracy of SM-GK over SM-GB ranged from 0 to 7%. For the MM, MDs, and MDe models, the increase in prediction accuracy of GK over GB ranged from 34 to 70%. For traits PH and EH, gains in prediction accuracy of models with GK compared to models with GB were smaller than those achieved in GY. Also, these gains in prediction accuracy decreased when a more difficult prediction problem was studied. PMID:28455415
Genomic-Enabled Prediction in Maize Using Kernel Models with Genotype × Environment Interaction.
Bandeira E Sousa, Massaine; Cuevas, Jaime; de Oliveira Couto, Evellyn Giselly; Pérez-Rodríguez, Paulino; Jarquín, Diego; Fritsche-Neto, Roberto; Burgueño, Juan; Crossa, Jose
2017-06-07
Multi-environment trials are routinely conducted in plant breeding to select candidates for the next selection cycle. In this study, we compare the prediction accuracy of four developed genomic-enabled prediction models: (1) single-environment, main genotypic effect model (SM); (2) multi-environment, main genotypic effects model (MM); (3) multi-environment, single variance G×E deviation model (MDs); and (4) multi-environment, environment-specific variance G×E deviation model (MDe). Each of these four models were fitted using two kernel methods: a linear kernel Genomic Best Linear Unbiased Predictor, GBLUP (GB), and a nonlinear kernel Gaussian kernel (GK). The eight model-method combinations were applied to two extensive Brazilian maize data sets (HEL and USP data sets), having different numbers of maize hybrids evaluated in different environments for grain yield (GY), plant height (PH), and ear height (EH). Results show that the MDe and the MDs models fitted with the Gaussian kernel (MDe-GK, and MDs-GK) had the highest prediction accuracy. For GY in the HEL data set, the increase in prediction accuracy of SM-GK over SM-GB ranged from 9 to 32%. For the MM, MDs, and MDe models, the increase in prediction accuracy of GK over GB ranged from 9 to 49%. For GY in the USP data set, the increase in prediction accuracy of SM-GK over SM-GB ranged from 0 to 7%. For the MM, MDs, and MDe models, the increase in prediction accuracy of GK over GB ranged from 34 to 70%. For traits PH and EH, gains in prediction accuracy of models with GK compared to models with GB were smaller than those achieved in GY. Also, these gains in prediction accuracy decreased when a more difficult prediction problem was studied. Copyright © 2017 Bandeira e Sousa et al.
High fidelity chemistry and radiation modeling for oxy -- combustion scenarios
NASA Astrophysics Data System (ADS)
Abdul Sater, Hassan A.
To account for the thermal and chemical effects associated with the high CO2 concentrations in an oxy-combustion atmosphere, several refined gas-phase chemistry and radiative property models have been formulated for laminar to highly turbulent systems. This thesis examines the accuracies of several chemistry and radiative property models employed in computational fluid dynamic (CFD) simulations of laminar to transitional oxy-methane diffusion flames by comparing their predictions against experimental data. Literature review about chemistry and radiation modeling in oxy-combustion atmospheres considered turbulent systems where the predictions are impacted by the interplay and accuracies of the turbulence, radiation and chemistry models. Thus, by considering a laminar system we minimize the impact of turbulence and the uncertainties associated with turbulence models. In the first section of this thesis, an assessment and validation of gray and non-gray formulations of a recently proposed weighted-sum-of-gray gas model in oxy-combustion scenarios was undertaken. Predictions of gas, wall temperatures and flame lengths were in good agreement with experimental measurements. The temperature and flame length predictions were not sensitive to the radiative property model employed. However, there were significant variations between the gray and non-gray model radiant fraction predictions with the variations in general increasing with decrease in Reynolds numbers possibly attributed to shorter flames and steeper temperature gradients. The results of this section confirm that non-gray model predictions of radiative heat fluxes are more accurate than gray model predictions especially at steeper temperature gradients. In the second section, the accuracies of three gas-phase chemistry models were assessed by comparing their predictions against experimental measurements of temperature, species concentrations and flame lengths. The chemistry was modeled employing the Eddy Dissipation Concept (EDC) employing a 41-step detailed chemistry mechanism, the non-adiabatic extension of the equilibrium Probability Density Function (PDF) based mixture-fraction model and a two-step global finite rate chemistry model with modified rate constants proposed to work well in oxy-methane flames. Based on the results from this section, the equilibrium PDF model in conjunction with a high-fidelity non-gray model for the radiative properties of the gas-phase may be deemed as accurate to capture the major gas species concentrations, temperatures and flame lengths in oxy-methane flames. The third section examines the variations in radiative transfer predictions due to the choice of chemistry and gas-phase radiative property models. The radiative properties were estimated employing four weighted-sum-of-gray-gases models (WSGGM) that were formulated employing different spectroscopic/model databases. An average variation of 14 -- 17% in the wall incident radiative fluxes was observed between the EDC and equilibrium mixture fraction chemistry models, due to differences in their temperature predictions within the flame. One-dimensional, line-of-sight radiation calculations showed a 15 -- 25 % reduction in the directional radiative fluxes at lower axial locations as a result of ignoring radiation from CO and CH4. Under the constraints of fixed temperature and species distributions, the flame radiant power estimates and average wall incident radiative fluxes varied by nearly 60% and 11% respectively among the different WSGG models.
Assessing models of arsenic occurrence in drinking water from bedrock aquifers in New Hampshire
Andy, Caroline; Fahnestock, Maria Florencia; Lombard, Melissa; Hayes, Laura; Bryce, Julie; Ayotte, Joseph
2017-01-01
Three existing multivariate logistic regression models were assessed using new data to evaluate the capacity of the models to correctly predict the probability of groundwater arsenic concentrations exceeding the threshold values of 1, 5, and 10 micrograms per liter (µg/L) in New Hampshire, USA. A recently released testing dataset includes arsenic concentrations from groundwater samples collected in 2004–2005 from a mix of 367 public-supply and private domestic wells. The use of this dataset to test three existing logistic regression models demonstrated enhanced overall predictive accuracy for the 5 and 10 μg/L models. Overall accuracies of 54.8, 76.3, and 86.4 percent were reported for the 1, 5, and 10 μg/L models, respectively. The state was divided by counties into northwest and southeast regions. Regional differences in accuracy were identified; models had an average accuracy of 83.1 percent for the counties in the northwest and 63.7 percent in the southeast. This is most likely due to high model specificity in the northwest and regional differences in arsenic occurrence. Though these models have limitations, they allow for arsenic hazard assessment across the region. The introduction of well-type (public or private), well depth, and casing length as explanatory variables may be appropriate measures to improve model performance. Our findings indicate that the original models generalize to the testing dataset, and should continue to serve as an important vehicle of preventative public health that may be applied to other groundwater contaminants in New Hampshire.
Ley-Bosch, Carlos; Quintana-Suárez, Miguel A.
2018-01-01
Indoor localization estimation has become an attractive research topic due to growing interest in location-aware services. Many research works have proposed solving this problem by using wireless communication systems based on radiofrequency. Nevertheless, those approaches usually deliver an accuracy of up to two metres, since they are hindered by multipath propagation. On the other hand, in the last few years, the increasing use of light-emitting diodes in illumination systems has provided the emergence of Visible Light Communication technologies, in which data communication is performed by transmitting through the visible band of the electromagnetic spectrum. This brings a brand new approach to high accuracy indoor positioning because this kind of network is not affected by electromagnetic interferences and the received optical power is more stable than radio signals. Our research focus on to propose a fingerprinting indoor positioning estimation system based on neural networks to predict the device position in a 3D environment. Neural networks are an effective classification and predictive method. The localization system is built using a dataset of received signal strength coming from a grid of different points. From the these values, the position in Cartesian coordinates (x,y,z) is estimated. The use of three neural networks is proposed in this work, where each network is responsible for estimating the position by each axis. Experimental results indicate that the proposed system leads to substantial improvements to accuracy over the widely-used traditional fingerprinting methods, yielding an accuracy above 99% and an average error distance of 0.4 mm. PMID:29601525
NASA Astrophysics Data System (ADS)
Kisi, Ozgur; Parmar, Kulwinder Singh
2016-03-01
This study investigates the accuracy of least square support vector machine (LSSVM), multivariate adaptive regression splines (MARS) and M5 model tree (M5Tree) in modeling river water pollution. Various combinations of water quality parameters, Free Ammonia (AMM), Total Kjeldahl Nitrogen (TKN), Water Temperature (WT), Total Coliform (TC), Fecal Coliform (FC) and Potential of Hydrogen (pH) monitored at Nizamuddin, Delhi Yamuna River in India were used as inputs to the applied models. Results indicated that the LSSVM and MARS models had almost same accuracy and they performed better than the M5Tree model in modeling monthly chemical oxygen demand (COD). The average root mean square error (RMSE) of the LSSVM and M5Tree models was decreased by 1.47% and 19.1% using MARS model, respectively. Adding TC input to the models did not increase their accuracy in modeling COD while adding FC and pH inputs to the models generally decreased the accuracy. The overall results indicated that the MARS and LSSVM models could be successfully used in estimating monthly river water pollution level by using AMM, TKN and WT parameters as inputs.
NASA Astrophysics Data System (ADS)
Schöniger, Anneli; Wöhling, Thomas; Nowak, Wolfgang
2014-05-01
Bayesian model averaging ranks the predictive capabilities of alternative conceptual models based on Bayes' theorem. The individual models are weighted with their posterior probability to be the best one in the considered set of models. Finally, their predictions are combined into a robust weighted average and the predictive uncertainty can be quantified. This rigorous procedure does, however, not yet account for possible instabilities due to measurement noise in the calibration data set. This is a major drawback, since posterior model weights may suffer a lack of robustness related to the uncertainty in noisy data, which may compromise the reliability of model ranking. We present a new statistical concept to account for measurement noise as source of uncertainty for the weights in Bayesian model averaging. Our suggested upgrade reflects the limited information content of data for the purpose of model selection. It allows us to assess the significance of the determined posterior model weights, the confidence in model selection, and the accuracy of the quantified predictive uncertainty. Our approach rests on a brute-force Monte Carlo framework. We determine the robustness of model weights against measurement noise by repeatedly perturbing the observed data with random realizations of measurement error. Then, we analyze the induced variability in posterior model weights and introduce this "weighting variance" as an additional term into the overall prediction uncertainty analysis scheme. We further determine the theoretical upper limit in performance of the model set which is imposed by measurement noise. As an extension to the merely relative model ranking, this analysis provides a measure of absolute model performance. To finally decide, whether better data or longer time series are needed to ensure a robust basis for model selection, we resample the measurement time series and assess the convergence of model weights for increasing time series length. We illustrate our suggested approach with an application to model selection between different soil-plant models following up on a study by Wöhling et al. (2013). Results show that measurement noise compromises the reliability of model ranking and causes a significant amount of weighting uncertainty, if the calibration data time series is not long enough to compensate for its noisiness. This additional contribution to the overall predictive uncertainty is neglected without our approach. Thus, we strongly advertise to include our suggested upgrade in the Bayesian model averaging routine.
Outcome Prediction in Mathematical Models of Immune Response to Infection.
Mai, Manuel; Wang, Kun; Huber, Greg; Kirby, Michael; Shattuck, Mark D; O'Hern, Corey S
2015-01-01
Clinicians need to predict patient outcomes with high accuracy as early as possible after disease inception. In this manuscript, we show that patient-to-patient variability sets a fundamental limit on outcome prediction accuracy for a general class of mathematical models for the immune response to infection. However, accuracy can be increased at the expense of delayed prognosis. We investigate several systems of ordinary differential equations (ODEs) that model the host immune response to a pathogen load. Advantages of systems of ODEs for investigating the immune response to infection include the ability to collect data on large numbers of 'virtual patients', each with a given set of model parameters, and obtain many time points during the course of the infection. We implement patient-to-patient variability v in the ODE models by randomly selecting the model parameters from distributions with coefficients of variation v that are centered on physiological values. We use logistic regression with one-versus-all classification to predict the discrete steady-state outcomes of the system. We find that the prediction algorithm achieves near 100% accuracy for v = 0, and the accuracy decreases with increasing v for all ODE models studied. The fact that multiple steady-state outcomes can be obtained for a given initial condition, i.e. the basins of attraction overlap in the space of initial conditions, limits the prediction accuracy for v > 0. Increasing the elapsed time of the variables used to train and test the classifier, increases the prediction accuracy, while adding explicit external noise to the ODE models decreases the prediction accuracy. Our results quantify the competition between early prognosis and high prediction accuracy that is frequently encountered by clinicians.
Jamaludin, Amir; Lootus, Meelis; Kadir, Timor; Zisserman, Andrew; Urban, Jill; Battié, Michele C; Fairbank, Jeremy; McCall, Iain
2017-05-01
Investigation of the automation of radiological features from magnetic resonance images (MRIs) of the lumbar spine. To automate the process of grading lumbar intervertebral discs and vertebral bodies from MRIs. MR imaging is the most common imaging technique used in investigating low back pain (LBP). Various features of degradation, based on MRIs, are commonly recorded and graded, e.g., Modic change and Pfirrmann grading of intervertebral discs. Consistent scoring and grading is important for developing robust clinical systems and research. Automation facilitates this consistency and reduces the time of radiological analysis considerably and hence the expense. 12,018 intervertebral discs, from 2009 patients, were graded by a radiologist and were then used to train: (1) a system to detect and label vertebrae and discs in a given scan, and (2) a convolutional neural network (CNN) model that predicts several radiological gradings. The performance of the model, in terms of class average accuracy, was compared with the intra-observer class average accuracy of the radiologist. The detection system achieved 95.6% accuracy in terms of disc detection and labeling. The model is able to produce predictions of multiple pathological gradings that consistently matched those of the radiologist. The model identifies 'Evidence Hotspots' that are the voxels that most contribute to the degradation scores. Automation of radiological grading is now on par with human performance. The system can be beneficial in aiding clinical diagnoses in terms of objectivity of gradings and the speed of analysis. It can also draw the attention of a radiologist to regions of degradation. This objectivity and speed is an important stepping stone in the investigation of the relationship between MRIs and clinical diagnoses of back pain in large cohorts. Level 3.
Adjusted Clinical Groups: Predictive Accuracy for Medicaid Enrollees in Three States
Adams, E. Kathleen; Bronstein, Janet M.; Raskind-Hood, Cheryl
2002-01-01
Actuarial split-sample methods were used to assess predictive accuracy of adjusted clinical groups (ACGs) for Medicaid enrollees in Georgia, Mississippi (lagging in managed care penetration), and California. Accuracy for two non-random groups—high-cost and located in urban poor areas—was assessed. Measures for random groups were derived with and without short-term enrollees to assess the effect of turnover on predictive accuracy. ACGs improved predictive accuracy for high-cost conditions in all States, but did so only for those in Georgia's poorest urban areas. Higher and more unpredictable expenses of short-term enrollees moderated the predictive power of ACGs. This limitation was significant in Mississippi due in part, to that State's very high proportion of short-term enrollees. PMID:12545598
Singh, Jay P; Doll, Helen; Grann, Martin
2012-01-01
Objective To investigate the predictive validity of tools commonly used to assess the risk of violence, sexual, and criminal behaviour. Design Systematic review and tabular meta-analysis of replication studies following PRISMA guidelines. Data sources PsycINFO, Embase, Medline, and United States Criminal Justice Reference Service Abstracts. Review methods We included replication studies from 1 January 1995 to 1 January 2011 if they provided contingency data for the offending outcome that the tools were designed to predict. We calculated the diagnostic odds ratio, sensitivity, specificity, area under the curve, positive predictive value, negative predictive value, the number needed to detain to prevent one offence, as well as a novel performance indicator—the number safely discharged. We investigated potential sources of heterogeneity using metaregression and subgroup analyses. Results Risk assessments were conducted on 73 samples comprising 24 847 participants from 13 countries, of whom 5879 (23.7%) offended over an average of 49.6 months. When used to predict violent offending, risk assessment tools produced low to moderate positive predictive values (median 41%, interquartile range 27-60%) and higher negative predictive values (91%, 81-95%), and a corresponding median number needed to detain of 2 (2-4) and number safely discharged of 10 (4-18). Instruments designed to predict violent offending performed better than those aimed at predicting sexual or general crime. Conclusions Although risk assessment tools are widely used in clinical and criminal justice settings, their predictive accuracy varies depending on how they are used. They seem to identify low risk individuals with high levels of accuracy, but their use as sole determinants of detention, sentencing, and release is not supported by the current evidence. Further research is needed to examine their contribution to treatment and management. PMID:22833604
Fleischman, Ross J.; Lundquist, Mark; Jui, Jonathan; Newgard, Craig D.; Warden, Craig
2014-01-01
Objective To derive and validate a model that accurately predicts ambulance arrival time that could be implemented as a Google Maps web application. Methods This was a retrospective study of all scene transports in Multnomah County, Oregon, from January 1 through December 31, 2008. Scene and destination hospital addresses were converted to coordinates. ArcGIS Network Analyst was used to estimate transport times based on street network speed limits. We then created a linear regression model to improve the accuracy of these street network estimates using weather, patient characteristics, use of lights and sirens, daylight, and rush-hour intervals. The model was derived from a 50% sample and validated on the remainder. Significance of the covariates was determined by p < 0.05 for a t-test of the model coefficients. Accuracy was quantified by the proportion of estimates that were within 5 minutes of the actual transport times recorded by computer-aided dispatch. We then built a Google Maps-based web application to demonstrate application in real-world EMS operations. Results There were 48,308 included transports. Street network estimates of transport time were accurate within 5 minutes of actual transport time less than 16% of the time. Actual transport times were longer during daylight and rush-hour intervals and shorter with use of lights and sirens. Age under 18 years, gender, wet weather, and trauma system entry were not significant predictors of transport time. Our model predicted arrival time within 5 minutes 73% of the time. For lights and sirens transports, accuracy was within 5 minutes 77% of the time. Accuracy was identical in the validation dataset. Lights and sirens saved an average of 3.1 minutes for transports under 8.8 minutes, and 5.3 minutes for longer transports. Conclusions An estimate of transport time based only on a street network significantly underestimated transport times. A simple model incorporating few variables can predict ambulance time of arrival to the emergency department with good accuracy. This model could be linked to global positioning system data and an automated Google Maps web application to optimize emergency department resource use. Use of lights and sirens had a significant effect on transport times. PMID:23865736
Chen, L; Schenkel, F; Vinsky, M; Crews, D H; Li, C
2013-10-01
In beef cattle, phenotypic data that are difficult and/or costly to measure, such as feed efficiency, and DNA marker genotypes are usually available on a small number of animals of different breeds or populations. To achieve a maximal accuracy of genomic prediction using the phenotype and genotype data, strategies for forming a training population to predict genomic breeding values (GEBV) of the selection candidates need to be evaluated. In this study, we examined the accuracy of predicting GEBV for residual feed intake (RFI) based on 522 Angus and 395 Charolais steers genotyped on SNP with the Illumina Bovine SNP50 Beadchip for 3 training population forming strategies: within breed, across breed, and by pooling data from the 2 breeds (i.e., combined). Two other scenarios with the training and validation data split by birth year and by sire family within a breed were also investigated to assess the impact of genetic relationships on the accuracy of genomic prediction. Three statistical methods including the best linear unbiased prediction with the relationship matrix defined based on the pedigree (PBLUP), based on the SNP genotypes (GBLUP), and a Bayesian method (BayesB) were used to predict the GEBV. The results showed that the accuracy of the GEBV prediction was the highest when the prediction was within breed and when the validation population had greater genetic relationships with the training population, with a maximum of 0.58 for Angus and 0.64 for Charolais. The within-breed prediction accuracies dropped to 0.29 and 0.38, respectively, when the validation populations had a minimal pedigree link with the training population. When the training population of a different breed was used to predict the GEBV of the validation population, that is, across-breed genomic prediction, the accuracies were further reduced to 0.10 to 0.22, depending on the prediction method used. Pooling data from the 2 breeds to form the training population resulted in accuracies increased to 0.31 and 0.43, respectively, for the Angus and Charolais validation populations. The results suggested that the genetic relationship of selection candidates with the training population has a greater impact on the accuracy of GEBV using the Illumina Bovine SNP50 Beadchip. Pooling data from different breeds to form the training population will improve the accuracy of across breed genomic prediction for RFI in beef cattle.
3D tumor localization through real-time volumetric x-ray imaging for lung cancer radiotherapy.
Li, Ruijiang; Lewis, John H; Jia, Xun; Gu, Xuejun; Folkerts, Michael; Men, Chunhua; Song, William Y; Jiang, Steve B
2011-05-01
To evaluate an algorithm for real-time 3D tumor localization from a single x-ray projection image for lung cancer radiotherapy. Recently, we have developed an algorithm for reconstructing volumetric images and extracting 3D tumor motion information from a single x-ray projection [Li et al., Med. Phys. 37, 2822-2826 (2010)]. We have demonstrated its feasibility using a digital respiratory phantom with regular breathing patterns. In this work, we present a detailed description and a comprehensive evaluation of the improved algorithm. The algorithm was improved by incorporating respiratory motion prediction. The accuracy and efficiency of using this algorithm for 3D tumor localization were then evaluated on (1) a digital respiratory phantom, (2) a physical respiratory phantom, and (3) five lung cancer patients. These evaluation cases include both regular and irregular breathing patterns that are different from the training dataset. For the digital respiratory phantom with regular and irregular breathing, the average 3D tumor localization error is less than 1 mm which does not seem to be affected by amplitude change, period change, or baseline shift. On an NVIDIA Tesla C1060 graphic processing unit (GPU) card, the average computation time for 3D tumor localization from each projection ranges between 0.19 and 0.26 s, for both regular and irregular breathing, which is about a 10% improvement over previously reported results. For the physical respiratory phantom, an average tumor localization error below 1 mm was achieved with an average computation time of 0.13 and 0.16 s on the same graphic processing unit (GPU) card, for regular and irregular breathing, respectively. For the five lung cancer patients, the average tumor localization error is below 2 mm in both the axial and tangential directions. The average computation time on the same GPU card ranges between 0.26 and 0.34 s. Through a comprehensive evaluation of our algorithm, we have established its accuracy in 3D tumor localization to be on the order of 1 mm on average and 2 mm at 95 percentile for both digital and physical phantoms, and within 2 mm on average and 4 mm at 95 percentile for lung cancer patients. The results also indicate that the accuracy is not affected by the breathing pattern, be it regular or irregular. High computational efficiency can be achieved on GPU, requiring 0.1-0.3 s for each x-ray projection.
Predicting lettuce canopy photosynthesis with statistical and neural network models
NASA Technical Reports Server (NTRS)
Frick, J.; Precetti, C.; Mitchell, C. A.
1998-01-01
An artificial neural network (NN) and a statistical regression model were developed to predict canopy photosynthetic rates (Pn) for 'Waldman's Green' leaf lettuce (Latuca sativa L.). All data used to develop and test the models were collected for crop stands grown hydroponically and under controlled-environment conditions. In the NN and regression models, canopy Pn was predicted as a function of three independent variables: shootzone CO2 concentration (600 to 1500 micromoles mol-1), photosynthetic photon flux (PPF) (600 to 1100 micromoles m-2 s-1), and canopy age (10 to 20 days after planting). The models were used to determine the combinations of CO2 and PPF setpoints required each day to maintain maximum canopy Pn. The statistical model (a third-order polynomial) predicted Pn more accurately than the simple NN (a three-layer, fully connected net). Over an 11-day validation period, average percent difference between predicted and actual Pn was 12.3% and 24.6% for the statistical and NN models, respectively. Both models lost considerable accuracy when used to determine relatively long-range Pn predictions (> or = 6 days into the future).
Manoharan, Sujatha C; Ramakrishnan, Swaminathan
2009-10-01
In this work, prediction of forced expiratory volume in pulmonary function test, carried out using spirometry and neural networks is presented. The pulmonary function data were recorded from volunteers using commercial available flow volume spirometer in standard acquisition protocol. The Radial Basis Function neural networks were used to predict forced expiratory volume in 1 s (FEV1) from the recorded flow volume curves. The optimal centres of the hidden layer of radial basis function were determined by k-means clustering algorithm. The performance of the neural network model was evaluated by computing their prediction error statistics of average value, standard deviation, root mean square and their correlation with the true data for normal, restrictive and obstructive cases. Results show that the adopted neural networks are capable of predicting FEV1 in both normal and abnormal cases. Prediction accuracy was more in obstructive abnormality when compared to restrictive cases. It appears that this method of assessment is useful in diagnosing the pulmonary abnormalities with incomplete data and data with poor recording.
Link prediction in multiplex online social networks
NASA Astrophysics Data System (ADS)
Jalili, Mahdi; Orouskhani, Yasin; Asgari, Milad; Alipourfard, Nazanin; Perc, Matjaž
2017-02-01
Online social networks play a major role in modern societies, and they have shaped the way social relationships evolve. Link prediction in social networks has many potential applications such as recommending new items to users, friendship suggestion and discovering spurious connections. Many real social networks evolve the connections in multiple layers (e.g. multiple social networking platforms). In this article, we study the link prediction problem in multiplex networks. As an example, we consider a multiplex network of Twitter (as a microblogging service) and Foursquare (as a location-based social network). We consider social networks of the same users in these two platforms and develop a meta-path-based algorithm for predicting the links. The connectivity information of the two layers is used to predict the links in Foursquare network. Three classical classifiers (naive Bayes, support vector machines (SVM) and K-nearest neighbour) are used for the classification task. Although the networks are not highly correlated in the layers, our experiments show that including the cross-layer information significantly improves the prediction performance. The SVM classifier results in the best performance with an average accuracy of 89%.
Link prediction in multiplex online social networks.
Jalili, Mahdi; Orouskhani, Yasin; Asgari, Milad; Alipourfard, Nazanin; Perc, Matjaž
2017-02-01
Online social networks play a major role in modern societies, and they have shaped the way social relationships evolve. Link prediction in social networks has many potential applications such as recommending new items to users, friendship suggestion and discovering spurious connections. Many real social networks evolve the connections in multiple layers (e.g. multiple social networking platforms). In this article, we study the link prediction problem in multiplex networks. As an example, we consider a multiplex network of Twitter (as a microblogging service) and Foursquare (as a location-based social network). We consider social networks of the same users in these two platforms and develop a meta-path-based algorithm for predicting the links. The connectivity information of the two layers is used to predict the links in Foursquare network. Three classical classifiers (naive Bayes, support vector machines (SVM) and K-nearest neighbour) are used for the classification task. Although the networks are not highly correlated in the layers, our experiments show that including the cross-layer information significantly improves the prediction performance. The SVM classifier results in the best performance with an average accuracy of 89%.
Prediction of metabolites of epoxidation reaction in MetaTox.
Rudik, A V; Dmitriev, A V; Bezhentsev, V M; Lagunin, A A; Filimonov, D A; Poroikov, V V
2017-10-01
Biotransformation is a process of the chemical modifications which may lead to the reactive metabolites, in particular the epoxides. Epoxide reactive metabolites may cause the toxic effects. The prediction of such metabolites is important for drug development and ecotoxicology studies. Epoxides are formed by some oxidation reactions, usually catalysed by cytochromes P450, and represent a large class of three-membered cyclic ethers. Identification of molecules, which may be epoxidized, and indication of the specific location of epoxide functional group (which is called SOE - site of epoxidation) are important for prediction of epoxide metabolites. Datasets from 355 molecules and 615 reactions were created for training and validation. The prediction of SOE is based on a combination of LMNA (Labelled Multilevel Neighbourhood of Atom) descriptors and Bayesian-like algorithm implemented in PASS software and MetaTox web-service. The average invariant accuracy of prediction (AUC) calculated in leave-one-out and 20-fold cross-validation procedures is 0.9. Prediction of epoxide formation based on the created SAR model is included as the component of MetaTox web-service ( http://www.way2drug.com/mg ).
Research on Improved Depth Belief Network-Based Prediction of Cardiovascular Diseases
Zhang, Hongpo
2018-01-01
Quantitative analysis and prediction can help to reduce the risk of cardiovascular disease. Quantitative prediction based on traditional model has low accuracy. The variance of model prediction based on shallow neural network is larger. In this paper, cardiovascular disease prediction model based on improved deep belief network (DBN) is proposed. Using the reconstruction error, the network depth is determined independently, and unsupervised training and supervised optimization are combined. It ensures the accuracy of model prediction while guaranteeing stability. Thirty experiments were performed independently on the Statlog (Heart) and Heart Disease Database data sets in the UCI database. Experimental results showed that the mean of prediction accuracy was 91.26% and 89.78%, respectively. The variance of prediction accuracy was 5.78 and 4.46, respectively. PMID:29854369
The wire-mesh sensor as a two-phase flow meter
NASA Astrophysics Data System (ADS)
Shaban, H.; Tavoularis, S.
2015-01-01
A novel gas and liquid flow rate measurement method is proposed for use in vertical upward and downward gas-liquid pipe flows. This method is based on the analysis of the time history of area-averaged void fraction that is measured using a conductivity wire-mesh sensor (WMS). WMS measurements were collected in vertical upward and downward air-water flows in a pipe with an internal diameter of 32.5 mm at nearly atmospheric pressure. The relative frequencies and the power spectral density of area-averaged void fraction were calculated and used as representative properties. Independent features, extracted from these properties using Principal Component Analysis and Independent Component Analysis, were used as inputs to artificial neural networks, which were trained to give the gas and liquid flow rates as outputs. The present method was shown to be accurate for all four encountered flow regimes and for a wide range of flow conditions. Besides providing accurate predictions for steady flows, the method was also tested successfully in three flows with transient liquid flow rates. The method was augmented by the use of the cross-correlation function of area-averaged void fraction determined from the output of a dual WMS unit as an additional representative property, which was found to improve the accuracy of flow rate prediction.
Forecasting asthma-related hospital admissions in London using negative binomial models.
Soyiri, Ireneous N; Reidpath, Daniel D; Sarran, Christophe
2013-05-01
Health forecasting can improve health service provision and individual patient outcomes. Environmental factors are known to impact chronic respiratory conditions such as asthma, but little is known about the extent to which these factors can be used for forecasting. Using weather, air quality and hospital asthma admissions, in London (2005-2006), two related negative binomial models were developed and compared with a naive seasonal model. In the first approach, predictive forecasting models were fitted with 7-day averages of each potential predictor, and then a subsequent multivariable model is constructed. In the second strategy, an exhaustive search of the best fitting models between possible combinations of lags (0-14 days) of all the environmental effects on asthma admission was conducted. Three models were considered: a base model (seasonal effects), contrasted with a 7-day average model and a selected lags model (weather and air quality effects). Season is the best predictor of asthma admissions. The 7-day average and seasonal models were trivial to implement. The selected lags model was computationally intensive, but of no real value over much more easily implemented models. Seasonal factors can predict daily hospital asthma admissions in London, and there is a little evidence that additional weather and air quality information would add to forecast accuracy.
Zheng, Yang; Cai, Jing; Li, JianWen; Li, Bo; Lin, Runmao; Tian, Feng; Wang, XiaoLing; Wang, Jun
2010-01-01
A 10-fold BAC library for giant panda was constructed and nine BACs were selected to generate finish sequences. These BACs could be used as a validation resource for the de novo assembly accuracy of the whole genome shotgun sequencing reads of giant panda newly generated by the Illumina GA sequencing technology. Complete sanger sequencing, assembly, annotation and comparative analysis were carried out on the selected BACs of a joint length 878 kb. Homologue search and de novo prediction methods were used to annotate genes and repeats. Twelve protein coding genes were predicted, seven of which could be functionally annotated. The seven genes have an average gene size of about 41 kb, an average coding size of about 1.2 kb and an average exon number of 6 per gene. Besides, seven tRNA genes were found. About 27 percent of the BAC sequence is composed of repeats. A phylogenetic tree was constructed using neighbor-join algorithm across five species, including giant panda, human, dog, cat and mouse, which reconfirms dog as the most related species to giant panda. Our results provide detailed sequence and structure information for new genes and repeats of giant panda, which will be helpful for further studies on the giant panda.
The accuracy of new wheelchair users' predictions about their future wheelchair use.
Hoenig, Helen; Griffiths, Patricia; Ganesh, Shanti; Caves, Kevin; Harris, Frances
2012-06-01
This study examined the accuracy of new wheelchair user predictions about their future wheelchair use. This was a prospective cohort study of 84 community-dwelling veterans provided a new manual wheelchair. The association between predicted and actual wheelchair use was strong at 3 mos (ϕ coefficient = 0.56), with 90% of those who anticipated using the wheelchair at 3 mos still using it (i.e., positive predictive value = 0.96) and 60% of those who anticipated not using it indeed no longer using the wheelchair (i.e., negative predictive value = 0.60, overall accuracy = 0.92). Predictive accuracy diminished over time, with overall accuracy declining from 0.92 at 3 mos to 0.66 at 6 mos. At all time points, and for all types of use, patients better predicted use as opposed to disuse, with correspondingly higher positive than negative predictive values. Accuracy of prediction of use in specific indoor and outdoor locations varied according to location. This study demonstrates the importance of better understanding the potential mismatch between the anticipated and actual patterns of wheelchair use. The findings suggest that users can be relied upon to accurately predict their basic wheelchair-related needs in the short-term. Further exploration is needed to identify characteristics that will aid users and their providers in more accurately predicting mobility needs for the long-term.
NASA Astrophysics Data System (ADS)
Chang, Jina; Tian, Zhen; Lu, Weiguo; Gu, Xuejun; Chen, Mingli; Jiang, Steve B.
2017-05-01
Multi-atlas segmentation (MAS) has been widely used to automate the delineation of organs at risk (OARs) for radiotherapy. Label fusion is a crucial step in MAS to cope with the segmentation variabilities among multiple atlases. However, most existing label fusion methods do not consider the potential dosimetric impact of the segmentation result. In this proof-of-concept study, we propose a novel geometry-dosimetry label fusion method for MAS-based OAR auto-contouring, which evaluates the segmentation performance in terms of both geometric accuracy and the dosimetric impact of the segmentation accuracy on the resulting treatment plan. Differently from the original selective and iterative method for performance level estimation (SIMPLE), we evaluated and rejected the atlases based on both Dice similarity coefficient and the predicted error of the dosimetric endpoints. The dosimetric error was predicted using our previously developed geometry-dosimetry model. We tested our method in MAS-based rectum auto-contouring on 20 prostate cancer patients. The accuracy in the rectum sub-volume close to the planning tumor volume (PTV), which was found to be a dosimetric sensitive region of the rectum, was greatly improved. The mean absolute distance between the obtained contour and the physician-drawn contour in the rectum sub-volume 2 mm away from PTV was reduced from 3.96 mm to 3.36 mm on average for the 20 patients, with the maximum decrease found to be from 9.22 mm to 3.75 mm. We also compared the dosimetric endpoints predicted for the obtained contours with those predicted for the physician-drawn contours. Our method led to smaller dosimetric endpoint errors than the SIMPLE method in 15 patients, comparable errors in 2 patients, and slightly larger errors in 3 patients. These results indicated the efficacy of our method in terms of considering both geometric accuracy and dosimetric impact during label fusion. Our algorithm can be applied to different tumor sites and radiation treatments, given a specifically trained geometry-dosimetry model.
Chang, Jina; Tian, Zhen; Lu, Weiguo; Gu, Xuejun; Chen, Mingli; Jiang, Steve B
2017-05-07
Multi-atlas segmentation (MAS) has been widely used to automate the delineation of organs at risk (OARs) for radiotherapy. Label fusion is a crucial step in MAS to cope with the segmentation variabilities among multiple atlases. However, most existing label fusion methods do not consider the potential dosimetric impact of the segmentation result. In this proof-of-concept study, we propose a novel geometry-dosimetry label fusion method for MAS-based OAR auto-contouring, which evaluates the segmentation performance in terms of both geometric accuracy and the dosimetric impact of the segmentation accuracy on the resulting treatment plan. Differently from the original selective and iterative method for performance level estimation (SIMPLE), we evaluated and rejected the atlases based on both Dice similarity coefficient and the predicted error of the dosimetric endpoints. The dosimetric error was predicted using our previously developed geometry-dosimetry model. We tested our method in MAS-based rectum auto-contouring on 20 prostate cancer patients. The accuracy in the rectum sub-volume close to the planning tumor volume (PTV), which was found to be a dosimetric sensitive region of the rectum, was greatly improved. The mean absolute distance between the obtained contour and the physician-drawn contour in the rectum sub-volume 2 mm away from PTV was reduced from 3.96 mm to 3.36 mm on average for the 20 patients, with the maximum decrease found to be from 9.22 mm to 3.75 mm. We also compared the dosimetric endpoints predicted for the obtained contours with those predicted for the physician-drawn contours. Our method led to smaller dosimetric endpoint errors than the SIMPLE method in 15 patients, comparable errors in 2 patients, and slightly larger errors in 3 patients. These results indicated the efficacy of our method in terms of considering both geometric accuracy and dosimetric impact during label fusion. Our algorithm can be applied to different tumor sites and radiation treatments, given a specifically trained geometry-dosimetry model.
Vanderhoof, Melanie; Distler, Hayley; Mendiola, Di Ana; Lang, Megan
2017-01-01
Natural variability in surface-water extent and associated characteristics presents a challenge to gathering timely, accurate information, particularly in environments that are dominated by small and/or forested wetlands. This study mapped inundation extent across the Upper Choptank River Watershed on the Delmarva Peninsula, occurring within both Maryland and Delaware. We integrated six quad-polarized Radarsat-2 images, Worldview-3 imagery, and an enhanced topographic wetness index in a random forest model. Output maps were filtered using light detection and ranging (lidar)-derived depressions to maximize the accuracy of forested inundation extent. Overall accuracy within the integrated and filtered model was 94.3%, with 5.5% and 6.0% errors of omission and commission for inundation, respectively. Accuracy of inundation maps obtained using Radarsat-2 alone were likely detrimentally affected by less than ideal angles of incidence and recent precipitation, but were likely improved by targeting the period between snowmelt and leaf-out for imagery collection. Across the six Radarsat-2 dates, filtering inundation outputs by lidar-derived depressions slightly elevated errors of omission for water (+1.0%), but decreased errors of commission (−7.8%), resulting in an average increase of 5.4% in overall accuracy. Depressions were derived from lidar datasets collected under both dry and average wetness conditions. Although antecedent wetness conditions influenced the abundance and total area mapped as depression, the two versions of the depression datasets showed a similar ability to reduce error in the inundation maps. Accurate mapping of surface water is critical to predicting and monitoring the effect of human-induced change and interannual variability on water quantity and quality.
Technical Report: TG-142 compliant and comprehensive quality assurance tests for respiratory gating
DOE Office of Scientific and Technical Information (OSTI.GOV)
Woods, Kyle; Rong, Yi, E-mail: yrong@ucdavis.edu
2015-11-15
Purpose: To develop and establish a comprehensive gating commissioning and quality assurance procedure in compliance with TG-142. Methods: Eight Varian TrueBeam Linacs were used for this study. Gating commissioning included an end-to-end test and baseline establishment. The end-to-end test was performed using a CIRS dynamic thoracic phantom with a moving cylinder inside the lung, which was used for carrying both optically simulated luminescence detectors (OSLDs) and Gafchromic EBT2 films while the target is moving, for a point dose check and 2D profile check. In addition, baselines were established for beam-on temporal delay and calibration of the surrogate, for both megavoltagemore » (MV) and kilovoltage (kV) beams. A motion simulation device (MotionSim) was used to provide periodic motion on a platform, in synchronizing with a surrogate motion. The overall accuracy and uncertainties were analyzed and compared. Results: The OSLD readings were within 5% compared to the planned dose (within measurement uncertainty) for both phase and amplitude gated deliveries. Film results showed less than 3% agreement to the predicted dose with a standard sinusoid motion. The gate-on temporal accuracy was averaged at 139 ± 10 ms for MV beams and 92 ± 11 ms for kV beams. The temporal delay of the surrogate motion depends on the motion speed and was averaged at 54.6 ± 3.1 ms for slow, 24.9 ± 2.9 ms for intermediate, and 23.0 ± 20.1 ms for fast speed. Conclusions: A comprehensive gating commissioning procedure was introduced for verifying the output accuracy and establishing the temporal accuracy baselines with respiratory gating. The baselines are needed for routine quality assurance tests, as suggested by TG-142.« less
Estimating Terrestrial Carbon Exchange from Space: How Often and How Well?
NASA Technical Reports Server (NTRS)
Knox, Robert G.; Hall, Forrest G.; Huemmrich, Karl F.; Gervin, Janette C.
2003-01-01
Data from a new space mission measuring integrated light-use efficiency could provide a breakthrough in understanding of global carbon, water, and energy dynamics, and greatly improve the accuracy of model predictions for terrestrial carbon cycles and climate. Over the past decade, Gamon and others have shown that changes in photo-protective pigments are sensitive indicators of declines in light-use efficiency of plants and plant canopies. The requirements for integrated diurnal measurements from space need to be defined, before a space mission can be formulated successfully using this concept. We used towerbased CO2 flux data as idealized proxies for remote measurements, examining their sampling properties. Thousands of half-hourly CO2 flux measurements are needed before their average begins to converge on an average annual net CO2 exchange. Estimates of daily integrated fluxes (i.e., diurnal curves) are more statistically efficient, especially if the spacing between measured days is quasiregular, rather than random. Using a few measurements per day one can distinguish among days with different net CO2 exchanges. Fluxes sampled between mid-morning to mid-afternoon are more diagnostic than early morning or late afternoon measurements. Similar results (correlation >0.935) were obtained using 2 measurements per day with high accuracy ([:plusmn:]5%), 3 measurements per day with medium accuracy ([:plusmn:] 10%), or 5 measurements per day at lower accuracy ([:plusmn:]20%). An observatory in a geosynchronous or near-geosynchronous orbit could provide appropriate observations, as could a multi-satellite constellation in polar orbits, but there is a potential trade-off between the required number of observations per day and quality of each observation.
Technical Report: TG-142 compliant and comprehensive quality assurance tests for respiratory gating.
Woods, Kyle; Rong, Yi
2015-11-01
To develop and establish a comprehensive gating commissioning and quality assurance procedure in compliance with TG-142. Eight Varian TrueBeam Linacs were used for this study. Gating commissioning included an end-to-end test and baseline establishment. The end-to-end test was performed using a CIRS dynamic thoracic phantom with a moving cylinder inside the lung, which was used for carrying both optically simulated luminescence detectors (OSLDs) and Gafchromic EBT2 films while the target is moving, for a point dose check and 2D profile check. In addition, baselines were established for beam-on temporal delay and calibration of the surrogate, for both megavoltage (MV) and kilovoltage (kV) beams. A motion simulation device (MotionSim) was used to provide periodic motion on a platform, in synchronizing with a surrogate motion. The overall accuracy and uncertainties were analyzed and compared. The OSLD readings were within 5% compared to the planned dose (within measurement uncertainty) for both phase and amplitude gated deliveries. Film results showed less than 3% agreement to the predicted dose with a standard sinusoid motion. The gate-on temporal accuracy was averaged at 139±10 ms for MV beams and 92±11 ms for kV beams. The temporal delay of the surrogate motion depends on the motion speed and was averaged at 54.6±3.1 ms for slow, 24.9±2.9 ms for intermediate, and 23.0±20.1 ms for fast speed. A comprehensive gating commissioning procedure was introduced for verifying the output accuracy and establishing the temporal accuracy baselines with respiratory gating. The baselines are needed for routine quality assurance tests, as suggested by TG-142.
Recollection can be Weak and Familiarity can be Strong
Ingram, Katherine M.; Mickes, Laura; Wixted, John T.
2012-01-01
The Remember/Know procedure is widely used to investigate recollection and familiarity in recognition memory, but almost all of the results obtained using that procedure can be readily accommodated by a unidimensional model based on signal-detection theory. The unidimensional model holds that Remember judgments reflect strong memories (associated with high confidence, high accuracy, and fast reaction times), whereas Know judgments reflect weaker memories (associated with lower confidence, lower accuracy, and slower reaction times). Although this is invariably true on average, a new two-dimensional account (the Continuous Dual-Process model) suggests that Remember judgments made with low confidence should be associated with lower old/new accuracy, but higher source accuracy, than Know judgments made with high confidence. We tested this prediction – and found evidence to support it – using a modified Remember/Know procedure in which participants were first asked to indicate a degree of recollection-based or familiarity-based confidence for each word presented on a recognition test and were then asked to recollect the color (red or blue) and screen location (top or bottom) associated with the word at study. For familiarity-based decisions, old/new accuracy increased with old/new confidence, but source accuracy did not (suggesting that stronger old/new memory was supported by higher degrees of familiarity). For recollection-based decisions, both old/new accuracy and source accuracy increased with old/new confidence (suggesting that stronger old/new memory was supported by higher degrees of recollection). These findings suggest that recollection and familiarity are continuous processes and that participants can indicate which process mainly contributed to their recognition decisions. PMID:21967320
Beniczky, Sándor; Lantz, Göran; Rosenzweig, Ivana; Åkeson, Per; Pedersen, Birthe; Pinborg, Lars H; Ziebell, Morten; Jespersen, Bo; Fuglsang-Frederiksen, Anders
2013-10-01
Although precise identification of the seizure-onset zone is an essential element of presurgical evaluation, source localization of ictal electroencephalography (EEG) signals has received little attention. The aim of our study was to estimate the accuracy of source localization of rhythmic ictal EEG activity using a distributed source model. Source localization of rhythmic ictal scalp EEG activity was performed in 42 consecutive cases fulfilling inclusion criteria. The study was designed according to recommendations for studies on diagnostic accuracy (STARD). The initial ictal EEG signals were selected using a standardized method, based on frequency analysis and voltage distribution of the ictal activity. A distributed source model-local autoregressive average (LAURA)-was used for the source localization. Sensitivity, specificity, and measurement of agreement (kappa) were determined based on the reference standard-the consensus conclusion of the multidisciplinary epilepsy surgery team. Predictive values were calculated from the surgical outcome of the operated patients. To estimate the clinical value of the ictal source analysis, we compared the likelihood ratios of concordant and discordant results. Source localization was performed blinded to the clinical data, and before the surgical decision. Reference standard was available for 33 patients. The ictal source localization had a sensitivity of 70% and a specificity of 76%. The mean measurement of agreement (kappa) was 0.61, corresponding to substantial agreement (95% confidence interval (CI) 0.38-0.84). Twenty patients underwent resective surgery. The positive predictive value (PPV) for seizure freedom was 92% and the negative predictive value (NPV) was 43%. The likelihood ratio was nine times higher for the concordant results, as compared with the discordant ones. Source localization of rhythmic ictal activity using a distributed source model (LAURA) for the ictal EEG signals selected with a standardized method is feasible in clinical practice and has a good diagnostic accuracy. Our findings encourage clinical neurophysiologists assessing ictal EEGs to include this method in their armamentarium. Wiley Periodicals, Inc. © 2013 International League Against Epilepsy.
NASA Astrophysics Data System (ADS)
Brown, Richard J. C.; Butterfield, David M.; Goddard, Sharon L.; Hussain, Delwar; Quincey, Paul G.; Fuller, Gary W.
2016-02-01
Many monitoring stations used to assess ambient air concentrations of pollutants regulated by European air quality directives suffer from being expensive to establish and operate, and from their location being based on the results of macro-scale modelling exercises rather than measurement assessments in candidate locations. To address these issues for the monitoring of polycyclic aromatic hydrocarbons (PAHs), this study has used data from a combination of the ultraviolet and infrared channels of aethalometers (referred to as UV BC), operated as part of the UK Black Carbon Network, as a surrogate measurement. This has established a relationship between concentrations of the PAH regulated in Europe, benzo[a]pyrene (B[a]P), and the UV BC signal at locations where these measurements have been made together from 2008 to 2014. This relationship was observed to be non-linear. Relationships for individual site types were used to predict measured concentrations with, on average, 1.5% accuracy across all annual averages, and with only 1 in 36 of the predicted annual averages deviating from the measured annual average by more than the B[a]P data quality objective for uncertainty of 50% (at -65%, with the range excluding this value between + 38% and -37%). These relationships were then used to predict B[a]P concentrations at stations where UV BC measurement are made, but PAH measurements are not. This process produced results which reflected expectations based on knowledge of the pollution climate at these stations gained from the measurements of other air quality networks, or from nearby stations. The influence of domestic solid fuel heating was clear using this approach which highlighted Strabane in Northern Ireland as a station likely to be in excess of the air quality directive target value for B[a]P.
Wind power application research on the fusion of the determination and ensemble prediction
NASA Astrophysics Data System (ADS)
Lan, Shi; Lina, Xu; Yuzhu, Hao
2017-07-01
The fused product of wind speed for the wind farm is designed through the use of wind speed products of ensemble prediction from the European Centre for Medium-Range Weather Forecasts (ECMWF) and professional numerical model products on wind power based on Mesoscale Model5 (MM5) and Beijing Rapid Update Cycle (BJ-RUC), which are suitable for short-term wind power forecasting and electric dispatch. The single-valued forecast is formed by calculating the different ensemble statistics of the Bayesian probabilistic forecasting representing the uncertainty of ECMWF ensemble prediction. Using autoregressive integrated moving average (ARIMA) model to improve the time resolution of the single-valued forecast, and based on the Bayesian model averaging (BMA) and the deterministic numerical model prediction, the optimal wind speed forecasting curve and the confidence interval are provided. The result shows that the fusion forecast has made obvious improvement to the accuracy relative to the existing numerical forecasting products. Compared with the 0-24 h existing deterministic forecast in the validation period, the mean absolute error (MAE) is decreased by 24.3 % and the correlation coefficient (R) is increased by 12.5 %. In comparison with the ECMWF ensemble forecast, the MAE is reduced by 11.7 %, and R is increased 14.5 %. Additionally, MAE did not increase with the prolongation of the forecast ahead.
United3D: a protein model quality assessment program that uses two consensus based methods.
Terashi, Genki; Oosawa, Makoto; Nakamura, Yuuki; Kanou, Kazuhiko; Takeda-Shitaka, Mayuko
2012-01-01
In protein structure prediction, such as template-based modeling and free modeling (ab initio modeling), the step that assesses the quality of protein models is very important. We have developed a model quality assessment (QA) program United3D that uses an optimized clustering method and a simple Cα atom contact-based potential. United3D automatically estimates the quality scores (Qscore) of predicted protein models that are highly correlated with the actual quality (GDT_TS). The performance of United3D was tested in the ninth Critical Assessment of protein Structure Prediction (CASP9) experiment. In CASP9, United3D showed the lowest average loss of GDT_TS (5.3) among the QA methods participated in CASP9. This result indicates that the performance of United3D to identify the high quality models from the models predicted by CASP9 servers on 116 targets was best among the QA methods that were tested in CASP9. United3D also produced high average Pearson correlation coefficients (0.93) and acceptable Kendall rank correlation coefficients (0.68) between the Qscore and GDT_TS. This performance was competitive with the other top ranked QA methods that were tested in CASP9. These results indicate that United3D is a useful tool for selecting high quality models from many candidate model structures provided by various modeling methods. United3D will improve the accuracy of protein structure prediction.
NASA Astrophysics Data System (ADS)
Puc, Małgorzata
2012-03-01
Birch pollen is one of the main causes of allergy during spring and early summer in northern and central Europe. The aim of this study was to create a forecast model that can accurately predict daily average concentrations of Betula sp. pollen grains in the atmosphere of Szczecin, Poland. In order to achieve this, a novel data analysis technique—artificial neural networks (ANN)—was used. Sampling was carried out using a volumetric spore trap of the Hirst design in Szczecin during 2003-2009. Spearman's rank correlation analysis revealed that humidity had a strong negative correlation with Betula pollen concentrations. Significant positive correlations were observed for maximum temperature, average temperature, minimum temperature and precipitation. The ANN resulted in multilayer perceptrons 366 8: 2928-7-1:1, time series prediction was of quite high accuracy (SD Ratio between 0.3 and 0.5, R > 0.85). Direct comparison of the observed and calculated values confirmed good performance of the model and its ability to recreate most of the variation.
A Smoluchowski model of crystallization dynamics of small colloidal clusters
NASA Astrophysics Data System (ADS)
Beltran-Villegas, Daniel J.; Sehgal, Ray M.; Maroudas, Dimitrios; Ford, David M.; Bevan, Michael A.
2011-10-01
We investigate the dynamics of colloidal crystallization in a 32-particle system at a fixed value of interparticle depletion attraction that produces coexisting fluid and solid phases. Free energy landscapes (FELs) and diffusivity landscapes (DLs) are obtained as coefficients of 1D Smoluchowski equations using as order parameters either the radius of gyration or the average crystallinity. FELs and DLs are estimated by fitting the Smoluchowski equations to Brownian dynamics (BD) simulations using either linear fits to locally initiated trajectories or global fits to unbiased trajectories using Bayesian inference. The resulting FELs are compared to Monte Carlo Umbrella Sampling results. The accuracy of the FELs and DLs for modeling colloidal crystallization dynamics is evaluated by comparing mean first-passage times from BD simulations with analytical predictions using the FEL and DL models. While the 1D models accurately capture dynamics near the free energy minimum fluid and crystal configurations, predictions near the transition region are not quantitatively accurate. A preliminary investigation of ensemble averaged 2D order parameter trajectories suggests that 2D models are required to capture crystallization dynamics in the transition region.
Predicting Culex pipiens/restuans population dynamics by interval lagged weather data
2013-01-01
Background Culex pipiens/restuans mosquitoes are important vectors for a variety of arthropod borne viral infections. In this study, the associations between 20 years of mosquito capture data and the time lagged environmental quantities daytime length, temperature, precipitation, relative humidity and wind speed were used to generate a predictive model for the population dynamics of this vector species. Methods Mosquito population in the study area was represented by averaged time series of mosquitos counts captured at 6 sites in Cook County (Illinois, USA). Cross-correlation maps (CCMs) were compiled to investigate the association between mosquito abundances and environmental quantities. The results obtained from the CCMs were incorporated into a Poisson regression to generate a predictive model. To optimize the predictive model the time lags obtained from the CCMs were adjusted using a genetic algorithm. Results CCMs for weekly data showed a highly positive correlation of mosquito abundances with daytime length 4 to 5 weeks prior to capture (quantified by a Spearman rank order correlation of rS = 0.898) and with temperature during 2 weeks prior to capture (rS = 0.870). Maximal negative correlations were found for wind speed averaged over 3 week prior to capture (rS = −0.621). Cx. pipiens/restuans population dynamics was predicted by integrating the CCM results in Poisson regression models. They were used to simulate the average seasonal cycle of the mosquito abundance. Verification with observations resulted in a correlation of rS = 0.899 for daily and rS = 0.917 for weekly data. Applying the optimized models to the entire 20-years time series also resulted in a suitable fit with rS = 0.876 for daily and rS = 0.899 for weekly data. Conclusions The study demonstrates the application of interval lagged weather data to predict mosquito abundances with a feasible accuracy, especially when related to weekly Cx. pipiens/restuans populations. PMID:23634763
DOE Office of Scientific and Technical Information (OSTI.GOV)
Solovyev, V.V.; Salamov, A.A.; Lawrence, C.B.
1994-12-31
Discriminant analysis is applied to the problem of recognition 5`-, internal and 3`-exons in human DNA sequences. Specific recognition functions were developed for revealing exons of particular types. The method based on a splice site prediction algorithm that uses the linear Fisher discriminant to combine the information about significant triplet frequencies of various functional parts of splice site regions and preferences of oligonucleotide in protein coding and nation regions. The accuracy of our splice site recognition function is about 97%. A discriminant function for 5`-exon prediction includes hexanucleotide composition of upstream region, triplet composition around the ATG codon, ORF codingmore » potential, donor splice site potential and composition of downstream introit region. For internal exon prediction, we combine in a discriminant function the characteristics describing the 5`- intron region, donor splice site, coding region, acceptor splice site and Y-intron region for each open reading frame flanked by GT and AG base pairs. The accuracy of precise internal exon recognition on a test set of 451 exon and 246693 pseudoexon sequences is 77% with a specificity of 79% and a level of pseudoexon ORF prediction of 99.96%. The recognition quality computed at the level of individual nucleotides is 89%, for exon sequences and 98% for intron sequences. A discriminant function for 3`-exon prediction includes octanucleolide composition of upstream nation region, triplet composition around the stop codon, ORF coding potential, acceptor splice site potential and hexanucleotide composition of downstream region. We unite these three discriminant functions in exon predicting program FEX (find exons). FEX exactly predicts 70% of 1016 exons from the test of 181 complete genes with specificity 73%, and 89% exons are exactly or partially predicted. On the average, 85% of nucleotides were predicted accurately with specificity 91%.« less
Global skin colour prediction from DNA.
Walsh, Susan; Chaitanya, Lakshmi; Breslin, Krystal; Muralidharan, Charanya; Bronikowska, Agnieszka; Pospiech, Ewelina; Koller, Julia; Kovatsi, Leda; Wollstein, Andreas; Branicki, Wojciech; Liu, Fan; Kayser, Manfred
2017-07-01
Human skin colour is highly heritable and externally visible with relevance in medical, forensic, and anthropological genetics. Although eye and hair colour can already be predicted with high accuracies from small sets of carefully selected DNA markers, knowledge about the genetic predictability of skin colour is limited. Here, we investigate the skin colour predictive value of 77 single-nucleotide polymorphisms (SNPs) from 37 genetic loci previously associated with human pigmentation using 2025 individuals from 31 global populations. We identified a minimal set of 36 highly informative skin colour predictive SNPs and developed a statistical prediction model capable of skin colour prediction on a global scale. Average cross-validated prediction accuracies expressed as area under the receiver-operating characteristic curve (AUC) ± standard deviation were 0.97 ± 0.02 for Light, 0.83 ± 0.11 for Dark, and 0.96 ± 0.03 for Dark-Black. When using a 5-category, this resulted in 0.74 ± 0.05 for Very Pale, 0.72 ± 0.03 for Pale, 0.73 ± 0.03 for Intermediate, 0.87±0.1 for Dark, and 0.97 ± 0.03 for Dark-Black. A comparative analysis in 194 independent samples from 17 populations demonstrated that our model outperformed a previously proposed 10-SNP-classifier approach with AUCs rising from 0.79 to 0.82 for White, comparable at the intermediate level of 0.63 and 0.62, respectively, and a large increase from 0.64 to 0.92 for Black. Overall, this study demonstrates that the chosen DNA markers and prediction model, particularly the 5-category level; allow skin colour predictions within and between continental regions for the first time, which will serve as a valuable resource for future applications in forensic and anthropologic genetics.
Development and Validation of a New Air Carrier Block Time Prediction Model and Methodology
NASA Astrophysics Data System (ADS)
Litvay, Robyn Olson
Commercial airline operations rely on predicted block times as the foundation for critical, successive decisions that include fuel purchasing, crew scheduling, and airport facility usage planning. Small inaccuracies in the predicted block times have the potential to result in huge financial losses, and, with profit margins for airline operations currently almost nonexistent, potentially negate any possible profit. Although optimization techniques have resulted in many models targeting airline operations, the challenge of accurately predicting and quantifying variables months in advance remains elusive. The objective of this work is the development of an airline block time prediction model and methodology that is practical, easily implemented, and easily updated. Research was accomplished, and actual U.S., domestic, flight data from a major airline was utilized, to develop a model to predict airline block times with increased accuracy and smaller variance in the actual times from the predicted times. This reduction in variance represents tens of millions of dollars (U.S.) per year in operational cost savings for an individual airline. A new methodology for block time prediction is constructed using a regression model as the base, as it has both deterministic and probabilistic components, and historic block time distributions. The estimation of the block times for commercial, domestic, airline operations requires a probabilistic, general model that can be easily customized for a specific airline’s network. As individual block times vary by season, by day, and by time of day, the challenge is to make general, long-term estimations representing the average, actual block times while minimizing the variation. Predictions of block times for the third quarter months of July and August of 2011 were calculated using this new model. The resulting, actual block times were obtained from the Research and Innovative Technology Administration, Bureau of Transportation Statistics (Airline On-time Performance Data, 2008-2011) for comparison and analysis. Future block times are shown to be predicted with greater accuracy, without exception and network-wide, for a major, U.S., domestic airline.
Automatic gender detection of dream reports: A promising approach.
Wong, Christina; Amini, Reza; De Koninck, Joseph
2016-08-01
A computer program was developed in an attempt to differentiate the dreams of males from females. Hypothesized gender predictors were based on previous literature concerning both dream content and written language features. Dream reports from home-collected dream diaries of 100 male (144 dreams) and 100 female (144 dreams) adolescent Anglophones were matched for equal length. They were first scored with the Hall and Van de Castle (HVDC) scales and quantified using DreamSAT. Two male and two female undergraduate students were asked to read all dreams and predict the dreamer's gender. They averaged a pairwise percent correct gender prediction of 75.8% (κ=0.516), while the Automatic Analysis showed that the computer program's accuracy was 74.5% (κ=0.492), both of which were higher than chance of 50% (κ=0.00). The prediction levels were maintained when dreams containing obvious gender identifiers were eliminated and integration of HVDC scales did not improve prediction. Copyright © 2016 Elsevier Inc. All rights reserved.
Large eddy simulation of fine water sprays: comparative analysis of two models and computer codes
NASA Astrophysics Data System (ADS)
Tsoy, A. S.; Snegirev, A. Yu.
2015-09-01
The model and the computer code FDS, albeit widely used in engineering practice to predict fire development, is not sufficiently validated for fire suppression by fine water sprays. In this work, the effect of numerical resolution of the large scale turbulent pulsations on the accuracy of predicted time-averaged spray parameters is evaluated. Comparison of the simulation results obtained with the two versions of the model and code, as well as that of the predicted and measured radial distributions of the liquid flow rate revealed the need to apply monotonic and yet sufficiently accurate discrete approximations of the convective terms. Failure to do so delays jet break-up, otherwise induced by large turbulent eddies, thereby excessively focuses the predicted flow around its axis. The effect of the pressure drop in the spray nozzle is also examined, and its increase has shown to cause only weak increase of the evaporated fraction and vapor concentration despite the significant increase of flow velocity.
Koehlinger, Keegan; Oleson, Jacob; McCreery, Ryan; Moeller, Mary Pat
2015-01-01
Purpose Production accuracy of s-related morphemes was examined in 3-year-olds with mild-to-severe hearing loss, focusing on perceptibility, articulation, and input frequency. Method Morphemes with /s/, /z/, and /ɪz/ as allomorphs (plural, possessive, third-person singular –s, and auxiliary and copula “is”) were analyzed from language samples gathered from 51 children (ages: 2;10 [years;months] to 3;8) who are hard of hearing (HH), all of whom used amplification. Articulation was assessed via the Goldman-Fristoe Test of Articulation–Second Edition, and monomorphemic word final /s/ and /z/ production. Hearing was measured via better ear pure tone average, unaided Speech Intelligibility Index, and aided sensation level of speech at 4 kHz. Results Unlike results reported for children with normal hearing, the group of children who are HH correctly produced the /ɪz/ allomorph more than /s/ and /z/ allomorphs. Relative accuracy levels for morphemes and sentence positions paralleled those of children with normal hearing. The 4-kHz sensation level scores (but not the better ear pure tone average or Speech Intelligibility Index), the Goldman-Fristoe Test of Articulation–Second Edition, and word final s/z use all predicted accuracy. Conclusions Both better hearing and higher articulation scores are associated with improved morpheme production, and better aided audibility in the high frequencies and word final production of s/z are particularly critical for morpheme acquisition in children who are HH. PMID:25650750
NASA Astrophysics Data System (ADS)
Xing, Xuguang; Ma, Xiaoyi
2018-06-01
The maximum upward flux ( E max) is a control condition for the development of groundwater evaporation models, which can be predicted through the Gardner model. A high-precision E max prediction helps to improve irrigation practice. When using the Gardner model, it has widely been accepted to ignore parameter b (a soil-water constant) for model simplification. However, this may affect the prediction accuracy; therefore, how parameter b affects E max requires detailed investigation. An indoor one-dimensional soil-column evaporation experiment was conducted to observe E max in the presence of a water table of depth 50 cm. The study consisted of 13 treatments based on four solutes and three concentrations in groundwater: KCl, NaCl, CaCl2, and MgCl2, with concentrations of 5, 30, and 100 g/L (salty groundwater); distilled water was used as a control treatment. Results indicated that for the experimental homogeneous loam, the average E max for the treatments supplied by salty groundwater was larger than that supplied by distilled water. Furthermore, during the prediction of the Gardner-model-based E max, ignoring b and including b always led to an overestimate and underestimate, respectively, compared to the observed E max. However, the maximum upward flux calculated including b (i.e. E bmax) had higher accuracy than that ignoring b for E max prediction. Moreover, the impact of ignoring b on E max gradually weakened with increasing b value. This research helps to reveal the groundwater evaporation mechanism.
Chipps, S.R.; Einfalt, L.M.; Wahl, David H.
2000-01-01
We measured growth of age-0 tiger muskellunge as a function of ration size (25, 50, 75, and 100% C(max))and water temperature (7.5-25??C) and compared experimental results with those predicted from a bioenergetic model. Discrepancies between actual and predicted values varied appreciably with water temperature and growth rate. On average, model output overestimated winter consumption rates at 10 and 7.5??C by 113 to 328%, respectively, whereas model predictions in summer and autumn (20-25??C) were in better agreement with actual values (4 to 58%). We postulate that variation in model performance was related to seasonal changes in esocid metabolic rate, which were not accounted for in the bioenergetic model. Moreover, accuracy of model output varied with feeding and growth rate of tiger muskellunge. The model performed poorly for fish fed low rations compared with estimates based on fish fed ad libitum rations and was attributed, in part, to the influence of growth rate on the accuracy of bioenergetic predictions. Based on modeling simulations, we found that errors associated with bioenergetic parameters had more influence on model output when growth rate was low, which is consistent with our observations. In addition, reduced conversion efficiency at high ration levels may contribute to variable model performance, thereby implying that waste losses should be modeled as a function of ration size for esocids. Our findings support earlier field tests of the esocid bioenergetic model and indicate that food consumption is generally overestimated by the model, particularly in winter months and for fish exhibiting low feeding and growth rates.
DRREP: deep ridge regressed epitope predictor.
Sher, Gene; Zhi, Degui; Zhang, Shaojie
2017-10-03
The ability to predict epitopes plays an enormous role in vaccine development in terms of our ability to zero in on where to do a more thorough in-vivo analysis of the protein in question. Though for the past decade there have been numerous advancements and improvements in epitope prediction, on average the best benchmark prediction accuracies are still only around 60%. New machine learning algorithms have arisen within the domain of deep learning, text mining, and convolutional networks. This paper presents a novel analytically trained and string kernel using deep neural network, which is tailored for continuous epitope prediction, called: Deep Ridge Regressed Epitope Predictor (DRREP). DRREP was tested on long protein sequences from the following datasets: SARS, Pellequer, HIV, AntiJen, and SEQ194. DRREP was compared to numerous state of the art epitope predictors, including the most recently published predictors called LBtope and DMNLBE. Using area under ROC curve (AUC), DRREP achieved a performance improvement over the best performing predictors on SARS (13.7%), HIV (8.9%), Pellequer (1.5%), and SEQ194 (3.1%), with its performance being matched only on the AntiJen dataset, by the LBtope predictor, where both DRREP and LBtope achieved an AUC of 0.702. DRREP is an analytically trained deep neural network, thus capable of learning in a single step through regression. By combining the features of deep learning, string kernels, and convolutional networks, the system is able to perform residue-by-residue prediction of continues epitopes with higher accuracy than the current state of the art predictors.
Beauchet, O; Noublanche, F; Simon, R; Sekhon, H; Chabot, J; Levinoff, E J; Kabeshova, A; Launay, C P
2018-01-01
Identification of the risk of falls is important among older inpatients. This study aims to examine performance criteria (i.e.; sensitivity, specificity, positive predictive value, negative predictive value and accuracy) for fall prediction resulting from a nurse assessment and an artificial neural networks (ANNs) analysis in older inpatients hospitalized in acute care medical wards. A total of 848 older inpatients (mean age, 83.0±7.2 years; 41.8% female) admitted to acute care medical wards in Angers University hospital (France) were included in this study using an observational prospective cohort design. Within 24 hours after admission of older inpatients, nurses performed a bedside clinical assessment. Participants were separated into non-fallers and fallers (i.e.; ≥1 fall during hospitalization stay). The analysis was conducted using three feed forward ANNs (multilayer perceptron [MLP], averaged neural network, and neuroevolution of augmenting topologies [NEAT]). Seventy-three (8.6%) participants fell at least once during their hospital stay. ANNs showed a high specificity, regardless of which ANN was used, and the highest value reported was with MLP (99.8%). In contrast, sensitivity was lower, with values ranging between 98.4 to 14.8%. MLP had the highest accuracy (99.7). Performance criteria for fall prediction resulting from a bedside nursing assessment and an ANNs analysis was associated with a high specificity but a low sensitivity, suggesting that this combined approach should be used more as a diagnostic test than a screening test when considering older inpatients in acute care medical ward.
Xiao, Bo; Imel, Zac E; Georgiou, Panayiotis G; Atkins, David C; Narayanan, Shrikanth S
2015-01-01
The technology for evaluating patient-provider interactions in psychotherapy-observational coding-has not changed in 70 years. It is labor-intensive, error prone, and expensive, limiting its use in evaluating psychotherapy in the real world. Engineering solutions from speech and language processing provide new methods for the automatic evaluation of provider ratings from session recordings. The primary data are 200 Motivational Interviewing (MI) sessions from a study on MI training methods with observer ratings of counselor empathy. Automatic Speech Recognition (ASR) was used to transcribe sessions, and the resulting words were used in a text-based predictive model of empathy. Two supporting datasets trained the speech processing tasks including ASR (1200 transcripts from heterogeneous psychotherapy sessions and 153 transcripts and session recordings from 5 MI clinical trials). The accuracy of computationally-derived empathy ratings were evaluated against human ratings for each provider. Computationally-derived empathy scores and classifications (high vs. low) were highly accurate against human-based codes and classifications, with a correlation of 0.65 and F-score (a weighted average of sensitivity and specificity) of 0.86, respectively. Empathy prediction using human transcription as input (as opposed to ASR) resulted in a slight increase in prediction accuracies, suggesting that the fully automatic system with ASR is relatively robust. Using speech and language processing methods, it is possible to generate accurate predictions of provider performance in psychotherapy from audio recordings alone. This technology can support large-scale evaluation of psychotherapy for dissemination and process studies.
Langevin, Scott M; Eliot, Melissa; Butler, Rondi A; Cheong, Agnes; Zhang, Xiang; McClean, Michael D; Koestler, Devin C; Kelsey, Karl T
2015-01-01
There are currently no screening tests in routine use for oral and pharyngeal cancer beyond visual inspection and palpation, which are provided on an opportunistic basis, indicating a need for development of novel methods for early detection, particularly in high-risk populations. We sought to address this need through comprehensive interrogation of CpG island methylation in oral rinse samples. We used the Infinium HumanMethylation450 BeadArray to interrogate DNA methylation in oral rinse samples collected from 154 patients with incident oral or pharyngeal carcinoma prior to treatment and 72 cancer-free control subjects. Subjects were randomly allocated to either a training or a testing set. For each subject, average methylation was calculated for each CpG island represented on the array. We applied a semi-supervised recursively partitioned mixture model to the CpG island methylation data to identify a classifier for prediction of case status in the training set. We then applied the resultant classifier to the testing set for validation and to assess the predictive accuracy. We identified a methylation classifier comprised of 22 CpG islands, which predicted oral and pharyngeal carcinoma with a high degree of accuracy (AUC = 0.92, 95 % CI 0.86, 0.98). This novel methylation panel is a strong predictor of oral and pharyngeal carcinoma case status in oral rinse samples and may have utility in early detection and post-treatment follow-up.
An Intelligent Ensemble Neural Network Model for Wind Speed Prediction in Renewable Energy Systems.
Ranganayaki, V; Deepa, S N
2016-01-01
Various criteria are proposed to select the number of hidden neurons in artificial neural network (ANN) models and based on the criterion evolved an intelligent ensemble neural network model is proposed to predict wind speed in renewable energy applications. The intelligent ensemble neural model based wind speed forecasting is designed by averaging the forecasted values from multiple neural network models which includes multilayer perceptron (MLP), multilayer adaptive linear neuron (Madaline), back propagation neural network (BPN), and probabilistic neural network (PNN) so as to obtain better accuracy in wind speed prediction with minimum error. The random selection of hidden neurons numbers in artificial neural network results in overfitting or underfitting problem. This paper aims to avoid the occurrence of overfitting and underfitting problems. The selection of number of hidden neurons is done in this paper employing 102 criteria; these evolved criteria are verified by the computed various error values. The proposed criteria for fixing hidden neurons are validated employing the convergence theorem. The proposed intelligent ensemble neural model is applied for wind speed prediction application considering the real time wind data collected from the nearby locations. The obtained simulation results substantiate that the proposed ensemble model reduces the error value to minimum and enhances the accuracy. The computed results prove the effectiveness of the proposed ensemble neural network (ENN) model with respect to the considered error factors in comparison with that of the earlier models available in the literature.
An Intelligent Ensemble Neural Network Model for Wind Speed Prediction in Renewable Energy Systems
Ranganayaki, V.; Deepa, S. N.
2016-01-01
Various criteria are proposed to select the number of hidden neurons in artificial neural network (ANN) models and based on the criterion evolved an intelligent ensemble neural network model is proposed to predict wind speed in renewable energy applications. The intelligent ensemble neural model based wind speed forecasting is designed by averaging the forecasted values from multiple neural network models which includes multilayer perceptron (MLP), multilayer adaptive linear neuron (Madaline), back propagation neural network (BPN), and probabilistic neural network (PNN) so as to obtain better accuracy in wind speed prediction with minimum error. The random selection of hidden neurons numbers in artificial neural network results in overfitting or underfitting problem. This paper aims to avoid the occurrence of overfitting and underfitting problems. The selection of number of hidden neurons is done in this paper employing 102 criteria; these evolved criteria are verified by the computed various error values. The proposed criteria for fixing hidden neurons are validated employing the convergence theorem. The proposed intelligent ensemble neural model is applied for wind speed prediction application considering the real time wind data collected from the nearby locations. The obtained simulation results substantiate that the proposed ensemble model reduces the error value to minimum and enhances the accuracy. The computed results prove the effectiveness of the proposed ensemble neural network (ENN) model with respect to the considered error factors in comparison with that of the earlier models available in the literature. PMID:27034973
NASA Astrophysics Data System (ADS)
Taha, Zahari; Muazu Musa, Rabiu; Majeed, A. P. P. Abdul; Razali Abdullah, Mohamad; Aizzat Zakaria, Muhammad; Muaz Alim, Muhammad; Arif Mat Jizat, Jessnor; Fauzi Ibrahim, Mohamad
2018-03-01
Support Vector Machine (SVM) has been revealed to be a powerful learning algorithm for classification and prediction. However, the use of SVM for prediction and classification in sport is at its inception. The present study classified and predicted high and low potential archers from a collection of psychological coping skills variables trained on different SVMs. 50 youth archers with the average age and standard deviation of (17.0 ±.056) gathered from various archery programmes completed a one end shooting score test. Psychological coping skills inventory which evaluates the archers level of related coping skills were filled out by the archers prior to their shooting tests. k-means cluster analysis was applied to cluster the archers based on their scores on variables assessed. SVM models, i.e. linear and fine radial basis function (RBF) kernel functions, were trained on the psychological variables. The k-means clustered the archers into high psychologically prepared archers (HPPA) and low psychologically prepared archers (LPPA), respectively. It was demonstrated that the linear SVM exhibited good accuracy and precision throughout the exercise with an accuracy of 92% and considerably fewer error rate for the prediction of the HPPA and the LPPA as compared to the fine RBF SVM. The findings of this investigation can be valuable to coaches and sports managers to recognise high potential athletes from the selected psychological coping skills variables examined which would consequently save time and energy during talent identification and development programme.
Optimization of multi-environment trials for genomic selection based on crop models.
Rincent, R; Kuhn, E; Monod, H; Oury, F-X; Rousset, M; Allard, V; Le Gouis, J
2017-08-01
We propose a statistical criterion to optimize multi-environment trials to predict genotype × environment interactions more efficiently, by combining crop growth models and genomic selection models. Genotype × environment interactions (GEI) are common in plant multi-environment trials (METs). In this context, models developed for genomic selection (GS) that refers to the use of genome-wide information for predicting breeding values of selection candidates need to be adapted. One promising way to increase prediction accuracy in various environments is to combine ecophysiological and genetic modelling thanks to crop growth models (CGM) incorporating genetic parameters. The efficiency of this approach relies on the quality of the parameter estimates, which depends on the environments composing this MET used for calibration. The objective of this study was to determine a method to optimize the set of environments composing the MET for estimating genetic parameters in this context. A criterion called OptiMET was defined to this aim, and was evaluated on simulated and real data, with the example of wheat phenology. The MET defined with OptiMET allowed estimating the genetic parameters with lower error, leading to higher QTL detection power and higher prediction accuracies. MET defined with OptiMET was on average more efficient than random MET composed of twice as many environments, in terms of quality of the parameter estimates. OptiMET is thus a valuable tool to determine optimal experimental conditions to best exploit MET and the phenotyping tools that are currently developed.
Morgante, Fabio; Huang, Wen; Maltecca, Christian; Mackay, Trudy F C
2018-06-01
Predicting complex phenotypes from genomic data is a fundamental aim of animal and plant breeding, where we wish to predict genetic merits of selection candidates; and of human genetics, where we wish to predict disease risk. While genomic prediction models work well with populations of related individuals and high linkage disequilibrium (LD) (e.g., livestock), comparable models perform poorly for populations of unrelated individuals and low LD (e.g., humans). We hypothesized that low prediction accuracies in the latter situation may occur when the genetics architecture of the trait departs from the infinitesimal and additive architecture assumed by most prediction models. We used simulated data for 10,000 lines based on sequence data from a population of unrelated, inbred Drosophila melanogaster lines to evaluate this hypothesis. We show that, even in very simplified scenarios meant as a stress test of the commonly used Genomic Best Linear Unbiased Predictor (G-BLUP) method, using all common variants yields low prediction accuracy regardless of the trait genetic architecture. However, prediction accuracy increases when predictions are informed by the genetic architecture inferred from mapping the top variants affecting main effects and interactions in the training data, provided there is sufficient power for mapping. When the true genetic architecture is largely or partially due to epistatic interactions, the additive model may not perform well, while models that account explicitly for interactions generally increase prediction accuracy. Our results indicate that accounting for genetic architecture can improve prediction accuracy for quantitative traits.
Electrophysiological evidence for preserved primacy of lexical prediction in aging.
Dave, Shruti; Brothers, Trevor A; Traxler, Matthew J; Ferreira, Fernanda; Henderson, John M; Swaab, Tamara Y
2018-05-28
Young adults show consistent neural benefits of predictable contexts when processing upcoming words, but these benefits are less clear-cut in older adults. Here we disentangle the neural correlates of prediction accuracy and contextual support during word processing, in order to test current theories that suggest that neural mechanisms underlying predictive processing are specifically impaired in older adults. During a sentence comprehension task, older and younger readers were asked to predict passage-final words and report the accuracy of these predictions. Age-related reductions were observed for N250 and N400 effects of prediction accuracy, as well as for N400 effects of contextual support independent of prediction accuracy. Furthermore, temporal primacy of predictive processing (i.e., earlier facilitation for successful predictions) was preserved across the lifespan, suggesting that predictive mechanisms are unlikely to be uniquely impaired in older adults. In addition, older adults showed prediction effects on frontal post-N400 positivities (PNPs) that were similar in amplitude to PNPs in young adults. Previous research has shown correlations between verbal fluency and lexical prediction in older adult readers, suggesting that the production system may be linked to capacity for lexical prediction, especially in aging. The current study suggests that verbal fluency modulates PNP effects of contextual support, but not prediction accuracy. Taken together, our findings suggest that aging does not result in specific declines in lexical prediction. Copyright © 2018 Elsevier Ltd. All rights reserved.
Carvajal, Thaddeus M; Viacrusis, Katherine M; Hernandez, Lara Fides T; Ho, Howell T; Amalin, Divina M; Watanabe, Kozo
2018-04-17
Several studies have applied ecological factors such as meteorological variables to develop models and accurately predict the temporal pattern of dengue incidence or occurrence. With the vast amount of studies that investigated this premise, the modeling approaches differ from each study and only use a single statistical technique. It raises the question of whether which technique would be robust and reliable. Hence, our study aims to compare the predictive accuracy of the temporal pattern of Dengue incidence in Metropolitan Manila as influenced by meteorological factors from four modeling techniques, (a) General Additive Modeling, (b) Seasonal Autoregressive Integrated Moving Average with exogenous variables (c) Random Forest and (d) Gradient Boosting. Dengue incidence and meteorological data (flood, precipitation, temperature, southern oscillation index, relative humidity, wind speed and direction) of Metropolitan Manila from January 1, 2009 - December 31, 2013 were obtained from respective government agencies. Two types of datasets were used in the analysis; observed meteorological factors (MF) and its corresponding delayed or lagged effect (LG). After which, these datasets were subjected to the four modeling techniques. The predictive accuracy and variable importance of each modeling technique were calculated and evaluated. Among the statistical modeling techniques, Random Forest showed the best predictive accuracy. Moreover, the delayed or lag effects of the meteorological variables was shown to be the best dataset to use for such purpose. Thus, the model of Random Forest with delayed meteorological effects (RF-LG) was deemed the best among all assessed models. Relative humidity was shown to be the top-most important meteorological factor in the best model. The study exhibited that there are indeed different predictive outcomes generated from each statistical modeling technique and it further revealed that the Random forest model with delayed meteorological effects to be the best in predicting the temporal pattern of Dengue incidence in Metropolitan Manila. It is also noteworthy that the study also identified relative humidity as an important meteorological factor along with rainfall and temperature that can influence this temporal pattern.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Baer, E; Royle, G; Lalonde, A
Purpose: Dual energy CT can predict stopping power ratios (SPR) for ion therapy treatment planning. Several approaches have been proposed recently, however accuracy and practicability in a clinical workflow are unaddressed. The aim of this work is to provide a fair comparison of available approaches in a human-like phantom to find the optimal method for tissue characterization in a clinical situation. Methods: The SPR determination accuracy is investigated using simulated DECT images. A virtual human-like phantom is created containing 14 different standard human tissues. SECT (120 kV) and DECT images (100 kV and 140 kV Sn) are simulated using themore » software ImaSim. The single energy CT (SECT) stoichiometric calibration method and four recently published calibration-based DECT methods are implemented and used to predict the SPRs from simulated images. The difference between SPR predictions and theoretical SPR are compared pixelwize. Mean, standard deviation and skewness of the SPR difference distributions are used as measures for bias, dispersion and symmetry. Results: The average SPR differences and standard deviations are (0.22 ± 1.27)% for SECT, and A) (−0.26 ± 1.30)%, B) (0.08 ± 1.12)%, C) (0.06 ± 1.15)% and D) (−0.05 ± 1.05)% for the four DECT methods. While SPR prediction using SECT is showing a systematic error on SPR, the DECT methods B, C and D are unbiased. The skewness of the SECT distribution is 0.57%, and A) −0.19%, B) −0.56%, C) −0.29% and D) −0.07% for DECT methods respectively. Conclusion: The here presented DECT methods B, C and D outperform the commonly used SECT stoichiometric calibration. These methods predict SPR accurately without a bias and within ± 1.2% (68th percentile). This indicates that DECT potentially improves accuracy of range predictions in proton therapy. A validation of these findings using clinical CT images of real tissues is necessary.« less
Feinstein, Wei P; Brylinski, Michal
2015-01-01
Computational approaches have emerged as an instrumental methodology in modern research. For example, virtual screening by molecular docking is routinely used in computer-aided drug discovery. One of the critical parameters for ligand docking is the size of a search space used to identify low-energy binding poses of drug candidates. Currently available docking packages often come with a default protocol for calculating the box size, however, many of these procedures have not been systematically evaluated. In this study, we investigate how the docking accuracy of AutoDock Vina is affected by the selection of a search space. We propose a new procedure for calculating the optimal docking box size that maximizes the accuracy of binding pose prediction against a non-redundant and representative dataset of 3,659 protein-ligand complexes selected from the Protein Data Bank. Subsequently, we use the Directory of Useful Decoys, Enhanced to demonstrate that the optimized docking box size also yields an improved ranking in virtual screening. Binding pockets in both datasets are derived from the experimental complex structures and, additionally, predicted by eFindSite. A systematic analysis of ligand binding poses generated by AutoDock Vina shows that the highest accuracy is achieved when the dimensions of the search space are 2.9 times larger than the radius of gyration of a docking compound. Subsequent virtual screening benchmarks demonstrate that this optimized docking box size also improves compound ranking. For instance, using predicted ligand binding sites, the average enrichment factor calculated for the top 1 % (10 %) of the screening library is 8.20 (3.28) for the optimized protocol, compared to 7.67 (3.19) for the default procedure. Depending on the evaluation metric, the optimal docking box size gives better ranking in virtual screening for about two-thirds of target proteins. This fully automated procedure can be used to optimize docking protocols in order to improve the ranking accuracy in production virtual screening simulations. Importantly, the optimized search space systematically yields better results than the default method not only for experimental pockets, but also for those predicted from protein structures. A script for calculating the optimal docking box size is freely available at www.brylinski.org/content/docking-box-size. Graphical AbstractWe developed a procedure to optimize the box size in molecular docking calculations. Left panel shows the predicted binding pose of NADP (green sticks) compared to the experimental complex structure of human aldose reductase (blue sticks) using a default protocol. Right panel shows the docking accuracy using an optimized box size.
Guest, J F; Vowden, K; Vowden, P
2017-06-02
To estimate the patterns of care and related resource use attributable to managing acute and chronic wounds among a catchment population of a typical clinical commissioning group (CCG)/health board and corresponding National Health Service (NHS) costs in the UK. This was a sub-analysis of a retrospective cohort analysis of the records of 2000 patients in The Health Improvement Network (THIN) database. Patients' characteristics, wound-related health outcomes and health-care resource use were quantified for an average CCG/health board with a catchment population of 250,000 adults ≥18 years of age, and the corresponding NHS cost of patient management was estimated at 2013/2014 prices. An average CCG/health board was estimated to be managing 11,200 wounds in 2012/2013. Of these, 40% were considered to be acute wounds, 48% chronic and 12% lacking any specific diagnosis. The prevalence of acute, chronic and unspecified wounds was estimated to be growing at the rate of 9%, 12% and 13% per annum respectively. Our analysis indicated that the current rate of wound healing must increase by an average of at least 1% per annum across all wound types in order to slow down the increasing prevalence. Otherwise, an average CCG/health board is predicted to manage ~23,200 wounds per annum by 2019/2020 and is predicted to spend a discounted (the process of determining the present value of a payment that is to be received in the future) £50 million on managing these wounds and associated comorbidities. Real-world evidence highlights the substantial burden that acute and chronic wounds impose on an average CCG/health board. Strategies are required to improve the accuracy of diagnosis and healing rates.
Sequence Based Prediction of Antioxidant Proteins Using a Classifier Selection Strategy
Zhang, Lina; Zhang, Chengjin; Gao, Rui; Yang, Runtao; Song, Qing
2016-01-01
Antioxidant proteins perform significant functions in maintaining oxidation/antioxidation balance and have potential therapies for some diseases. Accurate identification of antioxidant proteins could contribute to revealing physiological processes of oxidation/antioxidation balance and developing novel antioxidation-based drugs. In this study, an ensemble method is presented to predict antioxidant proteins with hybrid features, incorporating SSI (Secondary Structure Information), PSSM (Position Specific Scoring Matrix), RSA (Relative Solvent Accessibility), and CTD (Composition, Transition, Distribution). The prediction results of the ensemble predictor are determined by an average of prediction results of multiple base classifiers. Based on a classifier selection strategy, we obtain an optimal ensemble classifier composed of RF (Random Forest), SMO (Sequential Minimal Optimization), NNA (Nearest Neighbor Algorithm), and J48 with an accuracy of 0.925. A Relief combined with IFS (Incremental Feature Selection) method is adopted to obtain optimal features from hybrid features. With the optimal features, the ensemble method achieves improved performance with a sensitivity of 0.95, a specificity of 0.93, an accuracy of 0.94, and an MCC (Matthew’s Correlation Coefficient) of 0.880, far better than the existing method. To evaluate the prediction performance objectively, the proposed method is compared with existing methods on the same independent testing dataset. Encouragingly, our method performs better than previous studies. In addition, our method achieves more balanced performance with a sensitivity of 0.878 and a specificity of 0.860. These results suggest that the proposed ensemble method can be a potential candidate for antioxidant protein prediction. For public access, we develop a user-friendly web server for antioxidant protein identification that is freely accessible at http://antioxidant.weka.cc. PMID:27662651
Heslot, Nicolas; Akdemir, Deniz; Sorrells, Mark E; Jannink, Jean-Luc
2014-02-01
Development of models to predict genotype by environment interactions, in unobserved environments, using environmental covariates, a crop model and genomic selection. Application to a large winter wheat dataset. Genotype by environment interaction (G*E) is one of the key issues when analyzing phenotypes. The use of environment data to model G*E has long been a subject of interest but is limited by the same problems as those addressed by genomic selection methods: a large number of correlated predictors each explaining a small amount of the total variance. In addition, non-linear responses of genotypes to stresses are expected to further complicate the analysis. Using a crop model to derive stress covariates from daily weather data for predicted crop development stages, we propose an extension of the factorial regression model to genomic selection. This model is further extended to the marker level, enabling the modeling of quantitative trait loci (QTL) by environment interaction (Q*E), on a genome-wide scale. A newly developed ensemble method, soft rule fit, was used to improve this model and capture non-linear responses of QTL to stresses. The method is tested using a large winter wheat dataset, representative of the type of data available in a large-scale commercial breeding program. Accuracy in predicting genotype performance in unobserved environments for which weather data were available increased by 11.1% on average and the variability in prediction accuracy decreased by 10.8%. By leveraging agronomic knowledge and the large historical datasets generated by breeding programs, this new model provides insight into the genetic architecture of genotype by environment interactions and could predict genotype performance based on past and future weather scenarios.
The SIST-M: Predictive validity of a brief structured Clinical Dementia Rating interview
Okereke, Olivia I.; Pantoja-Galicia, Norberto; Copeland, Maura; Hyman, Bradley T.; Wanggaard, Taylor; Albert, Marilyn S.; Betensky, Rebecca A.; Blacker, Deborah
2011-01-01
Background We previously established reliability and cross-sectional validity of the SIST-M (Structured Interview and Scoring Tool–Massachusetts Alzheimer's Disease Research Center), a shortened version of an instrument shown to predict progression to Alzheimer disease (AD), even among persons with very mild cognitive impairment (vMCI). Objective To test predictive validity of the SIST-M. Methods Participants were 342 community-dwelling, non-demented older adults in a longitudinal study. Baseline Clinical Dementia Rating (CDR) ratings were determined by either: 1) clinician interviews or 2) a previously developed computer algorithm based on 60 questions (of a possible 131) extracted from clinician interviews. We developed age+gender+education-adjusted Cox proportional hazards models using CDR-sum-of-boxes (CDR-SB) as the predictor, where CDR-SB was determined by either clinician interview or algorithm; models were run for the full sample (n=342) and among those jointly classified as vMCI using clinician- and algorithm-based CDR ratings (n=156). We directly compared predictive accuracy using time-dependent Receiver Operating Characteristic (ROC) curves. Results AD hazard ratios (HRs) were similar for clinician-based and algorithm-based CDR-SB: for a 1-point increment in CDR-SB, respective HRs (95% CI)=3.1 (2.5,3.9) and 2.8 (2.2,3.5); among those with vMCI, respective HRs (95% CI) were 2.2 (1.6,3.2) and 2.1 (1.5,3.0). Similarly high predictive accuracy was achieved: the concordance probability (weighted average of the area-under-the-ROC curves) over follow-up was 0.78 vs. 0.76 using clinician-based vs. algorithm-based CDR-SB. Conclusion CDR scores based on items from this shortened interview had high predictive ability for AD – comparable to that using a lengthy clinical interview. PMID:21986342
NASA Astrophysics Data System (ADS)
Mukhopadhyay, Saumyadip; Abraham, John
2012-07-01
The unsteady flamelet progress variable (UFPV) model has been proposed by Pitsch and Ihme ["An unsteady/flamelet progress variable method for LES of nonpremixed turbulent combustion," AIAA Paper No. 2005-557, 2005] for modeling the averaged/filtered chemistry source terms in Reynolds averaged simulations and large eddy simulations of reacting non-premixed combustion. In the UFPV model, a look-up table of source terms is generated as a function of mixture fraction Z, scalar dissipation rate χ, and progress variable C by solving the unsteady flamelet equations. The assumption is that the unsteady flamelet represents the evolution of the reacting mixing layer in the non-premixed flame. We assess the accuracy of the model in predicting autoignition and flame development in compositionally stratified n-heptane/air mixtures using direct numerical simulations (DNS). The focus in this work is primarily on the assessment of accuracy of the probability density functions (PDFs) employed for obtaining averaged source terms. The performance of commonly employed presumed functions, such as the dirac-delta distribution function, the β distribution function, and statistically most likely distribution (SMLD) approach in approximating the shapes of the PDFs of the reactive and the conserved scalars is evaluated. For unimodal distributions, it is observed that functions that need two-moment information, e.g., the β distribution function and the SMLD approach with two-moment closure, are able to reasonably approximate the actual PDF. As the distribution becomes multimodal, higher moment information is required. Differences are observed between the ignition trends obtained from DNS and those predicted by the look-up table, especially for smaller gradients where the flamelet assumption becomes less applicable. The formulation assumes that the shape of the χ(Z) profile can be modeled by an error function which remains unchanged in the presence of heat release. We show that this assumption is not accurate.
Prediction of reacting atoms for the major biotransformation reactions of organic xenobiotics.
Rudik, Anastasia V; Dmitriev, Alexander V; Lagunin, Alexey A; Filimonov, Dmitry A; Poroikov, Vladimir V
2016-01-01
The knowledge of drug metabolite structures is essential at the early stage of drug discovery to understand the potential liabilities and risks connected with biotransformation. The determination of the site of a molecule at which a particular metabolic reaction occurs could be used as a starting point for metabolite identification. The prediction of the site of metabolism does not always correspond to the particular atom that is modified by the enzyme but rather is often associated with a group of atoms. To overcome this problem, we propose to operate with the term "reacting atom", corresponding to a single atom in the substrate that is modified during the biotransformation reaction. The prediction of the reacting atom(s) in a molecule for the major classes of biotransformation reactions is necessary to generate drug metabolites. Substrates of the major human cytochromes P450 and UDP-glucuronosyltransferases from the Biovia Metabolite database were divided into nine groups according to their reaction classes, which are aliphatic and aromatic hydroxylation, N- and O-glucuronidation, N-, S- and C-oxidation, and N- and O-dealkylation. Each training set consists of positive and negative examples of structures with one labelled atom. In the positive examples, the labelled atom is the reacting atom of a particular reaction that changed adjacency. Negative examples represent non-reacting atoms of a particular reaction. We used Labelled Multilevel Neighbourhoods of Atoms descriptors for the designation of reacting atoms. A Bayesian-like algorithm was applied to estimate the structure-activity relationships. The average invariant accuracy of prediction obtained in leave-one-out and 20-fold cross-validation procedures for five human isoforms of cytochrome P450 and all isoforms of UDP-glucuronosyltransferase varies from 0.86 to 0.99 (0.96 on average). We report that reacting atoms may be predicted with reasonable accuracy for the major classes of metabolic reactions-aliphatic and aromatic hydroxylation, N- and O-glucuronidation, N-, S- and C-oxidation, and N- and O-dealkylation. The proposed method is implemented as a freely available web service at http://www.way2drug.com/RA and may be used for the prediction of the most probable biotransformation reaction(s) and the appropriate reacting atoms in drug-like compounds.Graphical abstract.
Optimal Design of Low-Density SNP Arrays for Genomic Prediction: Algorithm and Applications.
Wu, Xiao-Lin; Xu, Jiaqi; Feng, Guofei; Wiggans, George R; Taylor, Jeremy F; He, Jun; Qian, Changsong; Qiu, Jiansheng; Simpson, Barry; Walker, Jeremy; Bauck, Stewart
2016-01-01
Low-density (LD) single nucleotide polymorphism (SNP) arrays provide a cost-effective solution for genomic prediction and selection, but algorithms and computational tools are needed for the optimal design of LD SNP chips. A multiple-objective, local optimization (MOLO) algorithm was developed for design of optimal LD SNP chips that can be imputed accurately to medium-density (MD) or high-density (HD) SNP genotypes for genomic prediction. The objective function facilitates maximization of non-gap map length and system information for the SNP chip, and the latter is computed either as locus-averaged (LASE) or haplotype-averaged Shannon entropy (HASE) and adjusted for uniformity of the SNP distribution. HASE performed better than LASE with ≤1,000 SNPs, but required considerably more computing time. Nevertheless, the differences diminished when >5,000 SNPs were selected. Optimization was accomplished conditionally on the presence of SNPs that were obligated to each chromosome. The frame location of SNPs on a chip can be either uniform (evenly spaced) or non-uniform. For the latter design, a tunable empirical Beta distribution was used to guide location distribution of frame SNPs such that both ends of each chromosome were enriched with SNPs. The SNP distribution on each chromosome was finalized through the objective function that was locally and empirically maximized. This MOLO algorithm was capable of selecting a set of approximately evenly-spaced and highly-informative SNPs, which in turn led to increased imputation accuracy compared with selection solely of evenly-spaced SNPs. Imputation accuracy increased with LD chip size, and imputation error rate was extremely low for chips with ≥3,000 SNPs. Assuming that genotyping or imputation error occurs at random, imputation error rate can be viewed as the upper limit for genomic prediction error. Our results show that about 25% of imputation error rate was propagated to genomic prediction in an Angus population. The utility of this MOLO algorithm was also demonstrated in a real application, in which a 6K SNP panel was optimized conditional on 5,260 obligatory SNP selected based on SNP-trait association in U.S. Holstein animals. With this MOLO algorithm, both imputation error rate and genomic prediction error rate were minimal.
Optimal Design of Low-Density SNP Arrays for Genomic Prediction: Algorithm and Applications
Wu, Xiao-Lin; Xu, Jiaqi; Feng, Guofei; Wiggans, George R.; Taylor, Jeremy F.; He, Jun; Qian, Changsong; Qiu, Jiansheng; Simpson, Barry; Walker, Jeremy; Bauck, Stewart
2016-01-01
Low-density (LD) single nucleotide polymorphism (SNP) arrays provide a cost-effective solution for genomic prediction and selection, but algorithms and computational tools are needed for the optimal design of LD SNP chips. A multiple-objective, local optimization (MOLO) algorithm was developed for design of optimal LD SNP chips that can be imputed accurately to medium-density (MD) or high-density (HD) SNP genotypes for genomic prediction. The objective function facilitates maximization of non-gap map length and system information for the SNP chip, and the latter is computed either as locus-averaged (LASE) or haplotype-averaged Shannon entropy (HASE) and adjusted for uniformity of the SNP distribution. HASE performed better than LASE with ≤1,000 SNPs, but required considerably more computing time. Nevertheless, the differences diminished when >5,000 SNPs were selected. Optimization was accomplished conditionally on the presence of SNPs that were obligated to each chromosome. The frame location of SNPs on a chip can be either uniform (evenly spaced) or non-uniform. For the latter design, a tunable empirical Beta distribution was used to guide location distribution of frame SNPs such that both ends of each chromosome were enriched with SNPs. The SNP distribution on each chromosome was finalized through the objective function that was locally and empirically maximized. This MOLO algorithm was capable of selecting a set of approximately evenly-spaced and highly-informative SNPs, which in turn led to increased imputation accuracy compared with selection solely of evenly-spaced SNPs. Imputation accuracy increased with LD chip size, and imputation error rate was extremely low for chips with ≥3,000 SNPs. Assuming that genotyping or imputation error occurs at random, imputation error rate can be viewed as the upper limit for genomic prediction error. Our results show that about 25% of imputation error rate was propagated to genomic prediction in an Angus population. The utility of this MOLO algorithm was also demonstrated in a real application, in which a 6K SNP panel was optimized conditional on 5,260 obligatory SNP selected based on SNP-trait association in U.S. Holstein animals. With this MOLO algorithm, both imputation error rate and genomic prediction error rate were minimal. PMID:27583971
Bennema, S C; Molento, M B; Scholte, R G; Carvalho, O S; Pritsch, I
2017-11-01
Fascioliasis is a condition caused by the trematode Fasciola hepatica. In this paper, the spatial distribution of F. hepatica in bovines in Brazil was modelled using a decision tree approach and a logistic regression, combined with a geographic information system (GIS) query. In the decision tree and the logistic model, isothermality had the strongest influence on disease prevalence. Also, the 50-year average precipitation in the warmest quarter of the year was included as a risk factor, having a negative influence on the parasite prevalence. The risk maps developed using both techniques, showed a predicted higher prevalence mainly in the South of Brazil. The prediction performance seemed to be high, but both techniques failed to reach a high accuracy in predicting the medium and high prevalence classes to the entire country. The GIS query map, based on the range of isothermality, minimum temperature of coldest month, precipitation of warmest quarter of the year, altitude and the average dailyland surface temperature, showed a possibility of presence of F. hepatica in a very large area. The risk maps produced using these methods can be used to focus activities of animal and public health programmes, even on non-evaluated F. hepatica areas.
Road traffic accidents prediction modelling: An analysis of Anambra State, Nigeria.
Ihueze, Chukwutoo C; Onwurah, Uchendu O
2018-03-01
One of the major problems in the world today is the rate of road traffic crashes and deaths on our roads. Majority of these deaths occur in low-and-middle income countries including Nigeria. This study analyzed road traffic crashes in Anambra State, Nigeria with the intention of developing accurate predictive models for forecasting crash frequency in the State using autoregressive integrated moving average (ARIMA) and autoregressive integrated moving average with explanatory variables (ARIMAX) modelling techniques. The result showed that ARIMAX model outperformed the ARIMA (1,1,1) model generated when their performances were compared using the lower Bayesian information criterion, mean absolute percentage error, root mean square error; and higher coefficient of determination (R-Squared) as accuracy measures. The findings of this study reveal that incorporating human, vehicle and environmental related factors in time series analysis of crash dataset produces a more robust predictive model than solely using aggregated crash count. This study contributes to the body of knowledge on road traffic safety and provides an approach to forecasting using many human, vehicle and environmental factors. The recommendations made in this study if applied will help in reducing the number of road traffic crashes in Nigeria. Copyright © 2017 Elsevier Ltd. All rights reserved.
Hydraulic geometry of river cross sections; theory of minimum variance
Williams, Garnett P.
1978-01-01
This study deals with the rates at which mean velocity, mean depth, and water-surface width increase with water discharge at a cross section on an alluvial stream. Such relations often follow power laws, the exponents in which are called hydraulic exponents. The Langbein (1964) minimum-variance theory is examined in regard to its validity and its ability to predict observed hydraulic exponents. The variables used with the theory were velocity, depth, width, bed shear stress, friction factor, slope (energy gradient), and stream power. Slope is often constant, in which case only velocity, depth, width, shear and friction factor need be considered. The theory was tested against a wide range of field data from various geographic areas of the United States. The original theory was intended to produce only the average hydraulic exponents for a group of cross sections in a similar type of geologic or hydraulic environment. The theory does predict these average exponents with a reasonable degree of accuracy. An attempt to forecast the exponents at any selected cross section was moderately successful. Empirical equations are more accurate than the minimum variance, Gauckler-Manning, or Chezy methods. Predictions of the exponent of width are most reliable, the exponent of depth fair, and the exponent of mean velocity poor. (Woodard-USGS)
NASA Astrophysics Data System (ADS)
González, C.; Segurado, J.; LLorca, J.
2004-07-01
The deformation of a composite made up of a random and homogeneous dispersion of elastic spheres in an elasto-plastic matrix was simulated by the finite element analysis of three-dimensional multiparticle cubic cells with periodic boundary conditions. "Exact" results (to a few percent) in tension and shear were determined by averaging 12 stress-strain curves obtained from cells containing 30 spheres, and they were compared with the predictions of secant homogenization models. In addition, the numerical simulations supplied detailed information of the stress microfields, which was used to ascertain the accuracy and the limitations of the homogenization models to include the nonlinear deformation of the matrix. It was found that secant approximations based on the volume-averaged second-order moment of the matrix stress tensor, combined with a highly accurate linear homogenization model, provided excellent predictions of the composite response when the matrix strain hardening rate was high. This was not the case, however, in composites which exhibited marked plastic strain localization in the matrix. The analysis of the evolution of the matrix stresses revealed that better predictions of the composite behavior can be obtained with new homogenization models which capture the essential differences in the stress carried by the elastic and plastic regions in the matrix at the onset of plastic deformation.
CPHmodels-3.0--remote homology modeling using structure-guided sequence profiles.
Nielsen, Morten; Lundegaard, Claus; Lund, Ole; Petersen, Thomas Nordahl
2010-07-01
CPHmodels-3.0 is a web server predicting protein 3D structure by use of single template homology modeling. The server employs a hybrid of the scoring functions of CPHmodels-2.0 and a novel remote homology-modeling algorithm. A query sequence is first attempted modeled using the fast CPHmodels-2.0 profile-profile scoring function suitable for close homology modeling. The new computational costly remote homology-modeling algorithm is only engaged provided that no suitable PDB template is identified in the initial search. CPHmodels-3.0 was benchmarked in the CASP8 competition and produced models for 94% of the targets (117 out of 128), 74% were predicted as high reliability models (87 out of 117). These achieved an average RMSD of 4.6 A when superimposed to the 3D structure. The remaining 26% low reliably models (30 out of 117) could superimpose to the true 3D structure with an average RMSD of 9.3 A. These performance values place the CPHmodels-3.0 method in the group of high performing 3D prediction tools. Beside its accuracy, one of the important features of the method is its speed. For most queries, the response time of the server is <20 min. The web server is available at http://www.cbs.dtu.dk/services/CPHmodels/.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lu, Siyuan; Hwang, Youngdeok; Khabibrakhmanov, Ildar
With increasing penetration of solar and wind energy to the total energy supply mix, the pressing need for accurate energy forecasting has become well-recognized. Here we report the development of a machine-learning based model blending approach for statistically combining multiple meteorological models for improving the accuracy of solar/wind power forecast. Importantly, we demonstrate that in addition to parameters to be predicted (such as solar irradiance and power), including additional atmospheric state parameters which collectively define weather situations as machine learning input provides further enhanced accuracy for the blended result. Functional analysis of variance shows that the error of individual modelmore » has substantial dependence on the weather situation. The machine-learning approach effectively reduces such situation dependent error thus produces more accurate results compared to conventional multi-model ensemble approaches based on simplistic equally or unequally weighted model averaging. Validation over an extended period of time results show over 30% improvement in solar irradiance/power forecast accuracy compared to forecasts based on the best individual model.« less
Riley, Richard D; Ahmed, Ikhlaaq; Debray, Thomas P A; Willis, Brian H; Noordzij, J Pieter; Higgins, Julian P T; Deeks, Jonathan J
2015-06-15
Following a meta-analysis of test accuracy studies, the translation of summary results into clinical practice is potentially problematic. The sensitivity, specificity and positive (PPV) and negative (NPV) predictive values of a test may differ substantially from the average meta-analysis findings, because of heterogeneity. Clinicians thus need more guidance: given the meta-analysis, is a test likely to be useful in new populations, and if so, how should test results inform the probability of existing disease (for a diagnostic test) or future adverse outcome (for a prognostic test)? We propose ways to address this. Firstly, following a meta-analysis, we suggest deriving prediction intervals and probability statements about the potential accuracy of a test in a new population. Secondly, we suggest strategies on how clinicians should derive post-test probabilities (PPV and NPV) in a new population based on existing meta-analysis results and propose a cross-validation approach for examining and comparing their calibration performance. Application is made to two clinical examples. In the first example, the joint probability that both sensitivity and specificity will be >80% in a new population is just 0.19, because of a low sensitivity. However, the summary PPV of 0.97 is high and calibrates well in new populations, with a probability of 0.78 that the true PPV will be at least 0.95. In the second example, post-test probabilities calibrate better when tailored to the prevalence in the new population, with cross-validation revealing a probability of 0.97 that the observed NPV will be within 10% of the predicted NPV. © 2015 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Improving the baking quality of bread wheat by genomic selection in early generations.
Michel, Sebastian; Kummer, Christian; Gallee, Martin; Hellinger, Jakob; Ametz, Christian; Akgöl, Batuhan; Epure, Doru; Güngör, Huseyin; Löschenberger, Franziska; Buerstmayr, Hermann
2018-02-01
Genomic selection shows great promise for pre-selecting lines with superior bread baking quality in early generations, 3 years ahead of labour-intensive, time-consuming, and costly quality analysis. The genetic improvement of baking quality is one of the grand challenges in wheat breeding as the assessment of the associated traits often involves time-consuming, labour-intensive, and costly testing forcing breeders to postpone sophisticated quality tests to the very last phases of variety development. The prospect of genomic selection for complex traits like grain yield has been shown in numerous studies, and might thus be also an interesting method to select for baking quality traits. Hence, we focused in this study on the accuracy of genomic selection for laborious and expensive to phenotype quality traits as well as its selection response in comparison with phenotypic selection. More than 400 genotyped wheat lines were, therefore, phenotyped for protein content, dough viscoelastic and mixing properties related to baking quality in multi-environment trials 2009-2016. The average prediction accuracy across three independent validation populations was r = 0.39 and could be increased to r = 0.47 by modelling major QTL as fixed effects as well as employing multi-trait prediction models, which resulted in an acceptable prediction accuracy for all dough rheological traits (r = 0.38-0.63). Genomic selection can furthermore be applied 2-3 years earlier than direct phenotypic selection, and the estimated selection response was nearly twice as high in comparison with indirect selection by protein content for baking quality related traits. This considerable advantage of genomic selection could accordingly support breeders in their selection decisions and aid in efficiently combining superior baking quality with grain yield in newly developed wheat varieties.