prediction method applied: Topics by Science.gov

Sample records for prediction method applied

Influenza detection and prediction algorithms: comparative accuracy trial in Östergötland county, Sweden, 2008-2012.

PubMed

Spreco, A; Eriksson, O; Dahlström, Ö; Timpka, T

2017-07-01

Methods for the detection of influenza epidemics and prediction of their progress have seldom been comparatively evaluated using prospective designs. This study aimed to perform a prospective comparative trial of algorithms for the detection and prediction of increased local influenza activity. Data on clinical influenza diagnoses recorded by physicians and syndromic data from a telenursing service were used. Five detection and three prediction algorithms previously evaluated in public health settings were calibrated and then evaluated over 3 years. When applied on diagnostic data, only detection using the Serfling regression method and prediction using the non-adaptive log-linear regression method showed acceptable performances during winter influenza seasons. For the syndromic data, none of the detection algorithms displayed a satisfactory performance, while non-adaptive log-linear regression was the best performing prediction method. We conclude that evidence was found for that available algorithms for influenza detection and prediction display satisfactory performance when applied on local diagnostic data during winter influenza seasons. When applied on local syndromic data, the evaluated algorithms did not display consistent performance. Further evaluations and research on combination of methods of these types in public health information infrastructures for 'nowcasting' (integrated detection and prediction) of influenza activity are warranted.
Predicting hepatitis B monthly incidence rates using weighted Markov chains and time series methods.

PubMed

Shahdoust, Maryam; Sadeghifar, Majid; Poorolajal, Jalal; Javanrooh, Niloofar; Amini, Payam

2015-01-01

Hepatitis B (HB) is a major global mortality. Accurately predicting the trend of the disease can provide an appropriate view to make health policy disease prevention. This paper aimed to apply three different to predict monthly incidence rates of HB. This historical cohort study was conducted on the HB incidence data of Hamadan Province, the west of Iran, from 2004 to 2012. Weighted Markov Chain (WMC) method based on Markov chain theory and two time series models including Holt Exponential Smoothing (HES) and SARIMA were applied on the data. The results of different applied methods were compared to correct percentages of predicted incidence rates. The monthly incidence rates were clustered into two clusters as state of Markov chain. The correct predicted percentage of the first and second clusters for WMC, HES and SARIMA methods was (100, 0), (84, 67) and (79, 47) respectively. The overall incidence rate of HBV is estimated to decrease over time. The comparison of results of the three models indicated that in respect to existing seasonality trend and non-stationarity, the HES had the most accurate prediction of the incidence rates.
Nonequilibrium radiative heating prediction method for aeroassist flowfields with coupling to flowfield solvers. Ph.D. Thesis

NASA Technical Reports Server (NTRS)

Hartung, Lin C.

1991-01-01

A method for predicting radiation adsorption and emission coefficients in thermochemical nonequilibrium flows is developed. The method is called the Langley optimized radiative nonequilibrium code (LORAN). It applies the smeared band approximation for molecular radiation to produce moderately detailed results and is intended to fill the gap between detailed but costly prediction methods and very fast but highly approximate methods. The optimization of the method to provide efficient solutions allowing coupling to flowfield solvers is discussed. Representative results are obtained and compared to previous nonequilibrium radiation methods, as well as to ground- and flight-measured data. Reasonable agreement is found in all cases. A multidimensional radiative transport method is also developed for axisymmetric flows. Its predictions for wall radiative flux are 20 to 25 percent lower than those of the tangent slab transport method, as expected, though additional investigation of the symmetry and outflow boundary conditions is indicated. The method was applied to the peak heating condition of the aeroassist flight experiment (AFE) trajectory, with results comparable to predictions from other methods. The LORAN method was also applied in conjunction with the computational fluid dynamics (CFD) code LAURA to study the sensitivity of the radiative heating prediction to various models used in nonequilibrium CFD. This study suggests that radiation measurements can provide diagnostic information about the detailed processes occurring in a nonequilibrium flowfield because radiation phenomena are very sensitive to these processes.
A comparison of LOD and UT1-UTC forecasts by different combined prediction techniques

NASA Astrophysics Data System (ADS)

Kosek, W.; Kalarus, M.; Johnson, T. J.; Wooden, W. H.; McCarthy, D. D.; Popiński, W.

Stochastic prediction techniques including autocovariance, autoregressive, autoregressive moving average, and neural networks were applied to the UT1-UTC and Length of Day (LOD) International Earth Rotation and Reference Systems Servive (IERS) EOPC04 time series to evaluate the capabilities of each method. All known effects such as leap seconds and solid Earth zonal tides were first removed from the observed values of UT1-UTC and LOD. Two combination procedures were applied to predict the resulting LODR time series: 1) the combination of the least-squares (LS) extrapolation with a stochastic predition method, and 2) the combination of the discrete wavelet transform (DWT) filtering and a stochastic prediction method. The results of the combination of the LS extrapolation with different stochastic prediction techniques were compared with the results of the UT1-UTC prediction method currently used by the IERS Rapid Service/Prediction Centre (RS/PC). It was found that the prediction accuracy depends on the starting prediction epochs, and for the combined forecast methods, the mean prediction errors for 1 to about 70 days in the future are of the same order as those of the method used by the IERS RS/PC.
Connecting clinical and actuarial prediction with rule-based methods.

PubMed

Fokkema, Marjolein; Smits, Niels; Kelderman, Henk; Penninx, Brenda W J H

2015-06-01

Meta-analyses comparing the accuracy of clinical versus actuarial prediction have shown actuarial methods to outperform clinical methods, on average. However, actuarial methods are still not widely used in clinical practice, and there has been a call for the development of actuarial prediction methods for clinical practice. We argue that rule-based methods may be more useful than the linear main effect models usually employed in prediction studies, from a data and decision analytic as well as a practical perspective. In addition, decision rules derived with rule-based methods can be represented as fast and frugal trees, which, unlike main effects models, can be used in a sequential fashion, reducing the number of cues that have to be evaluated before making a prediction. We illustrate the usability of rule-based methods by applying RuleFit, an algorithm for deriving decision rules for classification and regression problems, to a dataset on prediction of the course of depressive and anxiety disorders from Penninx et al. (2011). The RuleFit algorithm provided a model consisting of 2 simple decision rules, requiring evaluation of only 2 to 4 cues. Predictive accuracy of the 2-rule model was very similar to that of a logistic regression model incorporating 20 predictor variables, originally applied to the dataset. In addition, the 2-rule model required, on average, evaluation of only 3 cues. Therefore, the RuleFit algorithm appears to be a promising method for creating decision tools that are less time consuming and easier to apply in psychological practice, and with accuracy comparable to traditional actuarial methods. (c) 2015 APA, all rights reserved).
Prediction of sound fields in acoustical cavities using the boundary element method. M.S. Thesis

NASA Technical Reports Server (NTRS)

Kipp, C. R.; Bernhard, R. J.

1985-01-01

A method was developed to predict sound fields in acoustical cavities. The method is based on the indirect boundary element method. An isoparametric quadratic boundary element is incorporated. Pressure, velocity and/or impedance boundary conditions may be applied to a cavity by using this method. The capability to include acoustic point sources within the cavity is implemented. The method is applied to the prediction of sound fields in spherical and rectangular cavities. All three boundary condition types are verified. Cases with a point source within the cavity domain are also studied. Numerically determined cavity pressure distributions and responses are presented. The numerical results correlate well with available analytical results.
Guidelines for reporting and using prediction tools for genetic variation analysis.

PubMed

Vihinen, Mauno

2013-02-01

Computational prediction methods are widely used for the analysis of human genome sequence variants and their effects on gene/protein function, splice site aberration, pathogenicity, and disease risk. New methods are frequently developed. We believe that guidelines are essential for those writing articles about new prediction methods, as well as for those applying these tools in their research, so that the necessary details are reported. This will enable readers to gain the full picture of technical information, performance, and interpretation of results, and to facilitate comparisons of related methods. Here, we provide instructions on how to describe new methods, report datasets, and assess the performance of predictive tools. We also discuss what details of predictor implementation are essential for authors to understand. Similarly, these guidelines for the use of predictors provide instructions on what needs to be delineated in the text, as well as how researchers can avoid unwarranted conclusions. They are applicable to most prediction methods currently utilized. By applying these guidelines, authors will help reviewers, editors, and readers to more fully comprehend prediction methods and their use. © 2012 Wiley Periodicals, Inc.
Unsteady Fast Random Particle Mesh method for efficient prediction of tonal and broadband noises of a centrifugal fan unit

NASA Astrophysics Data System (ADS)

Heo, Seung; Cheong, Cheolung; Kim, Taehoon

2015-09-01

In this study, efficient numerical method is proposed for predicting tonal and broadband noises of a centrifugal fan unit. The proposed method is based on Hybrid Computational Aero-Acoustic (H-CAA) techniques combined with Unsteady Fast Random Particle Mesh (U-FRPM) method. The U-FRPM method is developed by extending the FRPM method proposed by Ewert et al. and is utilized to synthesize turbulence flow field from unsteady RANS solutions. The H-CAA technique combined with U-FRPM method is applied to predict broadband as well as tonal noises of a centrifugal fan unit in a household refrigerator. Firstly, unsteady flow field driven by a rotating fan is computed by solving the RANS equations with Computational Fluid Dynamic (CFD) techniques. Main source regions around the rotating fan are identified by examining the computed flow fields. Then, turbulence flow fields in the main source regions are synthesized by applying the U-FRPM method. The acoustic analogy is applied to model acoustic sources in the main source regions. Finally, the centrifugal fan noise is predicted by feeding the modeled acoustic sources into an acoustic solver based on the Boundary Element Method (BEM). The sound spectral levels predicted using the current numerical method show good agreements with the measured spectra at the Blade Pass Frequencies (BPFs) as well as in the high frequency range. On the more, the present method enables quantitative assessment of relative contributions of identified source regions to the sound field by comparing predicted sound pressure spectrum due to modeled sources.
Improved nonlinear prediction method

NASA Astrophysics Data System (ADS)

Adenan, Nur Hamiza; Md Noorani, Mohd Salmi

2014-06-01

The analysis and prediction of time series data have been addressed by researchers. Many techniques have been developed to be applied in various areas, such as weather forecasting, financial markets and hydrological phenomena involving data that are contaminated by noise. Therefore, various techniques to improve the method have been introduced to analyze and predict time series data. In respect of the importance of analysis and the accuracy of the prediction result, a study was undertaken to test the effectiveness of the improved nonlinear prediction method for data that contain noise. The improved nonlinear prediction method involves the formation of composite serial data based on the successive differences of the time series. Then, the phase space reconstruction was performed on the composite data (one-dimensional) to reconstruct a number of space dimensions. Finally the local linear approximation method was employed to make a prediction based on the phase space. This improved method was tested with data series Logistics that contain 0%, 5%, 10%, 20% and 30% of noise. The results show that by using the improved method, the predictions were found to be in close agreement with the observed ones. The correlation coefficient was close to one when the improved method was applied on data with up to 10% noise. Thus, an improvement to analyze data with noise without involving any noise reduction method was introduced to predict the time series data.
Predicting Peak Flows following Forest Fires

NASA Astrophysics Data System (ADS)

Elliot, William J.; Miller, Mary Ellen; Dobre, Mariana

2016-04-01

Following forest fires, peak flows in perennial and ephemeral streams often increase by a factor of 10 or more. This increase in peak flow rate may overwhelm existing downstream structures, such as road culverts, causing serious damage to road fills at stream crossings. In order to predict peak flow rates following wildfires, we have applied two different tools. One is based on the U.S.D.A Natural Resource Conservation Service Curve Number Method (CN), and the other is by applying the Water Erosion Prediction Project (WEPP) to the watershed. In our presentation, we will describe the science behind the two methods, and present the main variables for each model. We will then provide an example of a comparison of the two methods to a fire-prone watershed upstream of the City of Flagstaff, Arizona, USA, where a fire spread model was applied for current fuel loads, and for likely fuel loads following a fuel reduction treatment. When applying the curve number method, determining the time to peak flow can be problematic for low severity fires because the runoff flow paths are both surface and through shallow lateral flow. The WEPP watershed version incorporates shallow lateral flow into stream channels. However, the version of the WEPP model that was used for this study did not have channel routing capabilities, but rather relied on regression relationships to estimate peak flows from individual hillslope polygon peak runoff rates. We found that the two methods gave similar results if applied correctly, with the WEPP predictions somewhat greater than the CN predictions. Later releases of the WEPP model have incorporated alternative methods for routing peak flows that need to be evaluated.
Predictive modeling and reducing cyclic variability in autoignition engines

DOEpatents

Hellstrom, Erik; Stefanopoulou, Anna; Jiang, Li; Larimore, Jacob

2016-08-30

Methods and systems are provided for controlling a vehicle engine to reduce cycle-to-cycle combustion variation. A predictive model is applied to predict cycle-to-cycle combustion behavior of an engine based on observed engine performance variables. Conditions are identified, based on the predicted cycle-to-cycle combustion behavior, that indicate high cycle-to-cycle combustion variation. Corrective measures are then applied to prevent the predicted high cycle-to-cycle combustion variation.
Applying Mondrian Cross-Conformal Prediction To Estimate Prediction Confidence on Large Imbalanced Bioactivity Data Sets.

PubMed

Sun, Jiangming; Carlsson, Lars; Ahlberg, Ernst; Norinder, Ulf; Engkvist, Ola; Chen, Hongming

2017-07-24

Conformal prediction has been proposed as a more rigorous way to define prediction confidence compared to other application domain concepts that have earlier been used for QSAR modeling. One main advantage of such a method is that it provides a prediction region potentially with multiple predicted labels, which contrasts to the single valued (regression) or single label (classification) output predictions by standard QSAR modeling algorithms. Standard conformal prediction might not be suitable for imbalanced data sets. Therefore, Mondrian cross-conformal prediction (MCCP) which combines the Mondrian inductive conformal prediction with cross-fold calibration sets has been introduced. In this study, the MCCP method was applied to 18 publicly available data sets that have various imbalance levels varying from 1:10 to 1:1000 (ratio of active/inactive compounds). Our results show that MCCP in general performed well on bioactivity data sets with various imbalance levels. More importantly, the method not only provides confidence of prediction and prediction regions compared to standard machine learning methods but also produces valid predictions for the minority class. In addition, a compound similarity based nonconformity measure was investigated. Our results demonstrate that although it gives valid predictions, its efficiency is much worse than that of model dependent metrics.
A note on evaluating VAN earthquake predictions

NASA Astrophysics Data System (ADS)

Tselentis, G.-Akis; Melis, Nicos S.

The evaluation of the success level of an earthquake prediction method should not be based on approaches that apply generalized strict statistical laws and avoid the specific nature of the earthquake phenomenon. Fault rupture processes cannot be compared to gambling processes. The outcome of the present note is that even an ideal earthquake prediction method is still shown to be a matter of a “chancy” association between precursors and earthquakes if we apply the same procedure proposed by Mulargia and Gasperini [1992] in evaluating VAN earthquake predictions. Each individual VAN prediction has to be evaluated separately, taking always into account the specific circumstances and information available. The success level of epicenter prediction should depend on the earthquake magnitude, and magnitude and time predictions may depend on earthquake clustering and the tectonic regime respectively.
A Novel Calibration-Minimum Method for Prediction of Mole Fraction in Non-Ideal Mixture.

PubMed

Shibayama, Shojiro; Kaneko, Hiromasa; Funatsu, Kimito

2017-04-01

This article proposes a novel concentration prediction model that requires little training data and is useful for rapid process understanding. Process analytical technology is currently popular, especially in the pharmaceutical industry, for enhancement of process understanding and process control. A calibration-free method, iterative optimization technology (IOT), was proposed to predict pure component concentrations, because calibration methods such as partial least squares, require a large number of training samples, leading to high costs. However, IOT cannot be applied to concentration prediction in non-ideal mixtures because its basic equation is derived from the Beer-Lambert law, which cannot be applied to non-ideal mixtures. We proposed a novel method that realizes prediction of pure component concentrations in mixtures from a small number of training samples, assuming that spectral changes arising from molecular interactions can be expressed as a function of concentration. The proposed method is named IOT with virtual molecular interaction spectra (IOT-VIS) because the method takes spectral change as a virtual spectrum x nonlin,i into account. It was confirmed through the two case studies that the predictive accuracy of IOT-VIS was the highest among existing IOT methods.
Assessment and Validation of Machine Learning Methods for Predicting Molecular Atomization Energies.

PubMed

Hansen, Katja; Montavon, Grégoire; Biegler, Franziska; Fazli, Siamac; Rupp, Matthias; Scheffler, Matthias; von Lilienfeld, O Anatole; Tkatchenko, Alexandre; Müller, Klaus-Robert

2013-08-13

The accurate and reliable prediction of properties of molecules typically requires computationally intensive quantum-chemical calculations. Recently, machine learning techniques applied to ab initio calculations have been proposed as an efficient approach for describing the energies of molecules in their given ground-state structure throughout chemical compound space (Rupp et al. Phys. Rev. Lett. 2012, 108, 058301). In this paper we outline a number of established machine learning techniques and investigate the influence of the molecular representation on the methods performance. The best methods achieve prediction errors of 3 kcal/mol for the atomization energies of a wide variety of molecules. Rationales for this performance improvement are given together with pitfalls and challenges when applying machine learning approaches to the prediction of quantum-mechanical observables.
Calibration of decadal ensemble predictions

NASA Astrophysics Data System (ADS)

Pasternack, Alexander; Rust, Henning W.; Bhend, Jonas; Liniger, Mark; Grieger, Jens; Müller, Wolfgang; Ulbrich, Uwe

2017-04-01

Decadal climate predictions are of great socio-economic interest due to the corresponding planning horizons of several political and economic decisions. Due to uncertainties of weather and climate, forecasts (e.g. due to initial condition uncertainty), they are issued in a probabilistic way. One issue frequently observed for probabilistic forecasts is that they tend to be not reliable, i.e. the forecasted probabilities are not consistent with the relative frequency of the associated observed events. Thus, these kind of forecasts need to be re-calibrated. While re-calibration methods for seasonal time scales are available and frequently applied, these methods still have to be adapted for decadal time scales and its characteristic problems like climate trend and lead time dependent bias. Regarding this, we propose a method to re-calibrate decadal ensemble predictions that takes the above mentioned characteristics into account. Finally, this method will be applied and validated to decadal forecasts from the MiKlip system (Germany's initiative for decadal prediction).
Uncertainty Estimation using Bootstrapped Kriging Predictions for Precipitation Isoscapes

NASA Astrophysics Data System (ADS)

Ma, C.; Bowen, G. J.; Vander Zanden, H.; Wunder, M.

2017-12-01

Isoscapes are spatial models representing the distribution of stable isotope values across landscapes. Isoscapes of hydrogen and oxygen in precipitation are now widely used in a diversity of fields, including geology, biology, hydrology, and atmospheric science. To generate isoscapes, geostatistical methods are typically applied to extend predictions from limited data measurements. Kriging is a popular method in isoscape modeling, but quantifying the uncertainty associated with the resulting isoscapes is challenging. Applications that use precipitation isoscapes to determine sample origin require estimation of uncertainty. Here we present a simple bootstrap method (SBM) to estimate the mean and uncertainty of the krigged isoscape and compare these results with a generalized bootstrap method (GBM) applied in previous studies. We used hydrogen isotopic data from IsoMAP to explore these two approaches for estimating uncertainty. We conducted 10 simulations for each bootstrap method and found that SBM results in more kriging predictions (9/10) compared to GBM (4/10). Prediction from SBM was closer to the original prediction generated without bootstrapping and had less variance than GBM. SBM was tested on different datasets from IsoMAP with different numbers of observation sites. We determined that predictions from the datasets with fewer than 40 observation sites using SBM were more variable than the original prediction. The approaches we used for estimating uncertainty will be compiled in an R package that is under development. We expect that these robust estimates of precipitation isoscape uncertainty can be applied in diagnosing the origin of samples ranging from various type of waters to migratory animals, food products, and humans.
Decision curve analysis: a novel method for evaluating prediction models.

PubMed

Vickers, Andrew J; Elkin, Elena B

2006-01-01

Diagnostic and prognostic models are typically evaluated with measures of accuracy that do not address clinical consequences. Decision-analytic techniques allow assessment of clinical outcomes but often require collection of additional information and may be cumbersome to apply to models that yield a continuous result. The authors sought a method for evaluating and comparing prediction models that incorporates clinical consequences,requires only the data set on which the models are tested,and can be applied to models that have either continuous or dichotomous results. The authors describe decision curve analysis, a simple, novel method of evaluating predictive models. They start by assuming that the threshold probability of a disease or event at which a patient would opt for treatment is informative of how the patient weighs the relative harms of a false-positive and a false-negative prediction. This theoretical relationship is then used to derive the net benefit of the model across different threshold probabilities. Plotting net benefit against threshold probability yields the "decision curve." The authors apply the method to models for the prediction of seminal vesicle invasion in prostate cancer patients. Decision curve analysis identified the range of threshold probabilities in which a model was of value, the magnitude of benefit, and which of several models was optimal. Decision curve analysis is a suitable method for evaluating alternative diagnostic and prognostic strategies that has advantages over other commonly used measures and techniques.
Probabilistic Analysis of Aircraft Gas Turbine Disk Life and Reliability

NASA Technical Reports Server (NTRS)

Melis, Matthew E.; Zaretsky, Erwin V.; August, Richard

1999-01-01

Two series of low cycle fatigue (LCF) test data for two groups of different aircraft gas turbine engine compressor disk geometries were reanalyzed and compared using Weibull statistics. Both groups of disks were manufactured from titanium (Ti-6Al-4V) alloy. A NASA Glenn Research Center developed probabilistic computer code Probable Cause was used to predict disk life and reliability. A material-life factor A was determined for titanium (Ti-6Al-4V) alloy based upon fatigue disk data and successfully applied to predict the life of the disks as a function of speed. A comparison was made with the currently used life prediction method based upon crack growth rate. Applying an endurance limit to the computer code did not significantly affect the predicted lives under engine operating conditions. Failure location prediction correlates with those experimentally observed in the LCF tests. A reasonable correlation was obtained between the predicted disk lives using the Probable Cause code and a modified crack growth method for life prediction. Both methods slightly overpredict life for one disk group and significantly under predict it for the other.
Applications of information theory, genetic algorithms, and neural models to predict oil flow

NASA Astrophysics Data System (ADS)

Ludwig, Oswaldo; Nunes, Urbano; Araújo, Rui; Schnitman, Leizer; Lepikson, Herman Augusto

2009-07-01

This work introduces a new information-theoretic methodology for choosing variables and their time lags in a prediction setting, particularly when neural networks are used in non-linear modeling. The first contribution of this work is the Cross Entropy Function (XEF) proposed to select input variables and their lags in order to compose the input vector of black-box prediction models. The proposed XEF method is more appropriate than the usually applied Cross Correlation Function (XCF) when the relationship among the input and output signals comes from a non-linear dynamic system. The second contribution is a method that minimizes the Joint Conditional Entropy (JCE) between the input and output variables by means of a Genetic Algorithm (GA). The aim is to take into account the dependence among the input variables when selecting the most appropriate set of inputs for a prediction problem. In short, theses methods can be used to assist the selection of input training data that have the necessary information to predict the target data. The proposed methods are applied to a petroleum engineering problem; predicting oil production. Experimental results obtained with a real-world dataset are presented demonstrating the feasibility and effectiveness of the method.

Applicability of a gene expression based prediction method to SD and Wistar rats: an example of CARCINOscreen®.

PubMed

Matsumoto, Hiroshi; Saito, Fumiyo; Takeyoshi, Masahiro

2015-12-01

Recently, the development of several gene expression-based prediction methods has been attempted in the fields of toxicology. CARCINOscreen® is a gene expression-based screening method to predict carcinogenicity of chemicals which target the liver with high accuracy. In this study, we investigated the applicability of the gene expression-based screening method to SD and Wistar rats by using CARCINOscreen®, originally developed with F344 rats, with two carcinogens, 2,4-diaminotoluen and thioacetamide, and two non-carcinogens, 2,6-diaminotoluen and sodium benzoate. After the 28-day repeated dose test was conducted with each chemical in SD and Wistar rats, microarray analysis was performed using total RNA extracted from each liver. Obtained gene expression data were applied to CARCINOscreen®. Predictive scores obtained by the CARCINOscreen® for known carcinogens were > 2 in all strains of rats, while non-carcinogens gave prediction scores below 0.5. These results suggested that the gene expression based screening method, CARCINOscreen®, can be applied to SD and Wistar rats, widely used strains in toxicological studies, by setting of an appropriate boundary line of prediction score to classify the chemicals into carcinogens and non-carcinogens.
Method and apparatus for sensor fusion

NASA Technical Reports Server (NTRS)

Krishen, Kumar (Inventor); Shaw, Scott (Inventor); Defigueiredo, Rui J. P. (Inventor)

1991-01-01

Method and apparatus for fusion of data from optical and radar sensors by error minimization procedure is presented. The method was applied to the problem of shape reconstruction of an unknown surface at a distance. The method involves deriving an incomplete surface model from an optical sensor. The unknown characteristics of the surface are represented by some parameter. The correct value of the parameter is computed by iteratively generating theoretical predictions of the radar cross sections (RCS) of the surface, comparing the predicted and the observed values for the RCS, and improving the surface model from results of the comparison. Theoretical RCS may be computed from the surface model in several ways. One RCS prediction technique is the method of moments. The method of moments can be applied to an unknown surface only if some shape information is available from an independent source. The optical image provides the independent information.
Study of Uncertainties of Predicting Space Shuttle Thermal Environment. [impact of heating rate prediction errors on weight of thermal protection system

NASA Technical Reports Server (NTRS)

Fehrman, A. L.; Masek, R. V.

1972-01-01

Quantitative estimates of the uncertainty in predicting aerodynamic heating rates for a fully reusable space shuttle system are developed and the impact of these uncertainties on Thermal Protection System (TPS) weight are discussed. The study approach consisted of statistical evaluations of the scatter of heating data on shuttle configurations about state-of-the-art heating prediction methods to define the uncertainty in these heating predictions. The uncertainties were then applied as heating rate increments to the nominal predicted heating rate to define the uncertainty in TPS weight. Separate evaluations were made for the booster and orbiter, for trajectories which included boost through reentry and touchdown. For purposes of analysis, the vehicle configuration is divided into areas in which a given prediction method is expected to apply, and separate uncertainty factors and corresponding uncertainty in TPS weight derived for each area.
Streamflow Prediction based on Chaos Theory

NASA Astrophysics Data System (ADS)

Li, X.; Wang, X.; Babovic, V. M.

2015-12-01

Chaos theory is a popular method in hydrologic time series prediction. Local model (LM) based on this theory utilizes time-delay embedding to reconstruct the phase-space diagram. For this method, its efficacy is dependent on the embedding parameters, i.e. embedding dimension, time lag, and nearest neighbor number. The optimal estimation of these parameters is thus critical to the application of Local model. However, these embedding parameters are conventionally estimated using Average Mutual Information (AMI) and False Nearest Neighbors (FNN) separately. This may leads to local optimization and thus has limitation to its prediction accuracy. Considering about these limitation, this paper applies a local model combined with simulated annealing (SA) to find the global optimization of embedding parameters. It is also compared with another global optimization approach of Genetic Algorithm (GA). These proposed hybrid methods are applied in daily and monthly streamflow time series for examination. The results show that global optimization can contribute to the local model to provide more accurate prediction results compared with local optimization. The LM combined with SA shows more advantages in terms of its computational efficiency. The proposed scheme here can also be applied to other fields such as prediction of hydro-climatic time series, error correction, etc.
Using the surface panel method to predict the steady performance of ducted propellers

NASA Astrophysics Data System (ADS)

Cai, Hao-Peng; Su, Yu-Min; Li, Xin; Shen, Hai-Long

2009-12-01

A new numerical method was developed for predicting the steady hydrodynamic performance of ducted propellers. A potential based surface panel method was applied both to the duct and the propeller, and the interaction between them was solved by an induced velocity potential iterative method. Compared with the induced velocity iterative method, the method presented can save programming and calculating time. Numerical results for a JD simplified ducted propeller series showed that the method presented is effective for predicting the steady hydrodynamic performance of ducted propellers.
Life Prediction/Reliability Data of Glass-Ceramic Material Determined for Radome Applications

NASA Technical Reports Server (NTRS)

Choi, Sung R.; Gyekenyesi, John P.

2002-01-01

Brittle materials, ceramics, are candidate materials for a variety of structural applications for a wide range of temperatures. However, the process of slow crack growth, occurring in any loading configuration, limits the service life of structural components. Therefore, it is important to accurately determine the slow crack growth parameters required for component life prediction using an appropriate test methodology. This test methodology also should be useful in determining the influence of component processing and composition variables on the slow crack growth behavior of newly developed or existing materials, thereby allowing the component processing and composition to be tailored and optimized to specific needs. Through the American Society for Testing and Materials (ASTM), the authors recently developed two test methods to determine the life prediction parameters of ceramics. The two test standards, ASTM 1368 for room temperature and ASTM C 1465 for elevated temperatures, were published in the 2001 Annual Book of ASTM Standards, Vol. 15.01. Briefly, the test method employs constant stress-rate (or dynamic fatigue) testing to determine flexural strengths as a function of the applied stress rate. The merit of this test method lies in its simplicity: strengths are measured in a routine manner in flexure at four or more applied stress rates with an appropriate number of test specimens at each applied stress rate. The slow crack growth parameters necessary for life prediction are then determined from a simple relationship between the strength and the applied stress rate. Extensive life prediction testing was conducted at the NASA Glenn Research Center using the developed ASTM C 1368 test method to determine the life prediction parameters of a glass-ceramic material that the Navy will use for radome applications.
Building proteins from C alpha coordinates using the dihedral probability grid Monte Carlo method.

PubMed Central

Mathiowetz, A. M.; Goddard, W. A.

1995-01-01

Dihedral probability grid Monte Carlo (DPG-MC) is a general-purpose method of conformational sampling that can be applied to many problems in peptide and protein modeling. Here we present the DPG-MC method and apply it to predicting complete protein structures from C alpha coordinates. This is useful in such endeavors as homology modeling, protein structure prediction from lattice simulations, or fitting protein structures to X-ray crystallographic data. It also serves as an example of how DPG-MC can be applied to systems with geometric constraints. The conformational propensities for individual residues are used to guide conformational searches as the protein is built from the amino-terminus to the carboxyl-terminus. Results for a number of proteins show that both the backbone and side chain can be accurately modeled using DPG-MC. Backbone atoms are generally predicted with RMS errors of about 0.5 A (compared to X-ray crystal structure coordinates) and all atoms are predicted to an RMS error of 1.7 A or better. PMID:7549885
Comparison of theoretically predicted lateral-directional aerodynamic characteristics with full-scale wind tunnel data on the ATLIT airplane

NASA Technical Reports Server (NTRS)

Griswold, M.; Roskam, J.

1980-01-01

An analytical method is presented for predicting lateral-directional aerodynamic characteristics of light twin engine propeller-driven airplanes. This method is applied to the Advanced Technology Light Twin Engine airplane. The calculated characteristics are correlated against full-scale wind tunnel data. The method predicts the sideslip derivatives fairly well, although angle of attack variations are not well predicted. Spoiler performance was predicted somewhat high but was still reasonable. The rudder derivatives were not well predicted, in particular the effect of angle of attack. The predicted dynamic derivatives could not be correlated due to lack of experimental data.
Probabilistic framework for product design optimization and risk management

NASA Astrophysics Data System (ADS)

Keski-Rahkonen, J. K.

2018-05-01

Probabilistic methods have gradually gained ground within engineering practices but currently it is still the industry standard to use deterministic safety margin approaches to dimensioning components and qualitative methods to manage product risks. These methods are suitable for baseline design work but quantitative risk management and product reliability optimization require more advanced predictive approaches. Ample research has been published on how to predict failure probabilities for mechanical components and furthermore to optimize reliability through life cycle cost analysis. This paper reviews the literature for existing methods and tries to harness their best features and simplify the process to be applicable in practical engineering work. Recommended process applies Monte Carlo method on top of load-resistance models to estimate failure probabilities. Furthermore, it adds on existing literature by introducing a practical framework to use probabilistic models in quantitative risk management and product life cycle costs optimization. The main focus is on mechanical failure modes due to the well-developed methods used to predict these types of failures. However, the same framework can be applied on any type of failure mode as long as predictive models can be developed.
Comparison of RCS prediction techniques, computations and measurements

NASA Astrophysics Data System (ADS)

Brand, M. G. E.; Vanewijk, L. J.; Klinker, F.; Schippers, H.

1992-07-01

Three calculation methods to predict radar cross sections (RCS) of three dimensional objects are evaluated by computing the radar cross sections of a generic wing inlet configuration. The following methods are applied: a three dimensional high frequency method, a three dimensional boundary element method, and a two dimensional finite difference time domain method. The results of the computations are compared with the data of measurements.
Application of ride quality technology to predict ride satisfaction for commuter-type aircraft

NASA Technical Reports Server (NTRS)

Jacobson, I. D.; Kuhlthau, A. R.; Richards, L. G.

1975-01-01

A method was developed to predict passenger satisfaction with the ride environment of a transportation vehicle. This method, a general approach, was applied to a commuter-type aircraft for illustrative purposes. The effect of terrain, altitude and seat location were examined. The method predicts the variation in passengers satisfied for any set of flight conditions. In addition several noncommuter aircraft were analyzed for comparison and other uses of the model described. The method has advantages for design, evaluation, and operating decisions.
Prediction of shear critical behavior of high-strength reinforced concrete columns using finite element methods

NASA Astrophysics Data System (ADS)

Alrasyid, Harun; Safi, Fahrudin; Iranata, Data; Chen-Ou, Yu

2017-11-01

This research shows the prediction of shear behavior of High-Strength Reinforced Concrete Columns using Finite-Element Method. The experimental data of nine half scale high-strength reinforced concrete were selected. These columns using specified concrete compressive strength of 70 MPa, specified yield strength of longitudinal and transverse reinforcement of 685 and 785 MPa, respectively. The VecTor2 finite element software was used to simulate the shear critical behavior of these columns. The combination axial compression load and monotonic loading were applied at this prediction. It is demonstrated that VecTor2 finite element software provides accurate prediction of load-deflection up to peak at applied load, but provide similar behavior at post peak load. The shear strength prediction provide by VecTor 2 are slightly conservative compare to test result.
Identification of informative features for predicting proinflammatory potentials of engine exhausts.

PubMed

Wang, Chia-Chi; Lin, Ying-Chi; Lin, Yuan-Chung; Jhang, Syu-Ruei; Tung, Chun-Wei

2017-08-18

The immunotoxicity of engine exhausts is of high concern to human health due to the increasing prevalence of immune-related diseases. However, the evaluation of immunotoxicity of engine exhausts is currently based on expensive and time-consuming experiments. It is desirable to develop efficient methods for immunotoxicity assessment. To accelerate the development of safe alternative fuels, this study proposed a computational method for identifying informative features for predicting proinflammatory potentials of engine exhausts. A principal component regression (PCR) algorithm was applied to develop prediction models. The informative features were identified by a sequential backward feature elimination (SBFE) algorithm. A total of 19 informative chemical and biological features were successfully identified by SBFE algorithm. The informative features were utilized to develop a computational method named FS-CBM for predicting proinflammatory potentials of engine exhausts. FS-CBM model achieved a high performance with correlation coefficient values of 0.997 and 0.943 obtained from training and independent test sets, respectively. The FS-CBM model was developed for predicting proinflammatory potentials of engine exhausts with a large improvement on prediction performance compared with our previous CBM model. The proposed method could be further applied to construct models for bioactivities of mixtures.
Data-driven forecasting algorithms for building energy consumption

NASA Astrophysics Data System (ADS)

Noh, Hae Young; Rajagopal, Ram

2013-04-01

This paper introduces two forecasting methods for building energy consumption data that are recorded from smart meters in high resolution. For utility companies, it is important to reliably forecast the aggregate consumption profile to determine energy supply for the next day and prevent any crisis. The proposed methods involve forecasting individual load on the basis of their measurement history and weather data without using complicated models of building system. The first method is most efficient for a very short-term prediction, such as the prediction period of one hour, and uses a simple adaptive time-series model. For a longer-term prediction, a nonparametric Gaussian process has been applied to forecast the load profiles and their uncertainty bounds to predict a day-ahead. These methods are computationally simple and adaptive and thus suitable for analyzing a large set of data whose pattern changes over the time. These forecasting methods are applied to several sets of building energy consumption data for lighting and heating-ventilation-air-conditioning (HVAC) systems collected from a campus building at Stanford University. The measurements are collected every minute, and corresponding weather data are provided hourly. The results show that the proposed algorithms can predict those energy consumption data with high accuracy.
Ultra-Short-Term Wind Power Prediction Using a Hybrid Model

NASA Astrophysics Data System (ADS)

Mohammed, E.; Wang, S.; Yu, J.

2017-05-01

This paper aims to develop and apply a hybrid model of two data analytical methods, multiple linear regressions and least square (MLR&LS), for ultra-short-term wind power prediction (WPP), for example taking, Northeast China electricity demand. The data was obtained from the historical records of wind power from an offshore region, and from a wind farm of the wind power plant in the areas. The WPP achieved in two stages: first, the ratios of wind power were forecasted using the proposed hybrid method, and then the transformation of these ratios of wind power to obtain forecasted values. The hybrid model combines the persistence methods, MLR and LS. The proposed method included two prediction types, multi-point prediction and single-point prediction. WPP is tested by applying different models such as autoregressive moving average (ARMA), autoregressive integrated moving average (ARIMA) and artificial neural network (ANN). By comparing results of the above models, the validity of the proposed hybrid model is confirmed in terms of error and correlation coefficient. Comparison of results confirmed that the proposed method works effectively. Additional, forecasting errors were also computed and compared, to improve understanding of how to depict highly variable WPP and the correlations between actual and predicted wind power.
ADOT state-specific crash prediction models : an Arizona needs study.

DOT National Transportation Integrated Search

2016-12-01

The predictive method in the Highway Safety Manual (HSM) includes a safety performance function (SPF), : crash modification factors (CMFs), and a local calibration factor (C), if available. Two alternatives exist for : applying the HSM prediction met...
A Regionalized Flow Duration Curve Method to Predict Streamflow for Ungauaged Basins: A Case Study of the Rappahannock Watershed in Virginia, USA

EPA Science Inventory

A method to predict streamflow for ungauged basins of the Mid-Atlantic Region, USA was applied to the Rappahannock watershed in Virginia, USA. The method separates streamflow time series into magnitude and time sequence components. It uses the regionalized flow duration curve (RF...
Paroxysmal atrial fibrillation prediction method with shorter HRV sequences.

PubMed

Boon, K H; Khalil-Hani, M; Malarvili, M B; Sia, C W

2016-10-01

This paper proposes a method that predicts the onset of paroxysmal atrial fibrillation (PAF), using heart rate variability (HRV) segments that are shorter than those applied in existing methods, while maintaining good prediction accuracy. PAF is a common cardiac arrhythmia that increases the health risk of a patient, and the development of an accurate predictor of the onset of PAF is clinical important because it increases the possibility to stabilize (electrically) and prevent the onset of atrial arrhythmias with different pacing techniques. We investigate the effect of HRV features extracted from different lengths of HRV segments prior to PAF onset with the proposed PAF prediction method. The pre-processing stage of the predictor includes QRS detection, HRV quantification and ectopic beat correction. Time-domain, frequency-domain, non-linear and bispectrum features are then extracted from the quantified HRV. In the feature selection, the HRV feature set and classifier parameters are optimized simultaneously using an optimization procedure based on genetic algorithm (GA). Both full feature set and statistically significant feature subset are optimized by GA respectively. For the statistically significant feature subset, Mann-Whitney U test is used to filter non-statistical significance features that cannot pass the statistical test at 20% significant level. The final stage of our predictor is the classifier that is based on support vector machine (SVM). A 10-fold cross-validation is applied in performance evaluation, and the proposed method achieves 79.3% prediction accuracy using 15-minutes HRV segment. This accuracy is comparable to that achieved by existing methods that use 30-minutes HRV segments, most of which achieves accuracy of around 80%. More importantly, our method significantly outperforms those that applied segments shorter than 30 minutes. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Accurate prediction of secondary metabolite gene clusters in filamentous fungi.

PubMed

Andersen, Mikael R; Nielsen, Jakob B; Klitgaard, Andreas; Petersen, Lene M; Zachariasen, Mia; Hansen, Tilde J; Blicher, Lene H; Gotfredsen, Charlotte H; Larsen, Thomas O; Nielsen, Kristian F; Mortensen, Uffe H

2013-01-02

Biosynthetic pathways of secondary metabolites from fungi are currently subject to an intense effort to elucidate the genetic basis for these compounds due to their large potential within pharmaceutics and synthetic biochemistry. The preferred method is methodical gene deletions to identify supporting enzymes for key synthases one cluster at a time. In this study, we design and apply a DNA expression array for Aspergillus nidulans in combination with legacy data to form a comprehensive gene expression compendium. We apply a guilt-by-association-based analysis to predict the extent of the biosynthetic clusters for the 58 synthases active in our set of experimental conditions. A comparison with legacy data shows the method to be accurate in 13 of 16 known clusters and nearly accurate for the remaining 3 clusters. Furthermore, we apply a data clustering approach, which identifies cross-chemistry between physically separate gene clusters (superclusters), and validate this both with legacy data and experimentally by prediction and verification of a supercluster consisting of the synthase AN1242 and the prenyltransferase AN11080, as well as identification of the product compound nidulanin A. We have used A. nidulans for our method development and validation due to the wealth of available biochemical data, but the method can be applied to any fungus with a sequenced and assembled genome, thus supporting further secondary metabolite pathway elucidation in the fungal kingdom.
Prediction in Health Domain Using Bayesian Networks Optimization Based on Induction Learning Techniques

NASA Astrophysics Data System (ADS)

Felgaer, Pablo; Britos, Paola; García-Martínez, Ramón

A Bayesian network is a directed acyclic graph in which each node represents a variable and each arc a probabilistic dependency; they are used to provide: a compact form to represent the knowledge and flexible methods of reasoning. Obtaining it from data is a learning process that is divided in two steps: structural learning and parametric learning. In this paper we define an automatic learning method that optimizes the Bayesian networks applied to classification, using a hybrid method of learning that combines the advantages of the induction techniques of the decision trees (TDIDT-C4.5) with those of the Bayesian networks. The resulting method is applied to prediction in health domain.

The predictive validity of selection for entry into postgraduate training in general practice: evidence from three longitudinal studies

PubMed Central

Patterson, Fiona; Lievens, Filip; Kerrin, Máire; Munro, Neil; Irish, Bill

2013-01-01

Background The selection methodology for UK general practice is designed to accommodate several thousand applicants per year and targets six core attributes identified in a multi-method job-analysis study Aim To evaluate the predictive validity of selection methods for entry into postgraduate training, comprising a clinical problem-solving test, a situational judgement test, and a selection centre. Design and setting A three-part longitudinal predictive validity study of selection into training for UK general practice. Method In sample 1, participants were junior doctors applying for training in general practice (n = 6824). In sample 2, participants were GP registrars 1 year into training (n = 196). In sample 3, participants were GP registrars sitting the licensing examination after 3 years, at the end of training (n = 2292). The outcome measures include: assessor ratings of performance in a selection centre comprising job simulation exercises (sample 1); supervisor ratings of trainee job performance 1 year into training (sample 2); and licensing examination results, including an applied knowledge examination and a 12-station clinical skills objective structured clinical examination (OSCE; sample 3). Results Performance ratings at selection predicted subsequent supervisor ratings of job performance 1 year later. Selection results also significantly predicted performance on both the clinical skills OSCE and applied knowledge examination for licensing at the end of training. Conclusion In combination, these longitudinal findings provide good evidence of the predictive validity of the selection methods, and are the first reported for entry into postgraduate training. Results show that the best predictor of work performance and training outcomes is a combination of a clinical problem-solving test, a situational judgement test, and a selection centre. Implications for selection methods for all postgraduate specialties are considered. PMID:24267856
The Prediction of the Gas Utilization Ratio Based on TS Fuzzy Neural Network and Particle Swarm Optimization

PubMed Central

Jiang, Haihe; Yin, Yixin; Xiao, Wendong; Zhao, Baoyong

2018-01-01

Gas utilization ratio (GUR) is an important indicator that is used to evaluate the energy consumption of blast furnaces (BFs). Currently, the existing methods cannot predict the GUR accurately. In this paper, we present a novel data-driven model for predicting the GUR. The proposed approach utilized both the TS fuzzy neural network (TS-FNN) and the particle swarm algorithm (PSO) to predict the GUR. The particle swarm algorithm (PSO) is applied to optimize the parameters of the TS-FNN in order to decrease the error caused by the inaccurate initial parameter. This paper also applied the box graph (Box-plot) method to eliminate the abnormal value of the raw data during the data preprocessing. This method can deal with the data which does not obey the normal distribution which is caused by the complex industrial environments. The prediction results demonstrate that the optimization model based on PSO and the TS-FNN approach achieves higher prediction accuracy compared with the TS-FNN model and SVM model and the proposed approach can accurately predict the GUR of the blast furnace, providing an effective way for the on-line blast furnace distribution control. PMID:29461469
The Prediction of the Gas Utilization Ratio based on TS Fuzzy Neural Network and Particle Swarm Optimization.

PubMed

Zhang, Sen; Jiang, Haihe; Yin, Yixin; Xiao, Wendong; Zhao, Baoyong

2018-02-20

Gas utilization ratio (GUR) is an important indicator that is used to evaluate the energy consumption of blast furnaces (BFs). Currently, the existing methods cannot predict the GUR accurately. In this paper, we present a novel data-driven model for predicting the GUR. The proposed approach utilized both the TS fuzzy neural network (TS-FNN) and the particle swarm algorithm (PSO) to predict the GUR. The particle swarm algorithm (PSO) is applied to optimize the parameters of the TS-FNN in order to decrease the error caused by the inaccurate initial parameter. This paper also applied the box graph (Box-plot) method to eliminate the abnormal value of the raw data during the data preprocessing. This method can deal with the data which does not obey the normal distribution which is caused by the complex industrial environments. The prediction results demonstrate that the optimization model based on PSO and the TS-FNN approach achieves higher prediction accuracy compared with the TS-FNN model and SVM model and the proposed approach can accurately predict the GUR of the blast furnace, providing an effective way for the on-line blast furnace distribution control.
A fast and accurate method to predict 2D and 3D aerodynamic boundary layer flows

NASA Astrophysics Data System (ADS)

Bijleveld, H. A.; Veldman, A. E. P.

2014-12-01

A quasi-simultaneous interaction method is applied to predict 2D and 3D aerodynamic flows. This method is suitable for offshore wind turbine design software as it is a very accurate and computationally reasonably cheap method. This study shows the results for a NACA 0012 airfoil. The two applied solvers converge to the experimental values when the grid is refined. We also show that in separation the eigenvalues remain positive thus avoiding the Goldstein singularity at separation. In 3D we show a flow over a dent in which separation occurs. A rotating flat plat is used to show the applicability of the method for rotating flows. The shown capabilities of the method indicate that the quasi-simultaneous interaction method is suitable for design methods for offshore wind turbine blades.
A Hybrid Supervised/Unsupervised Machine Learning Approach to Solar Flare Prediction

NASA Astrophysics Data System (ADS)

Benvenuto, Federico; Piana, Michele; Campi, Cristina; Massone, Anna Maria

2018-01-01

This paper introduces a novel method for flare forecasting, combining prediction accuracy with the ability to identify the most relevant predictive variables. This result is obtained by means of a two-step approach: first, a supervised regularization method for regression, namely, LASSO is applied, where a sparsity-enhancing penalty term allows the identification of the significance with which each data feature contributes to the prediction; then, an unsupervised fuzzy clustering technique for classification, namely, Fuzzy C-Means, is applied, where the regression outcome is partitioned through the minimization of a cost function and without focusing on the optimization of a specific skill score. This approach is therefore hybrid, since it combines supervised and unsupervised learning; realizes classification in an automatic, skill-score-independent way; and provides effective prediction performances even in the case of imbalanced data sets. Its prediction power is verified against NOAA Space Weather Prediction Center data, using as a test set, data in the range between 1996 August and 2010 December and as training set, data in the range between 1988 December and 1996 June. To validate the method, we computed several skill scores typically utilized in flare prediction and compared the values provided by the hybrid approach with the ones provided by several standard (non-hybrid) machine learning methods. The results showed that the hybrid approach performs classification better than all other supervised methods and with an effectiveness comparable to the one of clustering methods; but, in addition, it provides a reliable ranking of the weights with which the data properties contribute to the forecast.
Comprehensive Approach to Verification and Validation of CFD Simulations Applied to Backward Facing Step-Application of CFD Uncertainty Analysis

NASA Technical Reports Server (NTRS)

Groves, Curtis E.; LLie, Marcel; Shallhorn, Paul A.

2012-01-01

There are inherent uncertainties and errors associated with using Computational Fluid Dynamics (CFD) to predict the flow field and there is no standard method for evaluating uncertainty in the CFD community. This paper describes an approach to -validate the . uncertainty in using CFD. The method will use the state of the art uncertainty analysis applying different turbulence niodels and draw conclusions on which models provide the least uncertainty and which models most accurately predict the flow of a backward facing step.
Negative Example Selection for Protein Function Prediction: The NoGO Database

PubMed Central

Youngs, Noah; Penfold-Brown, Duncan; Bonneau, Richard; Shasha, Dennis

2014-01-01

Negative examples – genes that are known not to carry out a given protein function – are rarely recorded in genome and proteome annotation databases, such as the Gene Ontology database. Negative examples are required, however, for several of the most powerful machine learning methods for integrative protein function prediction. Most protein function prediction efforts have relied on a variety of heuristics for the choice of negative examples. Determining the accuracy of methods for negative example prediction is itself a non-trivial task, given that the Open World Assumption as applied to gene annotations rules out many traditional validation metrics. We present a rigorous comparison of these heuristics, utilizing a temporal holdout, and a novel evaluation strategy for negative examples. We add to this comparison several algorithms adapted from Positive-Unlabeled learning scenarios in text-classification, which are the current state of the art methods for generating negative examples in low-density annotation contexts. Lastly, we present two novel algorithms of our own construction, one based on empirical conditional probability, and the other using topic modeling applied to genes and annotations. We demonstrate that our algorithms achieve significantly fewer incorrect negative example predictions than the current state of the art, using multiple benchmarks covering multiple organisms. Our methods may be applied to generate negative examples for any type of method that deals with protein function, and to this end we provide a database of negative examples in several well-studied organisms, for general use (The NoGO database, available at: bonneaulab.bio.nyu.edu/nogo.html). PMID:24922051
A predictive estimation method for carbon dioxide transport by data-driven modeling with a physically-based data model

NASA Astrophysics Data System (ADS)

Jeong, Jina; Park, Eungyu; Han, Weon Shik; Kim, Kue-Young; Jun, Seong-Chun; Choung, Sungwook; Yun, Seong-Taek; Oh, Junho; Kim, Hyun-Jun

2017-11-01

In this study, a data-driven method for predicting CO2 leaks and associated concentrations from geological CO2 sequestration is developed. Several candidate models are compared based on their reproducibility and predictive capability for CO2 concentration measurements from the Environment Impact Evaluation Test (EIT) site in Korea. Based on the data mining results, a one-dimensional solution of the advective-dispersive equation for steady flow (i.e., Ogata-Banks solution) is found to be most representative for the test data, and this model is adopted as the data model for the developed method. In the validation step, the method is applied to estimate future CO2 concentrations with the reference estimation by the Ogata-Banks solution, where a part of earlier data is used as the training dataset. From the analysis, it is found that the ensemble mean of multiple estimations based on the developed method shows high prediction accuracy relative to the reference estimation. In addition, the majority of the data to be predicted are included in the proposed quantile interval, which suggests adequate representation of the uncertainty by the developed method. Therefore, the incorporation of a reasonable physically-based data model enhances the prediction capability of the data-driven model. The proposed method is not confined to estimations of CO2 concentration and may be applied to various real-time monitoring data from subsurface sites to develop automated control, management or decision-making systems.
Application of XGBoost algorithm in hourly PM2.5 concentration prediction

NASA Astrophysics Data System (ADS)

Pan, Bingyue

2018-02-01

In view of prediction techniques of hourly PM2.5 concentration in China, this paper applied the XGBoost(Extreme Gradient Boosting) algorithm to predict hourly PM2.5 concentration. The monitoring data of air quality in Tianjin city was analyzed by using XGBoost algorithm. The prediction performance of the XGBoost method is evaluated by comparing observed and predicted PM2.5 concentration using three measures of forecast accuracy. The XGBoost method is also compared with the random forest algorithm, multiple linear regression, decision tree regression and support vector machines for regression models using computational results. The results demonstrate that the XGBoost algorithm outperforms other data mining methods.
A comparison of methods to predict historical daily streamflow time series in the southeastern United States

USGS Publications Warehouse

Farmer, William H.; Archfield, Stacey A.; Over, Thomas M.; Hay, Lauren E.; LaFontaine, Jacob H.; Kiang, Julie E.

2015-01-01

Effective and responsible management of water resources relies on a thorough understanding of the quantity and quality of available water. Streamgages cannot be installed at every location where streamflow information is needed. As part of its National Water Census, the U.S. Geological Survey is planning to provide streamflow predictions for ungaged locations. In order to predict streamflow at a useful spatial and temporal resolution throughout the Nation, efficient methods need to be selected. This report examines several methods used for streamflow prediction in ungaged basins to determine the best methods for regional and national implementation. A pilot area in the southeastern United States was selected to apply 19 different streamflow prediction methods and evaluate each method by a wide set of performance metrics. Through these comparisons, two methods emerged as the most generally accurate streamflow prediction methods: the nearest-neighbor implementations of nonlinear spatial interpolation using flow duration curves (NN-QPPQ) and standardizing logarithms of streamflow by monthly means and standard deviations (NN-SMS12L). It was nearly impossible to distinguish between these two methods in terms of performance. Furthermore, neither of these methods requires significantly more parameterization in order to be applied: NN-SMS12L requires 24 regional regressions—12 for monthly means and 12 for monthly standard deviations. NN-QPPQ, in the application described in this study, required 27 regressions of particular quantiles along the flow duration curve. Despite this finding, the results suggest that an optimal streamflow prediction method depends on the intended application. Some methods are stronger overall, while some methods may be better at predicting particular statistics. The methods of analysis presented here reflect a possible framework for continued analysis and comprehensive multiple comparisons of methods of prediction in ungaged basins (PUB). Additional metrics of comparison can easily be incorporated into this type of analysis. By considering such a multifaceted approach, the top-performing models can easily be identified and considered for further research. The top-performing models can then provide a basis for future applications and explorations by scientists, engineers, managers, and practitioners to suit their own needs.
Accurate Simulation of MPPT Methods Performance When Applied to Commercial Photovoltaic Panels

PubMed Central

2015-01-01

A new, simple, and quick-calculation methodology to obtain a solar panel model, based on the manufacturers' datasheet, to perform MPPT simulations, is described. The method takes into account variations on the ambient conditions (sun irradiation and solar cells temperature) and allows fast MPPT methods comparison or their performance prediction when applied to a particular solar panel. The feasibility of the described methodology is checked with four different MPPT methods applied to a commercial solar panel, within a day, and under realistic ambient conditions. PMID:25874262
Accurate simulation of MPPT methods performance when applied to commercial photovoltaic panels.

PubMed

Cubas, Javier; Pindado, Santiago; Sanz-Andrés, Ángel

2015-01-01

A new, simple, and quick-calculation methodology to obtain a solar panel model, based on the manufacturers' datasheet, to perform MPPT simulations, is described. The method takes into account variations on the ambient conditions (sun irradiation and solar cells temperature) and allows fast MPPT methods comparison or their performance prediction when applied to a particular solar panel. The feasibility of the described methodology is checked with four different MPPT methods applied to a commercial solar panel, within a day, and under realistic ambient conditions.
A Predictive Model of Daily Seismic Activity Induced by Mining, Developed with Data Mining Methods

NASA Astrophysics Data System (ADS)

Jakubowski, Jacek

2014-12-01

The article presents the development and evaluation of a predictive classification model of daily seismic energy emissions induced by longwall mining in sector XVI of the Piast coal mine in Poland. The model uses data on tremor energy, basic characteristics of the longwall face and mined output in this sector over the period from July 1987 to March 2011. The predicted binary variable is the occurrence of a daily sum of tremor seismic energies in a longwall that is greater than or equal to the threshold value of 105 J. Three data mining analytical methods were applied: logistic regression,neural networks, and stochastic gradient boosted trees. The boosted trees model was chosen as the best for the purposes of the prediction. The validation sample results showed its good predictive capability, taking the complex nature of the phenomenon into account. This may indicate the applied model's suitability for a sequential, short-term prediction of mining induced seismic activity.
Representation of DNA sequences in genetic codon context with applications in exon and intron prediction.

PubMed

Yin, Changchuan

2015-04-01

To apply digital signal processing (DSP) methods to analyze DNA sequences, the sequences first must be specially mapped into numerical sequences. Thus, effective numerical mappings of DNA sequences play key roles in the effectiveness of DSP-based methods such as exon prediction. Despite numerous mappings of symbolic DNA sequences to numerical series, the existing mapping methods do not include the genetic coding features of DNA sequences. We present a novel numerical representation of DNA sequences using genetic codon context (GCC) in which the numerical values are optimized by simulation annealing to maximize the 3-periodicity signal to noise ratio (SNR). The optimized GCC representation is then applied in exon and intron prediction by Short-Time Fourier Transform (STFT) approach. The results show the GCC method enhances the SNR values of exon sequences and thus increases the accuracy of predicting protein coding regions in genomes compared with the commonly used 4D binary representation. In addition, this study offers a novel way to reveal specific features of DNA sequences by optimizing numerical mappings of symbolic DNA sequences.
Incorporation of local structure into kriging models for the prediction of atomistic properties in the water decamer.

PubMed

Davie, Stuart J; Di Pasquale, Nicodemo; Popelier, Paul L A

2016-10-15

Machine learning algorithms have been demonstrated to predict atomistic properties approaching the accuracy of quantum chemical calculations at significantly less computational cost. Difficulties arise, however, when attempting to apply these techniques to large systems, or systems possessing excessive conformational freedom. In this article, the machine learning method kriging is applied to predict both the intra-atomic and interatomic energies, as well as the electrostatic multipole moments, of the atoms of a water molecule at the center of a 10 water molecule (decamer) cluster. Unlike previous work, where the properties of small water clusters were predicted using a molecular local frame, and where training set inputs (features) were based on atomic index, a variety of feature definitions and coordinate frames are considered here to increase prediction accuracy. It is shown that, for a water molecule at the center of a decamer, no single method of defining features or coordinate schemes is optimal for every property. However, explicitly accounting for the structure of the first solvation shell in the definition of the features of the kriging training set, and centring the coordinate frame on the atom-of-interest will, in general, return better predictions than models that apply the standard methods of feature definition, or a molecular coordinate frame. © 2016 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc. © 2016 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc.
The Next Era: Deep Learning in Pharmaceutical Research.

PubMed

Ekins, Sean

2016-11-01

Over the past decade we have witnessed the increasing sophistication of machine learning algorithms applied in daily use from internet searches, voice recognition, social network software to machine vision software in cameras, phones, robots and self-driving cars. Pharmaceutical research has also seen its fair share of machine learning developments. For example, applying such methods to mine the growing datasets that are created in drug discovery not only enables us to learn from the past but to predict a molecule's properties and behavior in future. The latest machine learning algorithm garnering significant attention is deep learning, which is an artificial neural network with multiple hidden layers. Publications over the last 3 years suggest that this algorithm may have advantages over previous machine learning methods and offer a slight but discernable edge in predictive performance. The time has come for a balanced review of this technique but also to apply machine learning methods such as deep learning across a wider array of endpoints relevant to pharmaceutical research for which the datasets are growing such as physicochemical property prediction, formulation prediction, absorption, distribution, metabolism, excretion and toxicity (ADME/Tox), target prediction and skin permeation, etc. We also show that there are many potential applications of deep learning beyond cheminformatics. It will be important to perform prospective testing (which has been carried out rarely to date) in order to convince skeptics that there will be benefits from investing in this technique.
Multispectral code excited linear prediction coding and its application in magnetic resonance images.

PubMed

Hu, J H; Wang, Y; Cahill, P T

1997-01-01

This paper reports a multispectral code excited linear prediction (MCELP) method for the compression of multispectral images. Different linear prediction models and adaptation schemes have been compared. The method that uses a forward adaptive autoregressive (AR) model has been proven to achieve a good compromise between performance, complexity, and robustness. This approach is referred to as the MFCELP method. Given a set of multispectral images, the linear predictive coefficients are updated over nonoverlapping three-dimensional (3-D) macroblocks. Each macroblock is further divided into several 3-D micro-blocks, and the best excitation signal for each microblock is determined through an analysis-by-synthesis procedure. The MFCELP method has been applied to multispectral magnetic resonance (MR) images. To satisfy the high quality requirement for medical images, the error between the original image set and the synthesized one is further specified using a vector quantizer. This method has been applied to images from 26 clinical MR neuro studies (20 slices/study, three spectral bands/slice, 256x256 pixels/band, 12 b/pixel). The MFCELP method provides a significant visual improvement over the discrete cosine transform (DCT) based Joint Photographers Expert Group (JPEG) method, the wavelet transform based embedded zero-tree wavelet (EZW) coding method, and the vector tree (VT) coding method, as well as the multispectral segmented autoregressive moving average (MSARMA) method we developed previously.
Augmented classical least squares multivariate spectral analysis

DOEpatents

Haaland, David M.; Melgaard, David K.

2004-02-03

A method of multivariate spectral analysis, termed augmented classical least squares (ACLS), provides an improved CLS calibration model when unmodeled sources of spectral variation are contained in a calibration sample set. The ACLS methods use information derived from component or spectral residuals during the CLS calibration to provide an improved calibration-augmented CLS model. The ACLS methods are based on CLS so that they retain the qualitative benefits of CLS, yet they have the flexibility of PLS and other hybrid techniques in that they can define a prediction model even with unmodeled sources of spectral variation that are not explicitly included in the calibration model. The unmodeled sources of spectral variation may be unknown constituents, constituents with unknown concentrations, nonlinear responses, non-uniform and correlated errors, or other sources of spectral variation that are present in the calibration sample spectra. Also, since the various ACLS methods are based on CLS, they can incorporate the new prediction-augmented CLS (PACLS) method of updating the prediction model for new sources of spectral variation contained in the prediction sample set without having to return to the calibration process. The ACLS methods can also be applied to alternating least squares models. The ACLS methods can be applied to all types of multivariate data.
Augmented Classical Least Squares Multivariate Spectral Analysis

DOEpatents

Haaland, David M.; Melgaard, David K.

2005-07-26

A method of multivariate spectral analysis, termed augmented classical least squares (ACLS), provides an improved CLS calibration model when unmodeled sources of spectral variation are contained in a calibration sample set. The ACLS methods use information derived from component or spectral residuals during the CLS calibration to provide an improved calibration-augmented CLS model. The ACLS methods are based on CLS so that they retain the qualitative benefits of CLS, yet they have the flexibility of PLS and other hybrid techniques in that they can define a prediction model even with unmodeled sources of spectral variation that are not explicitly included in the calibration model. The unmodeled sources of spectral variation may be unknown constituents, constituents with unknown concentrations, nonlinear responses, non-uniform and correlated errors, or other sources of spectral variation that are present in the calibration sample spectra. Also, since the various ACLS methods are based on CLS, they can incorporate the new prediction-augmented CLS (PACLS) method of updating the prediction model for new sources of spectral variation contained in the prediction sample set without having to return to the calibration process. The ACLS methods can also be applied to alternating least squares models. The ACLS methods can be applied to all types of multivariate data.
Augmented Classical Least Squares Multivariate Spectral Analysis

DOEpatents

Haaland, David M.; Melgaard, David K.

2005-01-11

A method of multivariate spectral analysis, termed augmented classical least squares (ACLS), provides an improved CLS calibration model when unmodeled sources of spectral variation are contained in a calibration sample set. The ACLS methods use information derived from component or spectral residuals during the CLS calibration to provide an improved calibration-augmented CLS model. The ACLS methods are based on CLS so that they retain the qualitative benefits of CLS, yet they have the flexibility of PLS and other hybrid techniques in that they can define a prediction model even with unmodeled sources of spectral variation that are not explicitly included in the calibration model. The unmodeled sources of spectral variation may be unknown constituents, constituents with unknown concentrations, nonlinear responses, non-uniform and correlated errors, or other sources of spectral variation that are present in the calibration sample spectra. Also, since the various ACLS methods are based on CLS, they can incorporate the new prediction-augmented CLS (PACLS) method of updating the prediction model for new sources of spectral variation contained in the prediction sample set without having to return to the calibration process. The ACLS methods can also be applied to alternating least squares models. The ACLS methods can be applied to all types of multivariate data.

Combining classifiers to predict gene function in Arabidopsis thaliana using large-scale gene expression measurements.

PubMed

Lan, Hui; Carson, Rachel; Provart, Nicholas J; Bonner, Anthony J

2007-09-21

Arabidopsis thaliana is the model species of current plant genomic research with a genome size of 125 Mb and approximately 28,000 genes. The function of half of these genes is currently unknown. The purpose of this study is to infer gene function in Arabidopsis using machine-learning algorithms applied to large-scale gene expression data sets, with the goal of identifying genes that are potentially involved in plant response to abiotic stress. Using in house and publicly available data, we assembled a large set of gene expression measurements for A. thaliana. Using those genes of known function, we first evaluated and compared the ability of basic machine-learning algorithms to predict which genes respond to stress. Predictive accuracy was measured using ROC50 and precision curves derived through cross validation. To improve accuracy, we developed a method for combining these classifiers using a weighted-voting scheme. The combined classifier was then trained on genes of known function and applied to genes of unknown function, identifying genes that potentially respond to stress. Visual evidence corroborating the predictions was obtained using electronic Northern analysis. Three of the predicted genes were chosen for biological validation. Gene knockout experiments confirmed that all three are involved in a variety of stress responses. The biological analysis of one of these genes (At1g16850) is presented here, where it is shown to be necessary for the normal response to temperature and NaCl. Supervised learning methods applied to large-scale gene expression measurements can be used to predict gene function. However, the ability of basic learning methods to predict stress response varies widely and depends heavily on how much dimensionality reduction is used. Our method of combining classifiers can improve the accuracy of such predictions - in this case, predictions of genes involved in stress response in plants - and it effectively chooses the appropriate amount of dimensionality reduction automatically. The method provides a useful means of identifying genes in A. thaliana that potentially respond to stress, and we expect it would be useful in other organisms and for other gene functions.
A Thermodynamically-consistent FBA-based Approach to Biogeochemical Reaction Modeling

NASA Astrophysics Data System (ADS)

Shapiro, B.; Jin, Q.

2015-12-01

Microbial rates are critical to understanding biogeochemical processes in natural environments. Recently, flux balance analysis (FBA) has been applied to predict microbial rates in aquifers and other settings. FBA is a genome-scale constraint-based modeling approach that computes metabolic rates and other phenotypes of microorganisms. This approach requires a prior knowledge of substrate uptake rates, which is not available for most natural microbes. Here we propose to constrain substrate uptake rates on the basis of microbial kinetics. Specifically, we calculate rates of respiration (and fermentation) using a revised Monod equation; this equation accounts for both the kinetics and thermodynamics of microbial catabolism. Substrate uptake rates are then computed from the rates of respiration, and applied to FBA to predict rates of microbial growth. We implemented this method by linking two software tools, PHREEQC and COBRA Toolbox. We applied this method to acetotrophic methanogenesis by Methanosarcina barkeri, and compared the simulation results to previous laboratory observations. The new method constrains acetate uptake by accounting for the kinetics and thermodynamics of methanogenesis, and predicted well the observations of previous experiments. In comparison, traditional methods of dynamic-FBA constrain acetate uptake on the basis of enzyme kinetics, and failed to reproduce the experimental results. These results show that microbial rate laws may provide a better constraint than enzyme kinetics for applying FBA to biogeochemical reaction modeling.
A deep learning-based multi-model ensemble method for cancer prediction.

PubMed

Xiao, Yawen; Wu, Jun; Lin, Zongli; Zhao, Xiaodong

2018-01-01

Cancer is a complex worldwide health problem associated with high mortality. With the rapid development of the high-throughput sequencing technology and the application of various machine learning methods that have emerged in recent years, progress in cancer prediction has been increasingly made based on gene expression, providing insight into effective and accurate treatment decision making. Thus, developing machine learning methods, which can successfully distinguish cancer patients from healthy persons, is of great current interest. However, among the classification methods applied to cancer prediction so far, no one method outperforms all the others. In this paper, we demonstrate a new strategy, which applies deep learning to an ensemble approach that incorporates multiple different machine learning models. We supply informative gene data selected by differential gene expression analysis to five different classification models. Then, a deep learning method is employed to ensemble the outputs of the five classifiers. The proposed deep learning-based multi-model ensemble method was tested on three public RNA-seq data sets of three kinds of cancers, Lung Adenocarcinoma, Stomach Adenocarcinoma and Breast Invasive Carcinoma. The test results indicate that it increases the prediction accuracy of cancer for all the tested RNA-seq data sets as compared to using a single classifier or the majority voting algorithm. By taking full advantage of different classifiers, the proposed deep learning-based multi-model ensemble method is shown to be accurate and effective for cancer prediction. Copyright © 2017 Elsevier B.V. All rights reserved.
Predicting missing links via correlation between nodes

NASA Astrophysics Data System (ADS)

Liao, Hao; Zeng, An; Zhang, Yi-Cheng

2015-10-01

As a fundamental problem in many different fields, link prediction aims to estimate the likelihood of an existing link between two nodes based on the observed information. Since this problem is related to many applications ranging from uncovering missing data to predicting the evolution of networks, link prediction has been intensively investigated recently and many methods have been proposed so far. The essential challenge of link prediction is to estimate the similarity between nodes. Most of the existing methods are based on the common neighbor index and its variants. In this paper, we propose to calculate the similarity between nodes by the Pearson correlation coefficient. This method is found to be very effective when applied to calculate similarity based on high order paths. We finally fuse the correlation-based method with the resource allocation method, and find that the combined method can substantially outperform the existing methods, especially in sparse networks.
Prediction of Protein-Protein Interaction Sites with Machine-Learning-Based Data-Cleaning and Post-Filtering Procedures.

PubMed

Liu, Guang-Hui; Shen, Hong-Bin; Yu, Dong-Jun

2016-04-01

Accurately predicting protein-protein interaction sites (PPIs) is currently a hot topic because it has been demonstrated to be very useful for understanding disease mechanisms and designing drugs. Machine-learning-based computational approaches have been broadly utilized and demonstrated to be useful for PPI prediction. However, directly applying traditional machine learning algorithms, which often assume that samples in different classes are balanced, often leads to poor performance because of the severe class imbalance that exists in the PPI prediction problem. In this study, we propose a novel method for improving PPI prediction performance by relieving the severity of class imbalance using a data-cleaning procedure and reducing predicted false positives with a post-filtering procedure: First, a machine-learning-based data-cleaning procedure is applied to remove those marginal targets, which may potentially have a negative effect on training a model with a clear classification boundary, from the majority samples to relieve the severity of class imbalance in the original training dataset; then, a prediction model is trained on the cleaned dataset; finally, an effective post-filtering procedure is further used to reduce potential false positive predictions. Stringent cross-validation and independent validation tests on benchmark datasets demonstrated the efficacy of the proposed method, which exhibits highly competitive performance compared with existing state-of-the-art sequence-based PPIs predictors and should supplement existing PPI prediction methods.
Ensemble ecosystem modeling for predicting ecosystem response to predator reintroduction.

PubMed

Baker, Christopher M; Gordon, Ascelin; Bode, Michael

2017-04-01

Introducing a new or extirpated species to an ecosystem is risky, and managers need quantitative methods that can predict the consequences for the recipient ecosystem. Proponents of keystone predator reintroductions commonly argue that the presence of the predator will restore ecosystem function, but this has not always been the case, and mathematical modeling has an important role to play in predicting how reintroductions will likely play out. We devised an ensemble modeling method that integrates species interaction networks and dynamic community simulations and used it to describe the range of plausible consequences of 2 keystone-predator reintroductions: wolves (Canis lupus) to Yellowstone National Park and dingoes (Canis dingo) to a national park in Australia. Although previous methods for predicting ecosystem responses to such interventions focused on predicting changes around a given equilibrium, we used Lotka-Volterra equations to predict changing abundances through time. We applied our method to interaction networks for wolves in Yellowstone National Park and for dingoes in Australia. Our model replicated the observed dynamics in Yellowstone National Park and produced a larger range of potential outcomes for the dingo network. However, we also found that changes in small vertebrates or invertebrates gave a good indication about the potential future state of the system. Our method allowed us to predict when the systems were far from equilibrium. Our results showed that the method can also be used to predict which species may increase or decrease following a reintroduction and can identify species that are important to monitor (i.e., species whose changes in abundance give extra insight into broad changes in the system). Ensemble ecosystem modeling can also be applied to assess the ecosystem-wide implications of other types of interventions including assisted migration, biocontrol, and invasive species eradication. © 2016 Society for Conservation Biology.
The predictive validity of selection for entry into postgraduate training in general practice: evidence from three longitudinal studies.

PubMed

Patterson, Fiona; Lievens, Filip; Kerrin, Máire; Munro, Neil; Irish, Bill

2013-11-01

The selection methodology for UK general practice is designed to accommodate several thousand applicants per year and targets six core attributes identified in a multi-method job-analysis study To evaluate the predictive validity of selection methods for entry into postgraduate training, comprising a clinical problem-solving test, a situational judgement test, and a selection centre. A three-part longitudinal predictive validity study of selection into training for UK general practice. In sample 1, participants were junior doctors applying for training in general practice (n = 6824). In sample 2, participants were GP registrars 1 year into training (n = 196). In sample 3, participants were GP registrars sitting the licensing examination after 3 years, at the end of training (n = 2292). The outcome measures include: assessor ratings of performance in a selection centre comprising job simulation exercises (sample 1); supervisor ratings of trainee job performance 1 year into training (sample 2); and licensing examination results, including an applied knowledge examination and a 12-station clinical skills objective structured clinical examination (OSCE; sample 3). Performance ratings at selection predicted subsequent supervisor ratings of job performance 1 year later. Selection results also significantly predicted performance on both the clinical skills OSCE and applied knowledge examination for licensing at the end of training. In combination, these longitudinal findings provide good evidence of the predictive validity of the selection methods, and are the first reported for entry into postgraduate training. Results show that the best predictor of work performance and training outcomes is a combination of a clinical problem-solving test, a situational judgement test, and a selection centre. Implications for selection methods for all postgraduate specialties are considered.
Predicting the helix packing of globular proteins by self-correcting distance geometry.

PubMed

Mumenthaler, C; Braun, W

1995-05-01

A new self-correcting distance geometry method for predicting the three-dimensional structure of small globular proteins was assessed with a test set of 8 helical proteins. With the knowledge of the amino acid sequence and the helical segments, our completely automated method calculated the correct backbone topology of six proteins. The accuracy of the predicted structures ranged from 2.3 A to 3.1 A for the helical segments compared to the experimentally determined structures. For two proteins, the predicted constraints were not restrictive enough to yield a conclusive prediction. The method can be applied to all small globular proteins, provided the secondary structure is known from NMR analysis or can be predicted with high reliability.
An Ensemble Approach for Drug Side Effect Prediction

PubMed Central

Jahid, Md Jamiul; Ruan, Jianhua

2014-01-01

In silico prediction of drug side-effects in early stage of drug development is becoming more popular now days, which not only reduces the time for drug design but also reduces the drug development costs. In this article we propose an ensemble approach to predict drug side-effects of drug molecules based on their chemical structure. Our idea originates from the observation that similar drugs have similar side-effects. Based on this observation we design an ensemble approach that combine the results from different classification models where each model is generated by a different set of similar drugs. We applied our approach to 1385 side-effects in the SIDER database for 888 drugs. Results show that our approach outperformed previously published approaches and standard classifiers. Furthermore, we applied our method to a number of uncharacterized drug molecules in DrugBank database and predict their side-effect profiles for future usage. Results from various sources confirm that our method is able to predict the side-effects for uncharacterized drugs and more importantly able to predict rare side-effects which are often ignored by other approaches. The method described in this article can be useful to predict side-effects in drug design in an early stage to reduce experimental cost and time. PMID:25327524
Efficient depth intraprediction method for H.264/AVC-based three-dimensional video coding

NASA Astrophysics Data System (ADS)

Oh, Kwan-Jung; Oh, Byung Tae

2015-04-01

We present an intracoding method that is applicable to depth map coding in multiview plus depth systems. Our approach combines skip prediction and plane segmentation-based prediction. The proposed depth intraskip prediction uses the estimated direction at both the encoder and decoder, and does not need to encode residual data. Our plane segmentation-based intraprediction divides the current block into biregions, and applies a different prediction scheme for each segmented region. This method avoids incorrect estimations across different regions, resulting in higher prediction accuracy. Simulation results demonstrate that the proposed scheme is superior to H.264/advanced video coding intraprediction and has the ability to improve the subjective rendering quality.
A predictive estimation method for carbon dioxide transport by data-driven modeling with a physically-based data model.

PubMed

Jeong, Jina; Park, Eungyu; Han, Weon Shik; Kim, Kue-Young; Jun, Seong-Chun; Choung, Sungwook; Yun, Seong-Taek; Oh, Junho; Kim, Hyun-Jun

2017-11-01

In this study, a data-driven method for predicting CO 2 leaks and associated concentrations from geological CO 2 sequestration is developed. Several candidate models are compared based on their reproducibility and predictive capability for CO 2 concentration measurements from the Environment Impact Evaluation Test (EIT) site in Korea. Based on the data mining results, a one-dimensional solution of the advective-dispersive equation for steady flow (i.e., Ogata-Banks solution) is found to be most representative for the test data, and this model is adopted as the data model for the developed method. In the validation step, the method is applied to estimate future CO 2 concentrations with the reference estimation by the Ogata-Banks solution, where a part of earlier data is used as the training dataset. From the analysis, it is found that the ensemble mean of multiple estimations based on the developed method shows high prediction accuracy relative to the reference estimation. In addition, the majority of the data to be predicted are included in the proposed quantile interval, which suggests adequate representation of the uncertainty by the developed method. Therefore, the incorporation of a reasonable physically-based data model enhances the prediction capability of the data-driven model. The proposed method is not confined to estimations of CO 2 concentration and may be applied to various real-time monitoring data from subsurface sites to develop automated control, management or decision-making systems. Copyright © 2017 Elsevier B.V. All rights reserved.
Uncertainty Quantification in Alchemical Free Energy Methods.

PubMed

Bhati, Agastya P; Wan, Shunzhou; Hu, Yuan; Sherborne, Brad; Coveney, Peter V

2018-06-12

Alchemical free energy methods have gained much importance recently from several reports of improved ligand-protein binding affinity predictions based on their implementation using molecular dynamics simulations. A large number of variants of such methods implementing different accelerated sampling techniques and free energy estimators are available, each claimed to be better than the others in its own way. However, the key features of reproducibility and quantification of associated uncertainties in such methods have barely been discussed. Here, we apply a systematic protocol for uncertainty quantification to a number of popular alchemical free energy methods, covering both absolute and relative free energy predictions. We show that a reliable measure of error estimation is provided by ensemble simulation-an ensemble of independent MD simulations-which applies irrespective of the free energy method. The need to use ensemble methods is fundamental and holds regardless of the duration of time of the molecular dynamics simulations performed.
Real-time implementation of model predictive control on Maricopa-Stanfield irrigation and drainage district's WM canal

USDA-ARS?s Scientific Manuscript database

Water resources are limited in many agricultural areas. One method to improve the effective use of water is to improve delivery service from irrigation canals. This can be done by applying automatic control methods that control the gates in an irrigation canal. The model predictive control MPC is ...
Virtual World Currency Value Fluctuation Prediction System Based on User Sentiment Analysis.

PubMed

Kim, Young Bin; Lee, Sang Hyeok; Kang, Shin Jin; Choi, Myung Jin; Lee, Jung; Kim, Chang Hun

2015-01-01

In this paper, we present a method for predicting the value of virtual currencies used in virtual gaming environments that support multiple users, such as massively multiplayer online role-playing games (MMORPGs). Predicting virtual currency values in a virtual gaming environment has rarely been explored; it is difficult to apply real-world methods for predicting fluctuating currency values or shares to the virtual gaming world on account of differences in domains between the two worlds. To address this issue, we herein predict virtual currency value fluctuations by collecting user opinion data from a virtual community and analyzing user sentiments or emotions from the opinion data. The proposed method is straightforward and applicable to predicting virtual currencies as well as to gaming environments, including MMORPGs. We test the proposed method using large-scale MMORPGs and demonstrate that virtual currencies can be effectively and efficiently predicted with it.
Virtual World Currency Value Fluctuation Prediction System Based on User Sentiment Analysis

PubMed Central

Kim, Young Bin; Lee, Sang Hyeok; Kang, Shin Jin; Choi, Myung Jin; Lee, Jung; Kim, Chang Hun

2015-01-01

In this paper, we present a method for predicting the value of virtual currencies used in virtual gaming environments that support multiple users, such as massively multiplayer online role-playing games (MMORPGs). Predicting virtual currency values in a virtual gaming environment has rarely been explored; it is difficult to apply real-world methods for predicting fluctuating currency values or shares to the virtual gaming world on account of differences in domains between the two worlds. To address this issue, we herein predict virtual currency value fluctuations by collecting user opinion data from a virtual community and analyzing user sentiments or emotions from the opinion data. The proposed method is straightforward and applicable to predicting virtual currencies as well as to gaming environments, including MMORPGs. We test the proposed method using large-scale MMORPGs and demonstrate that virtual currencies can be effectively and efficiently predicted with it. PMID:26241496
Lung nodule malignancy prediction using multi-task convolutional neural network

NASA Astrophysics Data System (ADS)

Li, Xiuli; Kao, Yueying; Shen, Wei; Li, Xiang; Xie, Guotong

2017-03-01

In this paper, we investigated the problem of diagnostic lung nodule malignancy prediction using thoracic Computed Tomography (CT) screening. Unlike most existing studies classify the nodules into two types benign and malignancy, we interpreted the nodule malignancy prediction as a regression problem to predict continuous malignancy level. We proposed a joint multi-task learning algorithm using Convolutional Neural Network (CNN) to capture nodule heterogeneity by extracting discriminative features from alternatingly stacked layers. We trained a CNN regression model to predict the nodule malignancy, and designed a multi-task learning mechanism to simultaneously share knowledge among 9 different nodule characteristics (Subtlety, Calcification, Sphericity, Margin, Lobulation, Spiculation, Texture, Diameter and Malignancy), and improved the final prediction result. Each CNN would generate characteristic-specific feature representations, and then we applied multi-task learning on the features to predict the corresponding likelihood for that characteristic. We evaluated the proposed method on 2620 nodules CT scans from LIDC-IDRI dataset with the 5-fold cross validation strategy. The multitask CNN regression result for regression RMSE and mapped classification ACC were 0.830 and 83.03%, while the results for single task regression RMSE 0.894 and mapped classification ACC 74.9%. Experiments show that the proposed method could predict the lung nodule malignancy likelihood effectively and outperforms the state-of-the-art methods. The learning framework could easily be applied in other anomaly likelihood prediction problem, such as skin cancer and breast cancer. It demonstrated the possibility of our method facilitating the radiologists for nodule staging assessment and individual therapeutic planning.
Whole genome prediction and heritability of childhood asthma phenotypes.

PubMed

McGeachie, Michael J; Clemmer, George L; Croteau-Chonka, Damien C; Castaldi, Peter J; Cho, Michael H; Sordillo, Joanne E; Lasky-Su, Jessica A; Raby, Benjamin A; Tantisira, Kelan G; Weiss, Scott T

2016-12-01

While whole genome prediction (WGP) methods have recently demonstrated successes in the prediction of complex genetic diseases, they have not yet been applied to asthma and related phenotypes. Longitudinal patterns of lung function differ between asthmatics, but these phenotypes have not been assessed for heritability or predictive ability. Herein, we assess the heritability and genetic predictability of asthma-related phenotypes. We applied several WGP methods to a well-phenotyped cohort of 832 children with mild-to-moderate asthma from CAMP. We assessed narrow-sense heritability and predictability for airway hyperresponsiveness, serum immunoglobulin E, blood eosinophil count, pre- and post-bronchodilator forced expiratory volume in 1 sec (FEV 1 ), bronchodilator response, steroid responsiveness, and longitudinal patterns of lung function (normal growth, reduced growth, early decline, and their combinations). Prediction accuracy was evaluated using a training/testing set split of the cohort. We found that longitudinal lung function phenotypes demonstrated significant narrow-sense heritability (reduced growth, 95%; normal growth with early decline, 55%). These same phenotypes also showed significant polygenic prediction (areas under the curve [AUCs] 56% to 62%). Including additional demographic covariates in the models increased prediction 4-8%, with reduced growth increasing from 62% to 66% AUC. We found that prediction with a genomic relatedness matrix was improved by filtering available SNPs based on chromatin evidence, and this result extended across cohorts. Longitudinal reduced lung function growth displayed extremely high heritability. All phenotypes with significant heritability showed significant polygenic prediction. Using SNP-prioritization increased prediction across cohorts. WGP methods show promise in predicting asthma-related heritable traits.
A Transfer Learning Approach for Applying Matrix Factorization to Small ITS Datasets

ERIC Educational Resources Information Center

Voß, Lydia; Schatten, Carlotta; Mazziotti, Claudia; Schmidt-Thieme, Lars

2015-01-01

Machine Learning methods for Performance Prediction in Intelligent Tutoring Systems (ITS) have proven their efficacy; specific methods, e.g. Matrix Factorization (MF), however suffer from the lack of available information about new tasks or new students. In this paper we show how this problem could be solved by applying Transfer Learning (TL),…
Prediction of ground effects on aircraft noise

NASA Technical Reports Server (NTRS)

Pao, S. P.; Wenzel, A. R.; Oncley, P. B.

1978-01-01

A unified method is recommended for predicting ground effects on noise. This method may be used in flyover noise predictions and in correcting static test-stand data to free-field conditions. The recommendation is based on a review of recent progress in the theory of ground effects and of the experimental evidence which supports this theory. It is shown that a surface wave must be included sometimes in the prediction method. Prediction equations are collected conveniently in a single section of the paper. Methods of measuring ground impedance and the resulting ground-impedance data are also reviewed because the recommended method is based on a locally reactive impedance boundary model. Current practice of estimating ground effects are reviewed and consideration is given to practical problems in applying the recommended method. These problems include finite frequency-band filters, finite source dimension, wind and temperature gradients, and signal incoherence.
Archaeology Through Computational Linguistics: Inscription Statistics Predict Excavation Sites of Indus Valley Artifacts.

PubMed

Recchia, Gabriel L; Louwerse, Max M

2016-11-01

Computational techniques comparing co-occurrences of city names in texts allow the relative longitudes and latitudes of cities to be estimated algorithmically. However, these techniques have not been applied to estimate the provenance of artifacts with unknown origins. Here, we estimate the geographic origin of artifacts from the Indus Valley Civilization, applying methods commonly used in cognitive science to the Indus script. We show that these methods can accurately predict the relative locations of archeological sites on the basis of artifacts of known provenance, and we further apply these techniques to determine the most probable excavation sites of four sealings of unknown provenance. These findings suggest that inscription statistics reflect historical interactions among locations in the Indus Valley region, and they illustrate how computational methods can help localize inscribed archeological artifacts of unknown origin. The success of this method offers opportunities for the cognitive sciences in general and for computational anthropology specifically. Copyright © 2015 Cognitive Science Society, Inc.

Five-equation and robust three-equation methods for solution verification of large eddy simulation

NASA Astrophysics Data System (ADS)

Dutta, Rabijit; Xing, Tao

2018-02-01

This study evaluates the recently developed general framework for solution verification methods for large eddy simulation (LES) using implicitly filtered LES of periodic channel flows at friction Reynolds number of 395 on eight systematically refined grids. The seven-equation method shows that the coupling error based on Hypothesis I is much smaller as compared with the numerical and modeling errors and therefore can be neglected. The authors recommend five-equation method based on Hypothesis II, which shows a monotonic convergence behavior of the predicted numerical benchmark ( S C ), and provides realistic error estimates without the need of fixing the orders of accuracy for either numerical or modeling errors. Based on the results from seven-equation and five-equation methods, less expensive three and four-equation methods for practical LES applications were derived. It was found that the new three-equation method is robust as it can be applied to any convergence types and reasonably predict the error trends. It was also observed that the numerical and modeling errors usually have opposite signs, which suggests error cancellation play an essential role in LES. When Reynolds averaged Navier-Stokes (RANS) based error estimation method is applied, it shows significant error in the prediction of S C on coarse meshes. However, it predicts reasonable S C when the grids resolve at least 80% of the total turbulent kinetic energy.
Power maximization of a point absorber wave energy converter using improved model predictive control

NASA Astrophysics Data System (ADS)

Milani, Farideh; Moghaddam, Reihaneh Kardehi

2017-08-01

This paper considers controlling and maximizing the absorbed power of wave energy converters for irregular waves. With respect to physical constraints of the system, a model predictive control is applied. Irregular waves' behavior is predicted by Kalman filter method. Owing to the great influence of controller parameters on the absorbed power, these parameters are optimized by imperialist competitive algorithm. The results illustrate the method's efficiency in maximizing the extracted power in the presence of unknown excitation force which should be predicted by Kalman filter.
Quantifying the predictive consequences of model error with linear subspace analysis

USGS Publications Warehouse

White, Jeremy T.; Doherty, John E.; Hughes, Joseph D.

2014-01-01

All computer models are simplified and imperfect simulators of complex natural systems. The discrepancy arising from simplification induces bias in model predictions, which may be amplified by the process of model calibration. This paper presents a new method to identify and quantify the predictive consequences of calibrating a simplified computer model. The method is based on linear theory, and it scales efficiently to the large numbers of parameters and observations characteristic of groundwater and petroleum reservoir models. The method is applied to a range of predictions made with a synthetic integrated surface-water/groundwater model with thousands of parameters. Several different observation processing strategies and parameterization/regularization approaches are examined in detail, including use of the Karhunen-Loève parameter transformation. Predictive bias arising from model error is shown to be prediction specific and often invisible to the modeler. The amount of calibration-induced bias is influenced by several factors, including how expert knowledge is applied in the design of parameterization schemes, the number of parameters adjusted during calibration, how observations and model-generated counterparts are processed, and the level of fit with observations achieved through calibration. Failure to properly implement any of these factors in a prediction-specific manner may increase the potential for predictive bias in ways that are not visible to the calibration and uncertainty analysis process.
Human Splice-Site Prediction with Deep Neural Networks.

PubMed

Naito, Tatsuhiko

2018-04-18

Accurate splice-site prediction is essential to delineate gene structures from sequence data. Several computational techniques have been applied to create a system to predict canonical splice sites. For classification tasks, deep neural networks (DNNs) have achieved record-breaking results and often outperformed other supervised learning techniques. In this study, a new method of splice-site prediction using DNNs was proposed. The proposed system receives an input sequence data and returns an answer as to whether it is splice site. The length of input is 140 nucleotides, with the consensus sequence (i.e., "GT" and "AG" for the donor and acceptor sites, respectively) in the middle. Each input sequence model is applied to the pretrained DNN model that determines the probability that an input is a splice site. The model consists of convolutional layers and bidirectional long short-term memory network layers. The pretraining and validation were conducted using the data set tested in previously reported methods. The performance evaluation results showed that the proposed method can outperform the previous methods. In addition, the pattern learned by the DNNs was visualized as position frequency matrices (PFMs). Some of PFMs were very similar to the consensus sequence. The trained DNN model and the brief source code for the prediction system are uploaded. Further improvement will be achieved following the further development of DNNs.
Prediction of Process-Induced Distortions in L-Shaped Composite Profiles Using Path-Dependent Constitutive Law

NASA Astrophysics Data System (ADS)

Ding, Anxin; Li, Shuxin; Wang, Jihui; Ni, Aiqing; Sun, Liangliang; Chang, Lei

2016-10-01

In this paper, the corner spring-in angles of AS4/8552 L-shaped composite profiles with different thicknesses are predicted using path-dependent constitutive law with the consideration of material properties variation due to phase change during curing. The prediction accuracy mainly depends on the properties in the rubbery and glassy states obtained by homogenization method rather than experimental measurements. Both analytical and finite element (FE) homogenization methods are applied to predict the overall properties of AS4/8552 composite. The effect of fiber volume fraction on the properties is investigated for both rubbery and glassy states using both methods. And the predicted results are compared with experimental measurements for the glassy state. Good agreement is achieved between the predicted results and available experimental data, showing the reliability of the homogenization method. Furthermore, the corner spring-in angles of L-shaped composite profiles are measured experimentally and the reliability of path-dependent constitutive law is validated as well as the properties prediction by FE homogenization method.
EPAS TOXCAST PROGRAM FOR PREDICTING HAZARD AND PRIORITIZING TOXICITY TESTING OF ENVIRONMENTAL CHEMICALS(S).

EPA Science Inventory

EPAs National Center for Computational Toxicology is developing methods that apply computational chemistry, high-throughput screening (HTS) and genomic technologies to predict potential toxicity and prioritize the use of limited testing resources.
Machine learning approaches for estimation of prediction interval for the model output.

PubMed

Shrestha, Durga L; Solomatine, Dimitri P

2006-03-01

A novel method for estimating prediction uncertainty using machine learning techniques is presented. Uncertainty is expressed in the form of the two quantiles (constituting the prediction interval) of the underlying distribution of prediction errors. The idea is to partition the input space into different zones or clusters having similar model errors using fuzzy c-means clustering. The prediction interval is constructed for each cluster on the basis of empirical distributions of the errors associated with all instances belonging to the cluster under consideration and propagated from each cluster to the examples according to their membership grades in each cluster. Then a regression model is built for in-sample data using computed prediction limits as targets, and finally, this model is applied to estimate the prediction intervals (limits) for out-of-sample data. The method was tested on artificial and real hydrologic data sets using various machine learning techniques. Preliminary results show that the method is superior to other methods estimating the prediction interval. A new method for evaluating performance for estimating prediction interval is proposed as well.
DDR: efficient computational method to predict drug-target interactions using graph mining and machine learning approaches.

PubMed

Olayan, Rawan S; Ashoor, Haitham; Bajic, Vladimir B

2018-04-01

Finding computationally drug-target interactions (DTIs) is a convenient strategy to identify new DTIs at low cost with reasonable accuracy. However, the current DTI prediction methods suffer the high false positive prediction rate. We developed DDR, a novel method that improves the DTI prediction accuracy. DDR is based on the use of a heterogeneous graph that contains known DTIs with multiple similarities between drugs and multiple similarities between target proteins. DDR applies non-linear similarity fusion method to combine different similarities. Before fusion, DDR performs a pre-processing step where a subset of similarities is selected in a heuristic process to obtain an optimized combination of similarities. Then, DDR applies a random forest model using different graph-based features extracted from the DTI heterogeneous graph. Using 5-repeats of 10-fold cross-validation, three testing setups, and the weighted average of area under the precision-recall curve (AUPR) scores, we show that DDR significantly reduces the AUPR score error relative to the next best start-of-the-art method for predicting DTIs by 34% when the drugs are new, by 23% when targets are new and by 34% when the drugs and the targets are known but not all DTIs between them are not known. Using independent sources of evidence, we verify as correct 22 out of the top 25 DDR novel predictions. This suggests that DDR can be used as an efficient method to identify correct DTIs. The data and code are provided at https://bitbucket.org/RSO24/ddr/. vladimir.bajic@kaust.edu.sa. Supplementary data are available at Bioinformatics online.
The Next Era: Deep Learning in Pharmaceutical Research

PubMed Central

Ekins, Sean

2016-01-01

Over the past decade we have witnessed the increasing sophistication of machine learning algorithms applied in daily use from internet searches, voice recognition, social network software to machine vision software in cameras, phones, robots and self-driving cars. Pharmaceutical research has also seen its fair share of machine learning developments. For example, applying such methods to mine the growing datasets that are created in drug discovery not only enables us to learn from the past but to predict a molecule’s properties and behavior in future. The latest machine learning algorithm garnering significant attention is deep learning, which is an artificial neural network with multiple hidden layers. Publications over the last 3 years suggest that this algorithm may have advantages over previous machine learning methods and offer a slight but discernable edge in predictive performance. The time has come for a balanced review of this technique but also to apply machine learning methods such as deep learning across a wider array of endpoints relevant to pharmaceutical research for which the datasets are growing such as physicochemical property prediction, formulation prediction, absorption, distribution, metabolism, excretion and toxicity (ADME/Tox), target prediction and skin permeation, etc. We also show that there are many potential applications of deep learning beyond cheminformatics. It will be important to perform prospective testing (which has been carried out rarely to date) in order to convince skeptics that there will be benefits from investing in this technique. PMID:27599991
Predicting the losses in sawtimber volume and quality from fires in oak-hickory forests.

Treesearch

Robert M. Loomis

1974-01-01

Presents a method for predicting future sawtimber losses due to fire-caused wounds. Losses are in terms of: (1) lumber value in dollars, (2) volume in board feet, (3) length of defect in feet, and (4) cross sectional area of defect in square inches. The methods apply to northern red, black, scarlet, white and chestnut oaks.
Inductive matrix completion for predicting gene-disease associations.

PubMed

Natarajan, Nagarajan; Dhillon, Inderjit S

2014-06-15

Most existing methods for predicting causal disease genes rely on specific type of evidence, and are therefore limited in terms of applicability. More often than not, the type of evidence available for diseases varies-for example, we may know linked genes, keywords associated with the disease obtained by mining text, or co-occurrence of disease symptoms in patients. Similarly, the type of evidence available for genes varies-for example, specific microarray probes convey information only for certain sets of genes. In this article, we apply a novel matrix-completion method called Inductive Matrix Completion to the problem of predicting gene-disease associations; it combines multiple types of evidence (features) for diseases and genes to learn latent factors that explain the observed gene-disease associations. We construct features from different biological sources such as microarray expression data and disease-related textual data. A crucial advantage of the method is that it is inductive; it can be applied to diseases not seen at training time, unlike traditional matrix-completion approaches and network-based inference methods that are transductive. Comparison with state-of-the-art methods on diseases from the Online Mendelian Inheritance in Man (OMIM) database shows that the proposed approach is substantially better-it has close to one-in-four chance of recovering a true association in the top 100 predictions, compared to the recently proposed Catapult method (second best) that has <15% chance. We demonstrate that the inductive method is particularly effective for a query disease with no previously known gene associations, and for predicting novel genes, i.e. genes that are previously not linked to diseases. Thus the method is capable of predicting novel genes even for well-characterized diseases. We also validate the novelty of predictions by evaluating the method on recently reported OMIM associations and on associations recently reported in the literature. Source code and datasets can be downloaded from http://bigdata.ices.utexas.edu/project/gene-disease. © The Author 2014. Published by Oxford University Press.
Penalized Multi-Way Partial Least Squares for Smooth Trajectory Decoding from Electrocorticographic (ECoG) Recording

PubMed Central

Eliseyev, Andrey; Aksenova, Tetiana

2016-01-01

In the current paper the decoding algorithms for motor-related BCI systems for continuous upper limb trajectory prediction are considered. Two methods for the smooth prediction, namely Sobolev and Polynomial Penalized Multi-Way Partial Least Squares (PLS) regressions, are proposed. The methods are compared to the Multi-Way Partial Least Squares and Kalman Filter approaches. The comparison demonstrated that the proposed methods combined the prediction accuracy of the algorithms of the PLS family and trajectory smoothness of the Kalman Filter. In addition, the prediction delay is significantly lower for the proposed algorithms than for the Kalman Filter approach. The proposed methods could be applied in a wide range of applications beyond neuroscience. PMID:27196417
Strong ground motion prediction using virtual earthquakes.

PubMed

Denolle, M A; Dunham, E M; Prieto, G A; Beroza, G C

2014-01-24

Sedimentary basins increase the damaging effects of earthquakes by trapping and amplifying seismic waves. Simulations of seismic wave propagation in sedimentary basins capture this effect; however, there exists no method to validate these results for earthquakes that have not yet occurred. We present a new approach for ground motion prediction that uses the ambient seismic field. We apply our method to a suite of magnitude 7 scenario earthquakes on the southern San Andreas fault and compare our ground motion predictions with simulations. Both methods find strong amplification and coupling of source and structure effects, but they predict substantially different shaking patterns across the Los Angeles Basin. The virtual earthquake approach provides a new approach for predicting long-period strong ground motion.
Modified Displacement Transfer Functions for Deformed Shape Predictions of Slender Curved Structures with Varying Curvatives

NASA Technical Reports Server (NTRS)

Ko, William L.; Fleischer, Van Tran

2014-01-01

To eliminate the need to use finite-element modeling for structure shape predictions, a new method was invented. This method is to use the Displacement Transfer Functions to transform the measured surface strains into deflections for mapping out overall structural deformed shapes. The Displacement Transfer Functions are expressed in terms of rectilinearly distributed surface strains, and contain no material properties. This report is to apply the patented method to the shape predictions of non-symmetrically loaded slender curved structures with different curvatures up to a full circle. Because the measured surface strains are not available, finite-element analysis had to be used to analytically generate the surface strains. Previously formulated straight-beam Displacement Transfer Functions were modified by introducing the curvature-effect correction terms. Through single-point or dual-point collocations with finite-elementgenerated deflection curves, functional forms of the curvature-effect correction terms were empirically established. The resulting modified Displacement Transfer Functions can then provide quite accurate shape predictions. Also, the uniform straight-beam Displacement Transfer Function was applied to the shape predictions of a section-cut of a generic capsule (GC) outer curved sandwich wall. The resulting GC shape predictions are quite accurate in partial regions where the radius of curvature does not change sharply.
Extensions to decision curve analysis, a novel method for evaluating diagnostic tests, prediction models and molecular markers

PubMed Central

Vickers, Andrew J; Cronin, Angel M; Elkin, Elena B; Gonen, Mithat

2008-01-01

Background Decision curve analysis is a novel method for evaluating diagnostic tests, prediction models and molecular markers. It combines the mathematical simplicity of accuracy measures, such as sensitivity and specificity, with the clinical applicability of decision analytic approaches. Most critically, decision curve analysis can be applied directly to a data set, and does not require the sort of external data on costs, benefits and preferences typically required by traditional decision analytic techniques. Methods In this paper we present several extensions to decision curve analysis including correction for overfit, confidence intervals, application to censored data (including competing risk) and calculation of decision curves directly from predicted probabilities. All of these extensions are based on straightforward methods that have previously been described in the literature for application to analogous statistical techniques. Results Simulation studies showed that repeated 10-fold crossvalidation provided the best method for correcting a decision curve for overfit. The method for applying decision curves to censored data had little bias and coverage was excellent; for competing risk, decision curves were appropriately affected by the incidence of the competing risk and the association between the competing risk and the predictor of interest. Calculation of decision curves directly from predicted probabilities led to a smoothing of the decision curve. Conclusion Decision curve analysis can be easily extended to many of the applications common to performance measures for prediction models. Software to implement decision curve analysis is provided. PMID:19036144
Prediction of Solvent Physical Properties using the Hierarchical Clustering Method

EPA Science Inventory

Recently a QSAR (Quantitative Structure Activity Relationship) method, the hierarchical clustering method, was developed to estimate acute toxicity values for large, diverse datasets. This methodology has now been applied to the estimate solvent physical properties including sur...
Specialized CFD Grid Generation Methods for Near-Field Sonic Boom Prediction

NASA Technical Reports Server (NTRS)

Park, Michael A.; Campbell, Richard L.; Elmiligui, Alaa; Cliff, Susan E.; Nayani, Sudheer N.

2014-01-01

Ongoing interest in analysis and design of low sonic boom supersonic transports re- quires accurate and ecient Computational Fluid Dynamics (CFD) tools. Specialized grid generation techniques are employed to predict near- eld acoustic signatures of these con- gurations. A fundamental examination of grid properties is performed including grid alignment with ow characteristics and element type. The issues a ecting the robustness of cylindrical surface extrusion are illustrated. This study will compare three methods in the extrusion family of grid generation methods that produce grids aligned with the freestream Mach angle. These methods are applied to con gurations from the First AIAA Sonic Boom Prediction Workshop.
Selecting Optimal Random Forest Predictive Models: A Case Study on Predicting the Spatial Distribution of Seabed Hardness

PubMed Central

Li, Jin; Tran, Maggie; Siwabessy, Justy

2016-01-01

Spatially continuous predictions of seabed hardness are important baseline environmental information for sustainable management of Australia’s marine jurisdiction. Seabed hardness is often inferred from multibeam backscatter data with unknown accuracy and can be inferred from underwater video footage at limited locations. In this study, we classified the seabed into four classes based on two new seabed hardness classification schemes (i.e., hard90 and hard70). We developed optimal predictive models to predict seabed hardness using random forest (RF) based on the point data of hardness classes and spatially continuous multibeam data. Five feature selection (FS) methods that are variable importance (VI), averaged variable importance (AVI), knowledge informed AVI (KIAVI), Boruta and regularized RF (RRF) were tested based on predictive accuracy. Effects of highly correlated, important and unimportant predictors on the accuracy of RF predictive models were examined. Finally, spatial predictions generated using the most accurate models were visually examined and analysed. This study confirmed that: 1) hard90 and hard70 are effective seabed hardness classification schemes; 2) seabed hardness of four classes can be predicted with a high degree of accuracy; 3) the typical approach used to pre-select predictive variables by excluding highly correlated variables needs to be re-examined; 4) the identification of the important and unimportant predictors provides useful guidelines for further improving predictive models; 5) FS methods select the most accurate predictive model(s) instead of the most parsimonious ones, and AVI and Boruta are recommended for future studies; and 6) RF is an effective modelling method with high predictive accuracy for multi-level categorical data and can be applied to ‘small p and large n’ problems in environmental sciences. Additionally, automated computational programs for AVI need to be developed to increase its computational efficiency and caution should be taken when applying filter FS methods in selecting predictive models. PMID:26890307
Selecting Optimal Random Forest Predictive Models: A Case Study on Predicting the Spatial Distribution of Seabed Hardness.

PubMed

Li, Jin; Tran, Maggie; Siwabessy, Justy

2016-01-01

Spatially continuous predictions of seabed hardness are important baseline environmental information for sustainable management of Australia's marine jurisdiction. Seabed hardness is often inferred from multibeam backscatter data with unknown accuracy and can be inferred from underwater video footage at limited locations. In this study, we classified the seabed into four classes based on two new seabed hardness classification schemes (i.e., hard90 and hard70). We developed optimal predictive models to predict seabed hardness using random forest (RF) based on the point data of hardness classes and spatially continuous multibeam data. Five feature selection (FS) methods that are variable importance (VI), averaged variable importance (AVI), knowledge informed AVI (KIAVI), Boruta and regularized RF (RRF) were tested based on predictive accuracy. Effects of highly correlated, important and unimportant predictors on the accuracy of RF predictive models were examined. Finally, spatial predictions generated using the most accurate models were visually examined and analysed. This study confirmed that: 1) hard90 and hard70 are effective seabed hardness classification schemes; 2) seabed hardness of four classes can be predicted with a high degree of accuracy; 3) the typical approach used to pre-select predictive variables by excluding highly correlated variables needs to be re-examined; 4) the identification of the important and unimportant predictors provides useful guidelines for further improving predictive models; 5) FS methods select the most accurate predictive model(s) instead of the most parsimonious ones, and AVI and Boruta are recommended for future studies; and 6) RF is an effective modelling method with high predictive accuracy for multi-level categorical data and can be applied to 'small p and large n' problems in environmental sciences. Additionally, automated computational programs for AVI need to be developed to increase its computational efficiency and caution should be taken when applying filter FS methods in selecting predictive models.
Chemically intuited, large-scale screening of MOFs by machine learning techniques

NASA Astrophysics Data System (ADS)

Borboudakis, Giorgos; Stergiannakos, Taxiarchis; Frysali, Maria; Klontzas, Emmanuel; Tsamardinos, Ioannis; Froudakis, George E.

2017-10-01

A novel computational methodology for large-scale screening of MOFs is applied to gas storage with the use of machine learning technologies. This approach is a promising trade-off between the accuracy of ab initio methods and the speed of classical approaches, strategically combined with chemical intuition. The results demonstrate that the chemical properties of MOFs are indeed predictable (stochastically, not deterministically) using machine learning methods and automated analysis protocols, with the accuracy of predictions increasing with sample size. Our initial results indicate that this methodology is promising to apply not only to gas storage in MOFs but in many other material science projects.

Firmness prediction in Prunus persica 'Calrico' peaches by visible/short-wave near infrared spectroscopy and acoustic measurements using optimised linear and non-linear chemometric models.

PubMed

Lafuente, Victoria; Herrera, Luis J; Pérez, María del Mar; Val, Jesús; Negueruela, Ignacio

2015-08-15

In this work, near infrared spectroscopy (NIR) and an acoustic measure (AWETA) (two non-destructive methods) were applied in Prunus persica fruit 'Calrico' (n = 260) to predict Magness-Taylor (MT) firmness. Separate and combined use of these measures was evaluated and compared using partial least squares (PLS) and least squares support vector machine (LS-SVM) regression methods. Also, a mutual-information-based variable selection method, seeking to find the most significant variables to produce optimal accuracy of the regression models, was applied to a joint set of variables (NIR wavelengths and AWETA measure). The newly proposed combined NIR-AWETA model gave good values of the determination coefficient (R(2)) for PLS and LS-SVM methods (0.77 and 0.78, respectively), improving the reliability of MT firmness prediction in comparison with separate NIR and AWETA predictions. The three variables selected by the variable selection method (AWETA measure plus NIR wavelengths 675 and 697 nm) achieved R(2) values 0.76 and 0.77, PLS and LS-SVM. These results indicated that the proposed mutual-information-based variable selection algorithm was a powerful tool for the selection of the most relevant variables. © 2014 Society of Chemical Industry.
Kernel-based whole-genome prediction of complex traits: a review.

PubMed

Morota, Gota; Gianola, Daniel

2014-01-01

Prediction of genetic values has been a focus of applied quantitative genetics since the beginning of the 20th century, with renewed interest following the advent of the era of whole genome-enabled prediction. Opportunities offered by the emergence of high-dimensional genomic data fueled by post-Sanger sequencing technologies, especially molecular markers, have driven researchers to extend Ronald Fisher and Sewall Wright's models to confront new challenges. In particular, kernel methods are gaining consideration as a regression method of choice for genome-enabled prediction. Complex traits are presumably influenced by many genomic regions working in concert with others (clearly so when considering pathways), thus generating interactions. Motivated by this view, a growing number of statistical approaches based on kernels attempt to capture non-additive effects, either parametrically or non-parametrically. This review centers on whole-genome regression using kernel methods applied to a wide range of quantitative traits of agricultural importance in animals and plants. We discuss various kernel-based approaches tailored to capturing total genetic variation, with the aim of arriving at an enhanced predictive performance in the light of available genome annotation information. Connections between prediction machines born in animal breeding, statistics, and machine learning are revisited, and their empirical prediction performance is discussed. Overall, while some encouraging results have been obtained with non-parametric kernels, recovering non-additive genetic variation in a validation dataset remains a challenge in quantitative genetics.
An analytical method to predict efficiency of aircraft gearboxes

NASA Technical Reports Server (NTRS)

Anderson, N. E.; Loewenthal, S. H.; Black, J. D.

1984-01-01

A spur gear efficiency prediction method previously developed by the authors was extended to include power loss of planetary gearsets. A friction coefficient model was developed for MIL-L-7808 oil based on disc machine data. This combined with the recent capability of predicting losses in spur gears of nonstandard proportions allows the calculation of power loss for complete aircraft gearboxes that utilize spur gears. The method was applied to the T56/501 turboprop gearbox and compared with measured test data. Bearing losses were calculated with large scale computer programs. Breakdowns of the gearbox losses point out areas for possible improvement.
A study of the limitations of linear theory methods as applied to sonic boom calculations

NASA Technical Reports Server (NTRS)

Darden, Christine M.

1990-01-01

Current sonic boom minimization theories have been reviewed to emphasize the capabilities and flexibilities of the methods. Flexibility is important because it is necessary for the designer to meet optimized area constraints while reducing the impact on vehicle aerodynamic performance. Preliminary comparisons of sonic booms predicted for two Mach 3 concepts illustrate the benefits of shaping. Finally, for very simple bodies of revolution, sonic boom predictions were made using two methods - a modified linear theory method and a nonlinear method - for signature shapes which were both farfield N-waves and midfield waves. Preliminary analysis on these simple bodies verified that current modified linear theory prediction methods become inadequate for predicting midfield signatures for Mach numbers above 3. The importance of impulse is sonic boom disturbance and the importance of three-dimensional effects which could not be simulated with the bodies of revolution will determine the validity of current modified linear theory methods in predicting midfield signatures at lower Mach numbers.
Protein Structure Prediction with Evolutionary Algorithms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hart, W.E.; Krasnogor, N.; Pelta, D.A.

1999-02-08

Evolutionary algorithms have been successfully applied to a variety of molecular structure prediction problems. In this paper we reconsider the design of genetic algorithms that have been applied to a simple protein structure prediction problem. Our analysis considers the impact of several algorithmic factors for this problem: the confirmational representation, the energy formulation and the way in which infeasible conformations are penalized, Further we empirically evaluated the impact of these factors on a small set of polymer sequences. Our analysis leads to specific recommendations for both GAs as well as other heuristic methods for solving PSP on the HP model.
Prediction Analysis for Measles Epidemics

NASA Astrophysics Data System (ADS)

Sumi, Ayako; Ohtomo, Norio; Tanaka, Yukio; Sawamura, Sadashi; Olsen, Lars Folke; Kobayashi, Nobumichi

2003-12-01

A newly devised procedure of prediction analysis, which is a linearized version of the nonlinear least squares method combined with the maximum entropy spectral analysis method, was proposed. This method was applied to time series data of measles case notification in several communities in the UK, USA and Denmark. The dominant spectral lines observed in each power spectral density (PSD) can be safely assigned as fundamental periods. The optimum least squares fitting (LSF) curve calculated using these fundamental periods can essentially reproduce the underlying variation of the measles data. An extension of the LSF curve can be used to predict measles case notification quantitatively. Some discussions including a predictability of chaotic time series are presented.
Sea surface temperature predictions using a multi-ocean analysis ensemble scheme

NASA Astrophysics Data System (ADS)

Zhang, Ying; Zhu, Jieshun; Li, Zhongxian; Chen, Haishan; Zeng, Gang

2017-08-01

This study examined the global sea surface temperature (SST) predictions by a so-called multiple-ocean analysis ensemble (MAE) initialization method which was applied in the National Centers for Environmental Prediction (NCEP) Climate Forecast System Version 2 (CFSv2). Different from most operational climate prediction practices which are initialized by a specific ocean analysis system, the MAE method is based on multiple ocean analyses. In the paper, the MAE method was first justified by analyzing the ocean temperature variability in four ocean analyses which all are/were applied for operational climate predictions either at the European Centre for Medium-range Weather Forecasts or at NCEP. It was found that these systems exhibit substantial uncertainties in estimating the ocean states, especially at the deep layers. Further, a set of MAE hindcasts was conducted based on the four ocean analyses with CFSv2, starting from each April during 1982-2007. The MAE hindcasts were verified against a subset of hindcasts from the NCEP CFS Reanalysis and Reforecast (CFSRR) Project. Comparisons suggested that MAE shows better SST predictions than CFSRR over most regions where ocean dynamics plays a vital role in SST evolutions, such as the El Niño and Atlantic Niño regions. Furthermore, significant improvements were also found in summer precipitation predictions over the equatorial eastern Pacific and Atlantic oceans, for which the local SST prediction improvements should be responsible. The prediction improvements by MAE imply a problem for most current climate predictions which are based on a specific ocean analysis system. That is, their predictions would drift towards states biased by errors inherent in their ocean initialization system, and thus have large prediction errors. In contrast, MAE arguably has an advantage by sampling such structural uncertainties, and could efficiently cancel these errors out in their predictions.
Assessment of data quality needs for use in transportation applications.

DOT National Transportation Integrated Search

2013-02-01

The objective of this project is to investigate the data quality measures and how they are applied to travel time prediction. This project showcases a shortterm travel time prediction method that takes into account the data needs of the realtim...
Prediction of turbulent shear layers in turbomachines

NASA Technical Reports Server (NTRS)

Bradshaw, P.

1974-01-01

The characteristics of turbulent shear layers in turbomachines are compared with the turbulent boundary layers on airfoils. Seven different aspects are examined. The limits of boundary layer theory are investigated. Boundary layer prediction methods are applied to analysis of the flow in turbomachines.
Space vehicle engine and heat shield environment review. Volume 1: Engineering analysis

NASA Technical Reports Server (NTRS)

Mcanelly, W. B.; Young, C. T. K.

1973-01-01

Methods for predicting the base heating characteristics of a multiple rocket engine installation are discussed. The environmental data is applied to the design of adequate protection system for the engine components. The methods for predicting the base region thermal environment are categorized as: (1) scale model testing, (2) extrapolation of previous and related flight test results, and (3) semiempirical analytical techniques.
Realizing drug repositioning by adapting a recommendation system to handle the process.

PubMed

Ozsoy, Makbule Guclin; Özyer, Tansel; Polat, Faruk; Alhajj, Reda

2018-04-12

Drug repositioning is the process of identifying new targets for known drugs. It can be used to overcome problems associated with traditional drug discovery by adapting existing drugs to treat new discovered diseases. Thus, it may reduce associated risk, cost and time required to identify and verify new drugs. Nowadays, drug repositioning has received more attention from industry and academia. To tackle this problem, researchers have applied many different computational methods and have used various features of drugs and diseases. In this study, we contribute to the ongoing research efforts by combining multiple features, namely chemical structures, protein interactions and side-effects to predict new indications of target drugs. To achieve our target, we realize drug repositioning as a recommendation process and this leads to a new perspective in tackling the problem. The utilized recommendation method is based on Pareto dominance and collaborative filtering. It can also integrate multiple data-sources and multiple features. For the computation part, we applied several settings and we compared their performance. Evaluation results show that the proposed method can achieve more concentrated predictions with high precision, where nearly half of the predictions are true. Compared to other state of the art methods described in the literature, the proposed method is better at making right predictions by having higher precision. The reported results demonstrate the applicability and effectiveness of recommendation methods for drug repositioning.
Hybrid Clustering-GWO-NARX neural network technique in predicting stock price

NASA Astrophysics Data System (ADS)

Das, Debashish; Safa Sadiq, Ali; Mirjalili, Seyedali; Noraziah, A.

2017-09-01

Prediction of stock price is one of the most challenging tasks due to nonlinear nature of the stock data. Though numerous attempts have been made to predict the stock price by applying various techniques, yet the predicted price is not always accurate and even the error rate is high to some extent. Consequently, this paper endeavours to determine an efficient stock prediction strategy by implementing a combinatorial method of Grey Wolf Optimizer (GWO), Clustering and Non Linear Autoregressive Exogenous (NARX) Technique. The study uses stock data from prominent stock market i.e. New York Stock Exchange (NYSE), NASDAQ and emerging stock market i.e. Malaysian Stock Market (Bursa Malaysia), Dhaka Stock Exchange (DSE). It applies K-means clustering algorithm to determine the most promising cluster, then MGWO is used to determine the classification rate and finally the stock price is predicted by applying NARX neural network algorithm. The prediction performance gained through experimentation is compared and assessed to guide the investors in making investment decision. The result through this technique is indeed promising as it has shown almost precise prediction and improved error rate. We have applied the hybrid Clustering-GWO-NARX neural network technique in predicting stock price. We intend to work with the effect of various factors in stock price movement and selection of parameters. We will further investigate the influence of company news either positive or negative in stock price movement. We would be also interested to predict the Stock indices.
ComplexContact: a web server for inter-protein contact prediction using deep learning.

PubMed

Zeng, Hong; Wang, Sheng; Zhou, Tianming; Zhao, Feifeng; Li, Xiufeng; Wu, Qing; Xu, Jinbo

2018-05-22

ComplexContact (http://raptorx2.uchicago.edu/ComplexContact/) is a web server for sequence-based interfacial residue-residue contact prediction of a putative protein complex. Interfacial residue-residue contacts are critical for understanding how proteins form complex and interact at residue level. When receiving a pair of protein sequences, ComplexContact first searches for their sequence homologs and builds two paired multiple sequence alignments (MSA), then it applies co-evolution analysis and a CASP-winning deep learning (DL) method to predict interfacial contacts from paired MSAs and visualizes the prediction as an image. The DL method was originally developed for intra-protein contact prediction and performed the best in CASP12. Our large-scale experimental test further shows that ComplexContact greatly outperforms pure co-evolution methods for inter-protein contact prediction, regardless of the species.
Modeling of BN Lifetime Prediction of a System Based on Integrated Multi-Level Information

PubMed Central

Wang, Xiaohong; Wang, Lizhi

2017-01-01

Predicting system lifetime is important to ensure safe and reliable operation of products, which requires integrated modeling based on multi-level, multi-sensor information. However, lifetime characteristics of equipment in a system are different and failure mechanisms are inter-coupled, which leads to complex logical correlations and the lack of a uniform lifetime measure. Based on a Bayesian network (BN), a lifetime prediction method for systems that combine multi-level sensor information is proposed. The method considers the correlation between accidental failures and degradation failure mechanisms, and achieves system modeling and lifetime prediction under complex logic correlations. This method is applied in the lifetime prediction of a multi-level solar-powered unmanned system, and the predicted results can provide guidance for the improvement of system reliability and for the maintenance and protection of the system. PMID:28926930
Modeling of BN Lifetime Prediction of a System Based on Integrated Multi-Level Information.

PubMed

Wang, Jingbin; Wang, Xiaohong; Wang, Lizhi

2017-09-15

Predicting system lifetime is important to ensure safe and reliable operation of products, which requires integrated modeling based on multi-level, multi-sensor information. However, lifetime characteristics of equipment in a system are different and failure mechanisms are inter-coupled, which leads to complex logical correlations and the lack of a uniform lifetime measure. Based on a Bayesian network (BN), a lifetime prediction method for systems that combine multi-level sensor information is proposed. The method considers the correlation between accidental failures and degradation failure mechanisms, and achieves system modeling and lifetime prediction under complex logic correlations. This method is applied in the lifetime prediction of a multi-level solar-powered unmanned system, and the predicted results can provide guidance for the improvement of system reliability and for the maintenance and protection of the system.
CALCULATION OF NONLINEAR CONFIDENCE AND PREDICTION INTERVALS FOR GROUND-WATER FLOW MODELS.

USGS Publications Warehouse

Cooley, Richard L.; Vecchia, Aldo V.

1987-01-01

A method is derived to efficiently compute nonlinear confidence and prediction intervals on any function of parameters derived as output from a mathematical model of a physical system. The method is applied to the problem of obtaining confidence and prediction intervals for manually-calibrated ground-water flow models. To obtain confidence and prediction intervals resulting from uncertainties in parameters, the calibrated model and information on extreme ranges and ordering of the model parameters within one or more independent groups are required. If random errors in the dependent variable are present in addition to uncertainties in parameters, then calculation of prediction intervals also requires information on the extreme range of error expected. A simple Monte Carlo method is used to compute the quantiles necessary to establish probability levels for the confidence and prediction intervals. Application of the method to a hypothetical example showed that inclusion of random errors in the dependent variable in addition to uncertainties in parameters can considerably widen the prediction intervals.
Improving lung cancer prognosis assessment by incorporating synthetic minority oversampling technique and score fusion method

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yan, Shiju; Qian, Wei; Guan, Yubao

2016-06-15

Purpose: This study aims to investigate the potential to improve lung cancer recurrence risk prediction performance for stage I NSCLS patients by integrating oversampling, feature selection, and score fusion techniques and develop an optimal prediction model. Methods: A dataset involving 94 early stage lung cancer patients was retrospectively assembled, which includes CT images, nine clinical and biological (CB) markers, and outcome of 3-yr disease-free survival (DFS) after surgery. Among the 94 patients, 74 remained DFS and 20 had cancer recurrence. Applying a computer-aided detection scheme, tumors were segmented from the CT images and 35 quantitative image (QI) features were initiallymore » computed. Two normalized Gaussian radial basis function network (RBFN) based classifiers were built based on QI features and CB markers separately. To improve prediction performance, the authors applied a synthetic minority oversampling technique (SMOTE) and a BestFirst based feature selection method to optimize the classifiers and also tested fusion methods to combine QI and CB based prediction results. Results: Using a leave-one-case-out cross-validation (K-fold cross-validation) method, the computed areas under a receiver operating characteristic curve (AUCs) were 0.716 ± 0.071 and 0.642 ± 0.061, when using the QI and CB based classifiers, respectively. By fusion of the scores generated by the two classifiers, AUC significantly increased to 0.859 ± 0.052 (p < 0.05) with an overall prediction accuracy of 89.4%. Conclusions: This study demonstrated the feasibility of improving prediction performance by integrating SMOTE, feature selection, and score fusion techniques. Combining QI features and CB markers and performing SMOTE prior to feature selection in classifier training enabled RBFN based classifier to yield improved prediction accuracy.« less
How can machine-learning methods assist in virtual screening for hyperuricemia? A healthcare machine-learning approach.

PubMed

Ichikawa, Daisuke; Saito, Toki; Ujita, Waka; Oyama, Hiroshi

2016-12-01

Our purpose was to develop a new machine-learning approach (a virtual health check-up) toward identification of those at high risk of hyperuricemia. Applying the system to general health check-ups is expected to reduce medical costs compared with administering an additional test. Data were collected during annual health check-ups performed in Japan between 2011 and 2013 (inclusive). We prepared training and test datasets from the health check-up data to build prediction models; these were composed of 43,524 and 17,789 persons, respectively. Gradient-boosting decision tree (GBDT), random forest (RF), and logistic regression (LR) approaches were trained using the training dataset and were then used to predict hyperuricemia in the test dataset. Undersampling was applied to build the prediction models to deal with the imbalanced class dataset. The results showed that the RF and GBDT approaches afforded the best performances in terms of sensitivity and specificity, respectively. The area under the curve (AUC) values of the models, which reflected the total discriminative ability of the classification, were 0.796 [95% confidence interval (CI): 0.766-0.825] for the GBDT, 0.784 [95% CI: 0.752-0.815] for the RF, and 0.785 [95% CI: 0.752-0.819] for the LR approaches. No significant differences were observed between pairs of each approach. Small changes occurred in the AUCs after applying undersampling to build the models. We developed a virtual health check-up that predicted the development of hyperuricemia using machine-learning methods. The GBDT, RF, and LR methods had similar predictive capability. Undersampling did not remarkably improve predictive power. Copyright Â© 2016 Elsevier Inc. All rights reserved.
Yeast Prions and Human Prion-like Proteins: Sequence Features and Prediction Methods

PubMed Central

Cascarina, Sean; Ross, Eric D.

2014-01-01

Prions are self-propagating infectious protein isoforms. A growing number of prions have been identified in yeast, each resulting from the conversion of soluble proteins into an insoluble amyloid form. These yeast prions have served as a powerful model system for studying the causes and consequences of prion aggregation. Remarkably, a number of human proteins containing prion-like domains, defined as domains with compositional similarity to yeast prion domains, have recently been linked to various human degenerative diseases, including amyotrophic lateral sclerosis (ALS). This suggests that the lessons learned from yeast prions may help in understanding these human diseases. In this review, we examine what has been learned about the amino acid sequence basis for prion aggregation in yeast, and how this information has been used to develop methods to predict aggregation propensity. We then discuss how this information is being applied to understand human disease, and the challenges involved in applying yeast prediction methods to higher organisms. PMID:24390581
Yeast prions and human prion-like proteins: sequence features and prediction methods.

PubMed

Cascarina, Sean M; Ross, Eric D

2014-06-01

Prions are self-propagating infectious protein isoforms. A growing number of prions have been identified in yeast, each resulting from the conversion of soluble proteins into an insoluble amyloid form. These yeast prions have served as a powerful model system for studying the causes and consequences of prion aggregation. Remarkably, a number of human proteins containing prion-like domains, defined as domains with compositional similarity to yeast prion domains, have recently been linked to various human degenerative diseases, including amyotrophic lateral sclerosis. This suggests that the lessons learned from yeast prions may help in understanding these human diseases. In this review, we examine what has been learned about the amino acid sequence basis for prion aggregation in yeast, and how this information has been used to develop methods to predict aggregation propensity. We then discuss how this information is being applied to understand human disease, and the challenges involved in applying yeast prediction methods to higher organisms.

Ordinal convolutional neural networks for predicting RDoC positive valence psychiatric symptom severity scores.

PubMed

Rios, Anthony; Kavuluru, Ramakanth

2017-11-01

The CEGS N-GRID 2016 Shared Task in Clinical Natural Language Processing (NLP) provided a set of 1000 neuropsychiatric notes to participants as part of a competition to predict psychiatric symptom severity scores. This paper summarizes our methods, results, and experiences based on our participation in the second track of the shared task. Classical methods of text classification usually fall into one of three problem types: binary, multi-class, and multi-label classification. In this effort, we study ordinal regression problems with text data where misclassifications are penalized differently based on how far apart the ground truth and model predictions are on the ordinal scale. Specifically, we present our entries (methods and results) in the N-GRID shared task in predicting research domain criteria (RDoC) positive valence ordinal symptom severity scores (absent, mild, moderate, and severe) from psychiatric notes. We propose a novel convolutional neural network (CNN) model designed to handle ordinal regression tasks on psychiatric notes. Broadly speaking, our model combines an ordinal loss function, a CNN, and conventional feature engineering (wide features) into a single model which is learned end-to-end. Given interpretability is an important concern with nonlinear models, we apply a recent approach called locally interpretable model-agnostic explanation (LIME) to identify important words that lead to instance specific predictions. Our best model entered into the shared task placed third among 24 teams and scored a macro mean absolute error (MMAE) based normalized score (100·(1-MMAE)) of 83.86. Since the competition, we improved our score (using basic ensembling) to 85.55, comparable with the winning shared task entry. Applying LIME to model predictions, we demonstrate the feasibility of instance specific prediction interpretation by identifying words that led to a particular decision. In this paper, we present a method that successfully uses wide features and an ordinal loss function applied to convolutional neural networks for ordinal text classification specifically in predicting psychiatric symptom severity scores. Our approach leads to excellent performance on the N-GRID shared task and is also amenable to interpretability using existing model-agnostic approaches. Copyright © 2017 Elsevier Inc. All rights reserved.
Bootstrap Prediction Intervals in Non-Parametric Regression with Applications to Anomaly Detection

NASA Technical Reports Server (NTRS)

Kumar, Sricharan; Srivistava, Ashok N.

2012-01-01

Prediction intervals provide a measure of the probable interval in which the outputs of a regression model can be expected to occur. Subsequently, these prediction intervals can be used to determine if the observed output is anomalous or not, conditioned on the input. In this paper, a procedure for determining prediction intervals for outputs of nonparametric regression models using bootstrap methods is proposed. Bootstrap methods allow for a non-parametric approach to computing prediction intervals with no specific assumptions about the sampling distribution of the noise or the data. The asymptotic fidelity of the proposed prediction intervals is theoretically proved. Subsequently, the validity of the bootstrap based prediction intervals is illustrated via simulations. Finally, the bootstrap prediction intervals are applied to the problem of anomaly detection on aviation data.
Prediction of the Possibility a Right-Turn Driving Behavior at Intersection Leads to an Accident by Detecting Deviation of the Situation from Usual when the Behavior is Observed

NASA Astrophysics Data System (ADS)

Hayashi, Toshinori; Yamada, Keiichi

Deviation of driving behavior from usual could be a sign of human error that increases the risk of traffic accidents. This paper proposes a novel method for predicting the possibility a driving behavior leads to an accident from the information on the driving behavior and the situation. In a previous work, a method of predicting the possibility by detecting the deviation of driving behavior from usual one in that situation has been proposed. In contrast, the method proposed in this paper predicts the possibility by detecting the deviation of the situation from usual one when the behavior is observed. An advantage of the proposed method is the number of the required models is independent of the variety of the situations. The method was applied to a problem of predicting accidents by right-turn driving behavior at an intersection, and the performance of the method was evaluated by experiments on a driving simulator.
Prediction-Correction Algorithms for Time-Varying Constrained Optimization

DOE PAGES

Simonetto, Andrea; Dall'Anese, Emiliano

2017-07-26

This article develops online algorithms to track solutions of time-varying constrained optimization problems. Particularly, resembling workhorse Kalman filtering-based approaches for dynamical systems, the proposed methods involve prediction-correction steps to provably track the trajectory of the optimal solutions of time-varying convex problems. The merits of existing prediction-correction methods have been shown for unconstrained problems and for setups where computing the inverse of the Hessian of the cost function is computationally affordable. This paper addresses the limitations of existing methods by tackling constrained problems and by designing first-order prediction steps that rely on the Hessian of the cost function (and do notmore » require the computation of its inverse). In addition, the proposed methods are shown to improve the convergence speed of existing prediction-correction methods when applied to unconstrained problems. Numerical simulations corroborate the analytical results and showcase performance and benefits of the proposed algorithms. A realistic application of the proposed method to real-time control of energy resources is presented.« less
Extensions to decision curve analysis, a novel method for evaluating diagnostic tests, prediction models and molecular markers.

PubMed

Vickers, Andrew J; Cronin, Angel M; Elkin, Elena B; Gonen, Mithat

2008-11-26

Decision curve analysis is a novel method for evaluating diagnostic tests, prediction models and molecular markers. It combines the mathematical simplicity of accuracy measures, such as sensitivity and specificity, with the clinical applicability of decision analytic approaches. Most critically, decision curve analysis can be applied directly to a data set, and does not require the sort of external data on costs, benefits and preferences typically required by traditional decision analytic techniques. In this paper we present several extensions to decision curve analysis including correction for overfit, confidence intervals, application to censored data (including competing risk) and calculation of decision curves directly from predicted probabilities. All of these extensions are based on straightforward methods that have previously been described in the literature for application to analogous statistical techniques. Simulation studies showed that repeated 10-fold crossvalidation provided the best method for correcting a decision curve for overfit. The method for applying decision curves to censored data had little bias and coverage was excellent; for competing risk, decision curves were appropriately affected by the incidence of the competing risk and the association between the competing risk and the predictor of interest. Calculation of decision curves directly from predicted probabilities led to a smoothing of the decision curve. Decision curve analysis can be easily extended to many of the applications common to performance measures for prediction models. Software to implement decision curve analysis is provided.
A Data Driven Model for Predicting RNA-Protein Interactions based on Gradient Boosting Machine.

PubMed

Jain, Dharm Skandh; Gupte, Sanket Rajan; Aduri, Raviprasad

2018-06-22

RNA protein interactions (RPI) play a pivotal role in the regulation of various biological processes. Experimental validation of RPI has been time-consuming, paving the way for computational prediction methods. The major limiting factor of these methods has been the accuracy and confidence of the predictions, and our in-house experiments show that they fail to accurately predict RPI involving short RNA sequences such as TERRA RNA. Here, we present a data-driven model for RPI prediction using a gradient boosting classifier. Amino acids and nucleotides are classified based on the high-resolution structural data of RNA protein complexes. The minimum structural unit consisting of five residues is used as the descriptor. Comparative analysis of existing methods shows the consistently higher performance of our method irrespective of the length of RNA present in the RPI. The method has been successfully applied to map RPI networks involving both long noncoding RNA as well as TERRA RNA. The method is also shown to successfully predict RNA and protein hubs present in RPI networks of four different organisms. The robustness of this method will provide a way for predicting RPI networks of yet unknown interactions for both long noncoding RNA and microRNA.
NAPR: a Cloud-Based Framework for Neuroanatomical Age Prediction.

PubMed

Pardoe, Heath R; Kuzniecky, Ruben

2018-01-01

The availability of cloud computing services has enabled the widespread adoption of the "software as a service" (SaaS) approach for software distribution, which utilizes network-based access to applications running on centralized servers. In this paper we apply the SaaS approach to neuroimaging-based age prediction. Our system, named "NAPR" (Neuroanatomical Age Prediction using R), provides access to predictive modeling software running on a persistent cloud-based Amazon Web Services (AWS) compute instance. The NAPR framework allows external users to estimate the age of individual subjects using cortical thickness maps derived from their own locally processed T1-weighted whole brain MRI scans. As a demonstration of the NAPR approach, we have developed two age prediction models that were trained using healthy control data from the ABIDE, CoRR, DLBS and NKI Rockland neuroimaging datasets (total N = 2367, age range 6-89 years). The provided age prediction models were trained using (i) relevance vector machines and (ii) Gaussian processes machine learning methods applied to cortical thickness surfaces obtained using Freesurfer v5.3. We believe that this transparent approach to out-of-sample evaluation and comparison of neuroimaging age prediction models will facilitate the development of improved age prediction models and allow for robust evaluation of the clinical utility of these methods.
Statistical Prediction in Proprietary Rehabilitation.

ERIC Educational Resources Information Center

Johnson, Kurt L.; And Others

1987-01-01

Applied statistical methods to predict case expenditures for low back pain rehabilitation cases in proprietary rehabilitation. Extracted predictor variables from case records of 175 workers compensation claimants with some degree of permanent disability due to back injury. Performed several multiple regression analyses resulting in a formula that…
A Hierarchical Model Predictive Tracking Control for Independent Four-Wheel Driving/Steering Vehicles with Coaxial Steering Mechanism

NASA Astrophysics Data System (ADS)

Itoh, Masato; Hagimori, Yuki; Nonaka, Kenichiro; Sekiguchi, Kazuma

2016-09-01

In this study, we apply a hierarchical model predictive control to omni-directional mobile vehicle, and improve the tracking performance. We deal with an independent four-wheel driving/steering vehicle (IFWDS) equipped with four coaxial steering mechanisms (CSM). The coaxial steering mechanism is a special one composed of two steering joints on the same axis. In our previous study with respect to IFWDS with ideal steering, we proposed a model predictive tracking control. However, this method did not consider constraints of the coaxial steering mechanism which causes delay of steering. We also proposed a model predictive steering control considering constraints of this mechanism. In this study, we propose a hierarchical system combining above two control methods for IFWDS. An upper controller, which deals with vehicle kinematics, runs a model predictive tracking control, and a lower controller, which considers constraints of coaxial steering mechanism, runs a model predictive steering control which tracks the predicted steering angle optimized an upper controller. We verify the superiority of this method by comparing this method with the previous method.
Thermal expansion coefficient prediction of fuel-cell seal materials from silica sand

NASA Astrophysics Data System (ADS)

Hidayat, Nurul; Triwikantoro, Baqiya, Malik A.; Pratapa, Suminar

2013-09-01

This study is focused on the prediction of coefficient of thermal expansion (CTE) of silica-sand-based fuel-cell seal materials (FcSMs) which in principle require a CTE value in the range of 9.5-12 ppm/°C. A semi-quantitative theoretical method to predict the CTE value is proposed by applying the analyzed phase compositions from XRD data and characterized density-porosity behavior. A typical silica sand was milled at 150 rpm for 1 hour followed by heating at 1000 °C for another hour. The sand and heated samples were characterized by means of XRD to perceive the phase composition correlation between them. Rietveld refinement was executed to investigate the weight fraction of the phase contained in the samples, and then converted to volume fraction for composite CTE calculations. The result was applied to predict their potential physical properties for FcSM. Porosity was taken into account in the calculation after which it was directly measured by the Archimedes method.
Short wavelength Raman spectroscopy applied to the discrimination and characterization of three cultivars of extra virgin olive oils in different maturation stages.

PubMed

Gouvinhas, Irene; Machado, Nelson; Carvalho, Teresa; de Almeida, José M M M; Barros, Ana I R N A

2015-01-01

Extra virgin olive oils produced from three cultivars on different maturation stages were characterized using Raman spectroscopy. Chemometric methods (principal component analysis, discriminant analysis, principal component regression and partial least squares regression) applied to Raman spectral data were utilized to evaluate and quantify the statistical differences between cultivars and their ripening process. The models for predicting the peroxide value and free acidity of olive oils showed good calibration and prediction values and presented high coefficients of determination (>0.933). Both the R(2), and the correlation equations between the measured chemical parameters, and the values predicted by each approach are presented; these comprehend both PCR and PLS, used to assess SNV normalized Raman data, as well as first and second derivative of the spectra. This study demonstrates that a combination of Raman spectroscopy with multivariate analysis methods can be useful to predict rapidly olive oil chemical characteristics during the maturation process. Copyright © 2014 Elsevier B.V. All rights reserved.
Neural networks for dimensionality reduction of fluorescence spectra and prediction of drinking water disinfection by-products.

PubMed

Peleato, Nicolas M; Legge, Raymond L; Andrews, Robert C

2018-06-01

The use of fluorescence data coupled with neural networks for improved predictability of drinking water disinfection by-products (DBPs) was investigated. Novel application of autoencoders to process high-dimensional fluorescence data was related to common dimensionality reduction techniques of parallel factors analysis (PARAFAC) and principal component analysis (PCA). The proposed method was assessed based on component interpretability as well as for prediction of organic matter reactivity to formation of DBPs. Optimal prediction accuracies on a validation dataset were observed with an autoencoder-neural network approach or by utilizing the full spectrum without pre-processing. Latent representation by an autoencoder appeared to mitigate overfitting when compared to other methods. Although DBP prediction error was minimized by other pre-processing techniques, PARAFAC yielded interpretable components which resemble fluorescence expected from individual organic fluorophores. Through analysis of the network weights, fluorescence regions associated with DBP formation can be identified, representing a potential method to distinguish reactivity between fluorophore groupings. However, distinct results due to the applied dimensionality reduction approaches were observed, dictating a need for considering the role of data pre-processing in the interpretability of the results. In comparison to common organic measures currently used for DBP formation prediction, fluorescence was shown to improve prediction accuracies, with improvements to DBP prediction best realized when appropriate pre-processing and regression techniques were applied. The results of this study show promise for the potential application of neural networks to best utilize fluorescence EEM data for prediction of organic matter reactivity. Copyright © 2018 Elsevier Ltd. All rights reserved.
Applied Distributed Model Predictive Control for Energy Efficient Buildings and Ramp Metering

NASA Astrophysics Data System (ADS)

Koehler, Sarah Muraoka

Industrial large-scale control problems present an interesting algorithmic design challenge. A number of controllers must cooperate in real-time on a network of embedded hardware with limited computing power in order to maximize system efficiency while respecting constraints and despite communication delays. Model predictive control (MPC) can automatically synthesize a centralized controller which optimizes an objective function subject to a system model, constraints, and predictions of disturbance. Unfortunately, the computations required by model predictive controllers for large-scale systems often limit its industrial implementation only to medium-scale slow processes. Distributed model predictive control (DMPC) enters the picture as a way to decentralize a large-scale model predictive control problem. The main idea of DMPC is to split the computations required by the MPC problem amongst distributed processors that can compute in parallel and communicate iteratively to find a solution. Some popularly proposed solutions are distributed optimization algorithms such as dual decomposition and the alternating direction method of multipliers (ADMM). However, these algorithms ignore two practical challenges: substantial communication delays present in control systems and also problem non-convexity. This thesis presents two novel and practically effective DMPC algorithms. The first DMPC algorithm is based on a primal-dual active-set method which achieves fast convergence, making it suitable for large-scale control applications which have a large communication delay across its communication network. In particular, this algorithm is suited for MPC problems with a quadratic cost, linear dynamics, forecasted demand, and box constraints. We measure the performance of this algorithm and show that it significantly outperforms both dual decomposition and ADMM in the presence of communication delay. The second DMPC algorithm is based on an inexact interior point method which is suited for nonlinear optimization problems. The parallel computation of the algorithm exploits iterative linear algebra methods for the main linear algebra computations in the algorithm. We show that the splitting of the algorithm is flexible and can thus be applied to various distributed platform configurations. The two proposed algorithms are applied to two main energy and transportation control problems. The first application is energy efficient building control. Buildings represent 40% of energy consumption in the United States. Thus, it is significant to improve the energy efficiency of buildings. The goal is to minimize energy consumption subject to the physics of the building (e.g. heat transfer laws), the constraints of the actuators as well as the desired operating constraints (thermal comfort of the occupants), and heat load on the system. In this thesis, we describe the control systems of forced air building systems in practice. We discuss the "Trim and Respond" algorithm which is a distributed control algorithm that is used in practice, and show that it performs similarly to a one-step explicit DMPC algorithm. Then, we apply the novel distributed primal-dual active-set method and provide extensive numerical results for the building MPC problem. The second main application is the control of ramp metering signals to optimize traffic flow through a freeway system. This application is particularly important since urban congestion has more than doubled in the past few decades. The ramp metering problem is to maximize freeway throughput subject to freeway dynamics (derived from mass conservation), actuation constraints, freeway capacity constraints, and predicted traffic demand. In this thesis, we develop a hybrid model predictive controller for ramp metering that is guaranteed to be persistently feasible and stable. This contrasts to previous work on MPC for ramp metering where such guarantees are absent. We apply a smoothing method to the hybrid model predictive controller and apply the inexact interior point method to this nonlinear non-convex ramp metering problem.
Railroad Tie Responses to Directly Applied Rail Seat Loading in Ballasted Tracks : A Computational Study.

DOT National Transportation Integrated Search

2012-04-17

This paper describes work in-progress that applies the : finite element (FE) method in predicting the responses of : individual railroad crossties to rail seat pressure loading in a : ballasted track. Both wood and prestressed concrete crossties : ar...
Applying Sigma Metrics to Reduce Outliers.

PubMed

Litten, Joseph

2017-03-01

Sigma metrics can be used to predict assay quality, allowing easy comparison of instrument quality and predicting which tests will require minimal quality control (QC) rules to monitor the performance of the method. A Six Sigma QC program can result in fewer controls and fewer QC failures for methods with a sigma metric of 5 or better. The higher the number of methods with a sigma metric of 5 or better, the lower the costs for reagents, supplies, and control material required to monitor the performance of the methods. Copyright © 2016 Elsevier Inc. All rights reserved.
Integration of a supersonic unsteady aerodynamic code into the NASA FASTEX system

NASA Technical Reports Server (NTRS)

Appa, Kari; Smith, Michael J. C.

1987-01-01

A supersonic unsteady aerodynamic loads prediction method based on the constant pressure method was integrated into the NASA FASTEX system. The updated FASTEX code can be employed for aeroelastic analyses in subsonic and supersonic flow regimes. A brief description of the supersonic constant pressure panel method, as applied to lifting surfaces and body configurations, is followed by a documentation of updates required to incorporate this method in the FASTEX code. Test cases showing correlations of predicted pressure distributions, flutter solutions, and stability derivatives with available data are reported.
LAI inversion from optical reflectance using a neural network trained with a multiple scattering model

NASA Technical Reports Server (NTRS)

Smith, James A.

1992-01-01

The inversion of the leaf area index (LAI) canopy parameter from optical spectral reflectance measurements is obtained using a backpropagation artificial neural network trained using input-output pairs generated by a multiple scattering reflectance model. The problem of LAI estimation over sparse canopies (LAI < 1.0) with varying soil reflectance backgrounds is particularly difficult. Standard multiple regression methods applied to canopies within a single homogeneous soil type yield good results but perform unacceptably when applied across soil boundaries, resulting in absolute percentage errors of >1000 percent for low LAI. Minimization methods applied to merit functions constructed from differences between measured reflectances and predicted reflectances using multiple-scattering models are unacceptably sensitive to a good initial guess for the desired parameter. In contrast, the neural network reported generally yields absolute percentage errors of <30 percent when weighting coefficients trained on one soil type were applied to predicted canopy reflectance at a different soil background.
Study on for soluble solids contents measurement of grape juice beverage based on Vis/NIRS and chemomtrics

NASA Astrophysics Data System (ADS)

Wu, Di; He, Yong

2007-11-01

The aim of this study is to investigate the potential of the visible and near infrared spectroscopy (Vis/NIRS) technique for non-destructive measurement of soluble solids contents (SSC) in grape juice beverage. 380 samples were studied in this paper. Smoothing way of Savitzky-Golay and standard normal variate were applied for the pre-processing of spectral data. Least-squares support vector machines (LS-SVM) with RBF kernel function was applied to developing the SSC prediction model based on the Vis/NIRS absorbance data. The determination coefficient for prediction (Rp2) of the results predicted by LS-SVM model was 0. 962 and root mean square error (RMSEP) was 0. 434137. It is concluded that Vis/NIRS technique can quantify the SSC of grape juice beverage fast and non-destructively.. At the same time, LS-SVM model was compared with PLS and back propagation neural network (BP-NN) methods. The results showed that LS-SVM was superior to the conventional linear and non-linear methods in predicting SSC of grape juice beverage. In this study, the generation ability of LS-SVM, PLS and BP-NN models were also investigated. It is concluded that LS-SVM regression method is a promising technique for chemometrics in quantitative prediction.
Identification of driving network of cellular differentiation from single sample time course gene expression data

NASA Astrophysics Data System (ADS)

Chen, Ye; Wolanyk, Nathaniel; Ilker, Tunc; Gao, Shouguo; Wang, Xujing

Methods developed based on bifurcation theory have demonstrated their potential in driving network identification for complex human diseases, including the work by Chen, et al. Recently bifurcation theory has been successfully applied to model cellular differentiation. However, there one often faces a technical challenge in driving network prediction: time course cellular differentiation study often only contains one sample at each time point, while driving network prediction typically require multiple samples at each time point to infer the variation and interaction structures of candidate genes for the driving network. In this study, we investigate several methods to identify both the critical time point and the driving network through examination of how each time point affects the autocorrelation and phase locking. We apply these methods to a high-throughput sequencing (RNA-Seq) dataset of 42 subsets of thymocytes and mature peripheral T cells at multiple time points during their differentiation (GSE48138 from GEO). We compare the predicted driving genes with known transcription regulators of cellular differentiation. We will discuss the advantages and limitations of our proposed methods, as well as potential further improvements of our methods.
Taxi-Out Time Prediction for Departures at Charlotte Airport Using Machine Learning Techniques

NASA Technical Reports Server (NTRS)

Lee, Hanbong; Malik, Waqar; Jung, Yoon C.

2016-01-01

Predicting the taxi-out times of departures accurately is important for improving airport efficiency and takeoff time predictability. In this paper, we attempt to apply machine learning techniques to actual traffic data at Charlotte Douglas International Airport for taxi-out time prediction. To find the key factors affecting aircraft taxi times, surface surveillance data is first analyzed. From this data analysis, several variables, including terminal concourse, spot, runway, departure fix and weight class, are selected for taxi time prediction. Then, various machine learning methods such as linear regression, support vector machines, k-nearest neighbors, random forest, and neural networks model are applied to actual flight data. Different traffic flow and weather conditions at Charlotte airport are also taken into account for more accurate prediction. The taxi-out time prediction results show that linear regression and random forest techniques can provide the most accurate prediction in terms of root-mean-square errors. We also discuss the operational complexity and uncertainties that make it difficult to predict the taxi times accurately.

Recovery of known T-cell epitopes by computational scanning of a viral genome

NASA Astrophysics Data System (ADS)

Logean, Antoine; Rognan, Didier

2002-04-01

A new computational method (EpiDock) is proposed for predicting peptide binding to class I MHC proteins, from the amino acid sequence of any protein of immunological interest. Starting from the primary structure of the target protein, individual three-dimensional structures of all possible MHC-peptide (8-, 9- and 10-mers) complexes are obtained by homology modelling. A free energy scoring function (Fresno) is then used to predict the absolute binding free energy of all possible peptides to the class I MHC restriction protein. Assuming that immunodominant epitopes are usually found among the top MHC binders, the method can thus be applied to predict the location of immunogenic peptides on the sequence of the protein target. When applied to the prediction of HLA-A*0201-restricted T-cell epitopes from the Hepatitis B virus, EpiDock was able to recover 92% of known high affinity binders and 80% of known epitopes within a filtered subset of all possible nonapeptides corresponding to about one tenth of the full theoretical list. The proposed method is fully automated and fast enough to scan a viral genome in less than an hour on a parallel computing architecture. As it requires very few starting experimental data, EpiDock can be used: (i) to predict potential T-cell epitopes from viral genomes (ii) to roughly predict still unknown peptide binding motifs for novel class I MHC alleles.
Using General Outcome Measures to Predict Student Performance on State-Mandated Assessments: An Applied Approach for Establishing Predictive Cutscores

ERIC Educational Resources Information Center

Leblanc, Michael; Dufore, Emily; McDougal, James

2012-01-01

Cutscores for reading and math (general outcome measures) to predict passage on New York state-mandated assessments were created by using a freely available Excel workbook. The authors used linear regression to create the cutscores and diagnostic indicators were provided. A rationale and procedure for using this method is outlined. This method…
Disease Modeling via Large-Scale Network Analysis

DTIC Science & Technology

2015-05-20

SECURITY CLASSIFICATION OF: A central goal of genetics is to learn how the genotype of an organism determines its phenotype. We address the implicit...guarantees for the methods. In the past, we have developed predictive methods general enough to apply to potentially any genetic trait, varying from... genetics is to learn how the genotype of an organism determines its phenotype. We address the implicit problem of predicting the association of genes with
A simple method to predict regional fish abundance: an example in the McKenzie River Basin, Oregon

Treesearch

D.J. McGarvey; J.M. Johnston

2011-01-01

Regional assessments of fisheries resources are increasingly called for, but tools with which to perform them are limited. We present a simple method that can be used to estimate regional carrying capacity and apply it to the McKenzie River Basin, Oregon. First, we use a macroecological model to predict trout densities within small, medium, and large streams in the...
Prediction of mean monthly river discharges in Colombia through Empirical Mode Decomposition

NASA Astrophysics Data System (ADS)

Carmona, A. M.; Poveda, G.

2015-04-01

The hydro-climatology of Colombia exhibits strong natural variability at a broad range of time scales including: inter-decadal, decadal, inter-annual, annual, intra-annual, intra-seasonal, and diurnal. Diverse applied sectors rely on quantitative predictions of river discharges for operational purposes including hydropower generation, agriculture, human health, fluvial navigation, territorial planning and management, risk preparedness and mitigation, among others. Various methodologies have been used to predict monthly mean river discharges that are based on "Predictive Analytics", an area of statistical analysis that studies the extraction of information from historical data to infer future trends and patterns. Our study couples the Empirical Mode Decomposition (EMD) with traditional methods, e.g. Autoregressive Model of Order 1 (AR1) and Neural Networks (NN), to predict mean monthly river discharges in Colombia, South America. The EMD allows us to decompose the historical time series of river discharges into a finite number of intrinsic mode functions (IMF) that capture the different oscillatory modes of different frequencies associated with the inherent time scales coexisting simultaneously in the signal (Huang et al. 1998, Huang and Wu 2008, Rao and Hsu, 2008). Our predictive method states that it is easier and simpler to predict each IMF at a time and then add them up together to obtain the predicted river discharge for a certain month, than predicting the full signal. This method is applied to 10 series of monthly mean river discharges in Colombia, using calibration periods of more than 25 years, and validation periods of about 12 years. Predictions are performed for time horizons spanning from 1 to 12 months. Our results show that predictions obtained through the traditional methods improve when the EMD is used as a previous step, since errors decrease by up to 13% when the AR1 model is used, and by up to 18% when using Neural Networks is combined with the EMD.
Prediction of pH of cola beverage using Vis/NIR spectroscopy and least squares-support vector machine

NASA Astrophysics Data System (ADS)

Liu, Fei; He, Yong

2008-02-01

Visible and near infrared (Vis/NIR) transmission spectroscopy and chemometric methods were utilized to predict the pH values of cola beverages. Five varieties of cola were prepared and 225 samples (45 samples for each variety) were selected for the calibration set, while 75 samples (15 samples for each variety) for the validation set. The smoothing way of Savitzky-Golay and standard normal variate (SNV) followed by first-derivative were used as the pre-processing methods. Partial least squares (PLS) analysis was employed to extract the principal components (PCs) which were used as the inputs of least squares-support vector machine (LS-SVM) model according to their accumulative reliabilities. Then LS-SVM with radial basis function (RBF) kernel function and a two-step grid search technique were applied to build the regression model with a comparison of PLS regression. The correlation coefficient (r), root mean square error of prediction (RMSEP) and bias were 0.961, 0.040 and 0.012 for PLS, while 0.975, 0.031 and 4.697x10 -3 for LS-SVM, respectively. Both methods obtained a satisfying precision. The results indicated that Vis/NIR spectroscopy combined with chemometric methods could be applied as an alternative way for the prediction of pH of cola beverages.
Predicting Flory-Huggins χ from Simulations

NASA Astrophysics Data System (ADS)

Zhang, Wenlin; Gomez, Enrique D.; Milner, Scott T.

2017-07-01

We introduce a method, based on a novel thermodynamic integration scheme, to extract the Flory-Huggins χ parameter as small as 10-3k T for polymer blends from molecular dynamics (MD) simulations. We obtain χ for the archetypical coarse-grained model of nonpolar polymer blends: flexible bead-spring chains with different Lennard-Jones interactions between A and B monomers. Using these χ values and a lattice version of self-consistent field theory (SCFT), we predict the shape of planar interfaces for phase-separated binary blends. Our SCFT results agree with MD simulations, validating both the predicted χ values and our thermodynamic integration method. Combined with atomistic simulations, our method can be applied to predict χ for new polymers from their chemical structures.
Tracking perturbations in Boolean networks with spectral methods

NASA Astrophysics Data System (ADS)

Kesseli, Juha; Rämö, Pauli; Yli-Harja, Olli

2005-08-01

In this paper we present a method for predicting the spread of perturbations in Boolean networks. The method is applicable to networks that have no regular topology. The prediction of perturbations can be performed easily by using a presented result which enables the efficient computation of the required iterative formulas. This result is based on abstract Fourier transform of the functions in the network. In this paper the method is applied to show the spread of perturbations in networks containing a distribution of functions found from biological data. The advances in the study of the spread of perturbations can directly be applied to enable ways of quantifying chaos in Boolean networks. Derrida plots over an arbitrary number of time steps can be computed and thus distributions of functions compared with each other with respect to the amount of order they create in random networks.
Identification of novel plant peroxisomal targeting signals by a combination of machine learning methods and in vivo subcellular targeting analyses.

PubMed

Lingner, Thomas; Kataya, Amr R; Antonicelli, Gerardo E; Benichou, Aline; Nilssen, Kjersti; Chen, Xiong-Yan; Siemsen, Tanja; Morgenstern, Burkhard; Meinicke, Peter; Reumann, Sigrun

2011-04-01

In the postgenomic era, accurate prediction tools are essential for identification of the proteomes of cell organelles. Prediction methods have been developed for peroxisome-targeted proteins in animals and fungi but are missing specifically for plants. For development of a predictor for plant proteins carrying peroxisome targeting signals type 1 (PTS1), we assembled more than 2500 homologous plant sequences, mainly from EST databases. We applied a discriminative machine learning approach to derive two different prediction methods, both of which showed high prediction accuracy and recognized specific targeting-enhancing patterns in the regions upstream of the PTS1 tripeptides. Upon application of these methods to the Arabidopsis thaliana genome, 392 gene models were predicted to be peroxisome targeted. These predictions were extensively tested in vivo, resulting in a high experimental verification rate of Arabidopsis proteins previously not known to be peroxisomal. The prediction methods were able to correctly infer novel PTS1 tripeptides, which even included novel residues. Twenty-three newly predicted PTS1 tripeptides were experimentally confirmed, and a high variability of the plant PTS1 motif was discovered. These prediction methods will be instrumental in identifying low-abundance and stress-inducible peroxisomal proteins and defining the entire peroxisomal proteome of Arabidopsis and agronomically important crop plants.
A novel method for landslide displacement prediction by integrating advanced computational intelligence algorithms.

PubMed

Zhou, Chao; Yin, Kunlong; Cao, Ying; Ahmed, Bayes; Fu, Xiaolin

2018-05-08

Landslide displacement prediction is considered as an essential component for developing early warning systems. The modelling of conventional forecast methods requires enormous monitoring data that limit its application. To conduct accurate displacement prediction with limited data, a novel method is proposed and applied by integrating three computational intelligence algorithms namely: the wavelet transform (WT), the artificial bees colony (ABC), and the kernel-based extreme learning machine (KELM). At first, the total displacement was decomposed into several sub-sequences with different frequencies using the WT. Next each sub-sequence was predicted separately by the KELM whose parameters were optimized by the ABC. Finally the predicted total displacement was obtained by adding all the predicted sub-sequences. The Shuping landslide in the Three Gorges Reservoir area in China was taken as a case study. The performance of the new method was compared with the WT-ELM, ABC-KELM, ELM, and the support vector machine (SVM) methods. Results show that the prediction accuracy can be improved by decomposing the total displacement into sub-sequences with various frequencies and by predicting them separately. The ABC-KELM algorithm shows the highest prediction capacity followed by the ELM and SVM. Overall, the proposed method achieved excellent performance both in terms of accuracy and stability.
A better way to evaluate remote monitoring programs in chronic disease care: receiver operating characteristic analysis.

PubMed

Brown Connolly, Nancy E

2014-12-01

This foundational study applies the process of receiver operating characteristic (ROC) analysis to evaluate utility and predictive value of a disease management (DM) model that uses RM devices for chronic obstructive pulmonary disease (COPD). The literature identifies a need for a more rigorous method to validate and quantify evidence-based value for remote monitoring (RM) systems being used to monitor persons with a chronic disease. ROC analysis is an engineering approach widely applied in medical testing, but that has not been evaluated for its utility in RM. Classifiers (saturated peripheral oxygen [SPO2], blood pressure [BP], and pulse), optimum threshold, and predictive accuracy are evaluated based on patient outcomes. Parametric and nonparametric methods were used. Event-based patient outcomes included inpatient hospitalization, accident and emergency, and home health visits. Statistical analysis tools included Microsoft (Redmond, WA) Excel(®) and MedCalc(®) (MedCalc Software, Ostend, Belgium) version 12 © 1993-2013 to generate ROC curves and statistics. Persons with COPD were monitored a minimum of 183 days, with at least one inpatient hospitalization within 12 months prior to monitoring. Retrospective, de-identified patient data from a United Kingdom National Health System COPD program were used. Datasets included biometric readings, alerts, and resource utilization. SPO2 was identified as a predictive classifier, with an optimal average threshold setting of 85-86%. BP and pulse were failed classifiers, and areas of design were identified that may improve utility and predictive capacity. Cost avoidance methodology was developed. RESULTS can be applied to health services planning decisions. Methods can be applied to system design and evaluation based on patient outcomes. This study validated the use of ROC in RM program evaluation.
Molecular gas dynamics applied to low-thrust propulsion

NASA Astrophysics Data System (ADS)

Zelesnik, Donna; Penko, Paul F.; Boyd, Iain D.

1993-11-01

The Direct Simulation Monte Carlo method is currently being applied to study flowfields of small thrusters, including both the internal nozzle and the external plume flow. The DSMC method is employed because of its inherent ability to capture nonequilibrium effects and proper boundary physics in low-density flow that are not readily obtained by continuum methods. Accurate prediction of both the internal and external nozzle flow is important in determining plume expansion which, in turn, bears directly on impingement and contamination effects.
Molecular gas dynamics applied to low-thrust propulsion

NASA Technical Reports Server (NTRS)

Zelesnik, Donna; Penko, Paul F.; Boyd, Iain D.

1993-01-01

The Direct Simulation Monte Carlo method is currently being applied to study flowfields of small thrusters, including both the internal nozzle and the external plume flow. The DSMC method is employed because of its inherent ability to capture nonequilibrium effects and proper boundary physics in low-density flow that are not readily obtained by continuum methods. Accurate prediction of both the internal and external nozzle flow is important in determining plume expansion which, in turn, bears directly on impingement and contamination effects.
Complete fold annotation of the human proteome using a novel structural feature space.

PubMed

Middleton, Sarah A; Illuminati, Joseph; Kim, Junhyong

2017-04-13

Recognition of protein structural fold is the starting point for many structure prediction tools and protein function inference. Fold prediction is computationally demanding and recognizing novel folds is difficult such that the majority of proteins have not been annotated for fold classification. Here we describe a new machine learning approach using a novel feature space that can be used for accurate recognition of all 1,221 currently known folds and inference of unknown novel folds. We show that our method achieves better than 94% accuracy even when many folds have only one training example. We demonstrate the utility of this method by predicting the folds of 34,330 human protein domains and showing that these predictions can yield useful insights into potential biological function, such as prediction of RNA-binding ability. Our method can be applied to de novo fold prediction of entire proteomes and identify candidate novel fold families.
Complete fold annotation of the human proteome using a novel structural feature space

PubMed Central

Middleton, Sarah A.; Illuminati, Joseph; Kim, Junhyong

2017-01-01

Recognition of protein structural fold is the starting point for many structure prediction tools and protein function inference. Fold prediction is computationally demanding and recognizing novel folds is difficult such that the majority of proteins have not been annotated for fold classification. Here we describe a new machine learning approach using a novel feature space that can be used for accurate recognition of all 1,221 currently known folds and inference of unknown novel folds. We show that our method achieves better than 94% accuracy even when many folds have only one training example. We demonstrate the utility of this method by predicting the folds of 34,330 human protein domains and showing that these predictions can yield useful insights into potential biological function, such as prediction of RNA-binding ability. Our method can be applied to de novo fold prediction of entire proteomes and identify candidate novel fold families. PMID:28406174
Integrated Detection and Prediction of Influenza Activity for Real-Time Surveillance: Algorithm Design

PubMed Central

2017-01-01

Background Influenza is a viral respiratory disease capable of causing epidemics that represent a threat to communities worldwide. The rapidly growing availability of electronic “big data” from diagnostic and prediagnostic sources in health care and public health settings permits advance of a new generation of methods for local detection and prediction of winter influenza seasons and influenza pandemics. Objective The aim of this study was to present a method for integrated detection and prediction of influenza virus activity in local settings using electronically available surveillance data and to evaluate its performance by retrospective application on authentic data from a Swedish county. Methods An integrated detection and prediction method was formally defined based on a design rationale for influenza detection and prediction methods adapted for local surveillance. The novel method was retrospectively applied on data from the winter influenza season 2008-09 in a Swedish county (population 445,000). Outcome data represented individuals who met a clinical case definition for influenza (based on International Classification of Diseases version 10 [ICD-10] codes) from an electronic health data repository. Information from calls to a telenursing service in the county was used as syndromic data source. Results The novel integrated detection and prediction method is based on nonmechanistic statistical models and is designed for integration in local health information systems. The method is divided into separate modules for detection and prediction of local influenza virus activity. The function of the detection module is to alert for an upcoming period of increased load of influenza cases on local health care (using influenza-diagnosis data), whereas the function of the prediction module is to predict the timing of the activity peak (using syndromic data) and its intensity (using influenza-diagnosis data). For detection modeling, exponential regression was used based on the assumption that the beginning of a winter influenza season has an exponential growth of infected individuals. For prediction modeling, linear regression was applied on 7-day periods at the time in order to find the peak timing, whereas a derivate of a normal distribution density function was used to find the peak intensity. We found that the integrated detection and prediction method detected the 2008-09 winter influenza season on its starting day (optimal timeliness 0 days), whereas the predicted peak was estimated to occur 7 days ahead of the factual peak and the predicted peak intensity was estimated to be 26% lower than the factual intensity (6.3 compared with 8.5 influenza-diagnosis cases/100,000). Conclusions Our detection and prediction method is one of the first integrated methods specifically designed for local application on influenza data electronically available for surveillance. The performance of the method in a retrospective study indicates that further prospective evaluations of the methods are justified. PMID:28619700
The Fractional Step Method Applied to Simulations of Natural Convective Flows

NASA Technical Reports Server (NTRS)

Westra, Douglas G.; Heinrich, Juan C.; Saxon, Jeff (Technical Monitor)

2002-01-01

This paper describes research done to apply the Fractional Step Method to finite-element simulations of natural convective flows in pure liquids, permeable media, and in a directionally solidified metal alloy casting. The Fractional Step Method has been applied commonly to high Reynold's number flow simulations, but is less common for low Reynold's number flows, such as natural convection in liquids and in permeable media. The Fractional Step Method offers increased speed and reduced memory requirements by allowing non-coupled solution of the pressure and the velocity components. The Fractional Step Method has particular benefits for predicting flows in a directionally solidified alloy, since other methods presently employed are not very efficient. Previously, the most suitable method for predicting flows in a directionally solidified binary alloy was the penalty method. The penalty method requires direct matrix solvers, due to the penalty term. The Fractional Step Method allows iterative solution of the finite element stiffness matrices, thereby allowing more efficient solution of the matrices. The Fractional Step Method also lends itself to parallel processing, since the velocity component stiffness matrices can be built and solved independently of each other. The finite-element simulations of a directionally solidified casting are used to predict macrosegregation in directionally solidified castings. In particular, the finite-element simulations predict the existence of 'channels' within the processing mushy zone and subsequently 'freckles' within the fully processed solid, which are known to result from macrosegregation, or what is often referred to as thermo-solutal convection. These freckles cause material property non-uniformities in directionally solidified castings; therefore many of these castings are scrapped. The phenomenon of natural convection in an alloy under-going directional solidification, or thermo-solutal convection, will be explained. The development of the momentum and continuity equations for natural convection in a fluid, a permeable medium, and in a binary alloy undergoing directional solidification will be presented. Finally, results for natural convection in a pure liquid, natural convection in a medium with a constant permeability, and for directional solidification will be presented.
Predictive regulatory models in Drosophila melanogaster by integrative inference of transcriptional networks

PubMed Central

Marbach, Daniel; Roy, Sushmita; Ay, Ferhat; Meyer, Patrick E.; Candeias, Rogerio; Kahveci, Tamer; Bristow, Christopher A.; Kellis, Manolis

2012-01-01

Gaining insights on gene regulation from large-scale functional data sets is a grand challenge in systems biology. In this article, we develop and apply methods for transcriptional regulatory network inference from diverse functional genomics data sets and demonstrate their value for gene function and gene expression prediction. We formulate the network inference problem in a machine-learning framework and use both supervised and unsupervised methods to predict regulatory edges by integrating transcription factor (TF) binding, evolutionarily conserved sequence motifs, gene expression, and chromatin modification data sets as input features. Applying these methods to Drosophila melanogaster, we predict ∼300,000 regulatory edges in a network of ∼600 TFs and 12,000 target genes. We validate our predictions using known regulatory interactions, gene functional annotations, tissue-specific expression, protein–protein interactions, and three-dimensional maps of chromosome conformation. We use the inferred network to identify putative functions for hundreds of previously uncharacterized genes, including many in nervous system development, which are independently confirmed based on their tissue-specific expression patterns. Last, we use the regulatory network to predict target gene expression levels as a function of TF expression, and find significantly higher predictive power for integrative networks than for motif or ChIP-based networks. Our work reveals the complementarity between physical evidence of regulatory interactions (TF binding, motif conservation) and functional evidence (coordinated expression or chromatin patterns) and demonstrates the power of data integration for network inference and studies of gene regulation at the systems level. PMID:22456606
Learning predictive models that use pattern discovery--a bootstrap evaluative approach applied in organ functioning sequences.

PubMed

Toma, Tudor; Bosman, Robert-Jan; Siebes, Arno; Peek, Niels; Abu-Hanna, Ameen

2010-08-01

An important problem in the Intensive Care is how to predict on a given day of stay the eventual hospital mortality for a specific patient. A recent approach to solve this problem suggested the use of frequent temporal sequences (FTSs) as predictors. Methods following this approach were evaluated in the past by inducing a model from a training set and validating the prognostic performance on an independent test set. Although this evaluative approach addresses the validity of the specific models induced in an experiment, it falls short of evaluating the inductive method itself. To achieve this, one must account for the inherent sources of variation in the experimental design. The main aim of this work is to demonstrate a procedure based on bootstrapping, specifically the .632 bootstrap procedure, for evaluating inductive methods that discover patterns, such as FTSs. A second aim is to apply this approach to find out whether a recently suggested inductive method that discovers FTSs of organ functioning status is superior over a traditional method that does not use temporal sequences when compared on each successive day of stay at the Intensive Care Unit. The use of bootstrapping with logistic regression using pre-specified covariates is known in the statistical literature. Using inductive methods of prognostic models based on temporal sequence discovery within the bootstrap procedure is however novel at least in predictive models in the Intensive Care. Our results of applying the bootstrap-based evaluative procedure demonstrate the superiority of the FTS-based inductive method over the traditional method in terms of discrimination as well as accuracy. In addition we illustrate the insights gained by the analyst into the discovered FTSs from the bootstrap samples. Copyright 2010 Elsevier Inc. All rights reserved.
Atmospheric dispersion prediction and source estimation of hazardous gas using artificial neural network, particle swarm optimization and expectation maximization

NASA Astrophysics Data System (ADS)

Qiu, Sihang; Chen, Bin; Wang, Rongxiao; Zhu, Zhengqiu; Wang, Yuan; Qiu, Xiaogang

2018-04-01

Hazardous gas leak accident has posed a potential threat to human beings. Predicting atmospheric dispersion and estimating its source become increasingly important in emergency management. Current dispersion prediction and source estimation models cannot satisfy the requirement of emergency management because they are not equipped with high efficiency and accuracy at the same time. In this paper, we develop a fast and accurate dispersion prediction and source estimation method based on artificial neural network (ANN), particle swarm optimization (PSO) and expectation maximization (EM). The novel method uses a large amount of pre-determined scenarios to train the ANN for dispersion prediction, so that the ANN can predict concentration distribution accurately and efficiently. PSO and EM are applied for estimating the source parameters, which can effectively accelerate the process of convergence. The method is verified by the Indianapolis field study with a SF6 release source. The results demonstrate the effectiveness of the method.

WegoLoc: accurate prediction of protein subcellular localization using weighted Gene Ontology terms.

PubMed

Chi, Sang-Mun; Nam, Dougu

2012-04-01

We present an accurate and fast web server, WegoLoc for predicting subcellular localization of proteins based on sequence similarity and weighted Gene Ontology (GO) information. A term weighting method in the text categorization process is applied to GO terms for a support vector machine classifier. As a result, WegoLoc surpasses the state-of-the-art methods for previously used test datasets. WegoLoc supports three eukaryotic kingdoms (animals, fungi and plants) and provides human-specific analysis, and covers several sets of cellular locations. In addition, WegoLoc provides (i) multiple possible localizations of input protein(s) as well as their corresponding probability scores, (ii) weights of GO terms representing the contribution of each GO term in the prediction, and (iii) a BLAST E-value for the best hit with GO terms. If the similarity score does not meet a given threshold, an amino acid composition-based prediction is applied as a backup method. WegoLoc and User's guide are freely available at the website http://www.btool.org/WegoLoc smchiks@ks.ac.kr; dougnam@unist.ac.kr Supplementary data is available at http://www.btool.org/WegoLoc.
Multiple grid arrangement improves ligand docking with unknown binding sites: Application to the inverse docking problem.

PubMed

Ban, Tomohiro; Ohue, Masahito; Akiyama, Yutaka

2018-04-01

The identification of comprehensive drug-target interactions is important in drug discovery. Although numerous computational methods have been developed over the years, a gold standard technique has not been established. Computational ligand docking and structure-based drug design allow researchers to predict the binding affinity between a compound and a target protein, and thus, they are often used to virtually screen compound libraries. In addition, docking techniques have also been applied to the virtual screening of target proteins (inverse docking) to predict target proteins of a drug candidate. Nevertheless, a more accurate docking method is currently required. In this study, we proposed a method in which a predicted ligand-binding site is covered by multiple grids, termed multiple grid arrangement. Notably, multiple grid arrangement facilitates the conformational search for a grid-based ligand docking software and can be applied to the state-of-the-art commercial docking software Glide (Schrödinger, LLC). We validated the proposed method by re-docking with the Astex diverse benchmark dataset and blind binding site situations, which improved the correct prediction rate of the top scoring docking pose from 27.1% to 34.1%; however, only a slight improvement in target prediction accuracy was observed with inverse docking scenarios. These findings highlight the limitations and challenges of current scoring functions and the need for more accurate docking methods. The proposed multiple grid arrangement method was implemented in Glide by modifying a cross-docking script for Glide, xglide.py. The script of our method is freely available online at http://www.bi.cs.titech.ac.jp/mga_glide/. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.
Constrained off-line synthesis approach of model predictive control for networked control systems with network-induced delays.

PubMed

Tang, Xiaoming; Qu, Hongchun; Wang, Ping; Zhao, Meng

2015-03-01

This paper investigates the off-line synthesis approach of model predictive control (MPC) for a class of networked control systems (NCSs) with network-induced delays. A new augmented model which can be readily applied to time-varying control law, is proposed to describe the NCS where bounded deterministic network-induced delays may occur in both sensor to controller (S-A) and controller to actuator (C-A) links. Based on this augmented model, a sufficient condition of the closed-loop stability is derived by applying the Lyapunov method. The off-line synthesis approach of model predictive control is addressed using the stability results of the system, which explicitly considers the satisfaction of input and state constraints. Numerical example is given to illustrate the effectiveness of the proposed method. Copyright © 2014 ISA. Published by Elsevier Ltd. All rights reserved.
Model predictive control based on reduced order models applied to belt conveyor system.

PubMed

Chen, Wei; Li, Xin

2016-11-01

In the paper, a model predictive controller based on reduced order model is proposed to control belt conveyor system, which is an electro-mechanics complex system with long visco-elastic body. Firstly, in order to design low-degree controller, the balanced truncation method is used for belt conveyor model reduction. Secondly, MPC algorithm based on reduced order model for belt conveyor system is presented. Because of the error bound between the full-order model and reduced order model, two Kalman state estimators are applied in the control scheme to achieve better system performance. Finally, the simulation experiments are shown that balanced truncation method can significantly reduce the model order with high-accuracy and model predictive control based on reduced-model performs well in controlling the belt conveyor system. Copyright © 2016 ISA. Published by Elsevier Ltd. All rights reserved.
Predictability of Extreme Climate Events via a Complex Network Approach

NASA Astrophysics Data System (ADS)

Muhkin, D.; Kurths, J.

2017-12-01

We analyse climate dynamics from a complex network approach. This leads to an inverse problem: Is there a backbone-like structure underlying the climate system? For this we propose a method to reconstruct and analyze a complex network from data generated by a spatio-temporal dynamical system. This approach enables us to uncover relations to global circulation patterns in oceans and atmosphere. This concept is then applied to Monsoon data; in particular, we develop a general framework to predict extreme events by combining a non-linear synchronization technique with complex networks. Applying this method, we uncover a new mechanism of extreme floods in the eastern Central Andes which could be used for operational forecasts. Moreover, we analyze the Indian Summer Monsoon (ISM) and identify two regions of high importance. By estimating an underlying critical point, this leads to an improved prediction of the onset of the ISM; this scheme was successful in 2016 and 2017.
Evaluation of image features and classification methods for Barrett's cancer detection using VLE imaging

NASA Astrophysics Data System (ADS)

Klomp, Sander; van der Sommen, Fons; Swager, Anne-Fré; Zinger, Svitlana; Schoon, Erik J.; Curvers, Wouter L.; Bergman, Jacques J.; de With, Peter H. N.

2017-03-01

Volumetric Laser Endomicroscopy (VLE) is a promising technique for the detection of early neoplasia in Barrett's Esophagus (BE). VLE generates hundreds of high resolution, grayscale, cross-sectional images of the esophagus. However, at present, classifying these images is a time consuming and cumbersome effort performed by an expert using a clinical prediction model. This paper explores the feasibility of using computer vision techniques to accurately predict the presence of dysplastic tissue in VLE BE images. Our contribution is threefold. First, a benchmarking is performed for widely applied machine learning techniques and feature extraction methods. Second, three new features based on the clinical detection model are proposed, having superior classification accuracy and speed, compared to earlier work. Third, we evaluate automated parameter tuning by applying simple grid search and feature selection methods. The results are evaluated on a clinically validated dataset of 30 dysplastic and 30 non-dysplastic VLE images. Optimal classification accuracy is obtained by applying a support vector machine and using our modified Haralick features and optimal image cropping, obtaining an area under the receiver operating characteristic of 0.95 compared to the clinical prediction model at 0.81. Optimal execution time is achieved using a proposed mean and median feature, which is extracted at least factor 2.5 faster than alternative features with comparable performance.
Multimodel predictive system for carbon dioxide solubility in saline formation waters.

PubMed

Wang, Zan; Small, Mitchell J; Karamalidis, Athanasios K

2013-02-05

The prediction of carbon dioxide solubility in brine at conditions relevant to carbon sequestration (i.e., high temperature, pressure, and salt concentration (T-P-X)) is crucial when this technology is applied. Eleven mathematical models for predicting CO(2) solubility in brine are compared and considered for inclusion in a multimodel predictive system. Model goodness of fit is evaluated over the temperature range 304-433 K, pressure range 74-500 bar, and salt concentration range 0-7 m (NaCl equivalent), using 173 published CO(2) solubility measurements, particularly selected for those conditions. The performance of each model is assessed using various statistical methods, including the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC). Different models emerge as best fits for different subranges of the input conditions. A classification tree is generated using machine learning methods to predict the best-performing model under different T-P-X subranges, allowing development of a multimodel predictive system (MMoPS) that selects and applies the model expected to yield the most accurate CO(2) solubility prediction. Statistical analysis of the MMoPS predictions, including a stratified 5-fold cross validation, shows that MMoPS outperforms each individual model and increases the overall accuracy of CO(2) solubility prediction across the range of T-P-X conditions likely to be encountered in carbon sequestration applications.
Using Chemoinformatics, Bioinformatics, and Bioassay to Predict and Explain the Antibacterial Activity of Nonantibiotic Food and Drug Administration Drugs.

PubMed

Kahlous, Nour Aldin; Bawarish, Muhammad Al Mohdi; Sarhan, Muhammad Arabi; Küpper, Manfred; Hasaba, Ali; Rajab, Mazen

2017-04-01

Discovering of new and effective antibiotics is a major issue facing scientists today. Luckily, the development of computer science offers new methods to overcome this issue. In this study, a set of computer software was used to predict the antibacterial activity of nonantibiotic Food and Drug Administration (FDA)-approved drugs, and to explain their action by possible binding to well-known bacterial protein targets, along with testing their antibacterial activity against Gram-positive and Gram-negative bacteria. A three-dimensional virtual screening method that relies on chemical and shape similarity was applied using rapid overlay of chemical structures (ROCS) software to select candidate compounds from the FDA-approved drugs database that share similarity with 17 known antibiotics. Then, to check their antibacterial activity, disk diffusion test was applied on Staphylococcus aureus and Escherichia coli. Finally, a protein docking method was applied using HYBRID software to predict the binding of the active candidate to the target receptor of its similar antibiotic. Of the 1,991 drugs that were screened, 34 had been selected and among them 10 drugs showed antibacterial activity, whereby drotaverine and metoclopramide activities were without precedent reports. Furthermore, the docking process predicted that diclofenac, drotaverine, (S)-flurbiprofen, (S)-ibuprofen, and indomethacin could bind to the protein target of their similar antibiotics. Nevertheless, their antibacterial activities are weak compared with those of their similar antibiotics, which can be potentiated further by performing chemical modifications on their structure.
Strainrange partitioning behavior of the nickel-base superalloys, Rene' 80 and in 100

NASA Technical Reports Server (NTRS)

Halford, G. R.; Nachtigall, A. J.

1978-01-01

A study was made to assess the ability of the method of Strainrange Partitioning (SRP) to both correlate and predict high-temperature, low cycle fatigue lives of nickel base superalloys for gas turbine applications. The partitioned strainrange versus life relationships for uncoated Rene' 80 and cast IN 100 were also determined from the ductility normalized-Strainrange Partitioning equations. These were used to predict the cyclic lives of the baseline tests. The life predictability of the method was verified for cast IN 100 by applying the baseline results to the cyclic life prediction of a series of complex strain cycling tests with multiple hold periods at constant strain. It was concluded that the method of SRP can correlate and predict the cyclic lives of laboratory specimens of the nickel base superalloys evaluated in this program.
Study on Development of 1D-2D Coupled Real-time Urban Inundation Prediction model

NASA Astrophysics Data System (ADS)

Lee, Seungsoo

2017-04-01

In recent years, we are suffering abnormal weather condition due to climate change around the world. Therefore, countermeasures for flood defense are urgent task. In this research, study on development of 1D-2D coupled real-time urban inundation prediction model using predicted precipitation data based on remote sensing technology is conducted. 1 dimensional (1D) sewerage system analysis model which was introduced by Lee et al. (2015) is used to simulate inlet and overflow phenomena by interacting with surface flown as well as flows in conduits. 2 dimensional (2D) grid mesh refinement method is applied to depict road networks for effective calculation time. 2D surface model is coupled with 1D sewerage analysis model in order to consider bi-directional flow between both. Also parallel computing method, OpenMP, is applied to reduce calculation time. The model is estimated by applying to 25 August 2014 extreme rainfall event which caused severe inundation damages in Busan, Korea. Oncheoncheon basin is selected for study basin and observed radar data are assumed as predicted rainfall data. The model shows acceptable calculation speed with accuracy. Therefore it is expected that the model can be used for real-time urban inundation forecasting system to minimize damages.
Expected number of asbestos-related lung cancers in the Netherlands in the next two decades: a comparison of methods.

PubMed

Van der Bij, Sjoukje; Vermeulen, Roel C H; Portengen, Lützen; Moons, Karel G M; Koffijberg, Hendrik

2016-05-01

Exposure to asbestos fibres increases the risk of mesothelioma and lung cancer. Although the vast majority of mesothelioma cases are caused by asbestos exposure, the number of asbestos-related lung cancers is less clear. This number cannot be determined directly as lung cancer causes are not clinically distinguishable but may be estimated using varying modelling methods. We applied three different modelling methods to the Dutch population supplemented with uncertainty ranges (UR) due to uncertainty in model input values. The first method estimated asbestos-related lung cancer cases directly from observed and predicted mesothelioma cases in an age-period-cohort analysis. The second method used evidence on the fraction of lung cancer cases attributable (population attributable risk (PAR)) to asbestos exposure. The third method incorporated risk estimates and population exposure estimates to perform a life table analysis. The three methods varied substantially in incorporated evidence. Moreover, the estimated number of asbestos-related lung cancer cases in the Netherlands between 2011 and 2030 depended crucially on the actual method applied, as the mesothelioma method predicts 17 500 expected cases (UR 7000-57 000), the PAR method predicts 12 150 cases (UR 6700-19 000), and the life table analysis predicts 6800 cases (UR 6800-33 850). The three different methods described resulted in absolute estimates varying by a factor of ∼2.5. These results show that accurate estimation of the impact of asbestos exposure on the lung cancer burden remains a challenge. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
A modified artificial neural network based prediction technique for tropospheric radio refractivity

PubMed Central

Javeed, Shumaila; Javed, Wajahat; Atif, M.; Uddin, Mueen

2018-01-01

Radio refractivity plays a significant role in the development and design of radio systems for attaining the best level of performance. Refractivity in the troposphere is one of the features affecting electromagnetic waves, and hence the communication system interrupts. In this work, a modified artificial neural network (ANN) based model is applied to predict the refractivity. The suggested ANN model comprises three modules: the data preparation module, the feature selection module, and the forecast module. The first module applies pre-processing to make the data compatible for the feature selection module. The second module discards irrelevant and redundant data from the input set. The third module uses ANN for prediction. The ANN model applies a sigmoid activation function and a multi-variate auto regressive model to update the weights during the training process. In this work, the refractivity is predicted and estimated based on ten years (2002–2011) of meteorological data, such as the temperature, pressure, and humidity, obtained from the Pakistan Meteorological Department (PMD), Islamabad. The refractivity is estimated using the method suggested by the International Telecommunication Union (ITU). The refractivity is predicted for the year 2012 using the database of the previous ten years, with the help of ANN. The ANN model is implemented in MATLAB. Next, the estimated and predicted refractivity levels are validated against each other. The predicted and actual values (PMD data) of the atmospheric parameters agree with each other well, and demonstrate the accuracy of the proposed ANN method. It was further found that all parameters have a strong relationship with refractivity, in particular the temperature and humidity. The refractivity values are higher during the rainy season owing to a strong association with the relative humidity. Therefore, it is important to properly cater the signal communication system during hot and humid weather. Based on the results, the proposed ANN method can be used to develop a refractivity database, which is highly important in a radio communication system. PMID:29494609
MetaKTSP: a meta-analytic top scoring pair method for robust cross-study validation of omics prediction analysis.

PubMed

Kim, SungHwan; Lin, Chien-Wei; Tseng, George C

2016-07-01

Supervised machine learning is widely applied to transcriptomic data to predict disease diagnosis, prognosis or survival. Robust and interpretable classifiers with high accuracy are usually favored for their clinical and translational potential. The top scoring pair (TSP) algorithm is an example that applies a simple rank-based algorithm to identify rank-altered gene pairs for classifier construction. Although many classification methods perform well in cross-validation of single expression profile, the performance usually greatly reduces in cross-study validation (i.e. the prediction model is established in the training study and applied to an independent test study) for all machine learning methods, including TSP. The failure of cross-study validation has largely diminished the potential translational and clinical values of the models. The purpose of this article is to develop a meta-analytic top scoring pair (MetaKTSP) framework that combines multiple transcriptomic studies and generates a robust prediction model applicable to independent test studies. We proposed two frameworks, by averaging TSP scores or by combining P-values from individual studies, to select the top gene pairs for model construction. We applied the proposed methods in simulated data sets and three large-scale real applications in breast cancer, idiopathic pulmonary fibrosis and pan-cancer methylation. The result showed superior performance of cross-study validation accuracy and biomarker selection for the new meta-analytic framework. In conclusion, combining multiple omics data sets in the public domain increases robustness and accuracy of the classification model that will ultimately improve disease understanding and clinical treatment decisions to benefit patients. An R package MetaKTSP is available online. (http://tsenglab.biostat.pitt.edu/software.htm). ctseng@pitt.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Very-short-term wind power prediction by a hybrid model with single- and multi-step approaches

NASA Astrophysics Data System (ADS)

Mohammed, E.; Wang, S.; Yu, J.

2017-05-01

Very-short-term wind power prediction (VSTWPP) has played an essential role for the operation of electric power systems. This paper aims at improving and applying a hybrid method of VSTWPP based on historical data. The hybrid method is combined by multiple linear regressions and least square (MLR&LS), which is intended for reducing prediction errors. The predicted values are obtained through two sub-processes:1) transform the time-series data of actual wind power into the power ratio, and then predict the power ratio;2) use the predicted power ratio to predict the wind power. Besides, the proposed method can include two prediction approaches: single-step prediction (SSP) and multi-step prediction (MSP). WPP is tested comparatively by auto-regressive moving average (ARMA) model from the predicted values and errors. The validity of the proposed hybrid method is confirmed in terms of error analysis by using probability density function (PDF), mean absolute percent error (MAPE) and means square error (MSE). Meanwhile, comparison of the correlation coefficients between the actual values and the predicted values for different prediction times and window has confirmed that MSP approach by using the hybrid model is the most accurate while comparing to SSP approach and ARMA. The MLR&LS is accurate and promising for solving problems in WPP.
Using a topographic index to distribute variable source area runoff predicted with the SCS curve-number equation

NASA Astrophysics Data System (ADS)

Lyon, Steve W.; Walter, M. Todd; Gérard-Marchant, Pierre; Steenhuis, Tammo S.

2004-10-01

Because the traditional Soil Conservation Service curve-number (SCS-CN) approach continues to be used ubiquitously in water quality models, new application methods are needed that are consistent with variable source area (VSA) hydrological processes in the landscape. We developed and tested a distributed approach for applying the traditional SCS-CN equation to watersheds where VSA hydrology is a dominant process. Predicting the location of source areas is important for watershed planning because restricting potentially polluting activities from runoff source areas is fundamental to controlling non-point-source pollution. The method presented here used the traditional SCS-CN approach to predict runoff volume and spatial extent of saturated areas and a topographic index, like that used in TOPMODEL, to distribute runoff source areas through watersheds. The resulting distributed CN-VSA method was applied to two subwatersheds of the Delaware basin in the Catskill Mountains region of New York State and one watershed in south-eastern Australia to produce runoff-probability maps. Observed saturated area locations in the watersheds agreed with the distributed CN-VSA method. Results showed good agreement with those obtained from the previously validated soil moisture routing (SMR) model. When compared with the traditional SCS-CN method, the distributed CN-VSA method predicted a similar total volume of runoff, but vastly different locations of runoff generation. Thus, the distributed CN-VSA approach provides a physically based method that is simple enough to be incorporated into water quality models, and other tools that currently use the traditional SCS-CN method, while still adhering to the principles of VSA hydrology.
Subcellular location prediction of proteins using support vector machines with alignment of block sequences utilizing amino acid composition.

PubMed

Tamura, Takeyuki; Akutsu, Tatsuya

2007-11-30

Subcellular location prediction of proteins is an important and well-studied problem in bioinformatics. This is a problem of predicting which part in a cell a given protein is transported to, where an amino acid sequence of the protein is given as an input. This problem is becoming more important since information on subcellular location is helpful for annotation of proteins and genes and the number of complete genomes is rapidly increasing. Since existing predictors are based on various heuristics, it is important to develop a simple method with high prediction accuracies. In this paper, we propose a novel and general predicting method by combining techniques for sequence alignment and feature vectors based on amino acid composition. We implemented this method with support vector machines on plant data sets extracted from the TargetP database. Through fivefold cross validation tests, the obtained overall accuracies and average MCC were 0.9096 and 0.8655 respectively. We also applied our method to other datasets including that of WoLF PSORT. Although there is a predictor which uses the information of gene ontology and yields higher accuracy than ours, our accuracies are higher than existing predictors which use only sequence information. Since such information as gene ontology can be obtained only for known proteins, our predictor is considered to be useful for subcellular location prediction of newly-discovered proteins. Furthermore, the idea of combination of alignment and amino acid frequency is novel and general so that it may be applied to other problems in bioinformatics. Our method for plant is also implemented as a web-system and available on http://sunflower.kuicr.kyoto-u.ac.jp/~tamura/slpfa.html.
Predicting carbon dioxide and energy fluxes across global FLUXNET sites with regression algorithms

DOE PAGES

Tramontana, Gianluca; Jung, Martin; Schwalm, Christopher R.; ...

2016-07-29

Spatio-temporal fields of land–atmosphere fluxes derived from data-driven models can complement simulations by process-based land surface models. While a number of strategies for empirical models with eddy-covariance flux data have been applied, a systematic intercomparison of these methods has been missing so far. In this study, we performed a cross-validation experiment for predicting carbon dioxide, latent heat, sensible heat and net radiation fluxes across different ecosystem types with 11 machine learning (ML) methods from four different classes (kernel methods, neural networks, tree methods, and regression splines). We applied two complementary setups: (1) 8-day average fluxes based on remotely sensed data andmore » (2) daily mean fluxes based on meteorological data and a mean seasonal cycle of remotely sensed variables. The patterns of predictions from different ML and experimental setups were highly consistent. There were systematic differences in performance among the fluxes, with the following ascending order: net ecosystem exchange ( R 2 < 0.5), ecosystem respiration ( R 2 > 0.6), gross primary production ( R 2> 0.7), latent heat ( R 2 > 0.7), sensible heat ( R 2 > 0.7), and net radiation ( R 2 > 0.8). The ML methods predicted the across-site variability and the mean seasonal cycle of the observed fluxes very well ( R 2 > 0.7), while the 8-day deviations from the mean seasonal cycle were not well predicted ( R 2 < 0.5). Fluxes were better predicted at forested and temperate climate sites than at sites in extreme climates or less represented by training data (e.g., the tropics). Finally, the evaluated large ensemble of ML-based models will be the basis of new global flux products.« less
Predicting carbon dioxide and energy fluxes across global FLUXNET sites with regression algorithms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tramontana, Gianluca; Jung, Martin; Schwalm, Christopher R.

Spatio-temporal fields of land–atmosphere fluxes derived from data-driven models can complement simulations by process-based land surface models. While a number of strategies for empirical models with eddy-covariance flux data have been applied, a systematic intercomparison of these methods has been missing so far. In this study, we performed a cross-validation experiment for predicting carbon dioxide, latent heat, sensible heat and net radiation fluxes across different ecosystem types with 11 machine learning (ML) methods from four different classes (kernel methods, neural networks, tree methods, and regression splines). We applied two complementary setups: (1) 8-day average fluxes based on remotely sensed data andmore » (2) daily mean fluxes based on meteorological data and a mean seasonal cycle of remotely sensed variables. The patterns of predictions from different ML and experimental setups were highly consistent. There were systematic differences in performance among the fluxes, with the following ascending order: net ecosystem exchange ( R 2 < 0.5), ecosystem respiration ( R 2 > 0.6), gross primary production ( R 2> 0.7), latent heat ( R 2 > 0.7), sensible heat ( R 2 > 0.7), and net radiation ( R 2 > 0.8). The ML methods predicted the across-site variability and the mean seasonal cycle of the observed fluxes very well ( R 2 > 0.7), while the 8-day deviations from the mean seasonal cycle were not well predicted ( R 2 < 0.5). Fluxes were better predicted at forested and temperate climate sites than at sites in extreme climates or less represented by training data (e.g., the tropics). Finally, the evaluated large ensemble of ML-based models will be the basis of new global flux products.« less
Integrated Detection and Prediction of Influenza Activity for Real-Time Surveillance: Algorithm Design.

PubMed

Spreco, Armin; Eriksson, Olle; Dahlström, Örjan; Cowling, Benjamin John; Timpka, Toomas

2017-06-15

Influenza is a viral respiratory disease capable of causing epidemics that represent a threat to communities worldwide. The rapidly growing availability of electronic "big data" from diagnostic and prediagnostic sources in health care and public health settings permits advance of a new generation of methods for local detection and prediction of winter influenza seasons and influenza pandemics. The aim of this study was to present a method for integrated detection and prediction of influenza virus activity in local settings using electronically available surveillance data and to evaluate its performance by retrospective application on authentic data from a Swedish county. An integrated detection and prediction method was formally defined based on a design rationale for influenza detection and prediction methods adapted for local surveillance. The novel method was retrospectively applied on data from the winter influenza season 2008-09 in a Swedish county (population 445,000). Outcome data represented individuals who met a clinical case definition for influenza (based on International Classification of Diseases version 10 [ICD-10] codes) from an electronic health data repository. Information from calls to a telenursing service in the county was used as syndromic data source. The novel integrated detection and prediction method is based on nonmechanistic statistical models and is designed for integration in local health information systems. The method is divided into separate modules for detection and prediction of local influenza virus activity. The function of the detection module is to alert for an upcoming period of increased load of influenza cases on local health care (using influenza-diagnosis data), whereas the function of the prediction module is to predict the timing of the activity peak (using syndromic data) and its intensity (using influenza-diagnosis data). For detection modeling, exponential regression was used based on the assumption that the beginning of a winter influenza season has an exponential growth of infected individuals. For prediction modeling, linear regression was applied on 7-day periods at the time in order to find the peak timing, whereas a derivate of a normal distribution density function was used to find the peak intensity. We found that the integrated detection and prediction method detected the 2008-09 winter influenza season on its starting day (optimal timeliness 0 days), whereas the predicted peak was estimated to occur 7 days ahead of the factual peak and the predicted peak intensity was estimated to be 26% lower than the factual intensity (6.3 compared with 8.5 influenza-diagnosis cases/100,000). Our detection and prediction method is one of the first integrated methods specifically designed for local application on influenza data electronically available for surveillance. The performance of the method in a retrospective study indicates that further prospective evaluations of the methods are justified. ©Armin Spreco, Olle Eriksson, Örjan Dahlström, Benjamin John Cowling, Toomas Timpka. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 15.06.2017.
Degradation trend estimation of slewing bearing based on LSSVM model

NASA Astrophysics Data System (ADS)

Lu, Chao; Chen, Jie; Hong, Rongjing; Feng, Yang; Li, Yuanyuan

2016-08-01

A novel prediction method is proposed based on least squares support vector machine (LSSVM) to estimate the slewing bearing's degradation trend with small sample data. This method chooses the vibration signal which contains rich state information as the object of the study. Principal component analysis (PCA) was applied to fuse multi-feature vectors which could reflect the health state of slewing bearing, such as root mean square, kurtosis, wavelet energy entropy, and intrinsic mode function (IMF) energy. The degradation indicator fused by PCA can reflect the degradation more comprehensively and effectively. Then the degradation trend of slewing bearing was predicted by using the LSSVM model optimized by particle swarm optimization (PSO). The proposed method was demonstrated to be more accurate and effective by the whole life experiment of slewing bearing. Therefore, it can be applied in engineering practice.

Empirical source noise prediction method with application to subsonic coaxial jet mixing noise

NASA Technical Reports Server (NTRS)

Zorumski, W. E.; Weir, D. S.

1982-01-01

A general empirical method, developed for source noise predictions, uses tensor splines to represent the dependence of the acoustic field on frequency and direction and Taylor's series to represent the dependence on source state parameters. The method is applied to prediction of mixing noise from subsonic circular and coaxial jets. A noise data base of 1/3-octave-band sound pressure levels (SPL's) from 540 tests was gathered from three countries: United States, United Kingdom, and France. The SPL's depend on seven variables: frequency, polar direction angle, and five source state parameters: inner and outer nozzle pressure ratios, inner and outer stream total temperatures, and nozzle area ratio. A least-squares seven-dimensional curve fit defines a table of constants which is used for the prediction method. The resulting prediction has a mean error of 0 dB and a standard deviation of 1.2 dB. The prediction method is used to search for a coaxial jet which has the greatest coaxial noise benefit as compared with an equivalent single jet. It is found that benefits of about 6 dB are possible.
Techniques for the Enhancement of Linear Predictive Speech Coding in Adverse Conditions

NASA Astrophysics Data System (ADS)

Wrench, Alan A.

Available from UMI in association with The British Library. Requires signed TDF. The Linear Prediction model was first applied to speech two and a half decades ago. Since then it has been the subject of intense research and continues to be one of the principal tools in the analysis of speech. Its mathematical tractability makes it a suitable subject for study and its proven success in practical applications makes the study worthwhile. The model is known to be unsuited to speech corrupted by background noise. This has led many researchers to investigate ways of enhancing the speech signal prior to Linear Predictive analysis. In this thesis this body of work is extended. The chosen application is low bit-rate (2.4 kbits/sec) speech coding. For this task the performance of the Linear Prediction algorithm is crucial because there is insufficient bandwidth to encode the error between the modelled speech and the original input. A review of the fundamentals of Linear Prediction and an independent assessment of the relative performance of methods of Linear Prediction modelling are presented. A new method is proposed which is fast and facilitates stability checking, however, its stability is shown to be unacceptably poorer than existing methods. A novel supposition governing the positioning of the analysis frame relative to a voiced speech signal is proposed and supported by observation. The problem of coding noisy speech is examined. Four frequency domain speech processing techniques are developed and tested. These are: (i) Combined Order Linear Prediction Spectral Estimation; (ii) Frequency Scaling According to an Aural Model; (iii) Amplitude Weighting Based on Perceived Loudness; (iv) Power Spectrum Squaring. These methods are compared with the Recursive Linearised Maximum a Posteriori method. Following on from work done in the frequency domain, a time domain implementation of spectrum squaring is developed. In addition, a new method of power spectrum estimation is developed based on the Minimum Variance approach. This new algorithm is shown to be closely related to Linear Prediction but produces slightly broader spectral peaks. Spectrum squaring is applied to both the new algorithm and standard Linear Prediction and their relative performance is assessed. (Abstract shortened by UMI.).
Predictability of monthly temperature and precipitation using automatic time series forecasting methods

NASA Astrophysics Data System (ADS)

Papacharalampous, Georgia; Tyralis, Hristos; Koutsoyiannis, Demetris

2018-02-01

We investigate the predictability of monthly temperature and precipitation by applying automatic univariate time series forecasting methods to a sample of 985 40-year-long monthly temperature and 1552 40-year-long monthly precipitation time series. The methods include a naïve one based on the monthly values of the last year, as well as the random walk (with drift), AutoRegressive Fractionally Integrated Moving Average (ARFIMA), exponential smoothing state-space model with Box-Cox transformation, ARMA errors, Trend and Seasonal components (BATS), simple exponential smoothing, Theta and Prophet methods. Prophet is a recently introduced model inspired by the nature of time series forecasted at Facebook and has not been applied to hydrometeorological time series before, while the use of random walk, BATS, simple exponential smoothing and Theta is rare in hydrology. The methods are tested in performing multi-step ahead forecasts for the last 48 months of the data. We further investigate how different choices of handling the seasonality and non-normality affect the performance of the models. The results indicate that: (a) all the examined methods apart from the naïve and random walk ones are accurate enough to be used in long-term applications; (b) monthly temperature and precipitation can be forecasted to a level of accuracy which can barely be improved using other methods; (c) the externally applied classical seasonal decomposition results mostly in better forecasts compared to the automatic seasonal decomposition used by the BATS and Prophet methods; and (d) Prophet is competitive, especially when it is combined with externally applied classical seasonal decomposition.
Recommendations for evaluation of computational methods

NASA Astrophysics Data System (ADS)

Jain, Ajay N.; Nicholls, Anthony

2008-03-01

The field of computational chemistry, particularly as applied to drug design, has become increasingly important in terms of the practical application of predictive modeling to pharmaceutical research and development. Tools for exploiting protein structures or sets of ligands known to bind particular targets can be used for binding-mode prediction, virtual screening, and prediction of activity. A serious weakness within the field is a lack of standards with respect to quantitative evaluation of methods, data set preparation, and data set sharing. Our goal should be to report new methods or comparative evaluations of methods in a manner that supports decision making for practical applications. Here we propose a modest beginning, with recommendations for requirements on statistical reporting, requirements for data sharing, and best practices for benchmark preparation and usage.
HMMBinder: DNA-Binding Protein Prediction Using HMM Profile Based Features.

PubMed

Zaman, Rianon; Chowdhury, Shahana Yasmin; Rashid, Mahmood A; Sharma, Alok; Dehzangi, Abdollah; Shatabda, Swakkhar

2017-01-01

DNA-binding proteins often play important role in various processes within the cell. Over the last decade, a wide range of classification algorithms and feature extraction techniques have been used to solve this problem. In this paper, we propose a novel DNA-binding protein prediction method called HMMBinder. HMMBinder uses monogram and bigram features extracted from the HMM profiles of the protein sequences. To the best of our knowledge, this is the first application of HMM profile based features for the DNA-binding protein prediction problem. We applied Support Vector Machines (SVM) as a classification technique in HMMBinder. Our method was tested on standard benchmark datasets. We experimentally show that our method outperforms the state-of-the-art methods found in the literature.
International Standards on stability of digital prints

NASA Astrophysics Data System (ADS)

Adelstein, Peter Z.

2010-06-01

The International Standards Organization (ISO) is a worldwide recognized standardizing body which has responsibility for standards on permanence of digital prints. This paper is an update on the progress made to date by ISO in writing test methods in this area. Three technologies are involved, namely ink jet, dye diffusion thermal transfer (dye-sublimation) and electrophotography. Two types of test methods are possible, namely comparative tests and predictive tests. To date a comparative test on water fastness has been published and final balloting is underway on a comparative test on humidity fastness. Predictive tests are being finalized on thermal stability and pollution susceptibility. The test method on thermal stability is intended to predict the print life during normal aging. One of the testing concerns is that some prints do not show significant image change in practical testing times. The test method on pollution susceptibility only deals with ozone and assumes that the reciprocity law applies. This law assumes that a long time under a low pollutant concentration is equivalent to a short time under the high concentration used in the test procedure. Longer term studies include a predictive test for light stability and the preparation of a material specification. The latter requires a decision about the proper colour target to be used and what constitutes an unacceptable colour change. Moreover, a specification which gives a predictive life is very dependent upon the conditions the print encounters and will only apply to specific levels of temperature, ozone and light.
Linear Elastic and Cohesive Fracture Analysis to Model Hydraulic Fracture in Brittle and Ductile Rocks

NASA Astrophysics Data System (ADS)

Yao, Yao

2012-05-01

Hydraulic fracturing technology is being widely used within the oil and gas industry for both waste injection and unconventional gas production wells. It is essential to predict the behavior of hydraulic fractures accurately based on understanding the fundamental mechanism(s). The prevailing approach for hydraulic fracture modeling continues to rely on computational methods based on Linear Elastic Fracture Mechanics (LEFM). Generally, these methods give reasonable predictions for hard rock hydraulic fracture processes, but still have inherent limitations, especially when fluid injection is performed in soft rock/sand or other non-conventional formations. These methods typically give very conservative predictions on fracture geometry and inaccurate estimation of required fracture pressure. One of the reasons the LEFM-based methods fail to give accurate predictions for these materials is that the fracture process zone ahead of the crack tip and softening effect should not be neglected in ductile rock fracture analysis. A 3D pore pressure cohesive zone model has been developed and applied to predict hydraulic fracturing under fluid injection. The cohesive zone method is a numerical tool developed to model crack initiation and growth in quasi-brittle materials considering the material softening effect. The pore pressure cohesive zone model has been applied to investigate the hydraulic fracture with different rock properties. The hydraulic fracture predictions of a three-layer water injection case have been compared using the pore pressure cohesive zone model with revised parameters, LEFM-based pseudo 3D model, a Perkins-Kern-Nordgren (PKN) model, and an analytical solution. Based on the size of the fracture process zone and its effect on crack extension in ductile rock, the fundamental mechanical difference of LEFM and cohesive fracture mechanics-based methods is discussed. An effective fracture toughness method has been proposed to consider the fracture process zone effect on the ductile rock fracture.
A comparison of radiosity with current methods of sound level prediction in commercial spaces

NASA Astrophysics Data System (ADS)

Beamer, C. Walter, IV; Muehleisen, Ralph T.

2002-11-01

The ray tracing and image methods (and variations thereof) are widely used for the computation of sound fields in architectural spaces. The ray tracing and image methods are best suited for spaces with mostly specular reflecting surfaces. The radiosity method, a method based on solving a system of energy balance equations, is best applied to spaces with mainly diffusely reflective surfaces. Because very few spaces are either purely specular or purely diffuse, all methods must deal with both types of reflecting surfaces. A comparison of the radiosity method to other methods for the prediction of sound levels in commercial environments is presented. [Work supported by NSF.
A class-based link prediction using Distance Dependent Chinese Restaurant Process

NASA Astrophysics Data System (ADS)

Andalib, Azam; Babamir, Seyed Morteza

2016-08-01

One of the important tasks in relational data analysis is link prediction which has been successfully applied on many applications such as bioinformatics, information retrieval, etc. The link prediction is defined as predicting the existence or absence of edges between nodes of a network. In this paper, we propose a novel method for link prediction based on Distance Dependent Chinese Restaurant Process (DDCRP) model which enables us to utilize the information of the topological structure of the network such as shortest path and connectivity of the nodes. We also propose a new Gibbs sampling algorithm for computing the posterior distribution of the hidden variables based on the training data. Experimental results on three real-world datasets show the superiority of the proposed method over other probabilistic models for link prediction problem.
The Recalibrated Sunspot Number: Impact on Solar Cycle Predictions

NASA Astrophysics Data System (ADS)

Clette, F.; Lefevre, L.

2017-12-01

Recently and for the first time since their creation, the sunspot number and group number series were entirely revisited and a first fully recalibrated version was officially released in July 2015 by the World Data Center SILSO (Brussels). Those reference long-term series are widely used as input data or as a calibration reference by various solar cycle prediction methods. Therefore, past predictions may now need to be redone using the new sunspot series, and methods already used for predicting cycle 24 will require adaptations before attempting predictions of the next cycles.In order to clarify the nature of the applied changes, we describe the different corrections applied to the sunspot and group number series, which affect extended time periods and can reach up to 40%. While some changes simply involve constant scale factors, other corrections vary with time or follow the solar cycle modulation. Depending on the prediction method and on the selected time interval, this can lead to different responses and biases. Moreover, together with the new series, standard error estimates are also progressively added to the new sunspot numbers, which may help deriving more accurate uncertainties for predicted activity indices. We conclude on the new round of recalibration that is now undertaken in the framework of a broad multi-team collaboration articulated around upcoming ISSI workshops. We outline the future corrections that can still be expected in the future, as part of a permanent upgrading process and quality control. From now on, future sunspot-based predictive models should thus be made more adaptable, and regular updates of predictions should become common practice in order to track periodic upgrades of the sunspot number series, just like it is done when using other modern solar observational series.
Use of model calibration to achieve high accuracy in analysis of computer networks

DOEpatents

Frogner, Bjorn; Guarro, Sergio; Scharf, Guy

2004-05-11

A system and method are provided for creating a network performance prediction model, and calibrating the prediction model, through application of network load statistical analyses. The method includes characterizing the measured load on the network, which may include background load data obtained over time, and may further include directed load data representative of a transaction-level event. Probabilistic representations of load data are derived to characterize the statistical persistence of the network performance variability and to determine delays throughout the network. The probabilistic representations are applied to the network performance prediction model to adapt the model for accurate prediction of network performance. Certain embodiments of the method and system may be used for analysis of the performance of a distributed application characterized as data packet streams.
Influential factors of red-light running at signalized intersection and prediction using a rare events logistic regression model.

PubMed

Ren, Yilong; Wang, Yunpeng; Wu, Xinkai; Yu, Guizhen; Ding, Chuan

2016-10-01

Red light running (RLR) has become a major safety concern at signalized intersection. To prevent RLR related crashes, it is critical to identify the factors that significantly impact the drivers' behaviors of RLR, and to predict potential RLR in real time. In this research, 9-month's RLR events extracted from high-resolution traffic data collected by loop detectors from three signalized intersections were applied to identify the factors that significantly affect RLR behaviors. The data analysis indicated that occupancy time, time gap, used yellow time, time left to yellow start, whether the preceding vehicle runs through the intersection during yellow, and whether there is a vehicle passing through the intersection on the adjacent lane were significantly factors for RLR behaviors. Furthermore, due to the rare events nature of RLR, a modified rare events logistic regression model was developed for RLR prediction. The rare events logistic regression method has been applied in many fields for rare events studies and shows impressive performance, but so far none of previous research has applied this method to study RLR. The results showed that the rare events logistic regression model performed significantly better than the standard logistic regression model. More importantly, the proposed RLR prediction method is purely based on loop detector data collected from a single advance loop detector located 400 feet away from stop-bar. This brings great potential for future field applications of the proposed method since loops have been widely implemented in many intersections and can collect data in real time. This research is expected to contribute to the improvement of intersection safety significantly. Copyright © 2016 Elsevier Ltd. All rights reserved.
The Fatigue Approach to Vibration and Health: is it a Practical and Viable way of Predicting the Effects on People?

NASA Astrophysics Data System (ADS)

Sandover, J.

1998-08-01

The fatigue approach assumes that the vertebral end-plates are the weak link in the spine subjected to shock and vibration, and fail as a result of material fatigue. The theory assumes that end-plate damage leads to degeneration and pain in the lumbar spine. There is evidence for both the damage predicted and the fatigue mode of failure so that the approach may provide a basis for predictive methods for use in epidemiology and standards. An available data set from a variety of heavy vehicles in practical situations was used for predictions of spinal stress and fatigue life. Although there was some disparity between the predictive methods used, the more developed methods indicated fatigue lives that appeared reasonable, taking into account the vehicles tested and our knowledge of spinal degeneration. It is argued that the modelling and fatigue approaches combined offer a basis for estimating the effects of vibration and shock on health. Although the human variables are such that the approach, as yet, only offers rough estimates, it offers a good basis for understanding. The approach indicates that peak values are important and large peaks dominate risk. The method indicates that long term r.m.s. methods probably underestimate the risk of injury. The BS 6841Wband ISO 2631Wkweightings have shortcomings when used where peak values are important. A simple model may be more appropriate. The principle can be applied to continuous vibration as well as high acceleration events so that one method can be applied universally to continuous vibrations, high acceleration events and mixtures of these. An endurance limit can be hypothesised and, if this limit is sufficiently high, then the need for many measurements can be reduced.
A new mathematical solution for predicting char activation reactions

USGS Publications Warehouse

Rafsanjani, H.H.; Jamshidi, E.; Rostam-Abadi, M.

2002-01-01

The differential conservation equations that describe typical gas-solid reactions, such as activation of coal chars, yield a set of coupled second-order partial differential equations. The solution of these coupled equations by exact analytical methods is impossible. In addition, an approximate or exact solution only provides predictions for either reaction- or diffusion-controlling cases. A new mathematical solution, the quantize method (QM), was applied to predict the gasification rates of coal char when both chemical reaction and diffusion through the porous char are present. Carbon conversion rates predicted by the QM were in closer agreement with the experimental data than those predicted by the random pore model and the simple particle model. ?? 2002 Elsevier Science Ltd. All rights reserved.
Application of two direct runoff prediction methods in Puerto Rico

USGS Publications Warehouse

Sepulveda, N.

1997-01-01

Two methods for predicting direct runoff from rainfall data were applied to several basins and the resulting hydrographs compared to measured values. The first method uses a geomorphology-based unit hydrograph to predict direct runoff through its convolution with the excess rainfall hyetograph. The second method shows how the resulting hydraulic routing flow equation from a kinematic wave approximation is solved using a spectral method based on the matrix representation of the spatial derivative with Chebyshev collocation and a fourth-order Runge-Kutta time discretization scheme. The calibrated Green-Ampt (GA) infiltration parameters are obtained by minimizing the sum, over several rainfall events, of absolute differences between the total excess rainfall volume computed from the GA equations and the total direct runoff volume computed from a hydrograph separation technique. The improvement made in predicting direct runoff using a geomorphology-based unit hydrograph with the ephemeral and perennial stream network instead of the strictly perennial stream network is negligible. The hydraulic routing scheme presented here is highly accurate in predicting the magnitude and time of the hydrograph peak although the much faster unit hydrograph method also yields reasonable results.
The application of artificial neural networks and support vector regression for simultaneous spectrophotometric determination of commercial eye drop contents

NASA Astrophysics Data System (ADS)

Valizadeh, Maryam; Sohrabi, Mahmoud Reza

2018-03-01

In the present study, artificial neural networks (ANNs) and support vector regression (SVR) as intelligent methods coupled with UV spectroscopy for simultaneous quantitative determination of Dorzolamide (DOR) and Timolol (TIM) in eye drop. Several synthetic mixtures were analyzed for validating the proposed methods. At first, neural network time series, which one type of network from the artificial neural network was employed and its efficiency was evaluated. Afterwards, the radial basis network was applied as another neural network. Results showed that the performance of this method is suitable for predicting. Finally, support vector regression was proposed to construct the Zilomole prediction model. Also, root mean square error (RMSE) and mean recovery (%) were calculated for SVR method. Moreover, the proposed methods were compared to the high-performance liquid chromatography (HPLC) as a reference method. One way analysis of variance (ANOVA) test at the 95% confidence level applied to the comparison results of suggested and reference methods that there were no significant differences between them. Also, the effect of interferences was investigated in spike solutions.
Applications of alignment-free methods in epigenomics.

PubMed

Pinello, Luca; Lo Bosco, Giosuè; Yuan, Guo-Cheng

2014-05-01

Epigenetic mechanisms play an important role in the regulation of cell type-specific gene activities, yet how epigenetic patterns are established and maintained remains poorly understood. Recent studies have supported a role of DNA sequences in recruitment of epigenetic regulators. Alignment-free methods have been applied to identify distinct sequence features that are associated with epigenetic patterns and to predict epigenomic profiles. Here, we review recent advances in such applications, including the methods to map DNA sequence to feature space, sequence comparison and prediction models. Computational studies using these methods have provided important insights into the epigenetic regulatory mechanisms.
Applying a machine learning model using a locally preserving projection based feature regeneration algorithm to predict breast cancer risk

NASA Astrophysics Data System (ADS)

Heidari, Morteza; Zargari Khuzani, Abolfazl; Danala, Gopichandh; Mirniaharikandehei, Seyedehnafiseh; Qian, Wei; Zheng, Bin

2018-03-01

Both conventional and deep machine learning has been used to develop decision-support tools applied in medical imaging informatics. In order to take advantages of both conventional and deep learning approach, this study aims to investigate feasibility of applying a locally preserving projection (LPP) based feature regeneration algorithm to build a new machine learning classifier model to predict short-term breast cancer risk. First, a computer-aided image processing scheme was used to segment and quantify breast fibro-glandular tissue volume. Next, initially computed 44 image features related to the bilateral mammographic tissue density asymmetry were extracted. Then, an LLP-based feature combination method was applied to regenerate a new operational feature vector using a maximal variance approach. Last, a k-nearest neighborhood (KNN) algorithm based machine learning classifier using the LPP-generated new feature vectors was developed to predict breast cancer risk. A testing dataset involving negative mammograms acquired from 500 women was used. Among them, 250 were positive and 250 remained negative in the next subsequent mammography screening. Applying to this dataset, LLP-generated feature vector reduced the number of features from 44 to 4. Using a leave-onecase-out validation method, area under ROC curve produced by the KNN classifier significantly increased from 0.62 to 0.68 (p < 0.05) and odds ratio was 4.60 with a 95% confidence interval of [3.16, 6.70]. Study demonstrated that this new LPP-based feature regeneration approach enabled to produce an optimal feature vector and yield improved performance in assisting to predict risk of women having breast cancer detected in the next subsequent mammography screening.
The predictive protective control of the heat exchanger

NASA Astrophysics Data System (ADS)

Nevriva, Pavel; Filipova, Blanka; Vilimec, Ladislav

2016-06-01

The paper deals with the predictive control applied to flexible cogeneration energy system FES. FES was designed and developed by the VITKOVICE POWER ENGINEERING joint-stock company and represents a new solution of decentralized cogeneration energy sources. In FES, the heating medium is flue gas generated by combustion of a solid fuel. The heated medium is power gas, which is a gas mixture of air and water steam. Power gas is superheated in the main heat exchanger and led to gas turbines. To protect the main heat exchanger against damage by overheating, the novel predictive protective control based on the mathematical model of exchanger was developed. The paper describes the principle, the design and the simulation of the predictive protective method applied to main heat exchanger of FES.
Predicting the Coupling Properties of Axially-Textured Materials.

PubMed

Fuentes-Cobas, Luis E; Muñoz-Romero, Alejandro; Montero-Cabrera, María E; Fuentes-Montero, Luis; Fuentes-Montero, María E

2013-10-30

A description of methods and computer programs for the prediction of "coupling properties" in axially-textured polycrystals is presented. Starting data are the single-crystal properties, texture and stereography. The validity and proper protocols for applying the Voigt, Reuss and Hill approximations to estimate coupling properties effective values is analyzed. Working algorithms for predicting mentioned averages are given. Bunge's symmetrized spherical harmonics expansion of orientation distribution functions, inverse pole figures and (single and polycrystals) physical properties is applied in all stages of the proposed methodology. The established mathematical route has been systematized in a working computer program. The discussion of piezoelectricity in a representative textured ferro-piezoelectric ceramic illustrates the application of the proposed methodology. Polycrystal coupling properties, predicted by the suggested route, are fairly close to experimentally measured ones.

Predicting the Coupling Properties of Axially-Textured Materials

PubMed Central

Fuentes-Cobas, Luis E.; Muñoz-Romero, Alejandro; Montero-Cabrera, María E.; Fuentes-Montero, Luis; Fuentes-Montero, María E.

2013-01-01

A description of methods and computer programs for the prediction of “coupling properties” in axially-textured polycrystals is presented. Starting data are the single-crystal properties, texture and stereography. The validity and proper protocols for applying the Voigt, Reuss and Hill approximations to estimate coupling properties effective values is analyzed. Working algorithms for predicting mentioned averages are given. Bunge’s symmetrized spherical harmonics expansion of orientation distribution functions, inverse pole figures and (single and polycrystals) physical properties is applied in all stages of the proposed methodology. The established mathematical route has been systematized in a working computer program. The discussion of piezoelectricity in a representative textured ferro-piezoelectric ceramic illustrates the application of the proposed methodology. Polycrystal coupling properties, predicted by the suggested route, are fairly close to experimentally measured ones. PMID:28788370
A novel principal component analysis for spatially misaligned multivariate air pollution data.

PubMed

Jandarov, Roman A; Sheppard, Lianne A; Sampson, Paul D; Szpiro, Adam A

2017-01-01

We propose novel methods for predictive (sparse) PCA with spatially misaligned data. These methods identify principal component loading vectors that explain as much variability in the observed data as possible, while also ensuring the corresponding principal component scores can be predicted accurately by means of spatial statistics at locations where air pollution measurements are not available. This will make it possible to identify important mixtures of air pollutants and to quantify their health effects in cohort studies, where currently available methods cannot be used. We demonstrate the utility of predictive (sparse) PCA in simulated data and apply the approach to annual averages of particulate matter speciation data from national Environmental Protection Agency (EPA) regulatory monitors.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Simonetto, Andrea; Dall'Anese, Emiliano

This article develops online algorithms to track solutions of time-varying constrained optimization problems. Particularly, resembling workhorse Kalman filtering-based approaches for dynamical systems, the proposed methods involve prediction-correction steps to provably track the trajectory of the optimal solutions of time-varying convex problems. The merits of existing prediction-correction methods have been shown for unconstrained problems and for setups where computing the inverse of the Hessian of the cost function is computationally affordable. This paper addresses the limitations of existing methods by tackling constrained problems and by designing first-order prediction steps that rely on the Hessian of the cost function (and do notmore » require the computation of its inverse). In addition, the proposed methods are shown to improve the convergence speed of existing prediction-correction methods when applied to unconstrained problems. Numerical simulations corroborate the analytical results and showcase performance and benefits of the proposed algorithms. A realistic application of the proposed method to real-time control of energy resources is presented.« less
Estimation and prediction under local volatility jump-diffusion model

NASA Astrophysics Data System (ADS)

Kim, Namhyoung; Lee, Younhee

2018-02-01

Volatility is an important factor in operating a company and managing risk. In the portfolio optimization and risk hedging using the option, the value of the option is evaluated using the volatility model. Various attempts have been made to predict option value. Recent studies have shown that stochastic volatility models and jump-diffusion models reflect stock price movements accurately. However, these models have practical limitations. Combining them with the local volatility model, which is widely used among practitioners, may lead to better performance. In this study, we propose a more effective and efficient method of estimating option prices by combining the local volatility model with the jump-diffusion model and apply it using both artificial and actual market data to evaluate its performance. The calibration process for estimating the jump parameters and local volatility surfaces is divided into three stages. We apply the local volatility model, stochastic volatility model, and local volatility jump-diffusion model estimated by the proposed method to KOSPI 200 index option pricing. The proposed method displays good estimation and prediction performance.
Semi-supervised prediction of gene regulatory networks using machine learning algorithms.

PubMed

Patel, Nihir; Wang, Jason T L

2015-10-01

Use of computational methods to predict gene regulatory networks (GRNs) from gene expression data is a challenging task. Many studies have been conducted using unsupervised methods to fulfill the task; however, such methods usually yield low prediction accuracies due to the lack of training data. In this article, we propose semi-supervised methods for GRN prediction by utilizing two machine learning algorithms, namely, support vector machines (SVM) and random forests (RF). The semi-supervised methods make use of unlabelled data for training. We investigated inductive and transductive learning approaches, both of which adopt an iterative procedure to obtain reliable negative training data from the unlabelled data. We then applied our semi-supervised methods to gene expression data of Escherichia coli and Saccharomyces cerevisiae, and evaluated the performance of our methods using the expression data. Our analysis indicated that the transductive learning approach outperformed the inductive learning approach for both organisms. However, there was no conclusive difference identified in the performance of SVM and RF. Experimental results also showed that the proposed semi-supervised methods performed better than existing supervised methods for both organisms.
Predicting critical transitions in dynamical systems from time series using nonstationary probability density modeling.

PubMed

Kwasniok, Frank

2013-11-01

A time series analysis method for predicting the probability density of a dynamical system is proposed. A nonstationary parametric model of the probability density is estimated from data within a maximum likelihood framework and then extrapolated to forecast the future probability density and explore the system for critical transitions or tipping points. A full systematic account of parameter uncertainty is taken. The technique is generic, independent of the underlying dynamics of the system. The method is verified on simulated data and then applied to prediction of Arctic sea-ice extent.
Applying different independent component analysis algorithms and support vector regression for IT chain store sales forecasting.

PubMed

Dai, Wensheng; Wu, Jui-Yu; Lu, Chi-Jie

2014-01-01

Sales forecasting is one of the most important issues in managing information technology (IT) chain store sales since an IT chain store has many branches. Integrating feature extraction method and prediction tool, such as support vector regression (SVR), is a useful method for constructing an effective sales forecasting scheme. Independent component analysis (ICA) is a novel feature extraction technique and has been widely applied to deal with various forecasting problems. But, up to now, only the basic ICA method (i.e., temporal ICA model) was applied to sale forecasting problem. In this paper, we utilize three different ICA methods including spatial ICA (sICA), temporal ICA (tICA), and spatiotemporal ICA (stICA) to extract features from the sales data and compare their performance in sales forecasting of IT chain store. Experimental results from a real sales data show that the sales forecasting scheme by integrating stICA and SVR outperforms the comparison models in terms of forecasting error. The stICA is a promising tool for extracting effective features from branch sales data and the extracted features can improve the prediction performance of SVR for sales forecasting.
Applying Different Independent Component Analysis Algorithms and Support Vector Regression for IT Chain Store Sales Forecasting

PubMed Central

Dai, Wensheng

2014-01-01

Sales forecasting is one of the most important issues in managing information technology (IT) chain store sales since an IT chain store has many branches. Integrating feature extraction method and prediction tool, such as support vector regression (SVR), is a useful method for constructing an effective sales forecasting scheme. Independent component analysis (ICA) is a novel feature extraction technique and has been widely applied to deal with various forecasting problems. But, up to now, only the basic ICA method (i.e., temporal ICA model) was applied to sale forecasting problem. In this paper, we utilize three different ICA methods including spatial ICA (sICA), temporal ICA (tICA), and spatiotemporal ICA (stICA) to extract features from the sales data and compare their performance in sales forecasting of IT chain store. Experimental results from a real sales data show that the sales forecasting scheme by integrating stICA and SVR outperforms the comparison models in terms of forecasting error. The stICA is a promising tool for extracting effective features from branch sales data and the extracted features can improve the prediction performance of SVR for sales forecasting. PMID:25165740
Retreatment Predictions in Odontology by means of CBR Systems.

PubMed

Campo, Livia; Aliaga, Ignacio J; De Paz, Juan F; García, Alvaro Enrique; Bajo, Javier; Villarubia, Gabriel; Corchado, Juan M

2016-01-01

The field of odontology requires an appropriate adjustment of treatments according to the circumstances of each patient. A follow-up treatment for a patient experiencing problems from a previous procedure such as endodontic therapy, for example, may not necessarily preclude the possibility of extraction. It is therefore necessary to investigate new solutions aimed at analyzing data and, with regard to the given values, determine whether dental retreatment is required. In this work, we present a decision support system which applies the case-based reasoning (CBR) paradigm, specifically designed to predict the practicality of performing or not performing a retreatment. Thus, the system uses previous experiences to provide new predictions, which is completely innovative in the field of odontology. The proposed prediction technique includes an innovative combination of methods that minimizes false negatives to the greatest possible extent. False negatives refer to a prediction favoring a retreatment when in fact it would be ineffective. The combination of methods is performed by applying an optimization problem to reduce incorrect classifications and takes into account different parameters, such as precision, recall, and statistical probabilities. The proposed system was tested in a real environment and the results obtained are promising.
Retreatment Predictions in Odontology by means of CBR Systems

PubMed Central

Campo, Livia; Aliaga, Ignacio J.; García, Alvaro Enrique; Villarubia, Gabriel; Corchado, Juan M.

2016-01-01

The field of odontology requires an appropriate adjustment of treatments according to the circumstances of each patient. A follow-up treatment for a patient experiencing problems from a previous procedure such as endodontic therapy, for example, may not necessarily preclude the possibility of extraction. It is therefore necessary to investigate new solutions aimed at analyzing data and, with regard to the given values, determine whether dental retreatment is required. In this work, we present a decision support system which applies the case-based reasoning (CBR) paradigm, specifically designed to predict the practicality of performing or not performing a retreatment. Thus, the system uses previous experiences to provide new predictions, which is completely innovative in the field of odontology. The proposed prediction technique includes an innovative combination of methods that minimizes false negatives to the greatest possible extent. False negatives refer to a prediction favoring a retreatment when in fact it would be ineffective. The combination of methods is performed by applying an optimization problem to reduce incorrect classifications and takes into account different parameters, such as precision, recall, and statistical probabilities. The proposed system was tested in a real environment and the results obtained are promising. PMID:26884749
Can Mathematical Models Predict the Outcomes of Prostate Cancer Patients Undergoing Intermittent Androgen Deprivation Therapy?

NASA Astrophysics Data System (ADS)

Everett, R. A.; Packer, A. M.; Kuang, Y.

Androgen deprivation therapy is a common treatment for advanced or metastatic prostate cancer. Like the normal prostate, most tumors depend on androgens for proliferation and survival but often develop treatment resistance. Hormonal treatment causes many undesirable side effects which significantly decrease the quality of life for patients. Intermittently applying androgen deprivation in cycles reduces the total duration with these negative effects and may reduce selective pressure for resistance. We extend an existing model which used measurements of patient testosterone levels to accurately fit measured serum prostate specific antigen (PSA) levels. We test the model's predictive accuracy, using only a subset of the data to find parameter values. The results are compared with those of an existing piecewise linear model which does not use testosterone as an input. Since actual treatment protocol is to re-apply therapy when PSA levels recover beyond some threshold value, we develop a second method for predicting the PSA levels. Based on a small set of data from seven patients, our results showed that the piecewise linear model produced slightly more accurate results while the two predictive methods are comparable. This suggests that a simpler model may be more beneficial for a predictive use compared to a more biologically insightful model, although further research is needed in this field prior to implementing mathematical models as a predictive method in a clinical setting. Nevertheless, both models are an important step in this direction.
Can Mathematical Models Predict the Outcomes of Prostate Cancer Patients Undergoing Intermittent Androgen Deprivation Therapy?

NASA Astrophysics Data System (ADS)

Everett, R. A.; Packer, A. M.; Kuang, Y.

2014-04-01

Androgen deprivation therapy is a common treatment for advanced or metastatic prostate cancer. Like the normal prostate, most tumors depend on androgens for proliferation and survival but often develop treatment resistance. Hormonal treatment causes many undesirable side effects which significantly decrease the quality of life for patients. Intermittently applying androgen deprivation in cycles reduces the total duration with these negative effects and may reduce selective pressure for resistance. We extend an existing model which used measurements of patient testosterone levels to accurately fit measured serum prostate specific antigen (PSA) levels. We test the model's predictive accuracy, using only a subset of the data to find parameter values. The results are compared with those of an existing piecewise linear model which does not use testosterone as an input. Since actual treatment protocol is to re-apply therapy when PSA levels recover beyond some threshold value, we develop a second method for predicting the PSA levels. Based on a small set of data from seven patients, our results showed that the piecewise linear model produced slightly more accurate results while the two predictive methods are comparable. This suggests that a simpler model may be more beneficial for a predictive use compared to a more biologically insightful model, although further research is needed in this field prior to implementing mathematical models as a predictive method in a clinical setting. Nevertheless, both models are an important step in this direction.
Prediction of monthly rainfall in Victoria, Australia: Clusterwise linear regression approach

NASA Astrophysics Data System (ADS)

Bagirov, Adil M.; Mahmood, Arshad; Barton, Andrew

2017-05-01

This paper develops the Clusterwise Linear Regression (CLR) technique for prediction of monthly rainfall. The CLR is a combination of clustering and regression techniques. It is formulated as an optimization problem and an incremental algorithm is designed to solve it. The algorithm is applied to predict monthly rainfall in Victoria, Australia using rainfall data with five input meteorological variables over the period of 1889-2014 from eight geographically diverse weather stations. The prediction performance of the CLR method is evaluated by comparing observed and predicted rainfall values using four measures of forecast accuracy. The proposed method is also compared with the CLR using the maximum likelihood framework by the expectation-maximization algorithm, multiple linear regression, artificial neural networks and the support vector machines for regression models using computational results. The results demonstrate that the proposed algorithm outperforms other methods in most locations.
Variable context Markov chains for HIV protease cleavage site prediction.

PubMed

Oğul, Hasan

2009-06-01

Deciphering the knowledge of HIV protease specificity and developing computational tools for detecting its cleavage sites in protein polypeptide chain are very desirable for designing efficient and specific chemical inhibitors to prevent acquired immunodeficiency syndrome. In this study, we developed a generative model based on a generalization of variable order Markov chains (VOMC) for peptide sequences and adapted the model for prediction of their cleavability by certain proteases. The new method, called variable context Markov chains (VCMC), attempts to identify the context equivalence based on the evolutionary similarities between individual amino acids. It was applied for HIV-1 protease cleavage site prediction problem and shown to outperform existing methods in terms of prediction accuracy on a common dataset. In general, the method is a promising tool for prediction of cleavage sites of all proteases and encouraged to be used for any kind of peptide classification problem as well.
Identifying Important Attributes for Prognostic Prediction in Traumatic Brain Injury Patients. A Hybrid Method of Decision Tree and Neural Network.

PubMed

Pourahmad, Saeedeh; Hafizi-Rastani, Iman; Khalili, Hosseinali; Paydar, Shahram

2016-10-17

Generally, traumatic brain injury (TBI) patients do not have a stable condition, particularly after the first week of TBI. Hence, indicating the attributes in prognosis through a prediction model is of utmost importance since it helps caregivers with treatment-decision options, or prepares the relatives for the most-likely outcome. This study attempted to determine and order the attributes in prognostic prediction in TBI patients, based on early clinical findings. A hybrid method was employed, which combines a decision tree (DT) and an artificial neural network (ANN) in order to improve the modeling process. The DT approach was applied as the initial analysis of the network architecture to increase accuracy in prediction. Afterwards, the ANN structure was mapped from the initial DT based on a part of the data. Subsequently, the designed network was trained and validated by the remaining data. 5-fold cross-validation method was applied to train the network. The area under the receiver operating characteristic (ROC) curve, sensitivity, specificity, and accuracy rate were utilized as performance measures. The important attributes were then determined from the trained network using two methods: change of mean squared error (MSE), and sensitivity analysis (SA). The hybrid method offered better results compared to the DT method. The accuracy rate of 86.3 % vs. 82.2 %, sensitivity value of 55.1 % vs. 47.6 %, specificity value of 93.6 % vs. 91.1 %, and the area under the ROC curve of 0.705 vs. 0.695 were achieved for the hybrid method and DT, respectively. However, the attributes' order by DT method was more consistent with the clinical literature. The combination of different modeling methods can enhance their performance. However, it may create some complexities in computations and interpretations. The outcome of the present study could deliver some useful hints in prognostic prediction on the basis of early clinical findings for TBI patients.
A general-purpose machine learning framework for predicting properties of inorganic materials

DOE PAGES

Ward, Logan; Agrawal, Ankit; Choudhary, Alok; ...

2016-08-26

A very active area of materials research is to devise methods that use machine learning to automatically extract predictive models from existing materials data. While prior examples have demonstrated successful models for some applications, many more applications exist where machine learning can make a strong impact. To enable faster development of machine-learning-based models for such applications, we have created a framework capable of being applied to a broad range of materials data. Our method works by using a chemically diverse list of attributes, which we demonstrate are suitable for describing a wide variety of properties, and a novel method formore » partitioning the data set into groups of similar materials to boost the predictive accuracy. In this manuscript, we demonstrate how this new method can be used to predict diverse properties of crystalline and amorphous materials, such as band gap energy and glass-forming ability.« less
A general-purpose machine learning framework for predicting properties of inorganic materials

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ward, Logan; Agrawal, Ankit; Choudhary, Alok

A very active area of materials research is to devise methods that use machine learning to automatically extract predictive models from existing materials data. While prior examples have demonstrated successful models for some applications, many more applications exist where machine learning can make a strong impact. To enable faster development of machine-learning-based models for such applications, we have created a framework capable of being applied to a broad range of materials data. Our method works by using a chemically diverse list of attributes, which we demonstrate are suitable for describing a wide variety of properties, and a novel method formore » partitioning the data set into groups of similar materials to boost the predictive accuracy. In this manuscript, we demonstrate how this new method can be used to predict diverse properties of crystalline and amorphous materials, such as band gap energy and glass-forming ability.« less
A vortex-filament and core model for wings with edge vortex separation

NASA Technical Reports Server (NTRS)

Pao, J. L.; Lan, C. E.

1981-01-01

A method for predicting aerodynamic characteristics of slender wings with edge vortex separation was developed. Semiempirical but simple methods were used to determine the initial positions of the free sheet and vortex core. Comparison with available data indicates that: the present method is generally accurate in predicting the lift and induced drag coefficients but the predicted pitching moment is too positive; the spanwise lifting pressure distributions estimated by the one vortex core solution of the present method are significantly better than the results of Mehrotra's method relative to the pressure peak values for the flat delta; the two vortex core system applied to the double delta and strake wing produce overall aerodynamic characteristics which have good agreement with data except for the pitching moment; and the computer time for the present method is about two thirds of that of Mehrotra's method.
A vortex-filament and core model for wings with edge vortex separation

NASA Technical Reports Server (NTRS)

Pao, J. L.; Lan, C. E.

1982-01-01

A vortex filament-vortex core method for predicting aerodynamic characteristics of slender wings with edge vortex separation was developed. Semi-empirical but simple methods were used to determine the initial positions of the free sheet and vortex core. Comparison with available data indicates that: (1) the present method is generally accurate in predicting the lift and induced drag coefficients but the predicted pitching moment is too positive; (2) the spanwise lifting pressure distributions estimated by the one vortex core solution of the present method are significantly better than the results of Mehrotra's method relative to the pressure peak values for the flat delta; (3) the two vortex core system applied to the double delta and strake wings produce overall aerodynamic characteristics which have good agreement with data except for the pitching moment; and (4) the computer time for the present method is about two thirds of that of Mehrotra's method.
Wave propagation modeling in composites reinforced by randomly oriented fibers

NASA Astrophysics Data System (ADS)

Kudela, Pawel; Radzienski, Maciej; Ostachowicz, Wieslaw

2018-02-01

A new method for prediction of elastic constants in randomly oriented fiber composites is proposed. It is based on mechanics of composites, the rule of mixtures and total mass balance tailored to the spectral element mesh composed of 3D brick elements. Selected elastic properties predicted by the proposed method are compared with values obtained by another theoretical method. The proposed method is applied for simulation of Lamb waves in glass-epoxy composite plate reinforced by randomly oriented fibers. Full wavefield measurements conducted by the scanning laser Doppler vibrometer are in good agreement with simulations performed by using the time domain spectral element method.

System and Method for Providing Model-Based Alerting of Spatial Disorientation to a Pilot

NASA Technical Reports Server (NTRS)

Johnson, Steve (Inventor); Conner, Kevin J (Inventor); Mathan, Santosh (Inventor)

2015-01-01

A system and method monitor aircraft state parameters, for example, aircraft movement and flight parameters, applies those inputs to a spatial disorientation model, and makes a prediction of when pilot may become spatially disoriented. Once the system predicts a potentially disoriented pilot, the sensitivity for alerting the pilot to conditions exceeding a threshold can be increased and allow for an earlier alert to mitigate the possibility of an incorrect control input.
Analytical and Experimental Vibration Analysis of a Faulty Gear System.

DTIC Science & Technology

1994-10-01

Wigner - Ville Distribution ( WVD ) was used to give a comprehensive comparison of the predicted and...experimental results. The WVD method applied to the experimental results were also compared to other fault detection techniques to verify the WVD’s ability to...of the damaged test gear and the predicted vibration from the model with simulated gear tooth pitting damage. Results also verified that the WVD method can successfully detect and locate gear tooth wear and pitting damage.
[Study of beta-turns in globular proteins].

PubMed

Amirova, S R; Milchevskiĭ, Iu V; Filatov, I V; Esipova, N G; Tumanian, V G

2005-01-01

The formation of beta-turns in globular proteins has been studied by the method of molecular mechanics. Statistical method of discriminant analysis was applied to calculate energy components and sequences of oligopeptide segments, and after this prediction of I type beta-turns has been drawn. The accuracy of true positive prediction is 65%. Components of conformational energy considerably affecting beta-turn formation were delineated. There are torsional energy, energy of hydrogen bonds, and van der Waals energy.
A novel multi-target regression framework for time-series prediction of drug efficacy.

PubMed

Li, Haiqing; Zhang, Wei; Chen, Ying; Guo, Yumeng; Li, Guo-Zheng; Zhu, Xiaoxin

2017-01-18

Excavating from small samples is a challenging pharmacokinetic problem, where statistical methods can be applied. Pharmacokinetic data is special due to the small samples of high dimensionality, which makes it difficult to adopt conventional methods to predict the efficacy of traditional Chinese medicine (TCM) prescription. The main purpose of our study is to obtain some knowledge of the correlation in TCM prescription. Here, a novel method named Multi-target Regression Framework to deal with the problem of efficacy prediction is proposed. We employ the correlation between the values of different time sequences and add predictive targets of previous time as features to predict the value of current time. Several experiments are conducted to test the validity of our method and the results of leave-one-out cross-validation clearly manifest the competitiveness of our framework. Compared with linear regression, artificial neural networks, and partial least squares, support vector regression combined with our framework demonstrates the best performance, and appears to be more suitable for this task.
A novel multi-target regression framework for time-series prediction of drug efficacy

PubMed Central

Li, Haiqing; Zhang, Wei; Chen, Ying; Guo, Yumeng; Li, Guo-Zheng; Zhu, Xiaoxin

2017-01-01

Excavating from small samples is a challenging pharmacokinetic problem, where statistical methods can be applied. Pharmacokinetic data is special due to the small samples of high dimensionality, which makes it difficult to adopt conventional methods to predict the efficacy of traditional Chinese medicine (TCM) prescription. The main purpose of our study is to obtain some knowledge of the correlation in TCM prescription. Here, a novel method named Multi-target Regression Framework to deal with the problem of efficacy prediction is proposed. We employ the correlation between the values of different time sequences and add predictive targets of previous time as features to predict the value of current time. Several experiments are conducted to test the validity of our method and the results of leave-one-out cross-validation clearly manifest the competitiveness of our framework. Compared with linear regression, artificial neural networks, and partial least squares, support vector regression combined with our framework demonstrates the best performance, and appears to be more suitable for this task. PMID:28098186
Prediction of TF target sites based on atomistic models of protein-DNA complexes

PubMed Central

Angarica, Vladimir Espinosa; Pérez, Abel González; Vasconcelos, Ana T; Collado-Vides, Julio; Contreras-Moreira, Bruno

2008-01-01

Background The specific recognition of genomic cis-regulatory elements by transcription factors (TFs) plays an essential role in the regulation of coordinated gene expression. Studying the mechanisms determining binding specificity in protein-DNA interactions is thus an important goal. Most current approaches for modeling TF specific recognition rely on the knowledge of large sets of cognate target sites and consider only the information contained in their primary sequence. Results Here we describe a structure-based methodology for predicting sequence motifs starting from the coordinates of a TF-DNA complex. Our algorithm combines information regarding the direct and indirect readout of DNA into an atomistic statistical model, which is used to estimate the interaction potential. We first measure the ability of our method to correctly estimate the binding specificities of eight prokaryotic and eukaryotic TFs that belong to different structural superfamilies. Secondly, the method is applied to two homology models, finding that sampling of interface side-chain rotamers remarkably improves the results. Thirdly, the algorithm is compared with a reference structural method based on contact counts, obtaining comparable predictions for the experimental complexes and more accurate sequence motifs for the homology models. Conclusion Our results demonstrate that atomic-detail structural information can be feasibly used to predict TF binding sites. The computational method presented here is universal and might be applied to other systems involving protein-DNA recognition. PMID:18922190
An Improved MUSIC Model for Gibbsite Surfaces

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mitchell, Scott C.; Bickmore, Barry R.; Tadanier, Christopher J.

2004-06-01

Here we use gibbsite as a model system with which to test a recently published, bond-valence method for predicting intrinsic pKa values for surface functional groups on oxides. At issue is whether the method is adequate when valence parameters for the functional groups are derived from ab initio structure optimization of surfaces terminated by vacuum. If not, ab initio molecular dynamics (AIMD) simulations of solvated surfaces (which are much more computationally expensive) will have to be used. To do this, we had to evaluate extant gibbsite potentiometric titration data that where some estimate of edge and basal surface area wasmore » available. Applying BET and recently developed atomic force microscopy methods, we found that most of these data sets were flawed, in that their surface area estimates were probably wrong. Similarly, there may have been problems with many of the titration procedures. However, one data set was adequate on both counts, and we applied our method of surface pKa int prediction to fitting a MUSIC model to this data with considerable success—several features of the titration data were predicted well. However, the model fit was certainly not perfect, and we experienced some difficulties optimizing highly charged, vacuum-terminated surfaces. Therefore, we conclude that we probably need to do AIMD simulations of solvated surfaces to adequately predict intrinsic pKa values for surface functional groups.« less
Development of Physics-Based Hurricane Wave Response Functions: Application to Selected Sites on the U.S. Gulf Coast

NASA Astrophysics Data System (ADS)

McLaughlin, P. W.; Kaihatu, J. M.; Irish, J. L.; Taylor, N. R.; Slinn, D.

2013-12-01

Recent hurricane activity in the Gulf of Mexico has led to a need for accurate, computationally efficient prediction of hurricane damage so that communities can better assess risk of local socio-economic disruption. This study focuses on developing robust, physics based non-dimensional equations that accurately predict maximum significant wave height at different locations near a given hurricane track. These equations (denoted as Wave Response Functions, or WRFs) were developed from presumed physical dependencies between wave heights and hurricane characteristics and fit with data from numerical models of waves and surge under hurricane conditions. After curve fitting, constraints which correct for fully developed sea state were used to limit the wind wave growth. When applied to the region near Gulfport, MS, back prediction of maximum significant wave height yielded root mean square errors between 0.22-0.42 (m) at open coast stations and 0.07-0.30 (m) at bay stations when compared to the numerical model data. The WRF method was also applied to Corpus Christi, TX and Panama City, FL with similar results. Back prediction errors will be included in uncertainty evaluations connected to risk calculations using joint probability methods. These methods require thousands of simulations to quantify extreme value statistics, thus requiring the use of reduced methods such as the WRF to represent the relevant physical processes.
Improved numerical methods for turbulent viscous recirculating flows

NASA Technical Reports Server (NTRS)

Turan, A.; Vandoormaal, J. P.

1988-01-01

The performance of discrete methods for the prediction of fluid flows can be enhanced by improving the convergence rate of solvers and by increasing the accuracy of the discrete representation of the equations of motion. This report evaluates the gains in solver performance that are available when various acceleration methods are applied. Various discretizations are also examined and two are recommended because of their accuracy and robustness. Insertion of the improved discretization and solver accelerator into a TEACH mode, that has been widely applied to combustor flows, illustrates the substantial gains to be achieved.
Empirical Flutter Prediction Method.

DTIC Science & Technology

1988-03-05

been used in this way to discover species or subspecies of animals, and to discover different types of voter or comsumer requiring different persuasions...respect to behavior or performance or response variables. Once this were done, corresponding clusters might be sought among descriptive or predictive or...jump in a response. The first sort of usage does not apply to the flutter prediction problem. Here the types of behavior are the different kinds of
Sea Ice Detection Based on an Improved Similarity Measurement Method Using Hyperspectral Data.

PubMed

Han, Yanling; Li, Jue; Zhang, Yun; Hong, Zhonghua; Wang, Jing

2017-05-15

Hyperspectral remote sensing technology can acquire nearly continuous spectrum information and rich sea ice image information, thus providing an important means of sea ice detection. However, the correlation and redundancy among hyperspectral bands reduce the accuracy of traditional sea ice detection methods. Based on the spectral characteristics of sea ice, this study presents an improved similarity measurement method based on linear prediction (ISMLP) to detect sea ice. First, the first original band with a large amount of information is determined based on mutual information theory. Subsequently, a second original band with the least similarity is chosen by the spectral correlation measuring method. Finally, subsequent bands are selected through the linear prediction method, and a support vector machine classifier model is applied to classify sea ice. In experiments performed on images of Baffin Bay and Bohai Bay, comparative analyses were conducted to compare the proposed method and traditional sea ice detection methods. Our proposed ISMLP method achieved the highest classification accuracies (91.18% and 94.22%) in both experiments. From these results the ISMLP method exhibits better performance overall than other methods and can be effectively applied to hyperspectral sea ice detection.
Sea Ice Detection Based on an Improved Similarity Measurement Method Using Hyperspectral Data

PubMed Central

Han, Yanling; Li, Jue; Zhang, Yun; Hong, Zhonghua; Wang, Jing

2017-01-01

Hyperspectral remote sensing technology can acquire nearly continuous spectrum information and rich sea ice image information, thus providing an important means of sea ice detection. However, the correlation and redundancy among hyperspectral bands reduce the accuracy of traditional sea ice detection methods. Based on the spectral characteristics of sea ice, this study presents an improved similarity measurement method based on linear prediction (ISMLP) to detect sea ice. First, the first original band with a large amount of information is determined based on mutual information theory. Subsequently, a second original band with the least similarity is chosen by the spectral correlation measuring method. Finally, subsequent bands are selected through the linear prediction method, and a support vector machine classifier model is applied to classify sea ice. In experiments performed on images of Baffin Bay and Bohai Bay, comparative analyses were conducted to compare the proposed method and traditional sea ice detection methods. Our proposed ISMLP method achieved the highest classification accuracies (91.18% and 94.22%) in both experiments. From these results the ISMLP method exhibits better performance overall than other methods and can be effectively applied to hyperspectral sea ice detection. PMID:28505135
Accurate prediction of RNA-binding protein residues with two discriminative structural descriptors.

PubMed

Sun, Meijian; Wang, Xia; Zou, Chuanxin; He, Zenghui; Liu, Wei; Li, Honglin

2016-06-07

RNA-binding proteins participate in many important biological processes concerning RNA-mediated gene regulation, and several computational methods have been recently developed to predict the protein-RNA interactions of RNA-binding proteins. Newly developed discriminative descriptors will help to improve the prediction accuracy of these prediction methods and provide further meaningful information for researchers. In this work, we designed two structural features (residue electrostatic surface potential and triplet interface propensity) and according to the statistical and structural analysis of protein-RNA complexes, the two features were powerful for identifying RNA-binding protein residues. Using these two features and other excellent structure- and sequence-based features, a random forest classifier was constructed to predict RNA-binding residues. The area under the receiver operating characteristic curve (AUC) of five-fold cross-validation for our method on training set RBP195 was 0.900, and when applied to the test set RBP68, the prediction accuracy (ACC) was 0.868, and the F-score was 0.631. The good prediction performance of our method revealed that the two newly designed descriptors could be discriminative for inferring protein residues interacting with RNAs. To facilitate the use of our method, a web-server called RNAProSite, which implements the proposed method, was constructed and is freely available at http://lilab.ecust.edu.cn/NABind .
Prediction of resource volumes at untested locations using simple local prediction models

USGS Publications Warehouse

Attanasi, E.D.; Coburn, T.C.; Freeman, P.A.

2006-01-01

This paper shows how local spatial nonparametric prediction models can be applied to estimate volumes of recoverable gas resources at individual undrilled sites, at multiple sites on a regional scale, and to compute confidence bounds for regional volumes based on the distribution of those estimates. An approach that combines cross-validation, the jackknife, and bootstrap procedures is used to accomplish this task. Simulation experiments show that cross-validation can be applied beneficially to select an appropriate prediction model. The cross-validation procedure worked well for a wide range of different states of nature and levels of information. Jackknife procedures are used to compute individual prediction estimation errors at undrilled locations. The jackknife replicates also are used with a bootstrap resampling procedure to compute confidence bounds for the total volume. The method was applied to data (partitioned into a training set and target set) from the Devonian Antrim Shale continuous-type gas play in the Michigan Basin in Otsego County, Michigan. The analysis showed that the model estimate of total recoverable volumes at prediction sites is within 4 percent of the total observed volume. The model predictions also provide frequency distributions of the cell volumes at the production unit scale. Such distributions are the basis for subsequent economic analyses. ?? Springer Science+Business Media, LLC 2007.
A Complete Procedure for Predicting and Improving the Performance of HAWT's

NASA Astrophysics Data System (ADS)

Al-Abadi, Ali; Ertunç, Özgür; Sittig, Florian; Delgado, Antonio

2014-06-01

A complete procedure for predicting and improving the performance of the horizontal axis wind turbine (HAWT) has been developed. The first process is predicting the power extracted by the turbine and the derived rotor torque, which should be identical to that of the drive unit. The BEM method and a developed post-stall treatment for resolving stall-regulated HAWT is incorporated in the prediction. For that, a modified stall-regulated prediction model, which can predict the HAWT performance over the operating range of oncoming wind velocity, is derived from existing models. The model involves radius and chord, which has made it more general in applications for predicting the performance of different scales and rotor shapes of HAWTs. The second process is modifying the rotor shape by an optimization process, which can be applied to any existing HAWT, to improve its performance. A gradient- based optimization is used for adjusting the chord and twist angle distribution of the rotor blade to increase the extraction of the power while keeping the drive torque constant, thus the same drive unit can be kept. The final process is testing the modified turbine to predict its enhanced performance. The procedure is applied to NREL phase-VI 10kW as a baseline turbine. The study has proven the applicability of the developed model in predicting the performance of the baseline as well as the optimized turbine. In addition, the optimization method has shown that the power coefficient can be increased while keeping same design rotational speed.
Reliability of tanoak volume equations when applied to different areas

Treesearch

Norman H. Pillsbury; Philip M. McDonald; Victor Simon

1995-01-01

Tree volume equations for tanoak (Lithocarpus densiflorus) were developed for seven stands throughout its natural range and compared by a volume prediction and a parameter difference method. The objective was to test if volume estimates from a species growing in a local, relatively uniform habitat could be applied more widely. Results indicated...
Application of the backstepping method to the prediction of increase or decrease of infected population.

PubMed

Kuniya, Toshikazu; Sano, Hideki

2016-05-10

In mathematical epidemiology, age-structured epidemic models have usually been formulated as the boundary-value problems of the partial differential equations. On the other hand, in engineering, the backstepping method has recently been developed and widely studied by many authors. Using the backstepping method, we obtained a boundary feedback control which plays the role of the threshold criteria for the prediction of increase or decrease of newly infected population. Under an assumption that the period of infectiousness is same for all infected individuals (that is, the recovery rate is given by the Dirac delta function multiplied by a sufficiently large positive constant), the prediction method is simplified to the comparison of the numbers of reported cases at the current and previous time steps. Our prediction method was applied to the reported cases per sentinel of influenza in Japan from 2006 to 2015 and its accuracy was 0.81 (404 correct predictions to the total 500 predictions). It was higher than that of the ARIMA models with different orders of the autoregressive part, differencing and moving-average process. In addition, a proposed method for the estimation of the number of reported cases, which is consistent with our prediction method, was better than that of the best-fitted ARIMA model ARIMA(1,1,0) in the sense of mean square error. Our prediction method based on the backstepping method can be simplified to the comparison of the numbers of reported cases of the current and previous time steps. In spite of its simplicity, it can provide a good prediction for the spread of influenza in Japan.
Covert Network Analysis for Key Player Detection and Event Prediction Using a Hybrid Classifier

PubMed Central

Akram, M. Usman; Khan, Shoab A.; Javed, Muhammad Younus

2014-01-01

National security has gained vital importance due to increasing number of suspicious and terrorist events across the globe. Use of different subfields of information technology has also gained much attraction of researchers and practitioners to design systems which can detect main members which are actually responsible for such kind of events. In this paper, we present a novel method to predict key players from a covert network by applying a hybrid framework. The proposed system calculates certain centrality measures for each node in the network and then applies novel hybrid classifier for detection of key players. Our system also applies anomaly detection to predict any terrorist activity in order to help law enforcement agencies to destabilize the involved network. As a proof of concept, the proposed framework has been implemented and tested using different case studies including two publicly available datasets and one local network. PMID:25136674
Network regularised Cox regression and multiplex network models to predict disease comorbidities and survival of cancer.

PubMed

Xu, Haoming; Moni, Mohammad Ali; Liò, Pietro

2015-12-01

In cancer genomics, gene expression levels provide important molecular signatures for all types of cancer, and this could be very useful for predicting the survival of cancer patients. However, the main challenge of gene expression data analysis is high dimensionality, and microarray is characterised by few number of samples with large number of genes. To overcome this problem, a variety of penalised Cox proportional hazard models have been proposed. We introduce a novel network regularised Cox proportional hazard model and a novel multiplex network model to measure the disease comorbidities and to predict survival of the cancer patient. Our methods are applied to analyse seven microarray cancer gene expression datasets: breast cancer, ovarian cancer, lung cancer, liver cancer, renal cancer and osteosarcoma. Firstly, we applied a principal component analysis to reduce the dimensionality of original gene expression data. Secondly, we applied a network regularised Cox regression model on the reduced gene expression datasets. By using normalised mutual information method and multiplex network model, we predict the comorbidities for the liver cancer based on the integration of diverse set of omics and clinical data, and we find the diseasome associations (disease-gene association) among different cancers based on the identified common significant genes. Finally, we evaluated the precision of the approach with respect to the accuracy of survival prediction using ROC curves. We report that colon cancer, liver cancer and renal cancer share the CXCL5 gene, and breast cancer, ovarian cancer and renal cancer share the CCND2 gene. Our methods are useful to predict survival of the patient and disease comorbidities more accurately and helpful for improvement of the care of patients with comorbidity. Software in Matlab and R is available on our GitHub page: https://github.com/ssnhcom/NetworkRegularisedCox.git. Copyright © 2015. Published by Elsevier Ltd.
Free energy of formation of a crystal nucleus in incongruent solidification: Implication for modeling the crystallization of aqueous nitric acid droplets in polar stratospheric clouds

NASA Astrophysics Data System (ADS)

Djikaev, Yuri S.; Ruckenstein, Eli

2017-04-01

Using the formalism of classical thermodynamics in the framework of the classical nucleation theory, we derive an expression for the reversible work W* of formation of a binary crystal nucleus in a liquid binary solution of non-stoichiometric composition (incongruent crystallization). Applied to the crystallization of aqueous nitric acid droplets, the new expression more adequately takes account of the effects of nitric acid vapor compared to the conventional expression of MacKenzie, Kulmala, Laaksonen, and Vesala (MKLV) [J. Geophys. Res.: Atmos. 102, 19729 (1997)]. The predictions of both MKLV and modified expressions for the average liquid-solid interfacial tension σls of nitric acid dihydrate (NAD) crystals are compared by using existing experimental data on the incongruent crystallization of aqueous nitric acid droplets of composition relevant to polar stratospheric clouds (PSCs). The predictions for σls based on the MKLV expression are higher by about 5% compared to predictions based on our modified expression. This results in similar differences between the predictions of both expressions for the solid-vapor interfacial tension σsv of NAD crystal nuclei. The latter can be obtained by using the method based on the analysis of experimental data on crystal nucleation rates in aqueous nitric acid droplets; it exploits the dominance of the surface-stimulated mode of crystal nucleation in small droplets and its negligibility in large ones. Applying that method to existing experimental data, our expression for the free energy of formation provides an estimate for σsv of NAD in the range ≈92 dyn/cm to ≈100 dyn/cm, while the MKLV expression predicts it in the range ≈95 dyn/cm to ≈105 dyn/cm. The predictions of both expressions for W* become identical for the case of congruent crystallization; this was also demonstrated by applying our method for determining σsv to the nucleation of nitric acid trihydrate crystals in PSC droplets of stoichiometric composition.

Convolutional neural networks for seizure prediction using intracranial and scalp electroencephalogram.

PubMed

Truong, Nhan Duy; Nguyen, Anh Duy; Kuhlmann, Levin; Bonyadi, Mohammad Reza; Yang, Jiawei; Ippolito, Samuel; Kavehei, Omid

2018-05-07

Seizure prediction has attracted growing attention as one of the most challenging predictive data analysis efforts to improve the life of patients with drug-resistant epilepsy and tonic seizures. Many outstanding studies have reported great results in providing sensible indirect (warning systems) or direct (interactive neural stimulation) control over refractory seizures, some of which achieved high performance. However, to achieve high sensitivity and a low false prediction rate, many of these studies relied on handcraft feature extraction and/or tailored feature extraction, which is performed for each patient independently. This approach, however, is not generalizable, and requires significant modifications for each new patient within a new dataset. In this article, we apply convolutional neural networks to different intracranial and scalp electroencephalogram (EEG) datasets and propose a generalized retrospective and patient-specific seizure prediction method. We use the short-time Fourier transform on 30-s EEG windows to extract information in both the frequency domain and the time domain. The algorithm automatically generates optimized features for each patient to best classify preictal and interictal segments. The method can be applied to any other patient from any dataset without the need for manual feature extraction. The proposed approach achieves sensitivity of 81.4%, 81.2%, and 75% and a false prediction rate of 0.06/h, 0.16/h, and 0.21/h on the Freiburg Hospital intracranial EEG dataset, the Boston Children's Hospital-MIT scalp EEG dataset, and the American Epilepsy Society Seizure Prediction Challenge dataset, respectively. Our prediction method is also statistically better than an unspecific random predictor for most of the patients in all three datasets. Copyright © 2018 Elsevier Ltd. All rights reserved.
[Study on the detection of active ingredient contents of Paecilomyces hepiali mycelium via near infrared spectroscopy].

PubMed

Teng, Wei-Zhuo; Song, Jia; Meng, Fan-Xin; Meng, Qing-Fan; Lu, Jia-Hui; Hu, Shuang; Teng, Li-Rong; Wang, Di; Xie, Jing

2014-10-01

Partial least squares (PLS) and radial basis function neural network (RBFNN) combined with near infrared spectros- copy (NIR) were applied to develop models for cordycepic acid, polysaccharide and adenosine analysis in Paecilomyces hepialid fermentation mycelium. The developed models possess well generalization and predictive ability which can be applied for crude drugs and related productions determination. During the experiment, 214 Paecilomyces hepialid mycelium samples were obtained via chemical mutagenesis combined with submerged fermentation. The contents of cordycepic acid, polysaccharide and adenosine were determined via traditional methods and the near infrared spectroscopy data were collected. The outliers were removed and the numbers of calibration set were confirmed via Monte Carlo partial least square (MCPLS) method. Based on the values of degree of approach (Da), both moving window partial least squares (MWPLS) and moving window radial basis function neural network (MWRBFNN) were applied to optimize characteristic wavelength variables, optimum preprocessing methods and other important variables in the models. After comparison, the RBFNN, RBFNN and PLS models were developed successfully for cordycepic acid, polysaccharide and adenosine detection, and the correlation between reference values and predictive values in both calibration set (R2c) and validation set (R2p) of optimum models was 0.9417 and 0.9663, 0.9803 and 0.9850, and 0.9761 and 0.9728, respectively. All the data suggest that these models possess well fitness and predictive ability.
A novel computer algorithm improves antibody epitope prediction using affinity-selected mimotopes: a case study using monoclonal antibodies against the West Nile virus E protein.

PubMed

Denisova, Galina F; Denisov, Dimitri A; Yeung, Jeffrey; Loeb, Mark B; Diamond, Michael S; Bramson, Jonathan L

2008-11-01

Understanding antibody function is often enhanced by knowledge of the specific binding epitope. Here, we describe a computer algorithm that permits epitope prediction based on a collection of random peptide epitopes (mimotopes) isolated by antibody affinity purification. We applied this methodology to the prediction of epitopes for five monoclonal antibodies against the West Nile virus (WNV) E protein, two of which exhibit therapeutic activity in vivo. This strategy was validated by comparison of our results with existing F(ab)-E protein crystal structures and mutational analysis by yeast surface display. We demonstrate that by combining the results of the mimotope method with our data from mutational analysis, epitopes could be predicted with greater certainty. The two methods displayed great complementarity as the mutational analysis facilitated epitope prediction when the results with the mimotope method were equivocal and the mimotope method revealed a broader number of residues within the epitope than the mutational analysis. Our results demonstrate that the combination of these two prediction strategies provides a robust platform for epitope characterization.
Prediction of multi-drug resistance transporters using a novel sequence analysis method [version 2; referees: 2 approved

DOE PAGES

McDermott, Jason E.; Bruillard, Paul; Overall, Christopher C.; ...

2015-03-09

There are many examples of groups of proteins that have similar function, but the determinants of functional specificity may be hidden by lack of sequencesimilarity, or by large groups of similar sequences with different functions. Transporters are one such protein group in that the general function, transport, can be easily inferred from the sequence, but the substrate specificity can be impossible to predict from sequence with current methods. In this paper we describe a linguistic-based approach to identify functional patterns from groups of unaligned protein sequences and its application to predict multi-drug resistance transporters (MDRs) from bacteria. We first showmore » that our method can recreate known patterns from PROSITE for several motifs from unaligned sequences. We then show that the method, MDRpred, can predict MDRs with greater accuracy and positive predictive value than a collection of currently available family-based models from the Pfam database. Finally, we apply MDRpred to a large collection of protein sequences from an environmental microbiome study to make novel predictions about drug resistance in a potential environmental reservoir.« less
Building blocks for automated elucidation of metabolites: machine learning methods for NMR prediction.

PubMed

Kuhn, Stefan; Egert, Björn; Neumann, Steffen; Steinbeck, Christoph

2008-09-25

Current efforts in Metabolomics, such as the Human Metabolome Project, collect structures of biological metabolites as well as data for their characterisation, such as spectra for identification of substances and measurements of their concentration. Still, only a fraction of existing metabolites and their spectral fingerprints are known. Computer-Assisted Structure Elucidation (CASE) of biological metabolites will be an important tool to leverage this lack of knowledge. Indispensable for CASE are modules to predict spectra for hypothetical structures. This paper evaluates different statistical and machine learning methods to perform predictions of proton NMR spectra based on data from our open database NMRShiftDB. A mean absolute error of 0.18 ppm was achieved for the prediction of proton NMR shifts ranging from 0 to 11 ppm. Random forest, J48 decision tree and support vector machines achieved similar overall errors. HOSE codes being a notably simple method achieved a comparatively good result of 0.17 ppm mean absolute error. NMR prediction methods applied in the course of this work delivered precise predictions which can serve as a building block for Computer-Assisted Structure Elucidation for biological metabolites.
MultiDK: A Multiple Descriptor Multiple Kernel Approach for Molecular Discovery and Its Application to Organic Flow Battery Electrolytes.

PubMed

Kim, Sungjin; Jinich, Adrián; Aspuru-Guzik, Alán

2017-04-24

We propose a multiple descriptor multiple kernel (MultiDK) method for efficient molecular discovery using machine learning. We show that the MultiDK method improves both the speed and accuracy of molecular property prediction. We apply the method to the discovery of electrolyte molecules for aqueous redox flow batteries. Using multiple-type-as opposed to single-type-descriptors, we obtain more relevant features for machine learning. Following the principle of "wisdom of the crowds", the combination of multiple-type descriptors significantly boosts prediction performance. Moreover, by employing multiple kernels-more than one kernel function for a set of the input descriptors-MultiDK exploits nonlinear relations between molecular structure and properties better than a linear regression approach. The multiple kernels consist of a Tanimoto similarity kernel and a linear kernel for a set of binary descriptors and a set of nonbinary descriptors, respectively. Using MultiDK, we achieve an average performance of r 2 = 0.92 with a test set of molecules for solubility prediction. We also extend MultiDK to predict pH-dependent solubility and apply it to a set of quinone molecules with different ionizable functional groups to assess their performance as flow battery electrolytes.
Whole-genome regression and prediction methods applied to plant and animal breeding.

PubMed

de Los Campos, Gustavo; Hickey, John M; Pong-Wong, Ricardo; Daetwyler, Hans D; Calus, Mario P L

2013-02-01

Genomic-enabled prediction is becoming increasingly important in animal and plant breeding and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of markers concurrently. Methods exist that allow implementing these large-p with small-n regressions, and genome-enabled selection (GS) is being implemented in several plant and animal breeding programs. The list of available methods is long, and the relationships between them have not been fully addressed. In this article we provide an overview of available methods for implementing parametric WGR models, discuss selected topics that emerge in applications, and present a general discussion of lessons learned from simulation and empirical data analysis in the last decade.
Whole-Genome Regression and Prediction Methods Applied to Plant and Animal Breeding

PubMed Central

de los Campos, Gustavo; Hickey, John M.; Pong-Wong, Ricardo; Daetwyler, Hans D.; Calus, Mario P. L.

2013-01-01

Genomic-enabled prediction is becoming increasingly important in animal and plant breeding and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of markers concurrently. Methods exist that allow implementing these large-p with small-n regressions, and genome-enabled selection (GS) is being implemented in several plant and animal breeding programs. The list of available methods is long, and the relationships between them have not been fully addressed. In this article we provide an overview of available methods for implementing parametric WGR models, discuss selected topics that emerge in applications, and present a general discussion of lessons learned from simulation and empirical data analysis in the last decade. PMID:22745228
Next-generation genome-scale models for metabolic engineering.

PubMed

King, Zachary A; Lloyd, Colton J; Feist, Adam M; Palsson, Bernhard O

2015-12-01

Constraint-based reconstruction and analysis (COBRA) methods have become widely used tools for metabolic engineering in both academic and industrial laboratories. By employing a genome-scale in silico representation of the metabolic network of a host organism, COBRA methods can be used to predict optimal genetic modifications that improve the rate and yield of chemical production. A new generation of COBRA models and methods is now being developed--encompassing many biological processes and simulation strategies-and next-generation models enable new types of predictions. Here, three key examples of applying COBRA methods to strain optimization are presented and discussed. Then, an outlook is provided on the next generation of COBRA models and the new types of predictions they will enable for systems metabolic engineering. Copyright © 2014 Elsevier Ltd. All rights reserved.
Improved prediction of MHC class I and class II epitopes using a novel Gibbs sampling approach.

PubMed

Nielsen, Morten; Lundegaard, Claus; Worning, Peder; Hvid, Christina Sylvester; Lamberth, Kasper; Buus, Søren; Brunak, Søren; Lund, Ole

2004-06-12

Prediction of which peptides will bind a specific major histocompatibility complex (MHC) constitutes an important step in identifying potential T-cell epitopes suitable as vaccine candidates. MHC class II binding peptides have a broad length distribution complicating such predictions. Thus, identifying the correct alignment is a crucial part of identifying the core of an MHC class II binding motif. In this context, we wish to describe a novel Gibbs motif sampler method ideally suited for recognizing such weak sequence motifs. The method is based on the Gibbs sampling method, and it incorporates novel features optimized for the task of recognizing the binding motif of MHC classes I and II. The method locates the binding motif in a set of sequences and characterizes the motif in terms of a weight-matrix. Subsequently, the weight-matrix can be applied to identifying effectively potential MHC binding peptides and to guiding the process of rational vaccine design. We apply the motif sampler method to the complex problem of MHC class II binding. The input to the method is amino acid peptide sequences extracted from the public databases of SYFPEITHI and MHCPEP and known to bind to the MHC class II complex HLA-DR4(B1*0401). Prior identification of information-rich (anchor) positions in the binding motif is shown to improve the predictive performance of the Gibbs sampler. Similarly, a consensus solution obtained from an ensemble average over suboptimal solutions is shown to outperform the use of a single optimal solution. In a large-scale benchmark calculation, the performance is quantified using relative operating characteristics curve (ROC) plots and we make a detailed comparison of the performance with that of both the TEPITOPE method and a weight-matrix derived using the conventional alignment algorithm of ClustalW. The calculation demonstrates that the predictive performance of the Gibbs sampler is higher than that of ClustalW and in most cases also higher than that of the TEPITOPE method.
Sparse Bayesian Learning for Identifying Imaging Biomarkers in AD Prediction

PubMed Central

Shen, Li; Qi, Yuan; Kim, Sungeun; Nho, Kwangsik; Wan, Jing; Risacher, Shannon L.; Saykin, Andrew J.

2010-01-01

We apply sparse Bayesian learning methods, automatic relevance determination (ARD) and predictive ARD (PARD), to Alzheimer’s disease (AD) classification to make accurate prediction and identify critical imaging markers relevant to AD at the same time. ARD is one of the most successful Bayesian feature selection methods. PARD is a powerful Bayesian feature selection method, and provides sparse models that is easy to interpret. PARD selects the model with the best estimate of the predictive performance instead of choosing the one with the largest marginal model likelihood. Comparative study with support vector machine (SVM) shows that ARD/PARD in general outperform SVM in terms of prediction accuracy. Additional comparison with surface-based general linear model (GLM) analysis shows that regions with strongest signals are identified by both GLM and ARD/PARD. While GLM P-map returns significant regions all over the cortex, ARD/PARD provide a small number of relevant and meaningful imaging markers with predictive power, including both cortical and subcortical measures. PMID:20879451
Link prediction measures considering different neighbors’ effects and application in social networks

NASA Astrophysics Data System (ADS)

Luo, Peng; Wu, Chong; Li, Yongli

Link prediction measures have been attracted particular attention in the field of mathematical physics. In this paper, we consider the different effects of neighbors in link prediction and focus on four different situations: only consider the individual’s own effects; consider the effects of individual, neighbors and neighbors’ neighbors; consider the effects of individual, neighbors, neighbors’ neighbors, neighbors’ neighbors’ neighbors and neighbors’ neighbors’ neighbors’ neighbors; consider the whole network participants’ effects. Then, according to the four situations, we present our link prediction models which also take the effects of social characteristics into consideration. An artificial network is adopted to illustrate the parameter estimation based on logistic regression. Furthermore, we compare our methods with the some other link prediction methods (LPMs) to examine the validity of our proposed model in online social networks. The results show the superior of our proposed link prediction methods compared with others. In the application part, our models are applied to study the social network evolution and used to recommend friends and cooperators in social networks.
Strain intensity factor approach for predicting the strength of continuously reinforced metal matrix composites

NASA Technical Reports Server (NTRS)

Poe, C. C., Jr.

1988-01-01

A method was previously developed to predict the fracture toughness (stress intensity factor at failure) of composites in terms of the elastic constants and the tensile failing strain of the fibers. The method was applied to boron/aluminum composites made with various proportions of 0 to + or - 45 deg plies. Predicted values of fracture toughness were in gross error because widespread yielding of the aluminum matrix made the compliance very nonlinear. An alternate method was developed to predict the strain intensity factor at failure rather than the stress intensity factor because the singular strain field was not affected by yielding as much as the stress field. Strengths of specimens containing crack-like slits were calculated from predicted failing strains using uniaxial stress-strain curves. Predicted strengths were in good agreement with experimental values, even for the very nonlinear laminates that contained only + or - 45 deg plies. This approach should be valid for other metal matrix composites that have continuous fibers.
On some methods for assessing earthquake predictions

NASA Astrophysics Data System (ADS)

Molchan, G.; Romashkova, L.; Peresan, A.

2017-09-01

A regional approach to the problem of assessing earthquake predictions inevitably faces a deficit of data. We point out some basic limits of assessment methods reported in the literature, considering the practical case of the performance of the CN pattern recognition method in the prediction of large Italian earthquakes. Along with the classical hypothesis testing, a new game approach, the so-called parimutuel gambling (PG) method, is examined. The PG, originally proposed for the evaluation of the probabilistic earthquake forecast, has been recently adapted for the case of 'alarm-based' CN prediction. The PG approach is a non-standard method; therefore it deserves careful examination and theoretical analysis. We show that the PG alarm-based version leads to an almost complete loss of information about predicted earthquakes (even for a large sample). As a result, any conclusions based on the alarm-based PG approach are not to be trusted. We also show that the original probabilistic PG approach does not necessarily identifies the genuine forecast correctly among competing seismicity rate models, even when applied to extensive data.
A Solar Time-Based Analog Ensemble Method for Regional Solar Power Forecasting

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hodge, Brian S; Zhang, Xinmin; Li, Yuan

This paper presents a new analog ensemble method for day-ahead regional photovoltaic (PV) power forecasting with hourly resolution. By utilizing open weather forecast and power measurement data, this prediction method is processed within a set of historical data with similar meteorological data (temperature and irradiance), and astronomical date (solar time and earth declination angle). Further, clustering and blending strategies are applied to improve its accuracy in regional PV forecasting. The robustness of the proposed method is demonstrated with three different numerical weather prediction models, the North American Mesoscale Forecast System, the Global Forecast System, and the Short-Range Ensemble Forecast, formore » both region level and single site level PV forecasts. Using real measured data, the new forecasting approach is applied to the load zone in Southeastern Massachusetts as a case study. The normalized root mean square error (NRMSE) has been reduced by 13.80%-61.21% when compared with three tested baselines.« less
Systems and methods for knowledge discovery in spatial data

DOEpatents

Obradovic, Zoran; Fiez, Timothy E.; Vucetic, Slobodan; Lazarevic, Aleksandar; Pokrajac, Dragoljub; Hoskinson, Reed L.

2005-03-08

Systems and methods are provided for knowledge discovery in spatial data as well as to systems and methods for optimizing recipes used in spatial environments such as may be found in precision agriculture. A spatial data analysis and modeling module is provided which allows users to interactively and flexibly analyze and mine spatial data. The spatial data analysis and modeling module applies spatial data mining algorithms through a number of steps. The data loading and generation module obtains or generates spatial data and allows for basic partitioning. The inspection module provides basic statistical analysis. The preprocessing module smoothes and cleans the data and allows for basic manipulation of the data. The partitioning module provides for more advanced data partitioning. The prediction module applies regression and classification algorithms on the spatial data. The integration module enhances prediction methods by combining and integrating models. The recommendation module provides the user with site-specific recommendations as to how to optimize a recipe for a spatial environment such as a fertilizer recipe for an agricultural field.
Numerical Evaluation of P-Multigrid Method for the Solution of Discontinuous Galerkin Discretizations of Diffusive Equations

NASA Technical Reports Server (NTRS)

Atkins, H. L.; Helenbrook, B. T.

2005-01-01

This paper describes numerical experiments with P-multigrid to corroborate analysis, validate the present implementation, and to examine issues that arise in the implementations of the various combinations of relaxation schemes, discretizations and P-multigrid methods. The two approaches to implement P-multigrid presented here are equivalent for most high-order discretization methods such as spectral element, SUPG, and discontinuous Galerkin applied to advection; however it is discovered that the approach that mimics the common geometric multigrid implementation is less robust, and frequently unstable when applied to discontinuous Galerkin discretizations of di usion. Gauss-Seidel relaxation converges 40% faster than block Jacobi, as predicted by analysis; however, the implementation of Gauss-Seidel is considerably more expensive that one would expect because gradients in most neighboring elements must be updated. A compromise quasi Gauss-Seidel relaxation method that evaluates the gradient in each element twice per iteration converges at rates similar to those predicted for true Gauss-Seidel.
A review of machine learning in obesity.

PubMed

DeGregory, K W; Kuiper, P; DeSilvio, T; Pleuss, J D; Miller, R; Roginski, J W; Fisher, C B; Harness, D; Viswanath, S; Heymsfield, S B; Dungan, I; Thomas, D M

2018-05-01

Rich sources of obesity-related data arising from sensors, smartphone apps, electronic medical health records and insurance data can bring new insights for understanding, preventing and treating obesity. For such large datasets, machine learning provides sophisticated and elegant tools to describe, classify and predict obesity-related risks and outcomes. Here, we review machine learning methods that predict and/or classify such as linear and logistic regression, artificial neural networks, deep learning and decision tree analysis. We also review methods that describe and characterize data such as cluster analysis, principal component analysis, network science and topological data analysis. We introduce each method with a high-level overview followed by examples of successful applications. The algorithms were then applied to National Health and Nutrition Examination Survey to demonstrate methodology, utility and outcomes. The strengths and limitations of each method were also evaluated. This summary of machine learning algorithms provides a unique overview of the state of data analysis applied specifically to obesity. © 2018 World Obesity Federation.
UVPROM dosimetry, microdosimetry and applications to SEU and extreme value theory

NASA Astrophysics Data System (ADS)

Scheick, Leif Zebediah

A new method is described for characterizing a device in terms of the statistical distribution of first failures. The method is based on the erasure of a commercial Ultra- Violet erasable Programmable Read Only Memory (UVPROM). The method of readout would be used on a spacecraft or in other restrictive radiation environments. The measurement of the charge remaining on the floating gate is used to determine absorbed dose. The method of determining dose does not require the detector to be destroyed or erased nor does it effect the ability for taking further measurements. This is compared to extreme value theory applied to the statistical distributions that apply to this device. This technique predicts the threshold of Single Event Effects (SEE), like anomalous changes in erasure time in programmable devices due to high microdose energy-deposition events. This technique also allows for advanced non-destructive, screening of a single microelectronic devices for predictable response in a stressful, i.e. radiation, environments.
Artificial neural network intelligent method for prediction

NASA Astrophysics Data System (ADS)

Trifonov, Roumen; Yoshinov, Radoslav; Pavlova, Galya; Tsochev, Georgi

2017-09-01

Accounting and financial classification and prediction problems are high challenge and researchers use different methods to solve them. Methods and instruments for short time prediction of financial operations using artificial neural network are considered. The methods, used for prediction of financial data as well as the developed forecasting system with neural network are described in the paper. The architecture of a neural network used four different technical indicators, which are based on the raw data and the current day of the week is presented. The network developed is used for forecasting movement of stock prices one day ahead and consists of an input layer, one hidden layer and an output layer. The training method is algorithm with back propagation of the error. The main advantage of the developed system is self-determination of the optimal topology of neural network, due to which it becomes flexible and more precise The proposed system with neural network is universal and can be applied to various financial instruments using only basic technical indicators as input data.

Predicting the equilibrium solubility of solid polycyclic aromatic hydrocarbons and dibenzothiophene using a combination of MOSCED plus molecular simulation or electronic structure calculations

NASA Astrophysics Data System (ADS)

Phifer, Jeremy R.; Cox, Courtney E.; da Silva, Larissa Ferreira; Nogueira, Gabriel Gonçalves; Barbosa, Ana Karolyne Pereira; Ley, Ryan T.; Bozada, Samantha M.; O'Loughlin, Elizabeth J.; Paluch, Andrew S.

2017-06-01

Methods to predict the equilibrium solubility of non-electrolyte solids are important for the design of novel separation processes. Here we demonstrate how conventional molecular simulation free energy calculations or electronic structure calculations in a continuum solvent, here SMD or SM8, can be used to predict parameters for the MOdified Separation of Cohesive Energy Density (MOSCED) method. The method is applied to the solutes naphthalene, anthracene, phenanthrene, pyrene and dibenzothiophene, compounds of interested to the petroleum industry and for environmental remediation. Adopting the melting point temperature and enthalpy of fusion of these compounds from experiment, we are able to predict equilibrium solubilities. Comparing to a total of 422 non-aqueous and 193 aqueous experimental solubilities, we find the proposed method is able to well correlate the data. The use of MOSCED is additionally advantageous as it is a solubility parameter-based method useful for intuitive solvent selection and formulation.
Effects of Moisture and Particle Size on Quantitative Determination of Total Organic Carbon (TOC) in Soils Using Near-Infrared Spectroscopy.

PubMed

Tamburini, Elena; Vincenzi, Fabio; Costa, Stefania; Mantovi, Paolo; Pedrini, Paola; Castaldelli, Giuseppe

2017-10-17

Near-Infrared Spectroscopy is a cost-effective and environmentally friendly technique that could represent an alternative to conventional soil analysis methods, including total organic carbon (TOC). Soil fertility and quality are usually measured by traditional methods that involve the use of hazardous and strong chemicals. The effects of physical soil characteristics, such as moisture content and particle size, on spectral signals could be of great interest in order to understand and optimize prediction capability and set up a robust and reliable calibration model, with the future perspective of being applied in the field. Spectra of 46 soil samples were collected. Soil samples were divided into three data sets: unprocessed, only dried and dried, ground and sieved, in order to evaluate the effects of moisture and particle size on spectral signals. Both separate and combined normalization methods including standard normal variate (SNV), multiplicative scatter correction (MSC) and normalization by closure (NCL), as well as smoothing using first and second derivatives (DV1 and DV2), were applied to a total of seven cases. Pretreatments for model optimization were designed and compared for each data set. The best combination of pretreatments was achieved by applying SNV and DV2 on partial least squares (PLS) modelling. There were no significant differences between the predictions using the three different data sets ( p < 0.05). Finally, a unique database including all three data sets was built to include all the sources of sample variability that were tested and used for final prediction. External validation of TOC was carried out on 16 unknown soil samples to evaluate the predictive ability of the final combined calibration model. Hence, we demonstrate that sample preprocessing has minor influence on the quality of near infrared spectroscopy (NIR) predictions, laying the ground for a direct and fast in situ application of the method. Data can be acquired outside the laboratory since the method is simple and does not need more than a simple band ratio of the spectra.
Predicting the evolution of spreading on complex networks

PubMed Central

Chen, Duan-Bing; Xiao, Rui; Zeng, An

2014-01-01

Due to the wide applications, spreading processes on complex networks have been intensively studied. However, one of the most fundamental problems has not yet been well addressed: predicting the evolution of spreading based on a given snapshot of the propagation on networks. With this problem solved, one can accelerate or slow down the spreading in advance if the predicted propagation result is narrower or wider than expected. In this paper, we propose an iterative algorithm to estimate the infection probability of the spreading process and then apply it to a mean-field approach to predict the spreading coverage. The validation of the method is performed in both artificial and real networks. The results show that our method is accurate in both infection probability estimation and spreading coverage prediction. PMID:25130862
A simplified approach to the pooled analysis of calibration of clinical prediction rules for systematic reviews of validation studies

PubMed Central

Dimitrov, Borislav D; Motterlini, Nicola; Fahey, Tom

2015-01-01

Objective Estimating calibration performance of clinical prediction rules (CPRs) in systematic reviews of validation studies is not possible when predicted values are neither published nor accessible or sufficient or no individual participant or patient data are available. Our aims were to describe a simplified approach for outcomes prediction and calibration assessment and evaluate its functionality and validity. Study design and methods: Methodological study of systematic reviews of validation studies of CPRs: a) ABCD2 rule for prediction of 7 day stroke; and b) CRB-65 rule for prediction of 30 day mortality. Predicted outcomes in a sample validation study were computed by CPR distribution patterns (“derivation model”). As confirmation, a logistic regression model (with derivation study coefficients) was applied to CPR-based dummy variables in the validation study. Meta-analysis of validation studies provided pooled estimates of “predicted:observed” risk ratios (RRs), 95% confidence intervals (CIs), and indexes of heterogeneity (I2) on forest plots (fixed and random effects models), with and without adjustment of intercepts. The above approach was also applied to the CRB-65 rule. Results Our simplified method, applied to ABCD2 rule in three risk strata (low, 0–3; intermediate, 4–5; high, 6–7 points), indicated that predictions are identical to those computed by univariate, CPR-based logistic regression model. Discrimination was good (c-statistics =0.61–0.82), however, calibration in some studies was low. In such cases with miscalibration, the under-prediction (RRs =0.73–0.91, 95% CIs 0.41–1.48) could be further corrected by intercept adjustment to account for incidence differences. An improvement of both heterogeneities and P-values (Hosmer-Lemeshow goodness-of-fit test) was observed. Better calibration and improved pooled RRs (0.90–1.06), with narrower 95% CIs (0.57–1.41) were achieved. Conclusion Our results have an immediate clinical implication in situations when predicted outcomes in CPR validation studies are lacking or deficient by describing how such predictions can be obtained by everyone using the derivation study alone, without any need for highly specialized knowledge or sophisticated statistics. PMID:25931829
Genetic Algorithms and Classification Trees in Feature Discovery: Diabetes and the NHANES database

DOE Office of Scientific and Technical Information (OSTI.GOV)

Heredia-Langner, Alejandro; Jarman, Kristin H.; Amidan, Brett G.

2013-09-01

This paper presents a feature selection methodology that can be applied to datasets containing a mixture of continuous and categorical variables. Using a Genetic Algorithm (GA), this method explores a dataset and selects a small set of features relevant for the prediction of a binary (1/0) response. Binary classification trees and an objective function based on conditional probabilities are used to measure the fitness of a given subset of features. The method is applied to health data in order to find factors useful for the prediction of diabetes. Results show that our algorithm is capable of narrowing down the setmore » of predictors to around 8 factors that can be validated using reputable medical and public health resources.« less
Novel near-infrared sampling apparatus for single kernel analysis of oil content in maize.

PubMed

Janni, James; Weinstock, B André; Hagen, Lisa; Wright, Steve

2008-04-01

A method of rapid, nondestructive chemical and physical analysis of individual maize (Zea mays L.) kernels is needed for the development of high value food, feed, and fuel traits. Near-infrared (NIR) spectroscopy offers a robust nondestructive method of trait determination. However, traditional NIR bulk sampling techniques cannot be applied successfully to individual kernels. Obtaining optimized single kernel NIR spectra for applied chemometric predictive analysis requires a novel sampling technique that can account for the heterogeneous forms, morphologies, and opacities exhibited in individual maize kernels. In this study such a novel technique is described and compared to less effective means of single kernel NIR analysis. Results of the application of a partial least squares (PLS) derived model for predictive determination of percent oil content per individual kernel are shown.
Limitations of diagnostic precision and predictive utility in the individual case: a challenge for forensic practice.

PubMed

Cooke, David J; Michie, Christine

2010-08-01

Knowledge of group tendencies may not assist accurate predictions in the individual case. This has importance for forensic decision making and for the assessment tools routinely applied in forensic evaluations. In this article, we applied Monte Carlo methods to examine diagnostic agreement with different levels of inter-rater agreement given the distributional characteristics of PCL-R scores. Diagnostic agreement and score agreement were substantially less than expected. In addition, we examined the confidence intervals associated with individual predictions of violent recidivism. On the basis of empirical findings, statistical theory, and logic, we conclude that predictions of future offending cannot be achieved in the individual case with any degree of confidence. We discuss the problems identified in relation to the PCL-R in terms of the broader relevance to all instruments used in forensic decision making.
Univariate Time Series Prediction of Solar Power Using a Hybrid Wavelet-ARMA-NARX Prediction Method

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nazaripouya, Hamidreza; Wang, Yubo; Chu, Chi-Cheng

This paper proposes a new hybrid method for super short-term solar power prediction. Solar output power usually has a complex, nonstationary, and nonlinear characteristic due to intermittent and time varying behavior of solar radiance. In addition, solar power dynamics is fast and is inertia less. An accurate super short-time prediction is required to compensate for the fluctuations and reduce the impact of solar power penetration on the power system. The objective is to predict one step-ahead solar power generation based only on historical solar power time series data. The proposed method incorporates discrete wavelet transform (DWT), Auto-Regressive Moving Average (ARMA)more » models, and Recurrent Neural Networks (RNN), while the RNN architecture is based on Nonlinear Auto-Regressive models with eXogenous inputs (NARX). The wavelet transform is utilized to decompose the solar power time series into a set of richer-behaved forming series for prediction. ARMA model is employed as a linear predictor while NARX is used as a nonlinear pattern recognition tool to estimate and compensate the error of wavelet-ARMA prediction. The proposed method is applied to the data captured from UCLA solar PV panels and the results are compared with some of the common and most recent solar power prediction methods. The results validate the effectiveness of the proposed approach and show a considerable improvement in the prediction precision.« less
Global analysis of bacterial transcription factors to predict cellular target processes.

PubMed

Doerks, Tobias; Andrade, Miguel A; Lathe, Warren; von Mering, Christian; Bork, Peer

2004-03-01

Whole-genome sequences are now available for >100 bacterial species, giving unprecedented power to comparative genomics approaches. We have applied genome-context methods to predict target processes that are regulated by transcription factors (TFs). Of 128 orthologous groups of proteins annotated as TFs, to date, 36 are functionally uncharacterized; in our analysis we predict a probable cellular target process or biochemical pathway for half of these functionally uncharacterized TFs.
Net analyte signal standard addition method (NASSAM) as a novel spectrofluorimetric and spectrophotometric technique for simultaneous determination, application to assay of melatonin and pyridoxine

NASA Astrophysics Data System (ADS)

Asadpour-Zeynali, Karim; Bastami, Mohammad

2010-02-01

In this work a new modification of the standard addition method called "net analyte signal standard addition method (NASSAM)" is presented for the simultaneous spectrofluorimetric and spectrophotometric analysis. The proposed method combines the advantages of standard addition method with those of net analyte signal concept. The method can be applied for the determination of analyte in the presence of known interferents. The accuracy of the predictions against H-point standard addition method is not dependent on the shape of the analyte and interferent spectra. The method was successfully applied to simultaneous spectrofluorimetric and spectrophotometric determination of pyridoxine (PY) and melatonin (MT) in synthetic mixtures and in a pharmaceutical formulation.
Innovation in prediction planning for anterior open bite correction.

PubMed

Almuzian, Mohammed; Almukhtar, Anas; O'Neil, Michael; Benington, Philip; Al Anezi, Thamer; Ayoub, Ashraf

2015-05-01

This study applies recent advances in 3D virtual imaging for application in the prediction planning of dentofacial deformities. Stereo-photogrammetry has been used to create virtual and physical models, which are creatively combined in planning the surgical correction of anterior open bite. The application of these novel methods is demonstrated through the surgical correction of a case.
Free variable selection QSPR study to predict 19F chemical shifts of some fluorinated organic compounds using Random Forest and RBF-PLS methods

NASA Astrophysics Data System (ADS)

Goudarzi, Nasser

2016-04-01

In this work, two new and powerful chemometrics methods are applied for the modeling and prediction of the 19F chemical shift values of some fluorinated organic compounds. The radial basis function-partial least square (RBF-PLS) and random forest (RF) are employed to construct the models to predict the 19F chemical shifts. In this study, we didn't used from any variable selection method and RF method can be used as variable selection and modeling technique. Effects of the important parameters affecting the ability of the RF prediction power such as the number of trees (nt) and the number of randomly selected variables to split each node (m) were investigated. The root-mean-square errors of prediction (RMSEP) for the training set and the prediction set for the RBF-PLS and RF models were 44.70, 23.86, 29.77, and 23.69, respectively. Also, the correlation coefficients of the prediction set for the RBF-PLS and RF models were 0.8684 and 0.9313, respectively. The results obtained reveal that the RF model can be used as a powerful chemometrics tool for the quantitative structure-property relationship (QSPR) studies.
Epileptic Seizures Prediction Using Machine Learning Methods

PubMed Central

Usman, Syed Muhammad

2017-01-01

Epileptic seizures occur due to disorder in brain functionality which can affect patient's health. Prediction of epileptic seizures before the beginning of the onset is quite useful for preventing the seizure by medication. Machine learning techniques and computational methods are used for predicting epileptic seizures from Electroencephalograms (EEG) signals. However, preprocessing of EEG signals for noise removal and features extraction are two major issues that have an adverse effect on both anticipation time and true positive prediction rate. Therefore, we propose a model that provides reliable methods of both preprocessing and feature extraction. Our model predicts epileptic seizures' sufficient time before the onset of seizure starts and provides a better true positive rate. We have applied empirical mode decomposition (EMD) for preprocessing and have extracted time and frequency domain features for training a prediction model. The proposed model detects the start of the preictal state, which is the state that starts few minutes before the onset of the seizure, with a higher true positive rate compared to traditional methods, 92.23%, and maximum anticipation time of 33 minutes and average prediction time of 23.6 minutes on scalp EEG CHB-MIT dataset of 22 subjects. PMID:29410700
Application of a hierarchical enzyme classification method reveals the role of gut microbiome in human metabolism

PubMed Central

2015-01-01

Background Enzymes are known as the molecular machines that drive the metabolism of an organism; hence identification of the full enzyme complement of an organism is essential to build the metabolic blueprint of that species as well as to understand the interplay of multiple species in an ecosystem. Experimental characterization of the enzymatic reactions of all enzymes in a genome is a tedious and expensive task. The problem is more pronounced in the metagenomic samples where even the species are not adequately cultured or characterized. Enzymes encoded by the gut microbiota play an essential role in the host metabolism; thus, warranting the need to accurately identify and annotate the full enzyme complements of species in the genomic and metagenomic projects. To fulfill this need, we develop and apply a method called ECemble, an ensemble approach to identify enzymes and enzyme classes and study the human gut metabolic pathways. Results ECemble method uses an ensemble of machine-learning methods to accurately model and predict enzymes from protein sequences and also identifies the enzyme classes and subclasses at the finest resolution. A tenfold cross-validation result shows accuracy between 97 and 99% at different levels in the hierarchy of enzyme classification, which is superior to comparable methods. We applied ECemble to predict the entire complements of enzymes from ten sequenced proteomes including the human proteome. We also applied this method to predict enzymes encoded by the human gut microbiome from gut metagenomic samples, and to study the role played by the microbe-derived enzymes in the human metabolism. After mapping the known and predicted enzymes to canonical human pathways, we identified 48 pathways that have at least one bacteria-encoded enzyme, which demonstrates the complementary role of gut microbiome in human gut metabolism. These pathways are primarily involved in metabolizing dietary nutrients such as carbohydrates, amino acids, lipids, cofactors and vitamins. Conclusions The ECemble method is able to hierarchically assign high quality enzyme annotations to genomic and metagenomic data. This study demonstrated the real application of ECemble to understand the indispensable role played by microbe-encoded enzymes in the healthy functioning of human metabolic systems. PMID:26099921
Application of a hierarchical enzyme classification method reveals the role of gut microbiome in human metabolism.

PubMed

Mohammed, Akram; Guda, Chittibabu

2015-01-01

Enzymes are known as the molecular machines that drive the metabolism of an organism; hence identification of the full enzyme complement of an organism is essential to build the metabolic blueprint of that species as well as to understand the interplay of multiple species in an ecosystem. Experimental characterization of the enzymatic reactions of all enzymes in a genome is a tedious and expensive task. The problem is more pronounced in the metagenomic samples where even the species are not adequately cultured or characterized. Enzymes encoded by the gut microbiota play an essential role in the host metabolism; thus, warranting the need to accurately identify and annotate the full enzyme complements of species in the genomic and metagenomic projects. To fulfill this need, we develop and apply a method called ECemble, an ensemble approach to identify enzymes and enzyme classes and study the human gut metabolic pathways. ECemble method uses an ensemble of machine-learning methods to accurately model and predict enzymes from protein sequences and also identifies the enzyme classes and subclasses at the finest resolution. A tenfold cross-validation result shows accuracy between 97 and 99% at different levels in the hierarchy of enzyme classification, which is superior to comparable methods. We applied ECemble to predict the entire complements of enzymes from ten sequenced proteomes including the human proteome. We also applied this method to predict enzymes encoded by the human gut microbiome from gut metagenomic samples, and to study the role played by the microbe-derived enzymes in the human metabolism. After mapping the known and predicted enzymes to canonical human pathways, we identified 48 pathways that have at least one bacteria-encoded enzyme, which demonstrates the complementary role of gut microbiome in human gut metabolism. These pathways are primarily involved in metabolizing dietary nutrients such as carbohydrates, amino acids, lipids, cofactors and vitamins. The ECemble method is able to hierarchically assign high quality enzyme annotations to genomic and metagenomic data. This study demonstrated the real application of ECemble to understand the indispensable role played by microbe-encoded enzymes in the healthy functioning of human metabolic systems.
Validation of catchment models for predicting land-use and climate change impacts. 1. Method

NASA Astrophysics Data System (ADS)

Ewen, J.; Parkin, G.

1996-02-01

Computer simulation models are increasingly being proposed as tools capable of giving water resource managers accurate predictions of the impact of changes in land-use and climate. Previous validation testing of catchment models is reviewed, and it is concluded that the methods used do not clearly test a model's fitness for such a purpose. A new generally applicable method is proposed. This involves the direct testing of fitness for purpose, uses established scientific techniques, and may be implemented within a quality assured programme of work. The new method is applied in Part 2 of this study (Parkin et al., J. Hydrol., 175:595-613, 1996).
A method for validation of finite element forming simulation on basis of a pointwise comparison of distance and curvature

NASA Astrophysics Data System (ADS)

Dörr, Dominik; Joppich, Tobias; Schirmaier, Fabian; Mosthaf, Tobias; Kärger, Luise; Henning, Frank

2016-10-01

Thermoforming of continuously fiber reinforced thermoplastics (CFRTP) is ideally suited to thin walled and complex shaped products. By means of forming simulation, an initial validation of the producibility of a specific geometry, an optimization of the forming process and the prediction of fiber-reorientation due to forming is possible. Nevertheless, applied methods need to be validated. Therefor a method is presented, which enables the calculation of error measures for the mismatch between simulation results and experimental tests, based on measurements with a conventional coordinate measuring device. As a quantitative measure, describing the curvature is provided, the presented method is also suitable for numerical or experimental sensitivity studies on wrinkling behavior. The applied methods for forming simulation, implemented in Abaqus explicit, are presented and applied to a generic geometry. The same geometry is tested experimentally and simulation and test results are compared by the proposed validation method.
An empirical approach to improving tidal predictions using recent real-time tide gauge data

NASA Astrophysics Data System (ADS)

Hibbert, Angela; Royston, Samantha; Horsburgh, Kevin J.; Leach, Harry

2014-05-01

Classical harmonic methods of tidal prediction are often problematic in estuarine environments due to the distortion of tidal fluctuations in shallow water, which results in a disparity between predicted and observed sea levels. This is of particular concern in the Bristol Channel, where the error associated with tidal predictions is potentially greater due to an unusually large tidal range of around 12m. As such predictions are fundamental to the short-term forecasting of High Water (HW) extremes, it is vital that alternative solutions are found. In a pilot study, using a year-long observational sea level record from the Port of Avonmouth in the Bristol Channel, the UK National Tidal and Sea Level Facility (NTSLF) tested the potential for reducing tidal prediction errors, using three alternatives to the Harmonic Method of tidal prediction. The three methods evaluated were (1) the use of Artificial Neural Network (ANN) models, (2) the Species Concordance technique and (3) a simple empirical procedure for correcting Harmonic Method High Water predictions based upon a few recent observations (referred to as the Empirical Correction Method). This latter method was then successfully applied to sea level records from an additional 42 of the 45 tide gauges that comprise the UK Tide Gauge Network. Consequently, it is to be incorporated into the operational systems of the UK Coastal Monitoring and Forecasting Partnership in order to improve short-term sea level predictions for the UK and in particular, the accurate estimation of HW extremes.
Estimating carbon monoxide exposure

NASA Technical Reports Server (NTRS)

Edgerley, R. H.

1971-01-01

Method predicts effects of carbon monoxide on astronauts confined in spacecraft cabin atmospheres. Information on need for low toxicity level also applies to confined spaces. Benefits are applicable to industry and public health.
Accurate prediction of protein–protein interactions from sequence alignments using a Bayesian method

PubMed Central

Burger, Lukas; van Nimwegen, Erik

2008-01-01

Accurate and large-scale prediction of protein–protein interactions directly from amino-acid sequences is one of the great challenges in computational biology. Here we present a new Bayesian network method that predicts interaction partners using only multiple alignments of amino-acid sequences of interacting protein domains, without tunable parameters, and without the need for any training examples. We first apply the method to bacterial two-component systems and comprehensively reconstruct two-component signaling networks across all sequenced bacteria. Comparisons of our predictions with known interactions show that our method infers interaction partners genome-wide with high accuracy. To demonstrate the general applicability of our method we show that it also accurately predicts interaction partners in a recent dataset of polyketide synthases. Analysis of the predicted genome-wide two-component signaling networks shows that cognates (interacting kinase/regulator pairs, which lie adjacent on the genome) and orphans (which lie isolated) form two relatively independent components of the signaling network in each genome. In addition, while most genes are predicted to have only a small number of interaction partners, we find that 10% of orphans form a separate class of ‘hub' nodes that distribute and integrate signals to and from up to tens of different interaction partners. PMID:18277381

Application of indoor noise prediction in the real world

NASA Astrophysics Data System (ADS)

Lewis, David N.

2002-11-01

Predicting indoor noise in industrial workrooms is an important part of the process of designing industrial plants. Predicted levels are used in the design process to determine compliance with occupational-noise regulations, and to estimate levels inside the walls in order to predict community noise radiated from the building. Once predicted levels are known, noise-control strategies can be developed. In this paper an overview of over 20 years of experience is given with the use of various prediction approaches to manage noise in Unilever plants. This work has applied empirical and ray-tracing approaches separately, and in combination, to design various packaging and production plants and other facilities. The advantages of prediction methods in general, and of the various approaches in particular, will be discussed. A case-study application of prediction methods to the optimization of noise-control measures in a food-packaging plant will be presented. Plans to acquire a simplified prediction model for use as a company noise-screening tool will be discussed.
Intelligent monitoring and control of semiconductor manufacturing equipment

NASA Technical Reports Server (NTRS)

Murdock, Janet L.; Hayes-Roth, Barbara

1991-01-01

The use of AI methods to monitor and control semiconductor fabrication in a state-of-the-art manufacturing environment called the Rapid Thermal Multiprocessor is described. Semiconductor fabrication involves many complex processing steps with limited opportunities to measure process and product properties. By applying additional process and product knowledge to that limited data, AI methods augment classical control methods by detecting abnormalities and trends, predicting failures, diagnosing, planning corrective action sequences, explaining diagnoses or predictions, and reacting to anomalous conditions that classical control systems typically would not correct. Research methodology and issues are discussed, and two diagnosis scenarios are examined.
Accurate low-cost methods for performance evaluation of cache memory systems

NASA Technical Reports Server (NTRS)

Laha, Subhasis; Patel, Janak H.; Iyer, Ravishankar K.

1988-01-01

Methods of simulation based on statistical techniques are proposed to decrease the need for large trace measurements and for predicting true program behavior. Sampling techniques are applied while the address trace is collected from a workload. This drastically reduces the space and time needed to collect the trace. Simulation techniques are developed to use the sampled data not only to predict the mean miss rate of the cache, but also to provide an empirical estimate of its actual distribution. Finally, a concept of primed cache is introduced to simulate large caches by the sampling-based method.
Calculation of precise firing statistics in a neural network model

NASA Astrophysics Data System (ADS)

Cho, Myoung Won

2017-08-01

A precise prediction of neural firing dynamics is requisite to understand the function of and the learning process in a biological neural network which works depending on exact spike timings. Basically, the prediction of firing statistics is a delicate manybody problem because the firing probability of a neuron at a time is determined by the summation over all effects from past firing states. A neural network model with the Feynman path integral formulation is recently introduced. In this paper, we present several methods to calculate firing statistics in the model. We apply the methods to some cases and compare the theoretical predictions with simulation results.
Virtual screening of cocrystal formers for CL-20

NASA Astrophysics Data System (ADS)

Zhou, Jun-Hong; Chen, Min-Bo; Chen, Wei-Ming; Shi, Liang-Wei; Zhang, Chao-Yang; Li, Hong-Zhen

2014-08-01

According to the structure characteristics of 2,4,6,8,10,12-hexanitrohexaazaisowurtzitane (CL-20) and the kinetic mechanism of the cocrystal formation, the method of virtual screening CL-20 cocrystal formers by the criterion of the strongest intermolecular site pairing energy (ISPE) was proposed. In this method the strongest ISPE was thought to determine the first step of the cocrystal formation. The prediction results for four sets of common drug molecule cocrystals by this method were compared with those by the total ISPE method from the reference (Musumeci et al., 2011), and the experimental results. This method was then applied to virtually screen the CL-20 cocrystal formers, and the prediction results were compared with the experimental results.
An investigation of gear mesh failure prediction techniques. M.S. Thesis - Cleveland State Univ.

NASA Technical Reports Server (NTRS)

Zakrajsek, James J.

1989-01-01

A study was performed in which several gear failure prediction methods were investigated and applied to experimental data from a gear fatigue test apparatus. The primary objective was to provide a baseline understanding of the prediction methods and to evaluate their diagnostic capabilities. The methods investigated use the signal average in both the time and frequency domain to detect gear failure. Data from eleven gear fatigue tests were recorded at periodic time intervals as the gears were run from initiation to failure. Four major failure modes, consisting of heavy wear, tooth breakage, single pits, and distributed pitting were observed among the failed gears. Results show that the prediction methods were able to detect only those gear failures which involved heavy wear or distributed pitting. None of the methods could predict fatigue cracks, which resulted in tooth breakage, or single pits. It is suspected that the fatigue cracks were not detected because of limitations in data acquisition rather than in methodology. Additionally, the frequency response between the gear shaft and the transducer was found to significantly affect the vibration signal. The specific frequencies affected were filtered out of the signal average prior to application of the methods.
A theoretical approach for estimation of ultimate size of bimetallic nanocomposites synthesized in microemulsion systems

NASA Astrophysics Data System (ADS)

Salabat, Alireza; Saydi, Hassan

2012-12-01

In this research a new idea for prediction of ultimate sizes of bimetallic nanocomposites synthesized in water-in-oil microemulsion system is proposed. In this method, by modifying Tabor Winterton approximation equation, an effective Hamaker constant was introduced. This effective Hamaker constant was applied in the van der Waals attractive interaction energy. The obtained effective van der Waals interaction energy was used as attractive contribution in the total interaction energy. The modified interaction energy was applied successfully to predict some bimetallic nanoparticles, at different mass fraction, synthesized in microemulsion system of dioctyl sodium sulfosuccinate (AOT)/isooctane.
The Effect of Nondeterministic Parameters on Shock-Associated Noise Prediction Modeling

NASA Technical Reports Server (NTRS)

Dahl, Milo D.; Khavaran, Abbas

2010-01-01

Engineering applications for aircraft noise prediction contain models for physical phenomenon that enable solutions to be computed quickly. These models contain parameters that have an uncertainty not accounted for in the solution. To include uncertainty in the solution, nondeterministic computational methods are applied. Using prediction models for supersonic jet broadband shock-associated noise, fixed model parameters are replaced by probability distributions to illustrate one of these methods. The results show the impact of using nondeterministic parameters both on estimating the model output uncertainty and on the model spectral level prediction. In addition, a global sensitivity analysis is used to determine the influence of the model parameters on the output, and to identify the parameters with the least influence on model output.
A Data Assimilation System For Operational Weather Forecast In Galicia Region (nw Spain)

NASA Astrophysics Data System (ADS)

Balseiro, C. F.; Souto, M. J.; Pérez-Muñuzuri, V.; Brewster, K.; Xue, M.

Regional weather forecast models, such as the Advanced Regional Prediction System (ARPS), over complex environments with varying local influences require an accurate meteorological analysis that should include all local meteorological measurements available. In this work, the ARPS Data Analysis System (ADAS) (Xue et al. 2001) is applied as a three-dimensional weather analysis tool to include surface station and rawinsonde data with the NCEP AVN forecasts as the analysis background. Currently in ADAS, a set of five meteorological variables are considered during the analysis: horizontal grid-relative wind components, pressure, potential temperature and spe- cific humidity. The analysis is used for high resolution numerical weather prediction for the Galicia region. The analysis method used in ADAS is based on the successive corrective scheme of Bratseth (1986), which asymptotically approaches the result of a statistical (optimal) interpolation, but at lower computational cost. As in the optimal interpolation scheme, the Bratseth interpolation method can take into account the rel- ative error between background and observational data, therefore they are relatively insensitive to large variations in data density and can integrate data of mixed accuracy. This method can be applied economically in an operational setting, providing signifi- cant improvement over the background model forecast as well as any analysis without high-resolution local observations. A one-way nesting is applied for weather forecast in Galicia region, and the use of this assimilation system in both domains shows better results not only in initial conditions but also in all forecast periods. Bratseth, A.M. (1986): "Statistical interpolation by means of successive corrections." Tellus, 38A, 439-447. Souto, M. J., Balseiro, C. F., Pérez-Muñuzuri, V., Xue, M. Brewster, K., (2001): "Im- pact of cloud analysis on numerical weather prediction in the galician region of Spain". Submitted to Journal of Applied Meteorology. Xue, M., Wang. D., Gao, J., Brewster, K, Droegemeier, K. K., (2001): "The Advanced Regional Prediction System (ARPS), storm-scale numerical weather prediction and data assimilation". Meteor. Atmos Physics. Accepted
Hidden state prediction: a modification of classic ancestral state reconstruction algorithms helps unravel complex symbioses.

PubMed

Zaneveld, Jesse R R; Thurber, Rebecca L V

2014-01-01

Complex symbioses between animal or plant hosts and their associated microbiotas can involve thousands of species and millions of genes. Because of the number of interacting partners, it is often impractical to study all organisms or genes in these host-microbe symbioses individually. Yet new phylogenetic predictive methods can use the wealth of accumulated data on diverse model organisms to make inferences into the properties of less well-studied species and gene families. Predictive functional profiling methods use evolutionary models based on the properties of studied relatives to put bounds on the likely characteristics of an organism or gene that has not yet been studied in detail. These techniques have been applied to predict diverse features of host-associated microbial communities ranging from the enzymatic function of uncharacterized genes to the gene content of uncultured microorganisms. We consider these phylogenetically informed predictive techniques from disparate fields as examples of a general class of algorithms for Hidden State Prediction (HSP), and argue that HSP methods have broad value in predicting organismal traits in a variety of contexts, including the study of complex host-microbe symbioses.
Minimum curvilinearity to enhance topological prediction of protein interactions by network embedding

PubMed Central

Cannistraci, Carlo Vittorio; Alanis-Lobato, Gregorio; Ravasi, Timothy

2013-01-01

Motivation: Most functions within the cell emerge thanks to protein–protein interactions (PPIs), yet experimental determination of PPIs is both expensive and time-consuming. PPI networks present significant levels of noise and incompleteness. Predicting interactions using only PPI-network topology (topological prediction) is difficult but essential when prior biological knowledge is absent or unreliable. Methods: Network embedding emphasizes the relations between network proteins embedded in a low-dimensional space, in which protein pairs that are closer to each other represent good candidate interactions. To achieve network denoising, which boosts prediction performance, we first applied minimum curvilinear embedding (MCE), and then adopted shortest path (SP) in the reduced space to assign likelihood scores to candidate interactions. Furthermore, we introduce (i) a new valid variation of MCE, named non-centred MCE (ncMCE); (ii) two automatic strategies for selecting the appropriate embedding dimension; and (iii) two new randomized procedures for evaluating predictions. Results: We compared our method against several unsupervised and supervisedly tuned embedding approaches and node neighbourhood techniques. Despite its computational simplicity, ncMCE-SP was the overall leader, outperforming the current methods in topological link prediction. Conclusion: Minimum curvilinearity is a valuable non-linear framework that we successfully applied to the embedding of protein networks for the unsupervised prediction of novel PPIs. The rationale for our approach is that biological and evolutionary information is imprinted in the non-linear patterns hidden behind the protein network topology, and can be exploited for predicting new protein links. The predicted PPIs represent good candidates for testing in high-throughput experiments or for exploitation in systems biology tools such as those used for network-based inference and prediction of disease-related functional modules. Availability: https://sites.google.com/site/carlovittoriocannistraci/home Contact: kalokagathos.agon@gmail.com or timothy.ravasi@kaust.edu.sa Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23812985
MO-G-18C-05: Real-Time Prediction in Free-Breathing Perfusion MRI

DOE Office of Scientific and Technical Information (OSTI.GOV)

Song, H; Liu, W; Ruan, D

Purpose: The aim is to minimize frame-wise difference errors caused by respiratory motion and eliminate the need for breath-holds in magnetic resonance imaging (MRI) sequences with long acquisitions and repeat times (TRs). The technique is being applied to perfusion MRI using arterial spin labeling (ASL). Methods: Respiratory motion prediction (RMP) using navigator echoes was implemented in ASL. A least-square method was used to extract the respiratory motion information from the 1D navigator. A generalized artificial neutral network (ANN) with three layers was developed to simultaneously predict 10 time points forward in time and correct for respiratory motion during MRI acquisition.more » During the training phase, the parameters of the ANN were optimized to minimize the aggregated prediction error based on acquired navigator data. During realtime prediction, the trained ANN was applied to the most recent estimated displacement trajectory to determine in real-time the amount of spatial Results: The respiratory motion information extracted from the least-square method can accurately represent the navigator profiles, with a normalized chi-square value of 0.037±0.015 across the training phase. During the 60-second training phase, the ANN successfully learned the respiratory motion pattern from the navigator training data. During real-time prediction, the ANN received displacement estimates and predicted the motion in the continuum of a 1.0 s prediction window. The ANN prediction was able to provide corrections for different respiratory states (i.e., inhalation/exhalation) during real-time scanning with a mean absolute error of < 1.8 mm. Conclusion: A new technique enabling free-breathing acquisition during MRI is being developed. A generalized ANN development has demonstrated its efficacy in predicting a continuum of motion profile for volumetric imaging based on navigator inputs. Future work will enhance the robustness of ANN and verify its effectiveness with human subjects. Research supported by National Institutes of Health National Cancer Institute Grant R01 CA159471-01.« less
The impact of different aperture distribution models and critical stress criteria on equivalent permeability in fractured rocks

NASA Astrophysics Data System (ADS)

Bisdom, Kevin; Bertotti, Giovanni; Nick, Hamidreza M.

2016-05-01

Predicting equivalent permeability in fractured reservoirs requires an understanding of the fracture network geometry and apertures. There are different methods for defining aperture, based on outcrop observations (power law scaling), fundamental mechanics (sublinear length-aperture scaling), and experiments (Barton-Bandis conductive shearing). Each method predicts heterogeneous apertures, even along single fractures (i.e., intrafracture variations), but most fractured reservoir models imply constant apertures for single fractures. We compare the relative differences in aperture and permeability predicted by three aperture methods, where permeability is modeled in explicit fracture networks with coupled fracture-matrix flow. Aperture varies along single fractures, and geomechanical relations are used to identify which fractures are critically stressed. The aperture models are applied to real-world large-scale fracture networks. (Sub)linear length scaling predicts the largest average aperture and equivalent permeability. Barton-Bandis aperture is smaller, predicting on average a sixfold increase compared to matrix permeability. Application of critical stress criteria results in a decrease in the fraction of open fractures. For the applied stress conditions, Coulomb predicts that 50% of the network is critically stressed, compared to 80% for Barton-Bandis peak shear. The impact of the fracture network on equivalent permeability depends on the matrix hydraulic properties, as in a low-permeable matrix, intrafracture connectivity, i.e., the opening along a single fracture, controls equivalent permeability, whereas for a more permeable matrix, absolute apertures have a larger impact. Quantification of fracture flow regimes using only the ratio of fracture versus matrix permeability is insufficient, as these regimes also depend on aperture variations within fractures.
In Silico Prediction of Chemical Toxicity for Drug Design Using Machine Learning Methods and Structural Alerts

NASA Astrophysics Data System (ADS)

Yang, Hongbin; Sun, Lixia; Li, Weihua; Liu, Guixia; Tang, Yun

2018-02-01

For a drug, safety is always the most important issue, including a variety of toxicities and adverse drug effects, which should be evaluated in preclinical and clinical trial phases. This review article at first simply introduced the computational methods used in prediction of chemical toxicity for drug design, including machine learning methods and structural alerts. Machine learning methods have been widely applied in qualitative classification and quantitative regression studies, while structural alerts can be regarded as a complementary tool for lead optimization. The emphasis of this article was put on the recent progress of predictive models built for various toxicities. Available databases and web servers were also provided. Though the methods and models are very helpful for drug design, there are still some challenges and limitations to be improved for drug safety assessment in the future.
In Silico Prediction of Chemical Toxicity for Drug Design Using Machine Learning Methods and Structural Alerts

PubMed Central

Yang, Hongbin; Sun, Lixia; Li, Weihua; Liu, Guixia; Tang, Yun

2018-01-01

During drug development, safety is always the most important issue, including a variety of toxicities and adverse drug effects, which should be evaluated in preclinical and clinical trial phases. This review article at first simply introduced the computational methods used in prediction of chemical toxicity for drug design, including machine learning methods and structural alerts. Machine learning methods have been widely applied in qualitative classification and quantitative regression studies, while structural alerts can be regarded as a complementary tool for lead optimization. The emphasis of this article was put on the recent progress of predictive models built for various toxicities. Available databases and web servers were also provided. Though the methods and models are very helpful for drug design, there are still some challenges and limitations to be improved for drug safety assessment in the future. PMID:29515993
Methods of generating synthetic acoustic logs from resistivity logs for gas-hydrate-bearing sediments

USGS Publications Warehouse

Lee, Myung W.

1999-01-01

Methods of predicting acoustic logs from resistivity logs for hydrate-bearing sediments are presented. Modified time average equations derived from the weighted equation provide a means of relating the velocity of the sediment to the resistivity of the sediment. These methods can be used to transform resistivity logs into acoustic logs with or without using the gas hydrate concentration in the pore space. All the parameters except the unconsolidation constants, necessary for the prediction of acoustic log from resistivity log, can be estimated from a cross plot of resistivity versus porosity values. Unconsolidation constants in equations may be assumed without rendering significant errors in the prediction. These methods were applied to the acoustic and resistivity logs acquired at the Mallik 2L-38 gas hydrate research well drilled at the Mackenzie Delta, northern Canada. The results indicate that the proposed method is simple and accurate.
Testing for intracycle determinism in pseudoperiodic time series.

PubMed

Coelho, Mara C S; Mendes, Eduardo M A M; Aguirre, Luis A

2008-06-01

A determinism test is proposed based on the well-known method of the surrogate data. Assuming predictability to be a signature of determinism, the proposed method checks for intracycle (e.g., short-term) determinism in the pseudoperiodic time series for which standard methods of surrogate analysis do not apply. The approach presented is composed of two steps. First, the data are preprocessed to reduce the effects of seasonal and trend components. Second, standard tests of surrogate analysis can then be used. The determinism test is applied to simulated and experimental pseudoperiodic time series and the results show the applicability of the proposed test.
On the prediction of far field computational aeroacoustics of advanced propellers

NASA Technical Reports Server (NTRS)

Jaeger, Stephen M.; Korkan, Kenneth D.

1990-01-01

A numerical method for determining the acoustic far field generated by a high-speed subsonic aircraft propeller was developed. The approach used in this method was to generate the entire three-dimensional pressure field about the propeller (using an Euler flowfield solver) and then to apply a solution of the wave equation on a cylindrical surface enveloping the propeller. The method is applied to generate the three-dimensional flowfield between two blades of an advanced propeller. The results are compared with experimental data obtained in a wind-tunnel test at a Mach number of 0.6.
Unpredictability of escape trajectory explains predator evasion ability and microhabitat preference of desert rodents.

PubMed

Moore, Talia Y; Cooper, Kimberly L; Biewener, Andrew A; Vasudevan, Ramanarayan

2017-09-05

Mechanistically linking movement behaviors and ecology is key to understanding the adaptive evolution of locomotion. Predator evasion, a behavior that enhances fitness, may depend upon short bursts or complex patterns of locomotion. However, such movements are poorly characterized by existing biomechanical metrics. We present methods based on the entropy measure of randomness from Information Theory to quantitatively characterize the unpredictability of non-steady-state locomotion. We then apply the method by examining sympatric rodent species whose escape trajectories differ in dimensionality. Unlike the speed-regulated gait use of cursorial animals to enhance locomotor economy, bipedal jerboa (family Dipodidae) gait transitions likely enhance maneuverability. In field-based observations, jerboa trajectories are significantly less predictable than those of quadrupedal rodents, likely increasing predator evasion ability. Consistent with this hypothesis, jerboas exhibit lower anxiety in open fields than quadrupedal rodents, a behavior that varies inversely with predator evasion ability. Our unpredictability metric expands the scope of quantitative biomechanical studies to include non-steady-state locomotion in a variety of evolutionary and ecologically significant contexts.Biomechanical understanding of animal gait and maneuverability has primarily been limited to species with more predictable, steady-state movement patterns. Here, the authors develop a method to quantify movement predictability, and apply the method to study escape-related movement in several species of desert rodents.
Evaluation of the impacts of climate change on disease vectors through ecological niche modelling.

PubMed

Carvalho, B M; Rangel, E F; Vale, M M

2017-08-01

Vector-borne diseases are exceptionally sensitive to climate change. Predicting vector occurrence in specific regions is a challenge that disease control programs must meet in order to plan and execute control interventions and climate change adaptation measures. Recently, an increasing number of scientific articles have applied ecological niche modelling (ENM) to study medically important insects and ticks. With a myriad of available methods, it is challenging to interpret their results. Here we review the future projections of disease vectors produced by ENM, and assess their trends and limitations. Tropical regions are currently occupied by many vector species; but future projections indicate poleward expansions of suitable climates for their occurrence and, therefore, entomological surveillance must be continuously done in areas projected to become suitable. The most commonly applied methods were the maximum entropy algorithm, generalized linear models, the genetic algorithm for rule set prediction, and discriminant analysis. Lack of consideration of the full-known current distribution of the target species on models with future projections has led to questionable predictions. We conclude that there is no ideal 'gold standard' method to model vector distributions; researchers are encouraged to test different methods for the same data. Such practice is becoming common in the field of ENM, but still lags behind in studies of disease vectors.

Predicting protein concentrations with ELISA microarray assays, monotonic splines and Monte Carlo simulation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Daly, Don S.; Anderson, Kevin K.; White, Amanda M.

Background: A microarray of enzyme-linked immunosorbent assays, or ELISA microarray, predicts simultaneously the concentrations of numerous proteins in a small sample. These predictions, however, are uncertain due to processing error and biological variability. Making sound biological inferences as well as improving the ELISA microarray process require require both concentration predictions and creditable estimates of their errors. Methods: We present a statistical method based on monotonic spline statistical models, penalized constrained least squares fitting (PCLS) and Monte Carlo simulation (MC) to predict concentrations and estimate prediction errors in ELISA microarray. PCLS restrains the flexible spline to a fit of assay intensitymore » that is a monotone function of protein concentration. With MC, both modeling and measurement errors are combined to estimate prediction error. The spline/PCLS/MC method is compared to a common method using simulated and real ELISA microarray data sets. Results: In contrast to the rigid logistic model, the flexible spline model gave credible fits in almost all test cases including troublesome cases with left and/or right censoring, or other asymmetries. For the real data sets, 61% of the spline predictions were more accurate than their comparable logistic predictions; especially the spline predictions at the extremes of the prediction curve. The relative errors of 50% of comparable spline and logistic predictions differed by less than 20%. Monte Carlo simulation rendered acceptable asymmetric prediction intervals for both spline and logistic models while propagation of error produced symmetric intervals that diverged unrealistically as the standard curves approached horizontal asymptotes. Conclusions: The spline/PCLS/MC method is a flexible, robust alternative to a logistic/NLS/propagation-of-error method to reliably predict protein concentrations and estimate their errors. The spline method simplifies model selection and fitting, and reliably estimates believable prediction errors. For the 50% of the real data sets fit well by both methods, spline and logistic predictions are practically indistinguishable, varying in accuracy by less than 15%. The spline method may be useful when automated prediction across simultaneous assays of numerous proteins must be applied routinely with minimal user intervention.« less
Challenges in predicting climate change impacts on pome fruit phenology

NASA Astrophysics Data System (ADS)

Darbyshire, Rebecca; Webb, Leanne; Goodwin, Ian; Barlow, E. W. R.

2014-08-01

Climate projection data were applied to two commonly used pome fruit flowering models to investigate potential differences in predicted full bloom timing. The two methods, fixed thermal time and sequential chill-growth, produced different results for seven apple and pear varieties at two Australian locations. The fixed thermal time model predicted incremental advancement of full bloom, while results were mixed from the sequential chill-growth model. To further investigate how the sequential chill-growth model reacts under climate perturbed conditions, four simulations were created to represent a wider range of species physiological requirements. These were applied to five Australian locations covering varied climates. Lengthening of the chill period and contraction of the growth period was common to most results. The relative dominance of the chill or growth component tended to predict whether full bloom advanced, remained similar or was delayed with climate warming. The simplistic structure of the fixed thermal time model and the exclusion of winter chill conditions in this method indicate it is unlikely to be suitable for projection analyses. The sequential chill-growth model includes greater complexity; however, reservations in using this model for impact analyses remain. The results demonstrate that appropriate representation of physiological processes is essential to adequately predict changes to full bloom under climate perturbed conditions with greater model development needed.
Predicting Protein-protein Association Rates using Coarse-grained Simulation and Machine Learning

NASA Astrophysics Data System (ADS)

Xie, Zhong-Ru; Chen, Jiawen; Wu, Yinghao

2017-04-01

Protein-protein interactions dominate all major biological processes in living cells. We have developed a new Monte Carlo-based simulation algorithm to study the kinetic process of protein association. We tested our method on a previously used large benchmark set of 49 protein complexes. The predicted rate was overestimated in the benchmark test compared to the experimental results for a group of protein complexes. We hypothesized that this resulted from molecular flexibility at the interface regions of the interacting proteins. After applying a machine learning algorithm with input variables that accounted for both the conformational flexibility and the energetic factor of binding, we successfully identified most of the protein complexes with overestimated association rates and improved our final prediction by using a cross-validation test. This method was then applied to a new independent test set and resulted in a similar prediction accuracy to that obtained using the training set. It has been thought that diffusion-limited protein association is dominated by long-range interactions. Our results provide strong evidence that the conformational flexibility also plays an important role in regulating protein association. Our studies provide new insights into the mechanism of protein association and offer a computationally efficient tool for predicting its rate.
Predicting Protein–protein Association Rates using Coarse-grained Simulation and Machine Learning

PubMed Central

Xie, Zhong-Ru; Chen, Jiawen; Wu, Yinghao

2017-01-01

Protein–protein interactions dominate all major biological processes in living cells. We have developed a new Monte Carlo-based simulation algorithm to study the kinetic process of protein association. We tested our method on a previously used large benchmark set of 49 protein complexes. The predicted rate was overestimated in the benchmark test compared to the experimental results for a group of protein complexes. We hypothesized that this resulted from molecular flexibility at the interface regions of the interacting proteins. After applying a machine learning algorithm with input variables that accounted for both the conformational flexibility and the energetic factor of binding, we successfully identified most of the protein complexes with overestimated association rates and improved our final prediction by using a cross-validation test. This method was then applied to a new independent test set and resulted in a similar prediction accuracy to that obtained using the training set. It has been thought that diffusion-limited protein association is dominated by long-range interactions. Our results provide strong evidence that the conformational flexibility also plays an important role in regulating protein association. Our studies provide new insights into the mechanism of protein association and offer a computationally efficient tool for predicting its rate. PMID:28418043
Predicting Protein-protein Association Rates using Coarse-grained Simulation and Machine Learning.

PubMed

Xie, Zhong-Ru; Chen, Jiawen; Wu, Yinghao

2017-04-18

Protein-protein interactions dominate all major biological processes in living cells. We have developed a new Monte Carlo-based simulation algorithm to study the kinetic process of protein association. We tested our method on a previously used large benchmark set of 49 protein complexes. The predicted rate was overestimated in the benchmark test compared to the experimental results for a group of protein complexes. We hypothesized that this resulted from molecular flexibility at the interface regions of the interacting proteins. After applying a machine learning algorithm with input variables that accounted for both the conformational flexibility and the energetic factor of binding, we successfully identified most of the protein complexes with overestimated association rates and improved our final prediction by using a cross-validation test. This method was then applied to a new independent test set and resulted in a similar prediction accuracy to that obtained using the training set. It has been thought that diffusion-limited protein association is dominated by long-range interactions. Our results provide strong evidence that the conformational flexibility also plays an important role in regulating protein association. Our studies provide new insights into the mechanism of protein association and offer a computationally efficient tool for predicting its rate.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Middleton, Sarah A.; Illuminati, Joseph; Kim, Junhyong

Recognition of protein structural fold is the starting point for many structure prediction tools and protein function inference. Fold prediction is computationally demanding and recognizing novel folds is difficult such that the majority of proteins have not been annotated for fold classification. Here we describe a new machine learning approach using a novel feature space that can be used for accurate recognition of all 1,221 currently known folds and inference of unknown novel folds. We show that our method achieves better than 94% accuracy even when many folds have only one training example. We demonstrate the utility of this methodmore » by predicting the folds of 34,330 human protein domains and showing that these predictions can yield useful insights into potential biological function, such as prediction of RNA-binding ability. Finally, our method can be applied to de novo fold prediction of entire proteomes and identify candidate novel fold families.« less
An inter-comparison of similarity-based methods for organisation and classification of groundwater hydrographs

NASA Astrophysics Data System (ADS)

Haaf, Ezra; Barthel, Roland

2018-04-01

Classification and similarity based methods, which have recently received major attention in the field of surface water hydrology, namely through the PUB (prediction in ungauged basins) initiative, have not yet been applied to groundwater systems. However, it can be hypothesised, that the principle of "similar systems responding similarly to similar forcing" applies in subsurface hydrology as well. One fundamental prerequisite to test this hypothesis and eventually to apply the principle to make "predictions for ungauged groundwater systems" is efficient methods to quantify the similarity of groundwater system responses, i.e. groundwater hydrographs. In this study, a large, spatially extensive, as well as geologically and geomorphologically diverse dataset from Southern Germany and Western Austria was used, to test and compare a set of 32 grouping methods, which have previously only been used individually in local-scale studies. The resulting groupings are compared to a heuristic visual classification, which serves as a baseline. A performance ranking of these classification methods is carried out and differences in homogeneity of grouping results were shown, whereby selected groups were related to hydrogeological indices and geological descriptors. This exploratory empirical study shows that the choice of grouping method has a large impact on the object distribution within groups, as well as on the homogeneity of patterns captured in groups. The study provides a comprehensive overview of a large number of grouping methods, which can guide researchers when attempting similarity-based groundwater hydrograph classification.
Dynamic Loads Generation for Multi-Point Vibration Excitation Problems

NASA Technical Reports Server (NTRS)

Shen, Lawrence

2011-01-01

A random-force method has been developed to predict dynamic loads produced by rocket-engine random vibrations for new rocket-engine designs. The method develops random forces at multiple excitation points based on random vibration environments scaled from accelerometer data obtained during hot-fire tests of existing rocket engines. This random-force method applies random forces to the model and creates expected dynamic response in a manner that simulates the way the operating engine applies self-generated random vibration forces (random pressure acting on an area) with the resulting responses that we measure with accelerometers. This innovation includes the methodology (implementation sequence), the computer code, two methods to generate the random-force vibration spectra, and two methods to reduce some of the inherent conservatism in the dynamic loads. This methodology would be implemented to generate the random-force spectra at excitation nodes without requiring the use of artificial boundary conditions in a finite element model. More accurate random dynamic loads than those predicted by current industry methods can then be generated using the random force spectra. The scaling method used to develop the initial power spectral density (PSD) environments for deriving the random forces for the rocket engine case is based on the Barrett Criteria developed at Marshall Space Flight Center in 1963. This invention approach can be applied in the aerospace, automotive, and other industries to obtain reliable dynamic loads and responses from a finite element model for any structure subject to multipoint random vibration excitations.
How to regress and predict in a Bland-Altman plot? Review and contribution based on tolerance intervals and correlated-errors-in-variables models.

PubMed

Francq, Bernard G; Govaerts, Bernadette

2016-06-30

Two main methodologies for assessing equivalence in method-comparison studies are presented separately in the literature. The first one is the well-known and widely applied Bland-Altman approach with its agreement intervals, where two methods are considered interchangeable if their differences are not clinically significant. The second approach is based on errors-in-variables regression in a classical (X,Y) plot and focuses on confidence intervals, whereby two methods are considered equivalent when providing similar measures notwithstanding the random measurement errors. This paper reconciles these two methodologies and shows their similarities and differences using both real data and simulations. A new consistent correlated-errors-in-variables regression is introduced as the errors are shown to be correlated in the Bland-Altman plot. Indeed, the coverage probabilities collapse and the biases soar when this correlation is ignored. Novel tolerance intervals are compared with agreement intervals with or without replicated data, and novel predictive intervals are introduced to predict a single measure in an (X,Y) plot or in a Bland-Atman plot with excellent coverage probabilities. We conclude that the (correlated)-errors-in-variables regressions should not be avoided in method comparison studies, although the Bland-Altman approach is usually applied to avert their complexity. We argue that tolerance or predictive intervals are better alternatives than agreement intervals, and we provide guidelines for practitioners regarding method comparison studies. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Harmonize input selection for sediment transport prediction

NASA Astrophysics Data System (ADS)

Afan, Haitham Abdulmohsin; Keshtegar, Behrooz; Mohtar, Wan Hanna Melini Wan; El-Shafie, Ahmed

2017-09-01

In this paper, three modeling approaches using a Neural Network (NN), Response Surface Method (RSM) and response surface method basis Global Harmony Search (GHS) are applied to predict the daily time series suspended sediment load. Generally, the input variables for forecasting the suspended sediment load are manually selected based on the maximum correlations of input variables in the modeling approaches based on NN and RSM. The RSM is improved to select the input variables by using the errors terms of training data based on the GHS, namely as response surface method and global harmony search (RSM-GHS) modeling method. The second-order polynomial function with cross terms is applied to calibrate the time series suspended sediment load with three, four and five input variables in the proposed RSM-GHS. The linear, square and cross corrections of twenty input variables of antecedent values of suspended sediment load and water discharge are investigated to achieve the best predictions of the RSM based on the GHS method. The performances of the NN, RSM and proposed RSM-GHS including both accuracy and simplicity are compared through several comparative predicted and error statistics. The results illustrated that the proposed RSM-GHS is as uncomplicated as the RSM but performed better, where fewer errors and better correlation was observed (R = 0.95, MAE = 18.09 (ton/day), RMSE = 25.16 (ton/day)) compared to the ANN (R = 0.91, MAE = 20.17 (ton/day), RMSE = 33.09 (ton/day)) and RSM (R = 0.91, MAE = 20.06 (ton/day), RMSE = 31.92 (ton/day)) for all types of input variables.
Algorithm for predicting the evolution of series of dynamics of complex systems in solving information problems

NASA Astrophysics Data System (ADS)

Kasatkina, T. I.; Dushkin, A. V.; Pavlov, V. A.; Shatovkin, R. R.

2018-03-01

In the development of information, systems and programming to predict the series of dynamics, neural network methods have recently been applied. They are more flexible, in comparison with existing analogues and are capable of taking into account the nonlinearities of the series. In this paper, we propose a modified algorithm for predicting the series of dynamics, which includes a method for training neural networks, an approach to describing and presenting input data, based on the prediction by the multilayer perceptron method. To construct a neural network, the values of a series of dynamics at the extremum points and time values corresponding to them, formed based on the sliding window method, are used as input data. The proposed algorithm can act as an independent approach to predicting the series of dynamics, and be one of the parts of the forecasting system. The efficiency of predicting the evolution of the dynamics series for a short-term one-step and long-term multi-step forecast by the classical multilayer perceptron method and a modified algorithm using synthetic and real data is compared. The result of this modification was the minimization of the magnitude of the iterative error that arises from the previously predicted inputs to the inputs to the neural network, as well as the increase in the accuracy of the iterative prediction of the neural network.
PAUSE: Predictive Analytics Using SPARQL-Endpoints

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sukumar, Sreenivas R; Ainsworth, Keela; Bond, Nathaniel

2014-07-11

This invention relates to the medical industry and more specifically to methods of predicting risks. With the impetus towards personalized and evidence-based medicine, the need for a framework to analyze/interpret quantitative measurements (blood work, toxicology, etc.) with qualitative descriptions (specialist reports after reading images, bio-medical knowledgebase, etc.) to predict diagnostic risks is fast emerging. We describe a software solution that leverages hardware for scalable in-memory analytics and applies next-generation semantic query tools on medical data.
Cell fate reprogramming by control of intracellular network dynamics

NASA Astrophysics Data System (ADS)

Zanudo, Jorge G. T.; Albert, Reka

Identifying control strategies for biological networks is paramount for practical applications that involve reprogramming a cell's fate, such as disease therapeutics and stem cell reprogramming. Although the topic of controlling the dynamics of a system has a long history in control theory, most of this work is not directly applicable to intracellular networks. Here we present a network control method that integrates the structural and functional information available for intracellular networks to predict control targets. Formulated in a logical dynamic scheme, our control method takes advantage of certain function-dependent network components and their relation to steady states in order to identify control targets, which are guaranteed to drive any initial state to the target state with 100% effectiveness and need to be applied only transiently for the system to reach and stay in the desired state. We illustrate our method's potential to find intervention targets for cancer treatment and cell differentiation by applying it to a leukemia signaling network and to the network controlling the differentiation of T cells. We find that the predicted control targets are effective in a broad dynamic framework. Moreover, several of the predicted interventions are supported by experiments. This work was supported by NSF Grant PHY 1205840.
Prediction of thermal coagulation from the instantaneous strain distribution induced by high-intensity focused ultrasound

NASA Astrophysics Data System (ADS)

Iwasaki, Ryosuke; Takagi, Ryo; Tomiyasu, Kentaro; Yoshizawa, Shin; Umemura, Shin-ichiro

2017-07-01

The targeting of the ultrasound beam and the prediction of thermal lesion formation in advance are the requirements for monitoring high-intensity focused ultrasound (HIFU) treatment with safety and reproducibility. To visualize the HIFU focal zone, we utilized an acoustic radiation force impulse (ARFI) imaging-based method. After inducing displacements inside tissues with pulsed HIFU called the push pulse exposure, the distribution of axial displacements started expanding and moving. To acquire RF data immediately after and during the HIFU push pulse exposure to improve prediction accuracy, we attempted methods using extrapolation estimation and applying HIFU noise elimination. The distributions going back in the time domain from the end of push pulse exposure are in good agreement with tissue coagulation at the center. The results suggest that the proposed focal zone visualization employing pulsed HIFU entailing the high-speed ARFI imaging method is useful for the prediction of thermal coagulation in advance.
An Efficient Scheme for Crystal Structure Prediction Based on Structural Motifs

DOE PAGES

Zhu, Zizhong; Wu, Ping; Wu, Shunqing; ...

2017-05-15

An efficient scheme based on structural motifs is proposed for the crystal structure prediction of materials. The key advantage of the present method comes in two fold: first, the degrees of freedom of the system are greatly reduced, since each structural motif, regardless of its size, can always be described by a set of parameters (R, θ, φ) with five degrees of freedom; second, the motifs could always appear in the predicted structures when the energies of the structures are relatively low. Both features make the present scheme a very efficient method for predicting desired materials. The method has beenmore » applied to the case of LiFePO 4, an important cathode material for lithium-ion batteries. Numerous new structures of LiFePO 4 have been found, compared to those currently available, available, demonstrating the reliability of the present methodology and illustrating the promise of the concept of structural motifs.« less
An Efficient Scheme for Crystal Structure Prediction Based on Structural Motifs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhu, Zizhong; Wu, Ping; Wu, Shunqing

An efficient scheme based on structural motifs is proposed for the crystal structure prediction of materials. The key advantage of the present method comes in two fold: first, the degrees of freedom of the system are greatly reduced, since each structural motif, regardless of its size, can always be described by a set of parameters (R, θ, φ) with five degrees of freedom; second, the motifs could always appear in the predicted structures when the energies of the structures are relatively low. Both features make the present scheme a very efficient method for predicting desired materials. The method has beenmore » applied to the case of LiFePO 4, an important cathode material for lithium-ion batteries. Numerous new structures of LiFePO 4 have been found, compared to those currently available, available, demonstrating the reliability of the present methodology and illustrating the promise of the concept of structural motifs.« less
Cox-nnet: An artificial neural network method for prognosis prediction of high-throughput omics data.

PubMed

Ching, Travers; Zhu, Xun; Garmire, Lana X

2018-04-01

Artificial neural networks (ANN) are computing architectures with many interconnections of simple neural-inspired computing elements, and have been applied to biomedical fields such as imaging analysis and diagnosis. We have developed a new ANN framework called Cox-nnet to predict patient prognosis from high throughput transcriptomics data. In 10 TCGA RNA-Seq data sets, Cox-nnet achieves the same or better predictive accuracy compared to other methods, including Cox-proportional hazards regression (with LASSO, ridge, and mimimax concave penalty), Random Forests Survival and CoxBoost. Cox-nnet also reveals richer biological information, at both the pathway and gene levels. The outputs from the hidden layer node provide an alternative approach for survival-sensitive dimension reduction. In summary, we have developed a new method for accurate and efficient prognosis prediction on high throughput data, with functional biological insights. The source code is freely available at https://github.com/lanagarmire/cox-nnet.
Blind predictions of protein interfaces by docking calculations in CAPRI.

PubMed

Lensink, Marc F; Wodak, Shoshana J

2010-11-15

Reliable prediction of the amino acid residues involved in protein-protein interfaces can provide valuable insight into protein function, and inform mutagenesis studies, and drug design applications. A fast-growing number of methods are being proposed for predicting protein interfaces, using structural information, energetic criteria, or sequence conservation or by integrating multiple criteria and approaches. Overall however, their performance remains limited, especially when applied to nonobligate protein complexes, where the individual components are also stable on their own. Here, we evaluate interface predictions derived from protein-protein docking calculations. To this end we measure the overlap between the interfaces in models of protein complexes submitted by 76 participants in CAPRI (Critical Assessment of Predicted Interactions) and those of 46 observed interfaces in 20 CAPRI targets corresponding to nonobligate complexes. Our evaluation considers multiple models for each target interface, submitted by different participants, using a variety of docking methods. Although this results in a substantial variability in the prediction performance across participants and targets, clear trends emerge. Docking methods that perform best in our evaluation predict interfaces with average recall and precision levels of about 60%, for a small majority (60%) of the analyzed interfaces. These levels are significantly higher than those obtained for nonobligate complexes by most extant interface prediction methods. We find furthermore that a sizable fraction (24%) of the interfaces in models ranked as incorrect in the CAPRI assessment are actually correctly predicted (recall and precision ≥50%), and that these models contribute to 70% of the correct docking-based interface predictions overall. Our analysis proves that docking methods are much more successful in identifying interfaces than in predicting complexes, and suggests that these methods have an excellent potential of addressing the interface prediction challenge. © 2010 Wiley-Liss, Inc.
Improving Prediction Accuracy for WSN Data Reduction by Applying Multivariate Spatio-Temporal Correlation

PubMed Central

Carvalho, Carlos; Gomes, Danielo G.; Agoulmine, Nazim; de Souza, José Neuman

2011-01-01

This paper proposes a method based on multivariate spatial and temporal correlation to improve prediction accuracy in data reduction for Wireless Sensor Networks (WSN). Prediction of data not sent to the sink node is a technique used to save energy in WSNs by reducing the amount of data traffic. However, it may not be very accurate. Simulations were made involving simple linear regression and multiple linear regression functions to assess the performance of the proposed method. The results show a higher correlation between gathered inputs when compared to time, which is an independent variable widely used for prediction and forecasting. Prediction accuracy is lower when simple linear regression is used, whereas multiple linear regression is the most accurate one. In addition to that, our proposal outperforms some current solutions by about 50% in humidity prediction and 21% in light prediction. To the best of our knowledge, we believe that we are probably the first to address prediction based on multivariate correlation for WSN data reduction. PMID:22346626
Predicting who will drop out of nursing courses: a machine learning exercise.

PubMed

Moseley, Laurence G; Mead, Donna M

2008-05-01

The concepts of causation and prediction are different, and have different implications for practice. This distinction is applied here to studies of the problem of student attrition (although it is more widely applicable). Studies of attrition from nursing courses have tended to concentrate on causation, trying, largely unsuccessfully, to elicit what causes drop out. However, the problem may more fruitfully be cast in terms of predicting who is likely to drop out. One powerful method for attempting to make predictions is rule induction. This paper reports the use of the Answer Tree package from SPSS for that purpose. The main data set consisted of 3978 records on 528 nursing students, split into a training set and a test set. The source was standard university student records. The method obtained 84% sensitivity, 70% specificity, and 94% accuracy on previously unseen cases. The method requires large amounts of high quality data. When such data are available, rule induction offers a way to reduce attrition. It would be desirable to compare its results with those of predictions made by tutors using more informal conventional methods.

Characterization of the Dynamics of Climate Systems and Identification of Missing Mechanisms Impacting the Long Term Predictive Capabilities of Global Climate Models Utilizing Dynamical Systems Approaches to the Analysis of Observed and Modeled Climate

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bhatt, Uma S.; Wackerbauer, Renate; Polyakov, Igor V.

The goal of this research was to apply fractional and non-linear analysis techniques in order to develop a more complete characterization of climate change and variability for the oceanic, sea ice and atmospheric components of the Earth System. This research applied two measures of dynamical characteristics of time series, the R/S method of calculating the Hurst exponent and Renyi entropy, to observational and modeled climate data in order to evaluate how well climate models capture the long-term dynamics evident in observations. Fractional diffusion analysis was applied to ARGO ocean buoy data to quantify ocean transport. Self organized maps were appliedmore » to North Pacific sea level pressure and analyzed in ways to improve seasonal predictability for Alaska fire weather. This body of research shows that these methods can be used to evaluate climate models and shed light on climate mechanisms (i.e., understanding why something happens). With further research, these methods show promise for improving seasonal to longer time scale forecasts of climate.« less
Automatic peak selection by a Benjamini-Hochberg-based algorithm.

PubMed

Abbas, Ahmed; Kong, Xin-Bing; Liu, Zhi; Jing, Bing-Yi; Gao, Xin

2013-01-01

A common issue in bioinformatics is that computational methods often generate a large number of predictions sorted according to certain confidence scores. A key problem is then determining how many predictions must be selected to include most of the true predictions while maintaining reasonably high precision. In nuclear magnetic resonance (NMR)-based protein structure determination, for instance, computational peak picking methods are becoming more and more common, although expert-knowledge remains the method of choice to determine how many peaks among thousands of candidate peaks should be taken into consideration to capture the true peaks. Here, we propose a Benjamini-Hochberg (B-H)-based approach that automatically selects the number of peaks. We formulate the peak selection problem as a multiple testing problem. Given a candidate peak list sorted by either volumes or intensities, we first convert the peaks into [Formula: see text]-values and then apply the B-H-based algorithm to automatically select the number of peaks. The proposed approach is tested on the state-of-the-art peak picking methods, including WaVPeak [1] and PICKY [2]. Compared with the traditional fixed number-based approach, our approach returns significantly more true peaks. For instance, by combining WaVPeak or PICKY with the proposed method, the missing peak rates are on average reduced by 20% and 26%, respectively, in a benchmark set of 32 spectra extracted from eight proteins. The consensus of the B-H-selected peaks from both WaVPeak and PICKY achieves 88% recall and 83% precision, which significantly outperforms each individual method and the consensus method without using the B-H algorithm. The proposed method can be used as a standard procedure for any peak picking method and straightforwardly applied to some other prediction selection problems in bioinformatics. The source code, documentation and example data of the proposed method is available at http://sfb.kaust.edu.sa/pages/software.aspx.
Automatic Peak Selection by a Benjamini-Hochberg-Based Algorithm

PubMed Central

Abbas, Ahmed; Kong, Xin-Bing; Liu, Zhi; Jing, Bing-Yi; Gao, Xin

2013-01-01

A common issue in bioinformatics is that computational methods often generate a large number of predictions sorted according to certain confidence scores. A key problem is then determining how many predictions must be selected to include most of the true predictions while maintaining reasonably high precision. In nuclear magnetic resonance (NMR)-based protein structure determination, for instance, computational peak picking methods are becoming more and more common, although expert-knowledge remains the method of choice to determine how many peaks among thousands of candidate peaks should be taken into consideration to capture the true peaks. Here, we propose a Benjamini-Hochberg (B-H)-based approach that automatically selects the number of peaks. We formulate the peak selection problem as a multiple testing problem. Given a candidate peak list sorted by either volumes or intensities, we first convert the peaks into -values and then apply the B-H-based algorithm to automatically select the number of peaks. The proposed approach is tested on the state-of-the-art peak picking methods, including WaVPeak [1] and PICKY [2]. Compared with the traditional fixed number-based approach, our approach returns significantly more true peaks. For instance, by combining WaVPeak or PICKY with the proposed method, the missing peak rates are on average reduced by 20% and 26%, respectively, in a benchmark set of 32 spectra extracted from eight proteins. The consensus of the B-H-selected peaks from both WaVPeak and PICKY achieves 88% recall and 83% precision, which significantly outperforms each individual method and the consensus method without using the B-H algorithm. The proposed method can be used as a standard procedure for any peak picking method and straightforwardly applied to some other prediction selection problems in bioinformatics. The source code, documentation and example data of the proposed method is available at http://sfb.kaust.edu.sa/pages/software.aspx. PMID:23308147
The transfer function method for gear system dynamics applied to conventional and minimum excitation gearing designs

NASA Technical Reports Server (NTRS)

Mark, W. D.

1982-01-01

A transfer function method for predicting the dynamic responses of gear systems with more than one gear mesh is developed and applied to the NASA Lewis four-square gear fatigue test apparatus. Methods for computing bearing-support force spectra and temporal histories of the total force transmitted by a gear mesh, the force transmitted by a single pair of teeth, and the maximum root stress in a single tooth are developed. Dynamic effects arising from other gear meshes in the system are included. A profile modification design method to minimize the vibration excitation arising from a pair of meshing gears is reviewed and extended. Families of tooth loading functions required for such designs are developed and examined for potential excitation of individual tooth vibrations. The profile modification design method is applied to a pair of test gears.
Assessing the impact of land use change on hydrology by ensemble modeling (LUCHEM) III: Scenario analysis

USGS Publications Warehouse

Huisman, J.A.; Breuer, L.; Bormann, H.; Bronstert, A.; Croke, B.F.W.; Frede, H.-G.; Graff, T.; Hubrechts, L.; Jakeman, A.J.; Kite, G.; Lanini, J.; Leavesley, G.; Lettenmaier, D.P.; Lindstrom, G.; Seibert, J.; Sivapalan, M.; Viney, N.R.; Willems, P.

2009-01-01

An ensemble of 10 hydrological models was applied to the same set of land use change scenarios. There was general agreement about the direction of changes in the mean annual discharge and 90% discharge percentile predicted by the ensemble members, although a considerable range in the magnitude of predictions for the scenarios and catchments under consideration was obvious. Differences in the magnitude of the increase were attributed to the different mean annual actual evapotranspiration rates for each land use type. The ensemble of model runs was further analyzed with deterministic and probabilistic ensemble methods. The deterministic ensemble method based on a trimmed mean resulted in a single somewhat more reliable scenario prediction. The probabilistic reliability ensemble averaging (REA) method allowed a quantification of the model structure uncertainty in the scenario predictions. It was concluded that the use of a model ensemble has greatly increased our confidence in the reliability of the model predictions. ?? 2008 Elsevier Ltd.
Strain intensity factor approach for predicting the strength of continuously reinforced metal matrix composites

NASA Technical Reports Server (NTRS)

Poe, Clarence C., Jr.

1989-01-01

A method was previously developed to predict the fracture toughness (stress intensity factor at failure) of composites in terms of the elastic constants and the tensile failing strain of the fibers. The method was applied to boron/aluminum composites made with various proportions of 0 deg and +/- 45 deg plies. Predicted values of fracture toughness were in gross error because widespread yielding of the aluminum matrix made the compliance very nonlinear. An alternate method was develolped to predict the strain intensity factor at failure rather than the stress intensity factor because the singular strain field was not affected by yielding as much as the stress field. Far-field strains at failure were calculated from the strain intensity factor, and then strengths were calculated from the far-field strains using uniaxial stress-strain curves. The predicted strengths were in good agreement with experimental values, even for the very nonlinear laminates that contained only +/- 45 deg plies. This approach should be valid for other metal matrix composites that have continuous fibers.
Assessment of modification factors for a row of bolts or timber connectors

Treesearch

Thomas Lee Wilkinson

1980-01-01

When bolts or timber connectors are used in a row, with load applied parallel to the row, load will be unequally distributed among the fasteners. This study assessed methods of predicting this unequal load distribution, looked at how joint variables can affect the distribution, and compared the predictions with data existing in the literature. Presently used design...
Why did the bear cross the road? Comparing the performance of multiple resistance surfaces and connectivity modeling methods

Treesearch

Samuel A. Cushman; Jesse S. Lewis; Erin L. Landguth

2014-01-01

There have been few assessments of the performance of alternative resistance surfaces, and little is known about how connectivity modeling approaches differ in their ability to predict organism movements. In this paper, we evaluate the performance of four connectivity modeling approaches applied to two resistance surfaces in predicting the locations of highway...
Watershed-scale evaluation of the Water Erosion Prediction Project (WEPP) model in the Lake Tahoe basin

Treesearch

Erin S. Brooks; Mariana Dobre; William J. Elliot; Joan Q. Wu; Jan Boll

2016-01-01

Forest managers need methods to evaluate the impacts of management at the watershed scale. The Water Erosion Prediction Project (WEPP) has the ability to model disturbed forested hillslopes, but has difficulty addressing some of the critical processes that are important at a watershed scale, including baseflow and water yield. In order to apply WEPP to...
The Prediction of the Students' Academic Underachievement in Mathematics Using the DEA Model: A Developing Country Case Study

ERIC Educational Resources Information Center

Moradi, Fatemeh; Amiripour, Parvaneh

2017-01-01

In this study, an attempt was made to predict the students' mathematical academic underachievement at the Islamic Azad University-Yadegare-Imam branch and the appropriate strategies in mathematical academic achievement to be applied using the Data Envelopment Analysis (DEA) model. Survey research methods were used to select 91 students from the…
Predicting the Extension of Biomedical Ontologies

PubMed Central

Pesquita, Catia; Couto, Francisco M.

2012-01-01

Developing and extending a biomedical ontology is a very demanding task that can never be considered complete given our ever-evolving understanding of the life sciences. Extension in particular can benefit from the automation of some of its steps, thus releasing experts to focus on harder tasks. Here we present a strategy to support the automation of change capturing within ontology extension where the need for new concepts or relations is identified. Our strategy is based on predicting areas of an ontology that will undergo extension in a future version by applying supervised learning over features of previous ontology versions. We used the Gene Ontology as our test bed and obtained encouraging results with average f-measure reaching 0.79 for a subset of biological process terms. Our strategy was also able to outperform state of the art change capturing methods. In addition we have identified several issues concerning prediction of ontology evolution, and have delineated a general framework for ontology extension prediction. Our strategy can be applied to any biomedical ontology with versioning, to help focus either manual or semi-automated extension methods on areas of the ontology that need extension. PMID:23028267
Prediction of beta-turns at over 80% accuracy based on an ensemble of predicted secondary structures and multiple alignments.

PubMed

Zheng, Ce; Kurgan, Lukasz

2008-10-10

beta-turn is a secondary protein structure type that plays significant role in protein folding, stability, and molecular recognition. To date, several methods for prediction of beta-turns from protein sequences were developed, but they are characterized by relatively poor prediction quality. The novelty of the proposed sequence-based beta-turn predictor stems from the usage of a window based information extracted from four predicted three-state secondary structures, which together with a selected set of position specific scoring matrix (PSSM) values serve as an input to the support vector machine (SVM) predictor. We show that (1) all four predicted secondary structures are useful; (2) the most useful information extracted from the predicted secondary structure includes the structure of the predicted residue, secondary structure content in a window around the predicted residue, and features that indicate whether the predicted residue is inside a secondary structure segment; (3) the PSSM values of Asn, Asp, Gly, Ile, Leu, Met, Pro, and Val were among the top ranked features, which corroborates with recent studies. The Asn, Asp, Gly, and Pro indicate potential beta-turns, while the remaining four amino acids are useful to predict non-beta-turns. Empirical evaluation using three nonredundant datasets shows favorable Q total, Q predicted and MCC values when compared with over a dozen of modern competing methods. Our method is the first to break the 80% Q total barrier and achieves Q total = 80.9%, MCC = 0.47, and Q predicted higher by over 6% when compared with the second best method. We use feature selection to reduce the dimensionality of the feature vector used as the input for the proposed prediction method. The applied feature set is smaller by 86, 62 and 37% when compared with the second and two third-best (with respect to MCC) competing methods, respectively. Experiments show that the proposed method constitutes an improvement over the competing prediction methods. The proposed prediction model can better discriminate between beta-turns and non-beta-turns due to obtaining lower numbers of false positive predictions. The prediction model and datasets are freely available at http://biomine.ece.ualberta.ca/BTNpred/BTNpred.html.
Neuro-symbolic representation learning on biological knowledge graphs.

PubMed

Alshahrani, Mona; Khan, Mohammad Asif; Maddouri, Omar; Kinjo, Akira R; Queralt-Rosinach, Núria; Hoehndorf, Robert

2017-09-01

Biological data and knowledge bases increasingly rely on Semantic Web technologies and the use of knowledge graphs for data integration, retrieval and federated queries. In the past years, feature learning methods that are applicable to graph-structured data are becoming available, but have not yet widely been applied and evaluated on structured biological knowledge. Results: We develop a novel method for feature learning on biological knowledge graphs. Our method combines symbolic methods, in particular knowledge representation using symbolic logic and automated reasoning, with neural networks to generate embeddings of nodes that encode for related information within knowledge graphs. Through the use of symbolic logic, these embeddings contain both explicit and implicit information. We apply these embeddings to the prediction of edges in the knowledge graph representing problems of function prediction, finding candidate genes of diseases, protein-protein interactions, or drug target relations, and demonstrate performance that matches and sometimes outperforms traditional approaches based on manually crafted features. Our method can be applied to any biological knowledge graph, and will thereby open up the increasing amount of Semantic Web based knowledge bases in biology to use in machine learning and data analytics. https://github.com/bio-ontology-research-group/walking-rdf-and-owl. robert.hoehndorf@kaust.edu.sa. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
Cost-of-illness studies based on massive data: a prevalence-based, top-down regression approach.

PubMed

Stollenwerk, Björn; Welchowski, Thomas; Vogl, Matthias; Stock, Stephanie

2016-04-01

Despite the increasing availability of routine data, no analysis method has yet been presented for cost-of-illness (COI) studies based on massive data. We aim, first, to present such a method and, second, to assess the relevance of the associated gain in numerical efficiency. We propose a prevalence-based, top-down regression approach consisting of five steps: aggregating the data; fitting a generalized additive model (GAM); predicting costs via the fitted GAM; comparing predicted costs between prevalent and non-prevalent subjects; and quantifying the stochastic uncertainty via error propagation. To demonstrate the method, it was applied to aggregated data in the context of chronic lung disease to German sickness funds data (from 1999), covering over 7.3 million insured. To assess the gain in numerical efficiency, the computational time of the innovative approach has been compared with corresponding GAMs applied to simulated individual-level data. Furthermore, the probability of model failure was modeled via logistic regression. Applying the innovative method was reasonably fast (19 min). In contrast, regarding patient-level data, computational time increased disproportionately by sample size. Furthermore, using patient-level data was accompanied by a substantial risk of model failure (about 80 % for 6 million subjects). The gain in computational efficiency of the innovative COI method seems to be of practical relevance. Furthermore, it may yield more precise cost estimates.
Identifying the binding mode of a molecular scaffold

NASA Astrophysics Data System (ADS)

Chema, Doron; Eren, Doron; Yayon, Avner; Goldblum, Amiram; Zaliani, Andrea

2004-01-01

We describe a method for docking of a scaffold-based series and present its advantages over docking of individual ligands, for determining the binding mode of a molecular scaffold in a binding site. The method has been applied to eight different scaffolds of protein kinase inhibitors (PKI). A single analog of each of these eight scaffolds was previously crystallized with different protein kinases. We have used FlexX to dock a set of molecules that share the same scaffold, rather than docking a single molecule. The main mode of binding is determined by the mode of binding of the largest cluster among the docked molecules that share a scaffold. Clustering is based on our `nearest single neighbor' method [J. Chem. Inf. Comput. Sci., 43 (2003) 208-217]. Additional criteria are applied in those cases in which more than one significant binding mode is found. Using the proposed method, most of the crystallographic binding modes of these scaffolds were reconstructed. Alternative modes, that have not been detected yet by experiments, could also be identified. The method was applied to predict the binding mode of an additional molecular scaffold that was not yet reported and the predicted binding mode has been found to be very similar to experimental results for a closely related scaffold. We suggest that this approach be used as a virtual screening tool for scaffold-based design processes.
MetaDP: a comprehensive web server for disease prediction of 16S rRNA metagenomic datasets.

PubMed

Xu, Xilin; Wu, Aiping; Zhang, Xinlei; Su, Mingming; Jiang, Taijiao; Yuan, Zhe-Ming

2016-01-01

High-throughput sequencing-based metagenomics has garnered considerable interest in recent years. Numerous methods and tools have been developed for the analysis of metagenomic data. However, it is still a daunting task to install a large number of tools and complete a complicated analysis, especially for researchers with minimal bioinformatics backgrounds. To address this problem, we constructed an automated software named MetaDP for 16S rRNA sequencing data analysis, including data quality control, operational taxonomic unit clustering, diversity analysis, and disease risk prediction modeling. Furthermore, a support vector machine-based prediction model for intestinal bowel syndrome (IBS) was built by applying MetaDP to microbial 16S sequencing data from 108 children. The success of the IBS prediction model suggests that the platform may also be applied to other diseases related to gut microbes, such as obesity, metabolic syndrome, or intestinal cancer, among others (http://metadp.cn:7001/).
Machine learning classification with confidence: application of transductive conformal predictors to MRI-based diagnostic and prognostic markers in depression.

PubMed

Nouretdinov, Ilia; Costafreda, Sergi G; Gammerman, Alexander; Chervonenkis, Alexey; Vovk, Vladimir; Vapnik, Vladimir; Fu, Cynthia H Y

2011-05-15

There is rapidly accumulating evidence that the application of machine learning classification to neuroimaging measurements may be valuable for the development of diagnostic and prognostic prediction tools in psychiatry. However, current methods do not produce a measure of the reliability of the predictions. Knowing the risk of the error associated with a given prediction is essential for the development of neuroimaging-based clinical tools. We propose a general probabilistic classification method to produce measures of confidence for magnetic resonance imaging (MRI) data. We describe the application of transductive conformal predictor (TCP) to MRI images. TCP generates the most likely prediction and a valid measure of confidence, as well as the set of all possible predictions for a given confidence level. We present the theoretical motivation for TCP, and we have applied TCP to structural and functional MRI data in patients and healthy controls to investigate diagnostic and prognostic prediction in depression. We verify that TCP predictions are as accurate as those obtained with more standard machine learning methods, such as support vector machine, while providing the additional benefit of a valid measure of confidence for each prediction. Copyright © 2010 Elsevier Inc. All rights reserved.
A Sensor Dynamic Measurement Error Prediction Model Based on NAPSO-SVM.

PubMed

Jiang, Minlan; Jiang, Lan; Jiang, Dingde; Li, Fei; Song, Houbing

2018-01-15

Dynamic measurement error correction is an effective way to improve sensor precision. Dynamic measurement error prediction is an important part of error correction, and support vector machine (SVM) is often used for predicting the dynamic measurement errors of sensors. Traditionally, the SVM parameters were always set manually, which cannot ensure the model's performance. In this paper, a SVM method based on an improved particle swarm optimization (NAPSO) is proposed to predict the dynamic measurement errors of sensors. Natural selection and simulated annealing are added in the PSO to raise the ability to avoid local optima. To verify the performance of NAPSO-SVM, three types of algorithms are selected to optimize the SVM's parameters: the particle swarm optimization algorithm (PSO), the improved PSO optimization algorithm (NAPSO), and the glowworm swarm optimization (GSO). The dynamic measurement error data of two sensors are applied as the test data. The root mean squared error and mean absolute percentage error are employed to evaluate the prediction models' performances. The experimental results show that among the three tested algorithms the NAPSO-SVM method has a better prediction precision and a less prediction errors, and it is an effective method for predicting the dynamic measurement errors of sensors.
Evaluating concentration estimation errors in ELISA microarray experiments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Daly, Don S.; White, Amanda M.; Varnum, Susan M.

Enzyme-linked immunosorbent assay (ELISA) is a standard immunoassay to predict a protein concentration in a sample. Deploying ELISA in a microarray format permits simultaneous prediction of the concentrations of numerous proteins in a small sample. These predictions, however, are uncertain due to processing error and biological variability. Evaluating prediction error is critical to interpreting biological significance and improving the ELISA microarray process. Evaluating prediction error must be automated to realize a reliable high-throughput ELISA microarray system. Methods: In this paper, we present a statistical method based on propagation of error to evaluate prediction errors in the ELISA microarray process. Althoughmore » propagation of error is central to this method, it is effective only when comparable data are available. Therefore, we briefly discuss the roles of experimental design, data screening, normalization and statistical diagnostics when evaluating ELISA microarray prediction errors. We use an ELISA microarray investigation of breast cancer biomarkers to illustrate the evaluation of prediction errors. The illustration begins with a description of the design and resulting data, followed by a brief discussion of data screening and normalization. In our illustration, we fit a standard curve to the screened and normalized data, review the modeling diagnostics, and apply propagation of error.« less
Analysis of algae growth mechanism and water bloom prediction under the effect of multi-affecting factor.

PubMed

Wang, Li; Wang, Xiaoyi; Jin, Xuebo; Xu, Jiping; Zhang, Huiyan; Yu, Jiabin; Sun, Qian; Gao, Chong; Wang, Lingbin

2017-03-01

The formation process of algae is described inaccurately and water blooms are predicted with a low precision by current methods. In this paper, chemical mechanism of algae growth is analyzed, and a correlation analysis of chlorophyll-a and algal density is conducted by chemical measurement. Taking into account the influence of multi-factors on algae growth and water blooms, the comprehensive prediction method combined with multivariate time series and intelligent model is put forward in this paper. Firstly, through the process of photosynthesis, the main factors that affect the reproduction of the algae are analyzed. A compensation prediction method of multivariate time series analysis based on neural network and Support Vector Machine has been put forward which is combined with Kernel Principal Component Analysis to deal with dimension reduction of the influence factors of blooms. Then, Genetic Algorithm is applied to improve the generalization ability of the BP network and Least Squares Support Vector Machine. Experimental results show that this method could better compensate the prediction model of multivariate time series analysis which is an effective way to improve the description accuracy of algae growth and prediction precision of water blooms.

A novel method for structure-based prediction of ion channel conductance properties.

PubMed Central

Smart, O S; Breed, J; Smith, G R; Sansom, M S

1997-01-01

A rapid and easy-to-use method of predicting the conductance of an ion channel from its three-dimensional structure is presented. The method combines the pore dimensions of the channel as measured in the HOLE program with an Ohmic model of conductance. An empirically based correction factor is then applied. The method yielded good results for six experimental channel structures (none of which were included in the training set) with predictions accurate to within an average factor of 1.62 to the true values. The predictive r2 was equal to 0.90, which is indicative of a good predictive ability. The procedure is used to validate model structures of alamethicin and phospholamban. Two genuine predictions for the conductance of channels with known structure but without reported conductances are given. A modification of the procedure that calculates the expected results for the effect of the addition of nonelectrolyte polymers on conductance is set out. Results for a cholera toxin B-subunit crystal structure agree well with the measured values. The difficulty in interpreting such studies is discussed, with the conclusion that measurements on channels of known structure are required. Images FIGURE 1 FIGURE 3 FIGURE 4 FIGURE 6 FIGURE 10 PMID:9138559
Prediction of a service demand using combined forecasting approach

NASA Astrophysics Data System (ADS)

Zhou, Ling

2017-08-01

Forecasting facilitates cutting down operational and management costs while ensuring service level for a logistics service provider. Our case study here is to investigate how to forecast short-term logistic demand for a LTL carrier. Combined approach depends on several forecasting methods simultaneously, instead of a single method. It can offset the weakness of a forecasting method with the strength of another, which could improve the precision performance of prediction. Main issues of combined forecast modeling are how to select methods for combination, and how to find out weight coefficients among methods. The principles of method selection include that each method should apply to the problem of forecasting itself, also methods should differ in categorical feature as much as possible. Based on these principles, exponential smoothing, ARIMA and Neural Network are chosen to form the combined approach. Besides, least square technique is employed to settle the optimal weight coefficients among forecasting methods. Simulation results show the advantage of combined approach over the three single methods. The work done in the paper helps manager to select prediction method in practice.
Binding Modes of Ligands Using Enhanced Sampling (BLUES): Rapid Decorrelation of Ligand Binding Modes via Nonequilibrium Candidate Monte Carlo.

PubMed

Gill, Samuel C; Lim, Nathan M; Grinaway, Patrick B; Rustenburg, Ariën S; Fass, Josh; Ross, Gregory A; Chodera, John D; Mobley, David L

2018-05-31

Accurately predicting protein-ligand binding affinities and binding modes is a major goal in computational chemistry, but even the prediction of ligand binding modes in proteins poses major challenges. Here, we focus on solving the binding mode prediction problem for rigid fragments. That is, we focus on computing the dominant placement, conformation, and orientations of a relatively rigid, fragment-like ligand in a receptor, and the populations of the multiple binding modes which may be relevant. This problem is important in its own right, but is even more timely given the recent success of alchemical free energy calculations. Alchemical calculations are increasingly used to predict binding free energies of ligands to receptors. However, the accuracy of these calculations is dependent on proper sampling of the relevant ligand binding modes. Unfortunately, ligand binding modes may often be uncertain, hard to predict, and/or slow to interconvert on simulation time scales, so proper sampling with current techniques can require prohibitively long simulations. We need new methods which dramatically improve sampling of ligand binding modes. Here, we develop and apply a nonequilibrium candidate Monte Carlo (NCMC) method to improve sampling of ligand binding modes. In this technique, the ligand is rotated and subsequently allowed to relax in its new position through alchemical perturbation before accepting or rejecting the rotation and relaxation as a nonequilibrium Monte Carlo move. When applied to a T4 lysozyme model binding system, this NCMC method shows over 2 orders of magnitude improvement in binding mode sampling efficiency compared to a brute force molecular dynamics simulation. This is a first step toward applying this methodology to pharmaceutically relevant binding of fragments and, eventually, drug-like molecules. We are making this approach available via our new Binding modes of ligands using enhanced sampling (BLUES) package which is freely available on GitHub.
Reply to “Earthquake prediction evaluation standards applied to the VAN Method,” by D. D. Jackson

NASA Astrophysics Data System (ADS)

Varotsos, P.; Lazaridou, M.; Hadjicontis, V.

Our earlier publications show that VAN method does not fail requirements (1) and (2) suggested by Jackson [1996]. No subjective ex-post facto decission was necessary for the evaluation of the success because, for the large majority of VAN predictions, the values of ΔM, Δr and Δt were published before the period 1987-1989 under discussion; in a few cases only (three out of 29), related with the observation of the new phenomenon of the SES electrical activity, the value of Δt was determined in 1988. Furthermore, a careful inspection-from physical point of view-shows that the three plausibility criteria, suggested by Jackson (to be obeyed by a candidate prediction technique), are actually met by VAN-method.
Predicting missing links and identifying spurious links via likelihood analysis

NASA Astrophysics Data System (ADS)

Pan, Liming; Zhou, Tao; Lü, Linyuan; Hu, Chin-Kun

2016-03-01

Real network data is often incomplete and noisy, where link prediction algorithms and spurious link identification algorithms can be applied. Thus far, it lacks a general method to transform network organizing mechanisms to link prediction algorithms. Here we use an algorithmic framework where a network’s probability is calculated according to a predefined structural Hamiltonian that takes into account the network organizing principles, and a non-observed link is scored by the conditional probability of adding the link to the observed network. Extensive numerical simulations show that the proposed algorithm has remarkably higher accuracy than the state-of-the-art methods in uncovering missing links and identifying spurious links in many complex biological and social networks. Such method also finds applications in exploring the underlying network evolutionary mechanisms.
Predicting missing links and identifying spurious links via likelihood analysis

PubMed Central

Pan, Liming; Zhou, Tao; Lü, Linyuan; Hu, Chin-Kun

2016-01-01

Real network data is often incomplete and noisy, where link prediction algorithms and spurious link identification algorithms can be applied. Thus far, it lacks a general method to transform network organizing mechanisms to link prediction algorithms. Here we use an algorithmic framework where a network’s probability is calculated according to a predefined structural Hamiltonian that takes into account the network organizing principles, and a non-observed link is scored by the conditional probability of adding the link to the observed network. Extensive numerical simulations show that the proposed algorithm has remarkably higher accuracy than the state-of-the-art methods in uncovering missing links and identifying spurious links in many complex biological and social networks. Such method also finds applications in exploring the underlying network evolutionary mechanisms. PMID:26961965
A Feature Fusion Based Forecasting Model for Financial Time Series

PubMed Central

Guo, Zhiqiang; Wang, Huaiqing; Liu, Quan; Yang, Jie

2014-01-01

Predicting the stock market has become an increasingly interesting research area for both researchers and investors, and many prediction models have been proposed. In these models, feature selection techniques are used to pre-process the raw data and remove noise. In this paper, a prediction model is constructed to forecast stock market behavior with the aid of independent component analysis, canonical correlation analysis, and a support vector machine. First, two types of features are extracted from the historical closing prices and 39 technical variables obtained by independent component analysis. Second, a canonical correlation analysis method is utilized to combine the two types of features and extract intrinsic features to improve the performance of the prediction model. Finally, a support vector machine is applied to forecast the next day's closing price. The proposed model is applied to the Shanghai stock market index and the Dow Jones index, and experimental results show that the proposed model performs better in the area of prediction than other two similar models. PMID:24971455
Effect of heteroscedasticity treatment in residual error models on model calibration and prediction uncertainty estimation

NASA Astrophysics Data System (ADS)

Sun, Ruochen; Yuan, Huiling; Liu, Xiaoli

2017-11-01

The heteroscedasticity treatment in residual error models directly impacts the model calibration and prediction uncertainty estimation. This study compares three methods to deal with the heteroscedasticity, including the explicit linear modeling (LM) method and nonlinear modeling (NL) method using hyperbolic tangent function, as well as the implicit Box-Cox transformation (BC). Then a combined approach (CA) combining the advantages of both LM and BC methods has been proposed. In conjunction with the first order autoregressive model and the skew exponential power (SEP) distribution, four residual error models are generated, namely LM-SEP, NL-SEP, BC-SEP and CA-SEP, and their corresponding likelihood functions are applied to the Variable Infiltration Capacity (VIC) hydrologic model over the Huaihe River basin, China. Results show that the LM-SEP yields the poorest streamflow predictions with the widest uncertainty band and unrealistic negative flows. The NL and BC methods can better deal with the heteroscedasticity and hence their corresponding predictive performances are improved, yet the negative flows cannot be avoided. The CA-SEP produces the most accurate predictions with the highest reliability and effectively avoids the negative flows, because the CA approach is capable of addressing the complicated heteroscedasticity over the study basin.
An Ensemble Successive Project Algorithm for Liquor Detection Using Near Infrared Sensor.

PubMed

Qu, Fangfang; Ren, Dong; Wang, Jihua; Zhang, Zhong; Lu, Na; Meng, Lei

2016-01-11

Spectral analysis technique based on near infrared (NIR) sensor is a powerful tool for complex information processing and high precision recognition, and it has been widely applied to quality analysis and online inspection of agricultural products. This paper proposes a new method to address the instability of small sample sizes in the successive projections algorithm (SPA) as well as the lack of association between selected variables and the analyte. The proposed method is an evaluated bootstrap ensemble SPA method (EBSPA) based on a variable evaluation index (EI) for variable selection, and is applied to the quantitative prediction of alcohol concentrations in liquor using NIR sensor. In the experiment, the proposed EBSPA with three kinds of modeling methods are established to test their performance. In addition, the proposed EBSPA combined with partial least square is compared with other state-of-the-art variable selection methods. The results show that the proposed method can solve the defects of SPA and it has the best generalization performance and stability. Furthermore, the physical meaning of the selected variables from the near infrared sensor data is clear, which can effectively reduce the variables and improve their prediction accuracy.
PLS-LS-SVM based modeling of ATR-IR as a robust method in detection and qualification of alprazolam

NASA Astrophysics Data System (ADS)

Parhizkar, Elahehnaz; Ghazali, Mohammad; Ahmadi, Fatemeh; Sakhteman, Amirhossein

2017-02-01

According to the United States pharmacopeia (USP), Gold standard technique for Alprazolam determination in dosage forms is HPLC, an expensive and time-consuming method that is not easy to approach. In this study chemometrics assisted ATR-IR was introduced as an alternative method that produce similar results in fewer time and energy consumed manner. Fifty-eight samples containing different concentrations of commercial alprazolam were evaluated by HPLC and ATR-IR method. A preprocessing approach was applied to convert raw data obtained from ATR-IR spectra to normal matrix. Finally, a relationship between alprazolam concentrations achieved by HPLC and ATR-IR data was established using PLS-LS-SVM (partial least squares least squares support vector machines). Consequently, validity of the method was verified to yield a model with low error values (root mean square error of cross validation equal to 0.98). The model was able to predict about 99% of the samples according to R2 of prediction set. Response permutation test was also applied to affirm that the model was not assessed by chance correlations. At conclusion, ATR-IR can be a reliable method in manufacturing process in detection and qualification of alprazolam content.
An automated ranking platform for machine learning regression models for meat spoilage prediction using multi-spectral imaging and metabolic profiling.

PubMed

Estelles-Lopez, Lucia; Ropodi, Athina; Pavlidis, Dimitris; Fotopoulou, Jenny; Gkousari, Christina; Peyrodie, Audrey; Panagou, Efstathios; Nychas, George-John; Mohareb, Fady

2017-09-01

Over the past decade, analytical approaches based on vibrational spectroscopy, hyperspectral/multispectral imagining and biomimetic sensors started gaining popularity as rapid and efficient methods for assessing food quality, safety and authentication; as a sensible alternative to the expensive and time-consuming conventional microbiological techniques. Due to the multi-dimensional nature of the data generated from such analyses, the output needs to be coupled with a suitable statistical approach or machine-learning algorithms before the results can be interpreted. Choosing the optimum pattern recognition or machine learning approach for a given analytical platform is often challenging and involves a comparative analysis between various algorithms in order to achieve the best possible prediction accuracy. In this work, "MeatReg", a web-based application is presented, able to automate the procedure of identifying the best machine learning method for comparing data from several analytical techniques, to predict the counts of microorganisms responsible of meat spoilage regardless of the packaging system applied. In particularly up to 7 regression methods were applied and these are ordinary least squares regression, stepwise linear regression, partial least square regression, principal component regression, support vector regression, random forest and k-nearest neighbours. MeatReg" was tested with minced beef samples stored under aerobic and modified atmosphere packaging and analysed with electronic nose, HPLC, FT-IR, GC-MS and Multispectral imaging instrument. Population of total viable count, lactic acid bacteria, pseudomonads, Enterobacteriaceae and B. thermosphacta, were predicted. As a result, recommendations of which analytical platforms are suitable to predict each type of bacteria and which machine learning methods to use in each case were obtained. The developed system is accessible via the link: www.sorfml.com. Copyright © 2017 Elsevier Ltd. All rights reserved.
Fractal Theory for Permeability Prediction, Venezuelan and USA Wells

NASA Astrophysics Data System (ADS)

Aldana, Milagrosa; Altamiranda, Dignorah; Cabrera, Ana

2014-05-01

Inferring petrophysical parameters such as permeability, porosity, water saturation, capillary pressure, etc, from the analysis of well logs or other available core data has always been of critical importance in the oil industry. Permeability in particular, which is considered to be a complex parameter, has been inferred using both empirical and theoretical techniques. The main goal of this work is to predict permeability values on different wells using Fractal Theory, based on a method proposed by Pape et al. (1999). This approach uses the relationship between permeability and the geometric form of the pore space of the rock. This method is based on the modified equation of Kozeny-Carman and a fractal pattern, which allows determining permeability as a function of the cementation exponent, porosity and the fractal dimension. Data from wells located in Venezuela and the United States of America are analyzed. Employing data of porosity and permeability obtained from core samples, and applying the Fractal Theory method, we calculated the prediction equations for each well. At the beginning, this was achieved by training with 50% of the data available for each well. Afterwards, these equations were tested inferring over 100% of the data to analyze possible trends in their distribution. This procedure gave excellent results in all the wells in spite of their geographic distance, generating permeability models with the potential to accurately predict permeability logs in the remaining parts of the well for which there are no core samples, using even porority logs. Additionally, empirical models were used to determine permeability and the results were compared with those obtained by applying the fractal method. The results indicated that, although there are empirical equations that give a proper adjustment, the prediction results obtained using fractal theory give a better fit to the core reference data.
Prediction of Landing Gear Noise Reduction and Comparison to Measurements

NASA Technical Reports Server (NTRS)

Lopes, Leonard V.

2010-01-01

Noise continues to be an ongoing problem for existing aircraft in flight and is projected to be a concern for next generation designs. During landing, when the engines are operating at reduced power, the noise from the airframe, of which landing gear noise is an important part, is equal to the engine noise. There are several methods of predicting landing gear noise, but none have been applied to predict the change in noise due to a change in landing gear design. The current effort uses the Landing Gear Model and Acoustic Prediction (LGMAP) code, developed at The Pennsylvania State University to predict the noise from landing gear. These predictions include the influence of noise reduction concepts on the landing gear noise. LGMAP is compared to wind tunnel experiments of a 6.3%-scale Boeing 777 main gear performed in the Quiet Flow Facility (QFF) at NASA Langley. The geometries tested in the QFF include the landing gear with and without a toboggan fairing and the door. It is shown that LGMAP is able to predict the noise directives and spectra from the model-scale test for the baseline configuration as accurately as current gear prediction methods. However, LGMAP is also able to predict the difference in noise caused by the toboggan fairing and by removing the landing gear door. LGMAP is also compared to far-field ground-based flush-mounted microphone measurements from the 2005 Quiet Technology Demonstrator 2 (QTD 2) flight test. These comparisons include a Boeing 777-300ER with and without a toboggan fairing that demonstrate that LGMAP can be applied to full-scale flyover measurements. LGMAP predictions of the noise generated by the nose gear on the main gear measurements are also shown.
Blind test of physics-based prediction of protein structures.

PubMed

Shell, M Scott; Ozkan, S Banu; Voelz, Vincent; Wu, Guohong Albert; Dill, Ken A

2009-02-01

We report here a multiprotein blind test of a computer method to predict native protein structures based solely on an all-atom physics-based force field. We use the AMBER 96 potential function with an implicit (GB/SA) model of solvation, combined with replica-exchange molecular-dynamics simulations. Coarse conformational sampling is performed using the zipping and assembly method (ZAM), an approach that is designed to mimic the putative physical routes of protein folding. ZAM was applied to the folding of six proteins, from 76 to 112 monomers in length, in CASP7, a community-wide blind test of protein structure prediction. Because these predictions have about the same level of accuracy as typical bioinformatics methods, and do not utilize information from databases of known native structures, this work opens up the possibility of predicting the structures of membrane proteins, synthetic peptides, or other foldable polymers, for which there is little prior knowledge of native structures. This approach may also be useful for predicting physical protein folding routes, non-native conformations, and other physical properties from amino acid sequences.
Blind Test of Physics-Based Prediction of Protein Structures

PubMed Central

Shell, M. Scott; Ozkan, S. Banu; Voelz, Vincent; Wu, Guohong Albert; Dill, Ken A.

2009-01-01

We report here a multiprotein blind test of a computer method to predict native protein structures based solely on an all-atom physics-based force field. We use the AMBER 96 potential function with an implicit (GB/SA) model of solvation, combined with replica-exchange molecular-dynamics simulations. Coarse conformational sampling is performed using the zipping and assembly method (ZAM), an approach that is designed to mimic the putative physical routes of protein folding. ZAM was applied to the folding of six proteins, from 76 to 112 monomers in length, in CASP7, a community-wide blind test of protein structure prediction. Because these predictions have about the same level of accuracy as typical bioinformatics methods, and do not utilize information from databases of known native structures, this work opens up the possibility of predicting the structures of membrane proteins, synthetic peptides, or other foldable polymers, for which there is little prior knowledge of native structures. This approach may also be useful for predicting physical protein folding routes, non-native conformations, and other physical properties from amino acid sequences. PMID:19186130
Noise-Source Separation Using Internal and Far-Field Sensors for a Full-Scale Turbofan Engine

NASA Technical Reports Server (NTRS)

Hultgren, Lennart S.; Miles, Jeffrey H.

2009-01-01

Noise-source separation techniques for the extraction of the sub-dominant combustion noise from the total noise signatures obtained in static-engine tests are described. Three methods are applied to data from a static, full-scale engine test. Both 1/3-octave and narrow-band results are discussed. The results are used to assess the combustion-noise prediction capability of the Aircraft Noise Prediction Program (ANOPP). A new additional phase-angle-based discriminator for the three-signal method is also introduced.
Different approaches in Partial Least Squares and Artificial Neural Network models applied for the analysis of a ternary mixture of Amlodipine, Valsartan and Hydrochlorothiazide

NASA Astrophysics Data System (ADS)

Darwish, Hany W.; Hassan, Said A.; Salem, Maissa Y.; El-Zeany, Badr A.

2014-03-01

Different chemometric models were applied for the quantitative analysis of Amlodipine (AML), Valsartan (VAL) and Hydrochlorothiazide (HCT) in ternary mixture, namely, Partial Least Squares (PLS) as traditional chemometric model and Artificial Neural Networks (ANN) as advanced model. PLS and ANN were applied with and without variable selection procedure (Genetic Algorithm GA) and data compression procedure (Principal Component Analysis PCA). The chemometric methods applied are PLS-1, GA-PLS, ANN, GA-ANN and PCA-ANN. The methods were used for the quantitative analysis of the drugs in raw materials and pharmaceutical dosage form via handling the UV spectral data. A 3-factor 5-level experimental design was established resulting in 25 mixtures containing different ratios of the drugs. Fifteen mixtures were used as a calibration set and the other ten mixtures were used as validation set to validate the prediction ability of the suggested methods. The validity of the proposed methods was assessed using the standard addition technique.
Predicting the velocity and azimuth of fragments generated by the range destruction or random failure of rocket casings and tankage

NASA Technical Reports Server (NTRS)

Eck, Marshall; Mukunda, Meera

1988-01-01

A calculational method is described which provides a powerful tool for predicting solid rocket motor (SRM) casing and liquid rocket tankage fragmentation response. The approach properly partitions the available impulse to each major system-mass component. It uses the Pisces code developed by Physics International to couple the forces generated by an Eulerian-modeled gas flow field to a Lagrangian-modeled fuel and casing system. The details of the predictive analytical modeling process and the development of normalized relations for momentum partition as a function of SRM burn time and initial geometry are discussed. Methods for applying similar modeling techniques to liquid-tankage-overpressure failures are also discussed. Good agreement between predictions and observations are obtained for five specific events.
Statistical Relational Learning to Predict Primary Myocardial Infarction from Electronic Health Records

PubMed Central

Weiss, Jeremy C; Page, David; Peissig, Peggy L; Natarajan, Sriraam; McCarty, Catherine

2013-01-01

Electronic health records (EHRs) are an emerging relational domain with large potential to improve clinical outcomes. We apply two statistical relational learning (SRL) algorithms to the task of predicting primary myocardial infarction. We show that one SRL algorithm, relational functional gradient boosting, outperforms propositional learners particularly in the medically-relevant high recall region. We observe that both SRL algorithms predict outcomes better than their propositional analogs and suggest how our methods can augment current epidemiological practices. PMID:25360347
Combined adaptive multiple subtraction based on optimized event tracing and extended wiener filtering

NASA Astrophysics Data System (ADS)

Tan, Jun; Song, Peng; Li, Jinshan; Wang, Lei; Zhong, Mengxuan; Zhang, Xiaobo

2017-06-01

The surface-related multiple elimination (SRME) method is based on feedback formulation and has become one of the most preferred multiple suppression methods used. However, some differences are apparent between the predicted multiples and those in the source seismic records, which may result in conventional adaptive multiple subtraction methods being barely able to effectively suppress multiples in actual production. This paper introduces a combined adaptive multiple attenuation method based on the optimized event tracing technique and extended Wiener filtering. The method firstly uses multiple records predicted by SRME to generate a multiple velocity spectrum, then separates the original record to an approximate primary record and an approximate multiple record by applying the optimized event tracing method and short-time window FK filtering method. After applying the extended Wiener filtering method, residual multiples in the approximate primary record can then be eliminated and the damaged primary can be restored from the approximate multiple record. This method combines the advantages of multiple elimination based on the optimized event tracing method and the extended Wiener filtering technique. It is an ideal method for suppressing typical hyperbolic and other types of multiples, with the advantage of minimizing damage of the primary. Synthetic and field data tests show that this method produces better multiple elimination results than the traditional multi-channel Wiener filter method and is more suitable for multiple elimination in complicated geological areas.

Hidden markov model for the prediction of transmembrane proteins using MATLAB.

PubMed

Chaturvedi, Navaneet; Shanker, Sudhanshu; Singh, Vinay Kumar; Sinha, Dhiraj; Pandey, Paras Nath

2011-01-01

Since membranous proteins play a key role in drug targeting therefore transmembrane proteins prediction is active and challenging area of biological sciences. Location based prediction of transmembrane proteins are significant for functional annotation of protein sequences. Hidden markov model based method was widely applied for transmembrane topology prediction. Here we have presented a revised and a better understanding model than an existing one for transmembrane protein prediction. Scripting on MATLAB was built and compiled for parameter estimation of model and applied this model on amino acid sequence to know the transmembrane and its adjacent locations. Estimated model of transmembrane topology was based on TMHMM model architecture. Only 7 super states are defined in the given dataset, which were converted to 96 states on the basis of their length in sequence. Accuracy of the prediction of model was observed about 74 %, is a good enough in the area of transmembrane topology prediction. Therefore we have concluded the hidden markov model plays crucial role in transmembrane helices prediction on MATLAB platform and it could also be useful for drug discovery strategy. The database is available for free at bioinfonavneet@gmail.comvinaysingh@bhu.ac.in.
Prediction of beta-turns at over 80% accuracy based on an ensemble of predicted secondary structures and multiple alignments

PubMed Central

Zheng, Ce; Kurgan, Lukasz

2008-01-01

Background β-turn is a secondary protein structure type that plays significant role in protein folding, stability, and molecular recognition. To date, several methods for prediction of β-turns from protein sequences were developed, but they are characterized by relatively poor prediction quality. The novelty of the proposed sequence-based β-turn predictor stems from the usage of a window based information extracted from four predicted three-state secondary structures, which together with a selected set of position specific scoring matrix (PSSM) values serve as an input to the support vector machine (SVM) predictor. Results We show that (1) all four predicted secondary structures are useful; (2) the most useful information extracted from the predicted secondary structure includes the structure of the predicted residue, secondary structure content in a window around the predicted residue, and features that indicate whether the predicted residue is inside a secondary structure segment; (3) the PSSM values of Asn, Asp, Gly, Ile, Leu, Met, Pro, and Val were among the top ranked features, which corroborates with recent studies. The Asn, Asp, Gly, and Pro indicate potential β-turns, while the remaining four amino acids are useful to predict non-β-turns. Empirical evaluation using three nonredundant datasets shows favorable Qtotal, Qpredicted and MCC values when compared with over a dozen of modern competing methods. Our method is the first to break the 80% Qtotal barrier and achieves Qtotal = 80.9%, MCC = 0.47, and Qpredicted higher by over 6% when compared with the second best method. We use feature selection to reduce the dimensionality of the feature vector used as the input for the proposed prediction method. The applied feature set is smaller by 86, 62 and 37% when compared with the second and two third-best (with respect to MCC) competing methods, respectively. Conclusion Experiments show that the proposed method constitutes an improvement over the competing prediction methods. The proposed prediction model can better discriminate between β-turns and non-β-turns due to obtaining lower numbers of false positive predictions. The prediction model and datasets are freely available at . PMID:18847492
RS-predictor: a new tool for predicting sites of cytochrome P450-mediated metabolism applied to CYP 3A4.

PubMed

Zaretzki, Jed; Bergeron, Charles; Rydberg, Patrik; Huang, Tao-wei; Bennett, Kristin P; Breneman, Curt M

2011-07-25

This article describes RegioSelectivity-Predictor (RS-Predictor), a new in silico method for generating predictive models of P450-mediated metabolism for drug-like compounds. Within this method, potential sites of metabolism (SOMs) are represented as "metabolophores": A concept that describes the hierarchical combination of topological and quantum chemical descriptors needed to represent the reactivity of potential metabolic reaction sites. RS-Predictor modeling involves the use of metabolophore descriptors together with multiple-instance ranking (MIRank) to generate an optimized descriptor weight vector that encodes regioselectivity trends across all cases in a training set. The resulting pathway-independent (O-dealkylation vs N-oxidation vs Csp(3) hydroxylation, etc.), isozyme-specific regioselectivity model may be used to predict potential metabolic liabilities. In the present work, cross-validated RS-Predictor models were generated for a set of 394 substrates of CYP 3A4 as a proof-of-principle for the method. Rank aggregation was then employed to merge independently generated predictions for each substrate into a single consensus prediction. The resulting consensus RS-Predictor models were shown to reliably identify at least one observed site of metabolism in the top two rank-positions on 78% of the substrates. Comparisons between RS-Predictor and previously described regioselectivity prediction methods reveal new insights into how in silico metabolite prediction methods should be compared.
A traveling salesman approach for predicting protein functions.

PubMed

Johnson, Olin; Liu, Jing

2006-10-12

Protein-protein interaction information can be used to predict unknown protein functions and to help study biological pathways. Here we present a new approach utilizing the classic Traveling Salesman Problem to study the protein-protein interactions and to predict protein functions in budding yeast Saccharomyces cerevisiae. We apply the global optimization tool from combinatorial optimization algorithms to cluster the yeast proteins based on the global protein interaction information. We then use this clustering information to help us predict protein functions. We use our algorithm together with the direct neighbor algorithm 1 on characterized proteins and compare the prediction accuracy of the two methods. We show our algorithm can produce better predictions than the direct neighbor algorithm, which only considers the immediate neighbors of the query protein. Our method is a promising one to be used as a general tool to predict functions of uncharacterized proteins and a successful sample of using computer science knowledge and algorithms to study biological problems.
A traveling salesman approach for predicting protein functions

PubMed Central

Johnson, Olin; Liu, Jing

2006-01-01

Background Protein-protein interaction information can be used to predict unknown protein functions and to help study biological pathways. Results Here we present a new approach utilizing the classic Traveling Salesman Problem to study the protein-protein interactions and to predict protein functions in budding yeast Saccharomyces cerevisiae. We apply the global optimization tool from combinatorial optimization algorithms to cluster the yeast proteins based on the global protein interaction information. We then use this clustering information to help us predict protein functions. We use our algorithm together with the direct neighbor algorithm [1] on characterized proteins and compare the prediction accuracy of the two methods. We show our algorithm can produce better predictions than the direct neighbor algorithm, which only considers the immediate neighbors of the query protein. Conclusion Our method is a promising one to be used as a general tool to predict functions of uncharacterized proteins and a successful sample of using computer science knowledge and algorithms to study biological problems. PMID:17147783
An influence coefficient method for the application of the modal technique to wing flutter suppression of the DAST ARW-1 wing

NASA Technical Reports Server (NTRS)

Pines, S.

1981-01-01

The methods used to compute the mass, structural stiffness, and aerodynamic forces in the form of influence coefficient matrices as applied to a flutter analysis of the Drones for Aerodynamic and Structural Testing (DAST) Aeroelastic Research Wing. The DAST wing was chosen because wind tunnel flutter test data and zero speed vibration data of the modes and frequencies exist and are available for comparison. A derivation of the equations of motion that can be used to apply the modal method for flutter suppression is included. A comparison of the open loop flutter predictions with both wind tunnel data and other analytical methods is presented.
Support vector machine applied to predict the zoonotic potential of E. coli O157 cattle isolates

USDA-ARS?s Scientific Manuscript database

Methods based on sequence data analysis facilitate the tracking of disease outbreaks, allow relationships between strains to be reconstructed and virulence factors to be identified. However, these methods are used postfactum after an outbreak has happened. Here, we show that support vector machine a...
A Comparison of Conventional Linear Regression Methods and Neural Networks for Forecasting Educational Spending.

ERIC Educational Resources Information Center

Baker, Bruce D.; Richards, Craig E.

1999-01-01

Applies neural network methods for forecasting 1991-95 per-pupil expenditures in U.S. public elementary and secondary schools. Forecasting models included the National Center for Education Statistics' multivariate regression model and three neural architectures. Regarding prediction accuracy, neural network results were comparable or superior to…
GIS Based Distributed Runoff Predictions in Variable Source Area Watersheds Employing the SCS-Curve Number

NASA Astrophysics Data System (ADS)

Steenhuis, T. S.; Mendoza, G.; Lyon, S. W.; Gerard Marchant, P.; Walter, M. T.; Schneiderman, E.

2003-04-01

Because the traditional Soil Conservation Service Curve Number (SCS-CN) approach continues to be ubiquitously used in GIS-BASED water quality models, new application methods are needed that are consistent with variable source area (VSA) hydrological processes in the landscape. We developed within an integrated GIS modeling environment a distributed approach for applying the traditional SCS-CN equation to watersheds where VSA hydrology is a dominant process. Spatial representation of hydrologic processes is important for watershed planning because restricting potentially polluting activities from runoff source areas is fundamental to controlling non-point source pollution. The methodology presented here uses the traditional SCS-CN method to predict runoff volume and spatial extent of saturated areas and uses a topographic index to distribute runoff source areas through watersheds. The resulting distributed CN-VSA method was incorporated in an existing GWLF water quality model and applied to sub-watersheds of the Delaware basin in the Catskill Mountains region of New York State. We found that the distributed CN-VSA approach provided a physically-based method that gives realistic results for watersheds with VSA hydrology.
Method of empirical dependences in estimation and prediction of activity of creatine kinase isoenzymes in cerebral ischemia

NASA Astrophysics Data System (ADS)

Sergeeva, Tatiana F.; Moshkova, Albina N.; Erlykina, Elena I.; Khvatova, Elena M.

2016-04-01

Creatine kinase is a key enzyme of energy metabolism in the brain. There are known cytoplasmic and mitochondrial creatine kinase isoenzymes. Mitochondrial creatine kinase exists as a mixture of two oligomeric forms - dimer and octamer. The aim of investigation was to study catalytic properties of cytoplasmic and mitochondrial creatine kinase and using of the method of empirical dependences for the possible prediction of the activity of these enzymes in cerebral ischemia. Ischemia was revealed to be accompanied with the changes of the activity of creatine kinase isoenzymes and oligomeric state of mitochondrial isoform. There were made the models of multiple regression that permit to study the activity of creatine kinase system in cerebral ischemia using a calculating method. Therefore, the mathematical method of empirical dependences can be applied for estimation and prediction of the functional state of the brain by the activity of creatine kinase isoenzymes in cerebral ischemia.
Fast, scalable prediction of deleterious noncoding variants from functional and population genomic data.

PubMed

Huang, Yi-Fei; Gulko, Brad; Siepel, Adam

2017-04-01

Many genetic variants that influence phenotypes of interest are located outside of protein-coding genes, yet existing methods for identifying such variants have poor predictive power. Here we introduce a new computational method, called LINSIGHT, that substantially improves the prediction of noncoding nucleotide sites at which mutations are likely to have deleterious fitness consequences, and which, therefore, are likely to be phenotypically important. LINSIGHT combines a generalized linear model for functional genomic data with a probabilistic model of molecular evolution. The method is fast and highly scalable, enabling it to exploit the 'big data' available in modern genomics. We show that LINSIGHT outperforms the best available methods in identifying human noncoding variants associated with inherited diseases. In addition, we apply LINSIGHT to an atlas of human enhancers and show that the fitness consequences at enhancers depend on cell type, tissue specificity, and constraints at associated promoters.
Statistical procedures for evaluating daily and monthly hydrologic model predictions

USGS Publications Warehouse

Coffey, M.E.; Workman, S.R.; Taraba, J.L.; Fogle, A.W.

2004-01-01

The overall study objective was to evaluate the applicability of different qualitative and quantitative methods for comparing daily and monthly SWAT computer model hydrologic streamflow predictions to observed data, and to recommend statistical methods for use in future model evaluations. Statistical methods were tested using daily streamflows and monthly equivalent runoff depths. The statistical techniques included linear regression, Nash-Sutcliffe efficiency, nonparametric tests, t-test, objective functions, autocorrelation, and cross-correlation. None of the methods specifically applied to the non-normal distribution and dependence between data points for the daily predicted and observed data. Of the tested methods, median objective functions, sign test, autocorrelation, and cross-correlation were most applicable for the daily data. The robust coefficient of determination (CD*) and robust modeling efficiency (EF*) objective functions were the preferred methods for daily model results due to the ease of comparing these values with a fixed ideal reference value of one. Predicted and observed monthly totals were more normally distributed, and there was less dependence between individual monthly totals than was observed for the corresponding predicted and observed daily values. More statistical methods were available for comparing SWAT model-predicted and observed monthly totals. The 1995 monthly SWAT model predictions and observed data had a regression Rr2 of 0.70, a Nash-Sutcliffe efficiency of 0.41, and the t-test failed to reject the equal data means hypothesis. The Nash-Sutcliffe coefficient and the R r2 coefficient were the preferred methods for monthly results due to the ability to compare these coefficients to a set ideal value of one.
Examining speed versus selection in connectivity models using elk migration as an example

USGS Publications Warehouse

Brennan, Angela; Hanks, Ephraim M.; Merkle, Jerod A.; Cole, Eric K.; Dewey, Sarah R.; Courtemanch, Alyson B.; Cross, Paul C.

2018-01-01

ContextLandscape resistance is vital to connectivity modeling and frequently derived from resource selection functions (RSFs). RSFs estimate relative probability of use and tend to focus on understanding habitat preferences during slow, routine animal movements (e.g., foraging). Dispersal and migration, however, can produce rarer, faster movements, in which case models of movement speed rather than resource selection may be more realistic for identifying habitats that facilitate connectivity.ObjectiveTo compare two connectivity modeling approaches applied to resistance estimated from models of movement rate and resource selection.MethodsUsing movement data from migrating elk, we evaluated continuous time Markov chain (CTMC) and movement-based RSF models (i.e., step selection functions [SSFs]). We applied circuit theory and shortest random path (SRP) algorithms to CTMC, SSF and null (i.e., flat) resistance surfaces to predict corridors between elk seasonal ranges. We evaluated prediction accuracy by comparing model predictions to empirical elk movements.ResultsAll connectivity models predicted elk movements well, but models applied to CTMC resistance were more accurate than models applied to SSF and null resistance. Circuit theory models were more accurate on average than SRP models.ConclusionsCTMC can be more realistic than SSFs for estimating resistance for fast movements, though SSFs may demonstrate some predictive ability when animals also move slowly through corridors (e.g., stopover use during migration). High null model accuracy suggests seasonal range data may also be critical for predicting direct migration routes. For animals that migrate or disperse across large landscapes, we recommend incorporating CTMC into the connectivity modeling toolkit.
Examining speed versus selection in connectivity models using elk migration as an example

USGS Publications Warehouse

Brennan, Angela; Hanks, EM; Merkle, JA; Cole, EK; Dewey, SR; Courtemanch, AB; Cross, Paul C.

2018-01-01

Context: Landscape resistance is vital to connectivity modeling and frequently derived from resource selection functions (RSFs). RSFs estimate relative probability of use and tend to focus on understanding habitat preferences during slow, routine animal movements (e.g., foraging). Dispersal and migration, however, can produce rarer, faster movements, in which case models of movement speed rather than resource selection may be more realistic for identifying habitats that facilitate connectivity. Objective: To compare two connectivity modeling approaches applied to resistance estimated from models of movement rate and resource selection. Methods: Using movement data from migrating elk, we evaluated continuous time Markov chain (CTMC) and movement-based RSF models (i.e., step selection functions [SSFs]). We applied circuit theory and shortest random path (SRP) algorithms to CTMC, SSF and null (i.e., flat) resistance surfaces to predict corridors between elk seasonal ranges. We evaluated prediction accuracy by comparing model predictions to empirical elk movements. Results: All models predicted elk movements well, but models applied to CTMC resistance were more accurate than models applied to SSF and null resistance. Circuit theory models were more accurate on average than SRP algorithms. Conclusions: CTMC can be more realistic than SSFs for estimating resistance for fast movements, though SSFs may demonstrate some predictive ability when animals also move slowly through corridors (e.g., stopover use during migration). High null model accuracy suggests seasonal range data may also be critical for predicting direct migration routes. For animals that migrate or disperse across large landscapes, we recommend incorporating CTMC into the connectivity modeling toolkit.
Brownian systems with spatially inhomogeneous activity

NASA Astrophysics Data System (ADS)

Sharma, A.; Brader, J. M.

2017-09-01

We generalize the Green-Kubo approach, previously applied to bulk systems of spherically symmetric active particles [J. Chem. Phys. 145, 161101 (2016), 10.1063/1.4966153], to include spatially inhomogeneous activity. The method is applied to predict the spatial dependence of the average orientation per particle and the density. The average orientation is given by an integral over the self part of the Van Hove function and a simple Gaussian approximation to this quantity yields an accurate analytical expression. Taking this analytical result as input to a dynamic density functional theory approximates the spatial dependence of the density in good agreement with simulation data. All theoretical predictions are validated using Brownian dynamics simulations.
Prediction of forest fires occurrences with area-level Poisson mixed models.

PubMed

Boubeta, Miguel; Lombardía, María José; Marey-Pérez, Manuel Francisco; Morales, Domingo

2015-05-01

The number of fires in forest areas of Galicia (north-west of Spain) during the summer period is quite high. Local authorities are interested in analyzing the factors that explain this phenomenon. Poisson regression models are good tools for describing and predicting the number of fires per forest areas. This work employs area-level Poisson mixed models for treating real data about fires in forest areas. A parametric bootstrap method is applied for estimating the mean squared errors of fires predictors. The developed methodology and software are applied to a real data set of fires in forest areas of Galicia. Copyright © 2015 Elsevier Ltd. All rights reserved.
Uncertainty in predicting soil hydraulic properties at the hillslope scale with indirect methods

NASA Astrophysics Data System (ADS)

Chirico, G. B.; Medina, H.; Romano, N.

2007-02-01

SummarySeveral hydrological applications require the characterisation of the soil hydraulic properties at large spatial scales. Pedotransfer functions (PTFs) are being developed as simplified methods to estimate soil hydraulic properties as an alternative to direct measurements, which are unfeasible for most practical circumstances. The objective of this study is to quantify the uncertainty in PTFs spatial predictions at the hillslope scale as related to the sampling density, due to: (i) the error in estimated soil physico-chemical properties and (ii) PTF model error. The analysis is carried out on a 2-km-long experimental hillslope in South Italy. The method adopted is based on a stochastic generation of patterns of soil variables using sequential Gaussian simulation, conditioned to the observed sample data. The following PTFs are applied: Vereecken's PTF [Vereecken, H., Diels, J., van Orshoven, J., Feyen, J., Bouma, J., 1992. Functional evaluation of pedotransfer functions for the estimation of soil hydraulic properties. Soil Sci. Soc. Am. J. 56, 1371-1378] and HYPRES PTF [Wösten, J.H.M., Lilly, A., Nemes, A., Le Bas, C., 1999. Development and use of a database of hydraulic properties of European soils. Geoderma 90, 169-185]. The two PTFs estimate reliably the soil water retention characteristic even for a relatively coarse sampling resolution, with prediction uncertainties comparable to the uncertainties in direct laboratory or field measurements. The uncertainty of soil water retention prediction due to the model error is as much as or more significant than the uncertainty associated with the estimated input, even for a relatively coarse sampling resolution. Prediction uncertainties are much more important when PTF are applied to estimate the saturated hydraulic conductivity. In this case model error dominates the overall prediction uncertainties, making negligible the effect of the input error.
A method for probing the mutational landscape of amyloid structure.

PubMed

O'Donnell, Charles W; Waldispühl, Jérôme; Lis, Mieszko; Halfmann, Randal; Devadas, Srinivas; Lindquist, Susan; Berger, Bonnie

2011-07-01

Proteins of all kinds can self-assemble into highly ordered β-sheet aggregates known as amyloid fibrils, important both biologically and clinically. However, the specific molecular structure of a fibril can vary dramatically depending on sequence and environmental conditions, and mutations can drastically alter amyloid function and pathogenicity. Experimental structure determination has proven extremely difficult with only a handful of NMR-based models proposed, suggesting a need for computational methods. We present AmyloidMutants, a statistical mechanics approach for de novo prediction and analysis of wild-type and mutant amyloid structures. Based on the premise of protein mutational landscapes, AmyloidMutants energetically quantifies the effects of sequence mutation on fibril conformation and stability. Tested on non-mutant, full-length amyloid structures with known chemical shift data, AmyloidMutants offers roughly 2-fold improvement in prediction accuracy over existing tools. Moreover, AmyloidMutants is the only method to predict complete super-secondary structures, enabling accurate discrimination of topologically dissimilar amyloid conformations that correspond to the same sequence locations. Applied to mutant prediction, AmyloidMutants identifies a global conformational switch between Aβ and its highly-toxic 'Iowa' mutant in agreement with a recent experimental model based on partial chemical shift data. Predictions on mutant, yeast-toxic strains of HET-s suggest similar alternate folds. When applied to HET-s and a HET-s mutant with core asparagines replaced by glutamines (both highly amyloidogenic chemically similar residues abundant in many amyloids), AmyloidMutants surprisingly predicts a greatly reduced capacity of the glutamine mutant to form amyloid. We confirm this finding by conducting mutagenesis experiments. Our tool is publically available on the web at http://amyloid.csail.mit.edu/. lindquist_admin@wi.mit.edu; bab@csail.mit.edu.
Predicting Oral Health-Related Behaviour in the Parents of Preschool Children: An Application of the Theory of Planned Behaviour

ERIC Educational Resources Information Center

Van den Branden, Sigrid; Van den Broucke, Stephan; Leroy, Roos; Declerck, Dominique; Hoppenbrouwers, Karel

2015-01-01

Objective: This study aimed to test the predictive validity of the Theory of Planned Behaviour (TPB) when applied to the oral health-related behaviours of parents towards their preschool children in a cross-sectional and prospective design over a 5-year interval. Methods: Data for this study were obtained from parents of 1,057 children born…
Predictive mapping of forest composition and structure with direct gradient analysis and nearest neighbor imputation in coastal Oregon, U.S.A.

Treesearch

Janet L. Ohmann; Matthew J. Gregory

2002-01-01

Spatially explicit information on the species composition and structure of forest vegetation is needed at broad spatial scales for natural resource policy analysis and ecological research. We present a method for predictive vegetation mapping that applies direct gradient analysis and nearest-neighbor imputation to ascribe detailed ground attributes of vegetation to...

Ab Initio-Based Predictions of Hydrocarbon Combustion Chemistry

DTIC Science & Technology

2015-07-15

There are two prime objectives of the research. One is to develop and apply efficient methods for using ab initio potential energy surfaces (PESs...31-Mar-2015 Approved for Public Release; Distribution Unlimited Final Report: Ab Initio -Based Predictions of Hydrocarbon Combustion Chemistry The...Office P.O. Box 12211 Research Triangle Park, NC 27709-2211 hydrocarbon combustion, ab initio quantum chemistry, potential energy surfaces, chemical
Construction of ontology augmented networks for protein complex prediction.

PubMed

Zhang, Yijia; Lin, Hongfei; Yang, Zhihao; Wang, Jian

2013-01-01

Protein complexes are of great importance in understanding the principles of cellular organization and function. The increase in available protein-protein interaction data, gene ontology and other resources make it possible to develop computational methods for protein complex prediction. Most existing methods focus mainly on the topological structure of protein-protein interaction networks, and largely ignore the gene ontology annotation information. In this article, we constructed ontology augmented networks with protein-protein interaction data and gene ontology, which effectively unified the topological structure of protein-protein interaction networks and the similarity of gene ontology annotations into unified distance measures. After constructing ontology augmented networks, a novel method (clustering based on ontology augmented networks) was proposed to predict protein complexes, which was capable of taking into account the topological structure of the protein-protein interaction network, as well as the similarity of gene ontology annotations. Our method was applied to two different yeast protein-protein interaction datasets and predicted many well-known complexes. The experimental results showed that (i) ontology augmented networks and the unified distance measure can effectively combine the structure closeness and gene ontology annotation similarity; (ii) our method is valuable in predicting protein complexes and has higher F1 and accuracy compared to other competing methods.
A comparison of two adaptive multivariate analysis methods (PLSR and ANN) for winter wheat yield forecasting using Landsat-8 OLI images

NASA Astrophysics Data System (ADS)

Chen, Pengfei; Jing, Qi

2017-02-01

An assumption that the non-linear method is more reasonable than the linear method when canopy reflectance is used to establish the yield prediction model was proposed and tested in this study. For this purpose, partial least squares regression (PLSR) and artificial neural networks (ANN), represented linear and non-linear analysis method, were applied and compared for wheat yield prediction. Multi-period Landsat-8 OLI images were collected at two different wheat growth stages, and a field campaign was conducted to obtain grain yields at selected sampling sites in 2014. The field data were divided into a calibration database and a testing database. Using calibration data, a cross-validation concept was introduced for the PLSR and ANN model construction to prevent over-fitting. All models were tested using the test data. The ANN yield-prediction model produced R2, RMSE and RMSE% values of 0.61, 979 kg ha-1, and 10.38%, respectively, in the testing phase, performing better than the PLSR yield-prediction model, which produced R2, RMSE, and RMSE% values of 0.39, 1211 kg ha-1, and 12.84%, respectively. Non-linear method was suggested as a better method for yield prediction.
Prediction of the binding affinities of peptides to class II MHC using a regularized thermodynamic model

PubMed Central

2010-01-01

Background The binding of peptide fragments of extracellular peptides to class II MHC is a crucial event in the adaptive immune response. Each MHC allotype generally binds a distinct subset of peptides and the enormous number of possible peptide epitopes prevents their complete experimental characterization. Computational methods can utilize the limited experimental data to predict the binding affinities of peptides to class II MHC. Results We have developed the Regularized Thermodynamic Average, or RTA, method for predicting the affinities of peptides binding to class II MHC. RTA accounts for all possible peptide binding conformations using a thermodynamic average and includes a parameter constraint for regularization to improve accuracy on novel data. RTA was shown to achieve higher accuracy, as measured by AUC, than SMM-align on the same data for all 17 MHC allotypes examined. RTA also gave the highest accuracy on all but three allotypes when compared with results from 9 different prediction methods applied to the same data. In addition, the method correctly predicted the peptide binding register of 17 out of 18 peptide-MHC complexes. Finally, we found that suboptimal peptide binding registers, which are often ignored in other prediction methods, made significant contributions of at least 50% of the total binding energy for approximately 20% of the peptides. Conclusions The RTA method accurately predicts peptide binding affinities to class II MHC and accounts for multiple peptide binding registers while reducing overfitting through regularization. The method has potential applications in vaccine design and in understanding autoimmune disorders. A web server implementing the RTA prediction method is available at http://bordnerlab.org/RTA/. PMID:20089173
Prediction of lung cancer patient survival via supervised machine learning classification techniques.

PubMed

Lynch, Chip M; Abdollahi, Behnaz; Fuqua, Joshua D; de Carlo, Alexandra R; Bartholomai, James A; Balgemann, Rayeanne N; van Berkel, Victor H; Frieboes, Hermann B

2017-12-01

Outcomes for cancer patients have been previously estimated by applying various machine learning techniques to large datasets such as the Surveillance, Epidemiology, and End Results (SEER) program database. In particular for lung cancer, it is not well understood which types of techniques would yield more predictive information, and which data attributes should be used in order to determine this information. In this study, a number of supervised learning techniques is applied to the SEER database to classify lung cancer patients in terms of survival, including linear regression, Decision Trees, Gradient Boosting Machines (GBM), Support Vector Machines (SVM), and a custom ensemble. Key data attributes in applying these methods include tumor grade, tumor size, gender, age, stage, and number of primaries, with the goal to enable comparison of predictive power between the various methods The prediction is treated like a continuous target, rather than a classification into categories, as a first step towards improving survival prediction. The results show that the predicted values agree with actual values for low to moderate survival times, which constitute the majority of the data. The best performing technique was the custom ensemble with a Root Mean Square Error (RMSE) value of 15.05. The most influential model within the custom ensemble was GBM, while Decision Trees may be inapplicable as it had too few discrete outputs. The results further show that among the five individual models generated, the most accurate was GBM with an RMSE value of 15.32. Although SVM underperformed with an RMSE value of 15.82, statistical analysis singles the SVM as the only model that generated a distinctive output. The results of the models are consistent with a classical Cox proportional hazards model used as a reference technique. We conclude that application of these supervised learning techniques to lung cancer data in the SEER database may be of use to estimate patient survival time with the ultimate goal to inform patient care decisions, and that the performance of these techniques with this particular dataset may be on par with that of classical methods. Copyright © 2017 Elsevier B.V. All rights reserved.
Co-evolutionary Analysis of Domains in Interacting Proteins Reveals Insights into Domain–Domain Interactions Mediating Protein–Protein Interactions

PubMed Central

Jothi, Raja; Cherukuri, Praveen F.; Tasneem, Asba; Przytycka, Teresa M.

2006-01-01

Recent advances in functional genomics have helped generate large-scale high-throughput protein interaction data. Such networks, though extremely valuable towards molecular level understanding of cells, do not provide any direct information about the regions (domains) in the proteins that mediate the interaction. Here, we performed co-evolutionary analysis of domains in interacting proteins in order to understand the degree of co-evolution of interacting and non-interacting domains. Using a combination of sequence and structural analysis, we analyzed protein–protein interactions in F1-ATPase, Sec23p/Sec24p, DNA-directed RNA polymerase and nuclear pore complexes, and found that interacting domain pair(s) for a given interaction exhibits higher level of co-evolution than the noninteracting domain pairs. Motivated by this finding, we developed a computational method to test the generality of the observed trend, and to predict large-scale domain–domain interactions. Given a protein–protein interaction, the proposed method predicts the domain pair(s) that is most likely to mediate the protein interaction. We applied this method on the yeast interactome to predict domain–domain interactions, and used known domain–domain interactions found in PDB crystal structures to validate our predictions. Our results show that the prediction accuracy of the proposed method is statistically significant. Comparison of our prediction results with those from two other methods reveals that only a fraction of predictions are shared by all the three methods, indicating that the proposed method can detect known interactions missed by other methods. We believe that the proposed method can be used with other methods to help identify previously unrecognized domain–domain interactions on a genome scale, and could potentially help reduce the search space for identifying interaction sites. PMID:16949097
DNA-Based Methods in the Immunohematology Reference Laboratory

PubMed Central

Denomme, Gregory A

2010-01-01

Although hemagglutination serves the immunohematology reference laboratory well, when used alone, it has limited capability to resolve complex problems. This overview discusses how molecular approaches can be used in the immunohematology reference laboratory. In order to apply molecular approaches to immunohematology, knowledge of genes, DNA-based methods, and the molecular bases of blood groups are required. When applied correctly, DNA-based methods can predict blood groups to resolve ABO/Rh discrepancies, identify variant alleles, and screen donors for antigen-negative units. DNA-based testing in immunohematology is a valuable tool used to resolve blood group incompatibilities and to support patients in their transfusion needs. PMID:21257350
Large-scale model quality assessment for improving protein tertiary structure prediction.

PubMed

Cao, Renzhi; Bhattacharya, Debswapna; Adhikari, Badri; Li, Jilong; Cheng, Jianlin

2015-06-15

Sampling structural models and ranking them are the two major challenges of protein structure prediction. Traditional protein structure prediction methods generally use one or a few quality assessment (QA) methods to select the best-predicted models, which cannot consistently select relatively better models and rank a large number of models well. Here, we develop a novel large-scale model QA method in conjunction with model clustering to rank and select protein structural models. It unprecedentedly applied 14 model QA methods to generate consensus model rankings, followed by model refinement based on model combination (i.e. averaging). Our experiment demonstrates that the large-scale model QA approach is more consistent and robust in selecting models of better quality than any individual QA method. Our method was blindly tested during the 11th Critical Assessment of Techniques for Protein Structure Prediction (CASP11) as MULTICOM group. It was officially ranked third out of all 143 human and server predictors according to the total scores of the first models predicted for 78 CASP11 protein domains and second according to the total scores of the best of the five models predicted for these domains. MULTICOM's outstanding performance in the extremely competitive 2014 CASP11 experiment proves that our large-scale QA approach together with model clustering is a promising solution to one of the two major problems in protein structure modeling. The web server is available at: http://sysbio.rnet.missouri.edu/multicom_cluster/human/. © The Author 2015. Published by Oxford University Press.
How to deal with multiple binding poses in alchemical relative protein-ligand binding free energy calculations.

PubMed

Kaus, Joseph W; Harder, Edward; Lin, Teng; Abel, Robert; McCammon, J Andrew; Wang, Lingle

2015-06-09

Recent advances in improved force fields and sampling methods have made it possible for the accurate calculation of protein–ligand binding free energies. Alchemical free energy perturbation (FEP) using an explicit solvent model is one of the most rigorous methods to calculate relative binding free energies. However, for cases where there are high energy barriers separating the relevant conformations that are important for ligand binding, the calculated free energy may depend on the initial conformation used in the simulation due to the lack of complete sampling of all the important regions in phase space. This is particularly true for ligands with multiple possible binding modes separated by high energy barriers, making it difficult to sample all relevant binding modes even with modern enhanced sampling methods. In this paper, we apply a previously developed method that provides a corrected binding free energy for ligands with multiple binding modes by combining the free energy results from multiple alchemical FEP calculations starting from all enumerated poses, and the results are compared with Glide docking and MM-GBSA calculations. From these calculations, the dominant ligand binding mode can also be predicted. We apply this method to a series of ligands that bind to c-Jun N-terminal kinase-1 (JNK1) and obtain improved free energy results. The dominant ligand binding modes predicted by this method agree with the available crystallography, while both Glide docking and MM-GBSA calculations incorrectly predict the binding modes for some ligands. The method also helps separate the force field error from the ligand sampling error, such that deviations in the predicted binding free energy from the experimental values likely indicate possible inaccuracies in the force field. An error in the force field for a subset of the ligands studied was identified using this method, and improved free energy results were obtained by correcting the partial charges assigned to the ligands. This improved the root-mean-square error (RMSE) for the predicted binding free energy from 1.9 kcal/mol with the original partial charges to 1.3 kcal/mol with the corrected partial charges.
How To Deal with Multiple Binding Poses in Alchemical Relative Protein–Ligand Binding Free Energy Calculations

PubMed Central

2016-01-01

Recent advances in improved force fields and sampling methods have made it possible for the accurate calculation of protein–ligand binding free energies. Alchemical free energy perturbation (FEP) using an explicit solvent model is one of the most rigorous methods to calculate relative binding free energies. However, for cases where there are high energy barriers separating the relevant conformations that are important for ligand binding, the calculated free energy may depend on the initial conformation used in the simulation due to the lack of complete sampling of all the important regions in phase space. This is particularly true for ligands with multiple possible binding modes separated by high energy barriers, making it difficult to sample all relevant binding modes even with modern enhanced sampling methods. In this paper, we apply a previously developed method that provides a corrected binding free energy for ligands with multiple binding modes by combining the free energy results from multiple alchemical FEP calculations starting from all enumerated poses, and the results are compared with Glide docking and MM-GBSA calculations. From these calculations, the dominant ligand binding mode can also be predicted. We apply this method to a series of ligands that bind to c-Jun N-terminal kinase-1 (JNK1) and obtain improved free energy results. The dominant ligand binding modes predicted by this method agree with the available crystallography, while both Glide docking and MM-GBSA calculations incorrectly predict the binding modes for some ligands. The method also helps separate the force field error from the ligand sampling error, such that deviations in the predicted binding free energy from the experimental values likely indicate possible inaccuracies in the force field. An error in the force field for a subset of the ligands studied was identified using this method, and improved free energy results were obtained by correcting the partial charges assigned to the ligands. This improved the root-mean-square error (RMSE) for the predicted binding free energy from 1.9 kcal/mol with the original partial charges to 1.3 kcal/mol with the corrected partial charges. PMID:26085821
A global goodness-of-fit test for receiver operating characteristic curve analysis via the bootstrap method.

PubMed

Zou, Kelly H; Resnic, Frederic S; Talos, Ion-Florin; Goldberg-Zimring, Daniel; Bhagwat, Jui G; Haker, Steven J; Kikinis, Ron; Jolesz, Ferenc A; Ohno-Machado, Lucila

2005-10-01

Medical classification accuracy studies often yield continuous data based on predictive models for treatment outcomes. A popular method for evaluating the performance of diagnostic tests is the receiver operating characteristic (ROC) curve analysis. The main objective was to develop a global statistical hypothesis test for assessing the goodness-of-fit (GOF) for parametric ROC curves via the bootstrap. A simple log (or logit) and a more flexible Box-Cox normality transformations were applied to untransformed or transformed data from two clinical studies to predict complications following percutaneous coronary interventions (PCIs) and for image-guided neurosurgical resection results predicted by tumor volume, respectively. We compared a non-parametric with a parametric binormal estimate of the underlying ROC curve. To construct such a GOF test, we used the non-parametric and parametric areas under the curve (AUCs) as the metrics, with a resulting p value reported. In the interventional cardiology example, logit and Box-Cox transformations of the predictive probabilities led to satisfactory AUCs (AUC=0.888; p=0.78, and AUC=0.888; p=0.73, respectively), while in the brain tumor resection example, log and Box-Cox transformations of the tumor size also led to satisfactory AUCs (AUC=0.898; p=0.61, and AUC=0.899; p=0.42, respectively). In contrast, significant departures from GOF were observed without applying any transformation prior to assuming a binormal model (AUC=0.766; p=0.004, and AUC=0.831; p=0.03), respectively. In both studies the p values suggested that transformations were important to consider before applying any binormal model to estimate the AUC. Our analyses also demonstrated and confirmed the predictive values of different classifiers for determining the interventional complications following PCIs and resection outcomes in image-guided neurosurgery.
The effect of applied transducer force on acoustic radiation force impulse quantification within the left lobe of the liver.

PubMed

Porra, Luke; Swan, Hans; Ho, Chien

2015-08-01

Introduction: Acoustic Radiation Force Impulse (ARFI) Quantification measures shear wave velocities (SWVs) within the liver. It is a reliable method for predicting the severity of liver fibrosis and has the potential to assess fibrosis in any part of the liver, but previous research has found ARFI quantification in the right lobe more accurate than in the left lobe. A lack of standardised applied transducer force when performing ARFI quantification in the left lobe of the liver may account for some of this inaccuracy. The research hypothesis of this present study predicted that an increase in applied transducer force would result in an increase in SWVs measured. Methods: ARFI quantification within the left lobe of the liver was performed within a group of healthy volunteers (n = 28). During each examination, each participant was subjected to ARFI quantification at six different levels of transducer force applied to the epigastric abdominal wall. Results: A repeated measures ANOVA test showed that ARFI quantification was significantly affected by applied transducer force (p = 0.002). Significant pairwise comparisons using Bonferroni correction for multiple comparisons showed that with an increase in applied transducer force, there was a decrease in SWVs. Conclusion: Applied transducer force has a significant effect on SWVs within the left lobe of the liver and it may explain some of the less accurate and less reliable results in previous studies where transducer force was not taken into consideration. Future studies in the left lobe of the liver should take this into account and control for applied transducer force.
A Template-Based Protein Structure Reconstruction Method Using Deep Autoencoder Learning.

PubMed

Li, Haiou; Lyu, Qiang; Cheng, Jianlin

2016-12-01

Protein structure prediction is an important problem in computational biology, and is widely applied to various biomedical problems such as protein function study, protein design, and drug design. In this work, we developed a novel deep learning approach based on a deeply stacked denoising autoencoder for protein structure reconstruction. We applied our approach to a template-based protein structure prediction using only the 3D structural coordinates of homologous template proteins as input. The templates were identified for a target protein by a PSI-BLAST search. 3DRobot (a program that automatically generates diverse and well-packed protein structure decoys) was used to generate initial decoy models for the target from the templates. A stacked denoising autoencoder was trained on the decoys to obtain a deep learning model for the target protein. The trained deep model was then used to reconstruct the final structural model for the target sequence. With target proteins that have highly similar template proteins as benchmarks, the GDT-TS score of the predicted structures is greater than 0.7, suggesting that the deep autoencoder is a promising method for protein structure reconstruction.
Technical note: Combining quantile forecasts and predictive distributions of streamflows

NASA Astrophysics Data System (ADS)

Bogner, Konrad; Liechti, Katharina; Zappa, Massimiliano

2017-11-01

The enhanced availability of many different hydro-meteorological modelling and forecasting systems raises the issue of how to optimally combine this great deal of information. Especially the usage of deterministic and probabilistic forecasts with sometimes widely divergent predicted future streamflow values makes it even more complicated for decision makers to sift out the relevant information. In this study multiple streamflow forecast information will be aggregated based on several different predictive distributions, and quantile forecasts. For this combination the Bayesian model averaging (BMA) approach, the non-homogeneous Gaussian regression (NGR), also known as the ensemble model output statistic (EMOS) techniques, and a novel method called Beta-transformed linear pooling (BLP) will be applied. By the help of the quantile score (QS) and the continuous ranked probability score (CRPS), the combination results for the Sihl River in Switzerland with about 5 years of forecast data will be compared and the differences between the raw and optimally combined forecasts will be highlighted. The results demonstrate the importance of applying proper forecast combination methods for decision makers in the field of flood and water resource management.
RNA-sequence data normalization through in silico prediction of reference genes: the bacterial response to DNA damage as case study.

PubMed

Berghoff, Bork A; Karlsson, Torgny; Källman, Thomas; Wagner, E Gerhart H; Grabherr, Manfred G

2017-01-01

Measuring how gene expression changes in the course of an experiment assesses how an organism responds on a molecular level. Sequencing of RNA molecules, and their subsequent quantification, aims to assess global gene expression changes on the RNA level (transcriptome). While advances in high-throughput RNA-sequencing (RNA-seq) technologies allow for inexpensive data generation, accurate post-processing and normalization across samples is required to eliminate any systematic noise introduced by the biochemical and/or technical processes. Existing methods thus either normalize on selected known reference genes that are invariant in expression across the experiment, assume that the majority of genes are invariant, or that the effects of up- and down-regulated genes cancel each other out during the normalization. Here, we present a novel method, moose 2 , which predicts invariant genes in silico through a dynamic programming (DP) scheme and applies a quadratic normalization based on this subset. The method allows for specifying a set of known or experimentally validated invariant genes, which guides the DP. We experimentally verified the predictions of this method in the bacterium Escherichia coli , and show how moose 2 is able to (i) estimate the expression value distances between RNA-seq samples, (ii) reduce the variation of expression values across all samples, and (iii) to subsequently reveal new functional groups of genes during the late stages of DNA damage. We further applied the method to three eukaryotic data sets, on which its performance compares favourably to other methods. The software is implemented in C++ and is publicly available from http://grabherr.github.io/moose2/. The proposed RNA-seq normalization method, moose 2 , is a valuable alternative to existing methods, with two major advantages: (i) in silico prediction of invariant genes provides a list of potential reference genes for downstream analyses, and (ii) non-linear artefacts in RNA-seq data are handled adequately to minimize variations between replicates.
Uncertainty Analysis Based on Sparse Grid Collocation and Quasi-Monte Carlo Sampling with Application in Groundwater Modeling

NASA Astrophysics Data System (ADS)

Zhang, G.; Lu, D.; Ye, M.; Gunzburger, M.

2011-12-01

Markov Chain Monte Carlo (MCMC) methods have been widely used in many fields of uncertainty analysis to estimate the posterior distributions of parameters and credible intervals of predictions in the Bayesian framework. However, in practice, MCMC may be computationally unaffordable due to slow convergence and the excessive number of forward model executions required, especially when the forward model is expensive to compute. Both disadvantages arise from the curse of dimensionality, i.e., the posterior distribution is usually a multivariate function of parameters. Recently, sparse grid method has been demonstrated to be an effective technique for coping with high-dimensional interpolation or integration problems. Thus, in order to accelerate the forward model and avoid the slow convergence of MCMC, we propose a new method for uncertainty analysis based on sparse grid interpolation and quasi-Monte Carlo sampling. First, we construct a polynomial approximation of the forward model in the parameter space by using the sparse grid interpolation. This approximation then defines an accurate surrogate posterior distribution that can be evaluated repeatedly at minimal computational cost. Second, instead of using MCMC, a quasi-Monte Carlo method is applied to draw samples in the parameter space. Then, the desired probability density function of each prediction is approximated by accumulating the posterior density values of all the samples according to the prediction values. Our method has the following advantages: (1) the polynomial approximation of the forward model on the sparse grid provides a very efficient evaluation of the surrogate posterior distribution; (2) the quasi-Monte Carlo method retains the same accuracy in approximating the PDF of predictions but avoids all disadvantages of MCMC. The proposed method is applied to a controlled numerical experiment of groundwater flow modeling. The results show that our method attains the same accuracy much more efficiently than traditional MCMC.
A feasibility study for long-path multiple detection using a neural network

NASA Technical Reports Server (NTRS)

Feuerbacher, G. A.; Moebes, T. A.

1994-01-01

Least-squares inverse filters have found widespread use in the deconvolution of seismograms and the removal of multiples. The use of least-squares prediction filters with prediction distances greater than unity leads to the method of predictive deconvolution which can be used for the removal of long path multiples. The predictive technique allows one to control the length of the desired output wavelet by control of the predictive distance, and hence to specify the desired degree of resolution. Events which are periodic within given repetition ranges can be attenuated selectively. The method is thus effective in the suppression of rather complex reverberation patterns. A back propagation(BP) neural network is constructed to perform the detection of first arrivals of the multiples and therefore aid in the more accurate determination of the predictive distance of the multiples. The neural detector is applied to synthetic reflection coefficients and synthetic seismic traces. The processing results show that the neural detector is accurate and should lead to an automated fast method for determining predictive distances across vast amounts of data such as seismic field records. The neural network system used in this study was the NASA Software Technology Branch's NETS system.
Hidden state prediction: a modification of classic ancestral state reconstruction algorithms helps unravel complex symbioses

PubMed Central

Zaneveld, Jesse R. R.; Thurber, Rebecca L. V.

2014-01-01

Complex symbioses between animal or plant hosts and their associated microbiotas can involve thousands of species and millions of genes. Because of the number of interacting partners, it is often impractical to study all organisms or genes in these host-microbe symbioses individually. Yet new phylogenetic predictive methods can use the wealth of accumulated data on diverse model organisms to make inferences into the properties of less well-studied species and gene families. Predictive functional profiling methods use evolutionary models based on the properties of studied relatives to put bounds on the likely characteristics of an organism or gene that has not yet been studied in detail. These techniques have been applied to predict diverse features of host-associated microbial communities ranging from the enzymatic function of uncharacterized genes to the gene content of uncultured microorganisms. We consider these phylogenetically informed predictive techniques from disparate fields as examples of a general class of algorithms for Hidden State Prediction (HSP), and argue that HSP methods have broad value in predicting organismal traits in a variety of contexts, including the study of complex host-microbe symbioses. PMID:25202302
Integrative genetic risk prediction using non-parametric empirical Bayes classification.

PubMed

Zhao, Sihai Dave

2017-06-01

Genetic risk prediction is an important component of individualized medicine, but prediction accuracies remain low for many complex diseases. A fundamental limitation is the sample sizes of the studies on which the prediction algorithms are trained. One way to increase the effective sample size is to integrate information from previously existing studies. However, it can be difficult to find existing data that examine the target disease of interest, especially if that disease is rare or poorly studied. Furthermore, individual-level genotype data from these auxiliary studies are typically difficult to obtain. This article proposes a new approach to integrative genetic risk prediction of complex diseases with binary phenotypes. It accommodates possible heterogeneity in the genetic etiologies of the target and auxiliary diseases using a tuning parameter-free non-parametric empirical Bayes procedure, and can be trained using only auxiliary summary statistics. Simulation studies show that the proposed method can provide superior predictive accuracy relative to non-integrative as well as integrative classifiers. The method is applied to a recent study of pediatric autoimmune diseases, where it substantially reduces prediction error for certain target/auxiliary disease combinations. The proposed method is implemented in the R package ssa. © 2016, The International Biometric Society.
Protein domain analysis of genomic sequence data reveals regulation of LRR related domains in plant transpiration in Ficus.

PubMed

Lang, Tiange; Yin, Kangquan; Liu, Jinyu; Cao, Kunfang; Cannon, Charles H; Du, Fang K

2014-01-01

Predicting protein domains is essential for understanding a protein's function at the molecular level. However, up till now, there has been no direct and straightforward method for predicting protein domains in species without a reference genome sequence. In this study, we developed a functionality with a set of programs that can predict protein domains directly from genomic sequence data without a reference genome. Using whole genome sequence data, the programming functionality mainly comprised DNA assembly in combination with next-generation sequencing (NGS) assembly methods and traditional methods, peptide prediction and protein domain prediction. The proposed new functionality avoids problems associated with de novo assembly due to micro reads and small single repeats. Furthermore, we applied our functionality for the prediction of leucine rich repeat (LRR) domains in four species of Ficus with no reference genome, based on NGS genomic data. We found that the LRRNT_2 and LRR_8 domains are related to plant transpiration efficiency, as indicated by the stomata index, in the four species of Ficus. The programming functionality established in this study provides new insights for protein domain prediction, which is particularly timely in the current age of NGS data expansion.

On the calibration process of film dosimetry: OLS inverse regression versus WLS inverse prediction.

PubMed

Crop, F; Van Rompaye, B; Paelinck, L; Vakaet, L; Thierens, H; De Wagter, C

2008-07-21

The purpose of this study was both putting forward a statistically correct model for film calibration and the optimization of this process. A reliable calibration is needed in order to perform accurate reference dosimetry with radiographic (Gafchromic) film. Sometimes, an ordinary least squares simple linear (in the parameters) regression is applied to the dose-optical-density (OD) curve with the dose as a function of OD (inverse regression) or sometimes OD as a function of dose (inverse prediction). The application of a simple linear regression fit is an invalid method because heteroscedasticity of the data is not taken into account. This could lead to erroneous results originating from the calibration process itself and thus to a lower accuracy. In this work, we compare the ordinary least squares (OLS) inverse regression method with the correct weighted least squares (WLS) inverse prediction method to create calibration curves. We found that the OLS inverse regression method could lead to a prediction bias of up to 7.3 cGy at 300 cGy and total prediction errors of 3% or more for Gafchromic EBT film. Application of the WLS inverse prediction method resulted in a maximum prediction bias of 1.4 cGy and total prediction errors below 2% in a 0-400 cGy range. We developed a Monte-Carlo-based process to optimize calibrations, depending on the needs of the experiment. This type of thorough analysis can lead to a higher accuracy for film dosimetry.
Extensions to the visual predictive check to facilitate model performance evaluation.

PubMed

Post, Teun M; Freijer, Jan I; Ploeger, Bart A; Danhof, Meindert

2008-04-01

The Visual Predictive Check (VPC) is a valuable and supportive instrument for evaluating model performance. However in its most commonly applied form, the method largely depends on a subjective comparison of the distribution of the simulated data with the observed data, without explicitly quantifying and relating the information in both. In recent adaptations to the VPC this drawback is taken into consideration by presenting the observed and predicted data as percentiles. In addition, in some of these adaptations the uncertainty in the predictions is represented visually. However, it is not assessed whether the expected random distribution of the observations around the predicted median trend is realised in relation to the number of observations. Moreover the influence of and the information residing in missing data at each time point is not taken into consideration. Therefore, in this investigation the VPC is extended with two methods to support a less subjective and thereby more adequate evaluation of model performance: (i) the Quantified Visual Predictive Check (QVPC) and (ii) the Bootstrap Visual Predictive Check (BVPC). The QVPC presents the distribution of the observations as a percentage, thus regardless the density of the data, above and below the predicted median at each time point, while also visualising the percentage of unavailable data. The BVPC weighs the predicted median against the 5th, 50th and 95th percentiles resulting from a bootstrap of the observed data median at each time point, while accounting for the number and the theoretical position of unavailable data. The proposed extensions to the VPC are illustrated by a pharmacokinetic simulation example and applied to a pharmacodynamic disease progression example.
Load application for the contact mechanics analysis and wear prediction of total knee replacement.

PubMed

Zhang, Jing; Chen, Zhenxian; Wang, Ling; Li, Dichen; Jin, Zhongmin

2017-05-01

Tibiofemoral contact forces in total knee replacement have been measured at the medial and lateral sites respectively using an instrumented prosthesis, and predicted from musculoskeletal multibody dynamics models with a reasonable accuracy. However, it is uncommon that the medial and lateral forces are applied separately to replace a total axial load according to the ISO standard in the majority of current finite element analyses. In this study, we quantified the different effects of applying the medial and lateral loads separately versus the traditional total axial load application on contact mechanics and wear prediction of a patient-specific knee prosthesis. The load application position played an important role under the medial-lateral load application. The loading set which produced the closest load distribution to the multibody dynamics model was used to predict the contact mechanics and wear for the prosthesis and compared with the total axial load application. The medial-lateral load distribution using the present method was found to be closer to the multibody dynamics prediction than the traditional total axial load application, and the maximum contact pressure and contact area were consistent with the corresponding load variation. The predicted total volumetric wear rate and area were similar between the two load applications. However, the split of the predicted wear volumes on the medial and the lateral sides was different. The lateral volumetric wear rate was 31.46% smaller than the medial from the traditional load application prediction, while from the medial-lateral load application, the lateral side was only 11.8% smaller than the medial. The medial-lateral load application could provide a new and more accurate method of load application for patient-specific preclinical contact mechanics and wear prediction of knee implants.
A Public-Private Partnership Develops and Externally Validates a 30-Day Hospital Readmission Risk Prediction Model

PubMed Central

Choudhry, Shahid A.; Li, Jing; Davis, Darcy; Erdmann, Cole; Sikka, Rishi; Sutariya, Bharat

2013-01-01

Introduction: Preventing the occurrence of hospital readmissions is needed to improve quality of care and foster population health across the care continuum. Hospitals are being held accountable for improving transitions of care to avert unnecessary readmissions. Advocate Health Care in Chicago and Cerner (ACC) collaborated to develop all-cause, 30-day hospital readmission risk prediction models to identify patients that need interventional resources. Ideally, prediction models should encompass several qualities: they should have high predictive ability; use reliable and clinically relevant data; use vigorous performance metrics to assess the models; be validated in populations where they are applied; and be scalable in heterogeneous populations. However, a systematic review of prediction models for hospital readmission risk determined that most performed poorly (average C-statistic of 0.66) and efforts to improve their performance are needed for widespread usage. Methods: The ACC team incorporated electronic health record data, utilized a mixed-method approach to evaluate risk factors, and externally validated their prediction models for generalizability. Inclusion and exclusion criteria were applied on the patient cohort and then split for derivation and internal validation. Stepwise logistic regression was performed to develop two predictive models: one for admission and one for discharge. The prediction models were assessed for discrimination ability, calibration, overall performance, and then externally validated. Results: The ACC Admission and Discharge Models demonstrated modest discrimination ability during derivation, internal and external validation post-recalibration (C-statistic of 0.76 and 0.78, respectively), and reasonable model fit during external validation for utility in heterogeneous populations. Conclusions: The ACC Admission and Discharge Models embody the design qualities of ideal prediction models. The ACC plans to continue its partnership to further improve and develop valuable clinical models. PMID:24224068
Analysis of Strand-Specific RNA-Seq Data Using Machine Learning Reveals the Structures of Transcription Units in Clostridium thermocellum

DOE PAGES

Chou, Wen-Chi; Ma, Qin; Yang, Shihui; ...

2015-03-12

The identification of transcription units (TUs) encoded in a bacterial genome is essential to elucidation of transcriptional regulation of the organism. To gain a detailed understanding of the dynamically composed TU structures, we have used four strand-specific RNA-seq (ssRNA-seq) datasets collected under two experimental conditions to derive the genomic TU organization of Clostridium thermocellum using a machine-learning approach. Our method accurately predicted the genomic boundaries of individual TUs based on two sets of parameters measuring the RNA-seq expression patterns across the genome: expression-level continuity and variance. A total of 2590 distinct TUs are predicted based on the four RNA-seq datasets.more » Moreover, among the predicted TUs, 44% have multiple genes. We assessed our prediction method on an independent set of RNA-seq data with longer reads. The evaluation confirmed the high quality of the predicted TUs. Functional enrichment analyses on a selected subset of the predicted TUs revealed interesting biology. To demonstrate the generality of the prediction method, we have also applied the method to RNA-seq data collected on Escherichia coli and achieved high prediction accuracies. The TU prediction program named SeqTU is publicly available athttps://code.google.com/p/seqtu/. We expect that the predicted TUs can serve as the baseline information for studying transcriptional and post-transcriptional regulation in C. thermocellum and other bacteria.« less
Near-Surface Wind Predictions in Complex Terrain with a CFD Approach Optimized for Atmospheric Boundary Layer Flows

NASA Astrophysics Data System (ADS)

Wagenbrenner, N. S.; Forthofer, J.; Butler, B.; Shannon, K.

2014-12-01

Near-surface wind predictions are important for a number of applications, including transport and dispersion, wind energy forecasting, and wildfire behavior. Researchers and forecasters would benefit from a wind model that could be readily applied to complex terrain for use in these various disciplines. Unfortunately, near-surface winds in complex terrain are not handled well by traditional modeling approaches. Numerical weather prediction models employ coarse horizontal resolutions which do not adequately resolve sub-grid terrain features important to the surface flow. Computational fluid dynamics (CFD) models are increasingly being applied to simulate atmospheric boundary layer (ABL) flows, especially in wind energy applications; however, the standard functionality provided in commercial CFD models is not suitable for ABL flows. Appropriate CFD modeling in the ABL requires modification of empirically-derived wall function parameters and boundary conditions to avoid erroneous streamwise gradients due to inconsistences between inlet profiles and specified boundary conditions. This work presents a new version of a near-surface wind model for complex terrain called WindNinja. The new version of WindNinja offers two options for flow simulations: 1) the native, fast-running mass-consistent method available in previous model versions and 2) a CFD approach based on the OpenFOAM modeling framework and optimized for ABL flows. The model is described and evaluations of predictions with surface wind data collected from two recent field campaigns in complex terrain are presented. A comparison of predictions from the native mass-consistent method and the new CFD method is also provided.
Decadal climate prediction with a refined anomaly initialisation approach

NASA Astrophysics Data System (ADS)

Volpi, Danila; Guemas, Virginie; Doblas-Reyes, Francisco J.; Hawkins, Ed; Nichols, Nancy K.

2017-03-01

In decadal prediction, the objective is to exploit both the sources of predictability from the external radiative forcings and from the internal variability to provide the best possible climate information for the next decade. Predicting the climate system internal variability relies on initialising the climate model from observational estimates. We present a refined method of anomaly initialisation (AI) applied to the ocean and sea ice components of the global climate forecast model EC-Earth, with the following key innovations: (1) the use of a weight applied to the observed anomalies, in order to avoid the risk of introducing anomalies recorded in the observed climate, whose amplitude does not fit in the range of the internal variability generated by the model; (2) the AI of the ocean density, instead of calculating it from the anomaly initialised state of temperature and salinity. An experiment initialised with this refined AI method has been compared with a full field and standard AI experiment. Results show that the use of such refinements enhances the surface temperature skill over part of the North and South Atlantic, part of the South Pacific and the Mediterranean Sea for the first forecast year. However, part of such improvement is lost in the following forecast years. For the tropical Pacific surface temperature, the full field initialised experiment performs the best. The prediction of the Arctic sea-ice volume is improved by the refined AI method for the first three forecast years and the skill of the Atlantic multidecadal oscillation is significantly increased compared to a non-initialised forecast, along the whole forecast time.
Prediction and measurement of the electromagnetic environment of high-power medium-wave and short-wave broadcast antennas in far field.

PubMed

Tang, Zhanghong; Wang, Qun; Ji, Zhijiang; Shi, Meiwu; Hou, Guoyan; Tan, Danjun; Wang, Pengqi; Qiu, Xianbo

2014-12-01

With the increasing city size, high-power electromagnetic radiation devices such as high-power medium-wave (MW) and short-wave (SW) antennas have been inevitably getting closer and closer to buildings, which resulted in the pollution of indoor electromagnetic radiation becoming worsened. To avoid such radiation exceeding the exposure limits by national standards, it is necessary to predict and survey the electromagnetic radiation by MW and SW antennas before constructing the buildings. In this paper, a modified prediction method for the far-field electromagnetic radiation is proposed and successfully applied to predict the electromagnetic environment of an area close to a group of typical high-power MW and SW wave antennas. Different from currently used simplified prediction method defined in the Radiation Protection Management Guidelines (H J/T 10. 3-1996), the new method in this article makes use of more information such as antennas' patterns to predict the electromagnetic environment. Therefore, it improves the prediction accuracy significantly by the new feature of resolution at different directions. At the end of this article, a comparison between the prediction data and the measured results is given to demonstrate the effectiveness of the proposed new method. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Computing and Applying Atomic Regulons to Understand Gene Expression and Regulation

DOE PAGES

Faria, José P.; Davis, James J.; Edirisinghe, Janaka N.; ...

2016-11-24

Understanding gene function and regulation is essential for the interpretation, prediction, and ultimate design of cell responses to changes in the environment. A multitude of technologies, abstractions, and interpretive frameworks have emerged to answer the challenges presented by genome function and regulatory network inference. Here, we propose a new approach for producing biologically meaningful clusters of coexpressed genes, called Atomic Regulons (ARs), based on expression data, gene context, and functional relationships. We demonstrate this new approach by computing ARs for Escherichia coli, which we compare with the coexpressed gene clusters predicted by two prevalent existing methods: hierarchical clustering and k-meansmore » clustering. We test the consistency of ARs predicted by all methods against expected interactions predicted by the Context Likelihood of Relatedness (CLR) mutual information based method, finding that the ARs produced by our approach show better agreement with CLR interactions. We then apply our method to compute ARs for four other genomes: Shewanella oneidensis, Pseudomonas aeruginosa, Thermus thermophilus, and Staphylococcus aureus. We compare the AR clusters from all genomes to study the similarity of coexpression among a phylogenetically diverse set of species, identifying subsystems that show remarkable similarity over wide phylogenetic distances. We also study the sensitivity of our method for computing ARs to the expression data used in the computation, showing that our new approach requires less data than competing approaches to converge to a near final configuration of ARs. We go on to use our sensitivity analysis to identify the specific experiments that lead most rapidly to the final set of ARs for E. coli. As a result, this analysis produces insights into improving the design of gene expression experiments.« less
Computing and Applying Atomic Regulons to Understand Gene Expression and Regulation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Faria, José P.; Davis, James J.; Edirisinghe, Janaka N.

Understanding gene function and regulation is essential for the interpretation, prediction, and ultimate design of cell responses to changes in the environment. A multitude of technologies, abstractions, and interpretive frameworks have emerged to answer the challenges presented by genome function and regulatory network inference. Here, we propose a new approach for producing biologically meaningful clusters of coexpressed genes, called Atomic Regulons (ARs), based on expression data, gene context, and functional relationships. We demonstrate this new approach by computing ARs for Escherichia coli, which we compare with the coexpressed gene clusters predicted by two prevalent existing methods: hierarchical clustering and k-meansmore » clustering. We test the consistency of ARs predicted by all methods against expected interactions predicted by the Context Likelihood of Relatedness (CLR) mutual information based method, finding that the ARs produced by our approach show better agreement with CLR interactions. We then apply our method to compute ARs for four other genomes: Shewanella oneidensis, Pseudomonas aeruginosa, Thermus thermophilus, and Staphylococcus aureus. We compare the AR clusters from all genomes to study the similarity of coexpression among a phylogenetically diverse set of species, identifying subsystems that show remarkable similarity over wide phylogenetic distances. We also study the sensitivity of our method for computing ARs to the expression data used in the computation, showing that our new approach requires less data than competing approaches to converge to a near final configuration of ARs. We go on to use our sensitivity analysis to identify the specific experiments that lead most rapidly to the final set of ARs for E. coli. As a result, this analysis produces insights into improving the design of gene expression experiments.« less
Receptor-based 3D QSAR analysis of estrogen receptor ligands - merging the accuracy of receptor-based alignments with the computational efficiency of ligand-based methods

NASA Astrophysics Data System (ADS)

Sippl, Wolfgang

2000-08-01

One of the major challenges in computational approaches to drug design is the accurate prediction of binding affinity of biomolecules. In the present study several prediction methods for a published set of estrogen receptor ligands are investigated and compared. The binding modes of 30 ligands were determined using the docking program AutoDock and were compared with available X-ray structures of estrogen receptor-ligand complexes. On the basis of the docking results an interaction energy-based model, which uses the information of the whole ligand-receptor complex, was generated. Several parameters were modified in order to analyze their influence onto the correlation between binding affinities and calculated ligand-receptor interaction energies. The highest correlation coefficient ( r 2 = 0.617, q 2 LOO = 0.570) was obtained considering protein flexibility during the interaction energy evaluation. The second prediction method uses a combination of receptor-based and 3D quantitative structure-activity relationships (3D QSAR) methods. The ligand alignment obtained from the docking simulations was taken as basis for a comparative field analysis applying the GRID/GOLPE program. Using the interaction field derived with a water probe and applying the smart region definition (SRD) variable selection, a significant and robust model was obtained ( r 2 = 0.991, q 2 LOO = 0.921). The predictive ability of the established model was further evaluated by using a test set of six additional compounds. The comparison with the generated interaction energy-based model and with a traditional CoMFA model obtained using a ligand-based alignment ( r 2 = 0.951, q 2 LOO = 0.796) indicates that the combination of receptor-based and 3D QSAR methods is able to improve the quality of the underlying model.
Predicting the future: opportunities and challenges for the chemical industry to apply 21st-century toxicity testing.

PubMed

Settivari, Raja S; Ball, Nicholas; Murphy, Lynea; Rasoulpour, Reza; Boverhof, Darrell R; Carney, Edward W

2015-03-01

Interest in applying 21st-century toxicity testing tools for safety assessment of industrial chemicals is growing. Whereas conventional toxicology uses mainly animal-based, descriptive methods, a paradigm shift is emerging in which computational approaches, systems biology, high-throughput in vitro toxicity assays, and high-throughput exposure assessments are beginning to be applied to mechanism-based risk assessments in a time- and resource-efficient fashion. Here we describe recent advances in predictive safety assessment, with a focus on their strategic application to meet the changing demands of the chemical industry and its stakeholders. The opportunities to apply these new approaches is extensive and include screening of new chemicals, informing the design of safer and more sustainable chemical alternatives, filling information gaps on data-poor chemicals already in commerce, strengthening read-across methodology for categories of chemicals sharing similar modes of action, and optimizing the design of reduced-risk product formulations. Finally, we discuss how these predictive approaches dovetail with in vivo integrated testing strategies within repeated-dose regulatory toxicity studies, which are in line with 3Rs principles to refine, reduce, and replace animal testing. Strategic application of these tools is the foundation for informed and efficient safety assessment testing strategies that can be applied at all stages of the product-development process.
Some New Mathematical Methods for Variational Objective Analysis

NASA Technical Reports Server (NTRS)

Wahba, G.; Johnson, D. R.

1984-01-01

New and/or improved variational methods for simultaneously combining forecast, heterogeneous observational data, a priori climatology, and physics to obtain improved estimates of the initial state of the atmosphere for the purpose of numerical weather prediction are developed. Cross validated spline methods are applied to atmospheric data for the purpose of improved description and analysis of atmospheric phenomena such as the tropopause and frontal boundary surfaces.
Experimental Validation of the Dynamic Inertia Measurement Method to Find the Mass Properties of an Iron Bird Test Article

NASA Technical Reports Server (NTRS)

Chin, Alexander W.; Herrera, Claudia Y.; Spivey, Natalie D.; Fladung, William A.; Cloutier, David

2015-01-01

The mass properties of an aerospace vehicle are required by multiple disciplines in the analysis and prediction of flight behavior. Pendulum oscillation methods have been developed and employed for almost a century as a means to measure mass properties. However, these oscillation methods are costly, time consuming, and risky. The NASA Armstrong Flight Research Center has been investigating the Dynamic Inertia Measurement, or DIM method as a possible alternative to oscillation methods. The DIM method uses ground test techniques that are already applied to aerospace vehicles when conducting modal surveys. Ground vibration tests would require minimal additional instrumentation and time to apply the DIM method. The DIM method has been validated on smaller test articles, but has not yet been fully proven on large aerospace vehicles.
Benchmarking protein-protein interface predictions: why you should care about protein size.

PubMed

Martin, Juliette

2014-07-01

A number of predictive methods have been developed to predict protein-protein binding sites. Each new method is traditionally benchmarked using sets of protein structures of various sizes, and global statistics are used to assess the quality of the prediction. Little attention has been paid to the potential bias due to protein size on these statistics. Indeed, small proteins involve proportionally more residues at interfaces than large ones. If a predictive method is biased toward small proteins, this can lead to an over-estimation of its performance. Here, we investigate the bias due to the size effect when benchmarking protein-protein interface prediction on the widely used docking benchmark 4.0. First, we simulate random scores that favor small proteins over large ones. Instead of the 0.5 AUC (Area Under the Curve) value expected by chance, these biased scores result in an AUC equal to 0.6 using hypergeometric distributions, and up to 0.65 using constant scores. We then use real prediction results to illustrate how to detect the size bias by shuffling, and subsequently correct it using a simple conversion of the scores into normalized ranks. In addition, we investigate the scores produced by eight published methods and show that they are all affected by the size effect, which can change their relative ranking. The size effect also has an impact on linear combination scores by modifying the relative contributions of each method. In the future, systematic corrections should be applied when benchmarking predictive methods using data sets with mixed protein sizes. © 2014 Wiley Periodicals, Inc.
Model for the prediction of subsurface strata movement due to underground mining

NASA Astrophysics Data System (ADS)

Cheng, Jianwei; Liu, Fangyuan; Li, Siyuan

2017-12-01

The problem of ground control stability due to large underground mining operations is often associated with large movements and deformations of strata. It is a complicated problem, and can induce severe safety or environmental hazards either at the surface or in strata. Hence, knowing the subsurface strata movement characteristics, and making any subsidence predictions in advance, are desirable for mining engineers to estimate any damage likely to affect the ground surface or subsurface strata. Based on previous research findings, this paper broadly applies a surface subsidence prediction model based on the influence function method to subsurface strata, in order to predict subsurface stratum movement. A step-wise prediction model is proposed, to investigate the movement of underground strata. The model involves a dynamic iteration calculation process to derive the movements and deformations for each stratum layer; modifications to the influence method function are also made for more precise calculations. The critical subsidence parameters, incorporating stratum mechanical properties and the spatial relationship of interest at the mining level, are thoroughly considered, with the purpose of improving the reliability of input parameters. Such research efforts can be very helpful to mining engineers’ understanding of the moving behavior of all strata over underground excavations, and assist in making any damage mitigation plan. In order to check the reliability of the model, two methods are carried out and cross-validation applied. One is to use a borehole TV monitor recording to identify the progress of subsurface stratum bedding and caving in a coal mine, the other is to conduct physical modelling of the subsidence in underground strata. The results of these two methods are used to compare with theoretical results calculated by the proposed mathematical model. The testing results agree well with each other, and the acceptable accuracy and reliability of the proposed prediction model are thus validated.
DeepSynergy: predicting anti-cancer drug synergy with Deep Learning

PubMed Central

Preuer, Kristina; Lewis, Richard P I; Hochreiter, Sepp; Bender, Andreas; Bulusu, Krishna C; Klambauer, Günter

2018-01-01

Abstract Motivation While drug combination therapies are a well-established concept in cancer treatment, identifying novel synergistic combinations is challenging due to the size of combinatorial space. However, computational approaches have emerged as a time- and cost-efficient way to prioritize combinations to test, based on recently available large-scale combination screening data. Recently, Deep Learning has had an impact in many research areas by achieving new state-of-the-art model performance. However, Deep Learning has not yet been applied to drug synergy prediction, which is the approach we present here, termed DeepSynergy. DeepSynergy uses chemical and genomic information as input information, a normalization strategy to account for input data heterogeneity, and conical layers to model drug synergies. Results DeepSynergy was compared to other machine learning methods such as Gradient Boosting Machines, Random Forests, Support Vector Machines and Elastic Nets on the largest publicly available synergy dataset with respect to mean squared error. DeepSynergy significantly outperformed the other methods with an improvement of 7.2% over the second best method at the prediction of novel drug combinations within the space of explored drugs and cell lines. At this task, the mean Pearson correlation coefficient between the measured and the predicted values of DeepSynergy was 0.73. Applying DeepSynergy for classification of these novel drug combinations resulted in a high predictive performance of an AUC of 0.90. Furthermore, we found that all compared methods exhibit low predictive performance when extrapolating to unexplored drugs or cell lines, which we suggest is due to limitations in the size and diversity of the dataset. We envision that DeepSynergy could be a valuable tool for selecting novel synergistic drug combinations. Availability and implementation DeepSynergy is available via www.bioinf.jku.at/software/DeepSynergy. Contact klambauer@bioinf.jku.at Supplementary information Supplementary data are available at Bioinformatics online. PMID:29253077
The statistical mechanics of complex signaling networks: nerve growth factor signaling

NASA Astrophysics Data System (ADS)

Brown, K. S.; Hill, C. C.; Calero, G. A.; Myers, C. R.; Lee, K. H.; Sethna, J. P.; Cerione, R. A.

2004-10-01

The inherent complexity of cellular signaling networks and their importance to a wide range of cellular functions necessitates the development of modeling methods that can be applied toward making predictions and highlighting the appropriate experiments to test our understanding of how these systems are designed and function. We use methods of statistical mechanics to extract useful predictions for complex cellular signaling networks. A key difficulty with signaling models is that, while significant effort is being made to experimentally measure the rate constants for individual steps in these networks, many of the parameters required to describe their behavior remain unknown or at best represent estimates. To establish the usefulness of our approach, we have applied our methods toward modeling the nerve growth factor (NGF)-induced differentiation of neuronal cells. In particular, we study the actions of NGF and mitogenic epidermal growth factor (EGF) in rat pheochromocytoma (PC12) cells. Through a network of intermediate signaling proteins, each of these growth factors stimulates extracellular regulated kinase (Erk) phosphorylation with distinct dynamical profiles. Using our modeling approach, we are able to predict the influence of specific signaling modules in determining the integrated cellular response to the two growth factors. Our methods also raise some interesting insights into the design and possible evolution of cellular systems, highlighting an inherent property of these systems that we call 'sloppiness.'
Docking-based classification models for exploratory toxicology ...

EPA Pesticide Factsheets

Background: Exploratory toxicology is a new emerging research area whose ultimate mission is that of protecting human health and environment from risks posed by chemicals. In this regard, the ethical and practical limitation of animal testing has encouraged the promotion of computational methods for the fast screening of huge collections of chemicals available on the market. Results: We derived 24 reliable docking-based classification models able to predict the estrogenic potential of a large collection of chemicals having high quality experimental data, kindly provided by the U.S. Environmental Protection Agency (EPA). The predictive power of our docking-based models was supported by values of AUC, EF1% (EFmax = 7.1), -LR (at SE = 0.75) and +LR (at SE = 0.25) ranging from 0.63 to 0.72, from 2.5 to 6.2, from 0.35 to 0.67 and from 2.05 to 9.84, respectively. In addition, external predictions were successfully made on some representative known estrogenic chemicals. Conclusion: We show how structure-based methods, widely applied to drug discovery programs, can be adapted to meet the conditions of the regulatory context. Importantly, these methods enable one to employ the physicochemical information contained in the X-ray solved biological target and to screen structurally-unrelated chemicals. Shows how structure-based methods, widely applied to drug discovery programs, can be adapted to meet the conditions of the regulatory context. Evaluation of 24 reliable dockin
Rapid discrimination and quantification of alkaloids in Corydalis Tuber by near-infrared spectroscopy.

PubMed

Lu, Hai-yan; Wang, Shi-sheng; Cai, Rui; Meng, Yu; Xie, Xin; Zhao, Wei-jie

2012-02-05

With the application of near-infrared spectroscopy (NIRS), a convenient and rapid method for determination of alkaloids in Corydalis Tuber extract and classification for samples from different locations have been developed. Five different samples were collected according to their geographical origin, 2-Der with smoothing point of 17 was applied as the spectral pre-treatment, and the 1st to scaling range algorithm was adjusted to be optimal approach, classification model was constructed over the wavelength range of 4582-4270 cm⁻¹, 5562-4976 cm⁻¹ and 7000-7467 cm⁻¹ with a great recognition rate. For prediction model, partial least squares (PLS) algorithm was utilized referring to HPLC-UV reference method, the optimum models were obtained after adjustment. Pre-processing methods of calibration models were COE for protopine and min-max normalization for palmatine and MSC for tetrahydropalmatine, respectively. The root mean square errors of cross-validation (RMSECV) for protopine, palmatine, tetrahydropalmatine were 0.884, 1.83, 3.23 mg/g. The correlation coefficients (R²) were 99.75, 98.41 and 97.34%. T test was applied, in the model of tetrahydropalmatine; there is no significant difference between NIR prediction and HPLC reference method at 95% confidence interval with t=0.746

[Kriging analysis of vegetation index depression in peak cluster karst area].

PubMed

Yang, Qi-Yong; Jiang, Zhong-Cheng; Ma, Zu-Lu; Cao, Jian-Hua; Luo, Wei-Qun; Li, Wen-Jun; Duan, Xiao-Fang

2012-04-01

In order to master the spatial variability of the normal different vegetation index (NDVI) of the peak cluster karst area, taking into account the problem of the mountain shadow "missing" information of remote sensing images existing in the karst area, NDVI of the non-shaded area were extracted in Guohua Ecological Experimental Area, in Pingguo County, Guangxi applying image processing software, ENVI. The spatial variability of NDVI was analyzed applying geostatistical method, and the NDVI of the mountain shadow areas was predicted and validated. The results indicated that the NDVI of the study area showed strong spatial variability and spatial autocorrelation resulting from the impact of intrinsic factors, and the range was 300 m. The spatial distribution maps of the NDVI interpolated by Kriging interpolation method showed that the mean of NDVI was 0.196, apparently strip and block. The higher NDVI values distributed in the area where the slope was greater than 25 degrees of the peak cluster area, while the lower values distributed in the area such as foot of the peak cluster and depression, where slope was less than 25 degrees. Kriging method validation results show that interpolation has a very high prediction accuracy and could predict the NDVI of the shadow area, which provides a new idea and method for monitoring and evaluation of the karst rocky desertification.
THE PRACTICE OF STRUCTURE ACTIVITY RELATIONSHIPS (SAR) IN TOXICOLOGY

EPA Science Inventory

Both qualitative and quantitative modeling methods relating chemical structure to biological activity, called structure-activity relationship analyses or SAR, are applied to the prediction and characterization of chemical toxicity. This minireview will discuss some generic issue...
Characterization of Residual Stress Effects on Fatigue Crack Growth of a Friction Stir Welded Aluminum Alloy

NASA Technical Reports Server (NTRS)

Newman, John A.; Smith, Stephen W.; Seshadri, Banavara R.; James, Mark A.; Brazill, Richard L.; Schultz, Robert W.; Donald, J. Keith; Blair, Amy

2015-01-01

An on-line compliance-based method to account for residual stress effects in stress-intensity factor and fatigue crack growth property determinations has been evaluated. Residual stress intensity factor results determined from specimens containing friction stir weld induced residual stresses are presented, and the on-line method results were found to be in excellent agreement with residual stress-intensity factor data obtained using the cut compliance method. Variable stress-intensity factor tests were designed to demonstrate that a simple superposition model, summing the applied stress-intensity factor with the residual stress-intensity factor, can be used to determine the total crack-tip stress-intensity factor. Finite element, VCCT (virtual crack closure technique), and J-integral analysis methods have been used to characterize weld-induced residual stress using thermal expansion/contraction in the form of an equivalent delta T (change in local temperature during welding) to simulate the welding process. This equivalent delta T was established and applied to analyze different specimen configurations to predict residual stress distributions and associated residual stress-intensity factor values. The predictions were found to agree well with experimental results obtained using the crack- and cut-compliance methods.
Improved HPLC method with the aid of chemometric strategy: determination of loxoprofen in pharmaceutical formulation.

PubMed

Venkatesan, P; Janardhanan, V Sree; Muralidharan, C; Valliappan, K

2012-06-01

Loxoprofen belongs to a class of Nonsteroidal anti-inflammatory drug acts by inhibiting isoforms of cyclo-oxygenase 1 and 2. In this study an improved RP-HPLC method was developed for the quantification of loxoprofen in pharmaceutical dosage form. For that purpose an experimental design approach was employed. Factors-independent variables (organic modifier, pH of the mobile phase and flow rate) were extracted from the preliminary study and as dependent variables three responses (loxoprofen retention factor, resolution between loxoprofen probenecid and retention time of probenecid) were selected. For the improvement of method development and optimization step, Derringer's desirability function was applied to simultaneously optimize the chosen three responses. The procedure allowed deduction of optimal conditions and the predicted optimum was acetonitrile: water (53:47, v/v), pH of the mobile phase adjusted at to 2.9 with ortho phosphoric acid. The separation was achieved in less than 4minutes. The method was applied in the quality control of commercial tablets. The method showed good agreement between the experimental data and predictive value throughout the studied parameter space. The optimized assay condition was validated according to International conference on harmonisation guidelines to confirm specificity, linearity, accuracy and precision.
Evaluating the evaluation of cancer driver genes

PubMed Central

Tokheim, Collin J.; Papadopoulos, Nickolas; Kinzler, Kenneth W.; Vogelstein, Bert; Karchin, Rachel

2016-01-01

Sequencing has identified millions of somatic mutations in human cancers, but distinguishing cancer driver genes remains a major challenge. Numerous methods have been developed to identify driver genes, but evaluation of the performance of these methods is hindered by the lack of a gold standard, that is, bona fide driver gene mutations. Here, we establish an evaluation framework that can be applied to driver gene prediction methods. We used this framework to compare the performance of eight such methods. One of these methods, described here, incorporated a machine-learning–based ratiometric approach. We show that the driver genes predicted by each of the eight methods vary widely. Moreover, the P values reported by several of the methods were inconsistent with the uniform values expected, thus calling into question the assumptions that were used to generate them. Finally, we evaluated the potential effects of unexplained variability in mutation rates on false-positive driver gene predictions. Our analysis points to the strengths and weaknesses of each of the currently available methods and offers guidance for improving them in the future. PMID:27911828
Development of a three dimensional numerical water quality model for continental shelf applications

NASA Technical Reports Server (NTRS)

Spaulding, M.; Hunter, D.

1975-01-01

A model to predict the distribution of water quality parameters in three dimensions was developed. The mass transport equation was solved using a non-dimensional vertical axis and an alternating-direction-implicit finite difference technique. The reaction kinetics of the constituents were incorporated into a matrix method which permits computation of the interactions of multiple constituents. Methods for the computation of dispersion coefficients and coliform bacteria decay rates were determined. Numerical investigations of dispersive and dissipative effects showed that the three-dimensional model performs as predicted by analysis of simpler cases. The model was then applied to a two dimensional vertically averaged tidal dynamics model for the Providence River. It was also extended to a steady state application by replacing the time step with an iteration sequence. This modification was verified by comparison to analytical solutions and applied to a river confluence situation.
Simultaneous determination of three herbicides by differential pulse voltammetry and chemometrics.

PubMed

Ni, Yongnian; Wang, Lin; Kokot, Serge

2011-01-01

A novel differential pulse voltammetry method (DPV) was researched and developed for the simultaneous determination of Pendimethalin, Dinoseb and sodium 5-nitroguaiacolate (5NG) with the aid of chemometrics. The voltammograms of these three compounds overlapped significantly, and to facilitate the simultaneous determination of the three analytes, chemometrics methods were applied. These included classical least squares (CLS), principal component regression (PCR), partial least squares (PLS) and radial basis function-artificial neural networks (RBF-ANN). A separately prepared verification data set was used to confirm the calibrations, which were built from the original and first derivative data matrices of the voltammograms. On the basis relative prediction errors and recoveries of the analytes, the RBF-ANN and the DPLS (D - first derivative spectra) models performed best and are particularly recommended for application. The DPLS calibration model was applied satisfactorily for the prediction of the three analytes from market vegetables and lake water samples.
Applying under-sampling techniques and cost-sensitive learning methods on risk assessment of breast cancer.

PubMed

Hsu, Jia-Lien; Hung, Ping-Cheng; Lin, Hung-Yen; Hsieh, Chung-Ho

2015-04-01

Breast cancer is one of the most common cause of cancer mortality. Early detection through mammography screening could significantly reduce mortality from breast cancer. However, most of screening methods may consume large amount of resources. We propose a computational model, which is solely based on personal health information, for breast cancer risk assessment. Our model can be served as a pre-screening program in the low-cost setting. In our study, the data set, consisting of 3976 records, is collected from Taipei City Hospital starting from 2008.1.1 to 2008.12.31. Based on the dataset, we first apply the sampling techniques and dimension reduction method to preprocess the testing data. Then, we construct various kinds of classifiers (including basic classifiers, ensemble methods, and cost-sensitive methods) to predict the risk. The cost-sensitive method with random forest classifier is able to achieve recall (or sensitivity) as 100 %. At the recall of 100 %, the precision (positive predictive value, PPV), and specificity of cost-sensitive method with random forest classifier was 2.9 % and 14.87 %, respectively. In our study, we build a breast cancer risk assessment model by using the data mining techniques. Our model has the potential to be served as an assisting tool in the breast cancer screening.
Local-search based prediction of medical image registration error

NASA Astrophysics Data System (ADS)

Saygili, Görkem

2018-03-01

Medical image registration is a crucial task in many different medical imaging applications. Hence, considerable amount of work has been published recently that aim to predict the error in a registration without any human effort. If provided, these error predictions can be used as a feedback to the registration algorithm to further improve its performance. Recent methods generally start with extracting image-based and deformation-based features, then apply feature pooling and finally train a Random Forest (RF) regressor to predict the real registration error. Image-based features can be calculated after applying a single registration but provide limited accuracy whereas deformation-based features such as variation of deformation vector field may require up to 20 registrations which is a considerably high time-consuming task. This paper proposes to use extracted features from a local search algorithm as image-based features to estimate the error of a registration. The proposed method comprises a local search algorithm to find corresponding voxels between registered image pairs and based on the amount of shifts and stereo confidence measures, it predicts the amount of registration error in millimetres densely using a RF regressor. Compared to other algorithms in the literature, the proposed algorithm does not require multiple registrations, can be efficiently implemented on a Graphical Processing Unit (GPU) and can still provide highly accurate error predictions in existence of large registration error. Experimental results with real registrations on a public dataset indicate a substantially high accuracy achieved by using features from the local search algorithm.
Process fault detection and nonlinear time series analysis for anomaly detection in safeguards

DOE Office of Scientific and Technical Information (OSTI.GOV)

Burr, T.L.; Mullen, M.F.; Wangen, L.E.

In this paper we discuss two advanced techniques, process fault detection and nonlinear time series analysis, and apply them to the analysis of vector-valued and single-valued time-series data. We investigate model-based process fault detection methods for analyzing simulated, multivariate, time-series data from a three-tank system. The model-predictions are compared with simulated measurements of the same variables to form residual vectors that are tested for the presence of faults (possible diversions in safeguards terminology). We evaluate two methods, testing all individual residuals with a univariate z-score and testing all variables simultaneously with the Mahalanobis distance, for their ability to detect lossmore » of material from two different leak scenarios from the three-tank system: a leak without and with replacement of the lost volume. Nonlinear time-series analysis tools were compared with the linear methods popularized by Box and Jenkins. We compare prediction results using three nonlinear and two linear modeling methods on each of six simulated time series: two nonlinear and four linear. The nonlinear methods performed better at predicting the nonlinear time series and did as well as the linear methods at predicting the linear values.« less
Historical Maps from Modern Images: Using Remote Sensing to Model and Map Century-Long Vegetation Change in a Fire-Prone Region

PubMed Central

Callister, Kate E.; Griffioen, Peter A.; Avitabile, Sarah C.; Haslem, Angie; Kelly, Luke T.; Kenny, Sally A.; Nimmo, Dale G.; Farnsworth, Lisa M.; Taylor, Rick S.; Watson, Simon J.; Bennett, Andrew F.; Clarke, Michael F.

2016-01-01

Understanding the age structure of vegetation is important for effective land management, especially in fire-prone landscapes where the effects of fire can persist for decades and centuries. In many parts of the world, such information is limited due to an inability to map disturbance histories before the availability of satellite images (~1972). Here, we describe a method for creating a spatial model of the age structure of canopy species that established pre-1972. We built predictive neural network models based on remotely sensed data and ecological field survey data. These models determined the relationship between sites of known fire age and remotely sensed data. The predictive model was applied across a 104,000 km2 study region in semi-arid Australia to create a spatial model of vegetation age structure, which is primarily the result of stand-replacing fires which occurred before 1972. An assessment of the predictive capacity of the model using independent validation data showed a significant correlation (rs = 0.64) between predicted and known age at test sites. Application of the model provides valuable insights into the distribution of vegetation age-classes and fire history in the study region. This is a relatively straightforward method which uses widely available data sources that can be applied in other regions to predict age-class distribution beyond the limits imposed by satellite imagery. PMID:27029046
SPATIO-TEMPORAL MODELING OF AGRICULTURAL YIELD DATA WITH AN APPLICATION TO PRICING CROP INSURANCE CONTRACTS

PubMed Central

Ozaki, Vitor A.; Ghosh, Sujit K.; Goodwin, Barry K.; Shirota, Ricardo

2009-01-01

This article presents a statistical model of agricultural yield data based on a set of hierarchical Bayesian models that allows joint modeling of temporal and spatial autocorrelation. This method captures a comprehensive range of the various uncertainties involved in predicting crop insurance premium rates as opposed to the more traditional ad hoc, two-stage methods that are typically based on independent estimation and prediction. A panel data set of county-average yield data was analyzed for 290 counties in the State of Paraná (Brazil) for the period of 1990 through 2002. Posterior predictive criteria are used to evaluate different model specifications. This article provides substantial improvements in the statistical and actuarial methods often applied to the calculation of insurance premium rates. These improvements are especially relevant to situations where data are limited. PMID:19890450
Crystal structure prediction supported by incomplete experimental data

NASA Astrophysics Data System (ADS)

Tsujimoto, Naoto; Adachi, Daiki; Akashi, Ryosuke; Todo, Synge; Tsuneyuki, Shinji

2018-05-01

We propose an efficient theoretical scheme for structure prediction on the basis of the idea of combining methods, which optimize theoretical calculation and experimental data simultaneously. In this scheme, we formulate a cost function based on a weighted sum of interatomic potential energies and a penalty function which is defined with partial experimental data totally insufficient for conventional structure analysis. In particular, we define the cost function using "crystallinity" formulated with only peak positions within the small range of the x-ray-diffraction pattern. We apply this method to well-known polymorphs of SiO2 and C with up to 108 atoms in the simulation cell and show that it reproduces the correct structures efficiently with very limited information of diffraction peaks. This scheme opens a new avenue for determining and predicting structures that are difficult to determine by conventional methods.
High speed transition prediction

NASA Technical Reports Server (NTRS)

Gasperas, Gediminis

1993-01-01

The main objective of this work period was to develop, maintain and exercise state-of-the-art methods for transition prediction in supersonic flow fields. Basic state and stability codes, acquired during the last work period, were exercised and applied to calculate the properties of various flowfields. The development of a code for the prediction of transition location using a currently novel method (the PSE or Parabolized Stability Equation method), initiated during the last work period and continued during the present work period, was cancelled at mid-year for budgetary reasons. Other activities during this period included the presentation of a paper at the APS meeting in Tallahassee, Florida entitled 'Stability of Two-Dimensional Compressible Boundary Layers', as well as the initiation of a paper co-authored with H. Reed of the Arizona State University entitled 'Stability of Boundary Layers'.
A study of methods of prediction and measurement of the transmission sound through the walls of light aircraft

NASA Technical Reports Server (NTRS)

Forssen, B.; Wang, Y. S.; Crocker, M. J.

1981-01-01

Several aspects were studied. The SEA theory was used to develop a theoretical model to predict the transmission loss through an aircraft window. This work mainly consisted of the writing of two computer programs. One program predicts the sound transmission through a plexiglass window (the case of a single partition). The other program applies to the case of a plexiglass window window with a window shade added (the case of a double partition with an air gap). The sound transmission through a structure was measured in experimental studies using several different methods in order that the accuracy and complexity of all the methods could be compared. Also, the measurements were conducted on the simple model of a fuselage (a cylindrical shell), on a real aircraft fuselage, and on stiffened panels.
A study of methods of prediction and measurement of the transmission sound through the walls of light aircraft

NASA Astrophysics Data System (ADS)

Forssen, B.; Wang, Y. S.; Crocker, M. J.

1981-12-01

Several aspects were studied. The SEA theory was used to develop a theoretical model to predict the transmission loss through an aircraft window. This work mainly consisted of the writing of two computer programs. One program predicts the sound transmission through a plexiglass window (the case of a single partition). The other program applies to the case of a plexiglass window window with a window shade added (the case of a double partition with an air gap). The sound transmission through a structure was measured in experimental studies using several different methods in order that the accuracy and complexity of all the methods could be compared. Also, the measurements were conducted on the simple model of a fuselage (a cylindrical shell), on a real aircraft fuselage, and on stiffened panels.
deepNF: Deep network fusion for protein function prediction.

PubMed

Gligorijevic, Vladimir; Barot, Meet; Bonneau, Richard

2018-06-01

The prevalence of high-throughput experimental methods has resulted in an abundance of large-scale molecular and functional interaction networks. The connectivity of these networks provides a rich source of information for inferring functional annotations for genes and proteins. An important challenge has been to develop methods for combining these heterogeneous networks to extract useful protein feature representations for function prediction. Most of the existing approaches for network integration use shallow models that encounter difficulty in capturing complex and highly-nonlinear network structures. Thus, we propose deepNF, a network fusion method based on Multimodal Deep Autoencoders to extract high-level features of proteins from multiple heterogeneous interaction networks. We apply this method to combine STRING networks to construct a common low-dimensional representation containing high-level protein features. We use separate layers for different network types in the early stages of the multimodal autoencoder, later connecting all the layers into a single bottleneck layer from which we extract features to predict protein function. We compare the cross-validation and temporal holdout predictive performance of our method with state-of-the-art methods, including the recently proposed method Mashup. Our results show that our method outperforms previous methods for both human and yeast STRING networks. We also show substantial improvement in the performance of our method in predicting GO terms of varying type and specificity. deepNF is freely available at: https://github.com/VGligorijevic/deepNF. vgligorijevic@flatironinstitute.org, rb133@nyu.edu. Supplementary data are available at Bioinformatics online.
A new algorithm for modeling friction in dynamic mechanical systems

NASA Technical Reports Server (NTRS)

Hill, R. E.

1988-01-01

A method of modeling friction forces that impede the motion of parts of dynamic mechanical systems is described. Conventional methods in which the friction effect is assumed a constant force, or torque, in a direction opposite to the relative motion, are applicable only to those cases where applied forces are large in comparison to the friction, and where there is little interest in system behavior close to the times of transitions through zero velocity. An algorithm is described that provides accurate determination of friction forces over a wide range of applied force and velocity conditions. The method avoids the simulation errors resulting from a finite integration interval used in connection with a conventional friction model, as is the case in many digital computer-based simulations. The algorithm incorporates a predictive calculation based on initial conditions of motion, externally applied forces, inertia, and integration step size. The predictive calculation in connection with an external integration process provides an accurate determination of both static and Coulomb friction forces and resulting motions in dynamic simulations. Accuracy of the results is improved over that obtained with conventional methods and a relatively large integration step size is permitted. A function block for incorporation in a specific simulation program is described. The general form of the algorithm facilitates implementation with various programming languages such as FORTRAN or C, as well as with other simulation programs.
Application of Deep Learning and Supervised Learning Methods to Recognize Nonlinear Hidden Pattern in Water Stress Levels from Spatiotemporal Datasets across Rural and Urban US Counties

NASA Astrophysics Data System (ADS)

Eisenhart, T.; Josset, L.; Rising, J. A.; Devineni, N.; Lall, U.

2017-12-01

In the wake of recent water crises, the need to understand and predict the risk of water stress in urban and rural areas has grown. This understanding has the potential to improve decision making in public resource management, policy making, risk management and investment decisions. Assuming an underlying relationship between urban and rural water stress and observable features, we apply Deep Learning and Supervised Learning models to uncover hidden nonlinear patterns from spatiotemporal datasets. Results of interest includes prediction accuracy on extreme categories (i.e. urban areas highly prone to water stress) and not solely the average risk for urban or rural area, which adds complexity to the tuning of model parameters. We first label urban water stressed counties using annual water quality violations and compile a comprehensive spatiotemporal dataset that captures the yearly evolution of climatic, demographic and economic factors of more than 3,000 US counties over the 1980-2010 period. As county-level data reporting is not done on a yearly basis, we test multiple imputation methods to get around the issue of missing data. Using Python libraries, TensorFlow and scikit-learn, we apply and compare the ability of, amongst other methods, Recurrent Neural Networks (testing both LSTM and GRU cells), Convolutional Neural Networks and Support Vector Machines to predict urban water stress. We evaluate the performance of those models over multiple time spans and combine methods to diminish the risk of overfitting and increase prediction power on test sets. This methodology seeks to identify hidden nonlinear patterns to assess the predominant data features that influence urban and rural water stress. Results from this application at the national scale will assess the performance of deep learning models to predict water stress risk areas across all US counties and will highlight a predominant Machine Learning method for modeling water stress risk using spatiotemporal data.
The Discovery of Novel Biomarkers Improves Breast Cancer Intrinsic Subtype Prediction and Reconciles the Labels in the METABRIC Data Set

PubMed Central

Milioli, Heloisa Helena; Vimieiro, Renato; Riveros, Carlos; Tishchenko, Inna; Berretta, Regina; Moscato, Pablo

2015-01-01

Background The prediction of breast cancer intrinsic subtypes has been introduced as a valuable strategy to determine patient diagnosis and prognosis, and therapy response. The PAM50 method, based on the expression levels of 50 genes, uses a single sample predictor model to assign subtype labels to samples. Intrinsic errors reported within this assay demonstrate the challenge of identifying and understanding the breast cancer groups. In this study, we aim to: a) identify novel biomarkers for subtype individuation by exploring the competence of a newly proposed method named CM1 score, and b) apply an ensemble learning, as opposed to the use of a single classifier, for sample subtype assignment. The overarching objective is to improve class prediction. Methods and Findings The microarray transcriptome data sets used in this study are: the METABRIC breast cancer data recorded for over 2000 patients, and the public integrated source from ROCK database with 1570 samples. We first computed the CM1 score to identify the probes with highly discriminative patterns of expression across samples of each intrinsic subtype. We further assessed the ability of 42 selected probes on assigning correct subtype labels using 24 different classifiers from the Weka software suite. For comparison, the same method was applied on the list of 50 genes from the PAM50 method. Conclusions The CM1 score portrayed 30 novel biomarkers for predicting breast cancer subtypes, with the confirmation of the role of 12 well-established genes. Intrinsic subtypes assigned using the CM1 list and the ensemble of classifiers are more consistent and homogeneous than the original PAM50 labels. The new subtypes show accurate distributions of current clinical markers ER, PR and HER2, and survival curves in the METABRIC and ROCK data sets. Remarkably, the paradoxical attribution of the original labels reinforces the limitations of employing a single sample classifiers to predict breast cancer intrinsic subtypes. PMID:26132585

Advancing our thinking in presence-only and used-available analysis.

PubMed

Warton, David; Aarts, Geert

2013-11-01

1. The problems of analysing used-available data and presence-only data are equivalent, and this paper uses this equivalence as a platform for exploring opportunities for advancing analysis methodology. 2. We suggest some potential methodological advances in used-available analysis, made possible via lessons learnt in the presence-only literature, for example, using modern methods to improve predictive performance. We also consider the converse - potential advances in presence-only analysis inspired by used-available methodology. 3. Notwithstanding these potential advances in methodology, perhaps a greater opportunity is in advancing our thinking about how to apply a given method to a particular data set. 4. It is shown by example that strikingly different results can be achieved for a single data set by applying a given method of analysis in different ways - hence having chosen a method of analysis, the next step of working out how to apply it is critical to performance. 5. We review some key issues to consider in deciding how to apply an analysis method: apply the method in a manner that reflects the study design; consider data properties; and use diagnostic tools to assess how reasonable a given analysis is for the data at hand. © 2013 The Authors. Journal of Animal Ecology © 2013 British Ecological Society.
Neural network based automatic limit prediction and avoidance system and method

NASA Technical Reports Server (NTRS)

Calise, Anthony J. (Inventor); Prasad, Jonnalagadda V. R. (Inventor); Horn, Joseph F. (Inventor)

2001-01-01

A method for performance envelope boundary cueing for a vehicle control system comprises the steps of formulating a prediction system for a neural network and training the neural network to predict values of limited parameters as a function of current control positions and current vehicle operating conditions. The method further comprises the steps of applying the neural network to the control system of the vehicle, where the vehicle has capability for measuring current control positions and current vehicle operating conditions. The neural network generates a map of current control positions and vehicle operating conditions versus the limited parameters in a pre-determined vehicle operating condition. The method estimates critical control deflections from the current control positions required to drive the vehicle to a performance envelope boundary. Finally, the method comprises the steps of communicating the critical control deflection to the vehicle control system; and driving the vehicle control system to provide a tactile cue to an operator of the vehicle as the control positions approach the critical control deflections.
An Aquatic Decomposition Scoring Method to Potentially Predict the Postmortem Submersion Interval of Bodies Recovered from the North Sea.

PubMed

van Daalen, Marjolijn A; de Kat, Dorothée S; Oude Grotebevelsborg, Bernice F L; de Leeuwe, Roosje; Warnaar, Jeroen; Oostra, Roelof Jan; M Duijst-Heesters, Wilma L J

2017-03-01

This study aimed to develop an aquatic decomposition scoring (ADS) method and investigated the predictive value of this method in estimating the postmortem submersion interval (PMSI) of bodies recovered from the North Sea. This method, consisting of an ADS item list and a pictorial reference atlas, showed a high interobserver agreement (Krippendorff's alpha ≥ 0.93) and hence proved to be valid. This scoring method was applied to data, collected from closed cases-cases in which the postmortal submersion interval (PMSI) was known-concerning bodies recovered from the North Sea from 1990 to 2013. Thirty-eight cases met the inclusion criteria and were scored by quantifying the observed total aquatic decomposition score (TADS). Statistical analysis demonstrated that TADS accurately predicts the PMSI (p < 0.001), confirming that the decomposition process in the North Sea is strongly correlated to time. © 2017 American Academy of Forensic Sciences.
Prediction of phospholipidosis-inducing potential of drugs by in vitro biochemical and physicochemical assays followed by multivariate analysis.

PubMed

Kuroda, Yukihiro; Saito, Madoka

2010-03-01

An in vitro method to predict phospholipidosis-inducing potential of cationic amphiphilic drugs (CADs) was developed using biochemical and physicochemical assays. The following parameters were applied to principal component analysis, as well as physicochemical parameters: pK(a) and clogP; dissociation constant of CADs from phospholipid, inhibition of enzymatic phospholipid degradation, and metabolic stability of CADs. In the score plot, phospholipidosis-inducing drugs (amiodarone, propranolol, imipramine, chloroquine) were plotted locally forming the subspace for positive CADs; while non-inducing drugs (chlorpromazine, chloramphenicol, disopyramide, lidocaine) were placed scattering out of the subspace, allowing a clear discrimination between both classes of CADs. CADs that often produce false results by conventional physicochemical or cell-based assay methods were accurately determined by our method. Basic and lipophilic disopyramide could be accurately predicted as a nonphospholipidogenic drug. Moreover, chlorpromazine, which is often falsely predicted as a phospholipidosis-inducing drug by in vitro methods, could be accurately determined. Because this method uses the pharmacokinetic parameters pK(a), clogP, and metabolic stability, which are usually obtained in the early stages of drug development, the method newly requires only the two parameters, binding to phospholipid, and inhibition of lipid degradation enzyme. Therefore, this method provides a cost-effective approach to predict phospholipidosis-inducing potential of a drug. Copyright (c) 2009 Elsevier Ltd. All rights reserved.
Irradiation dose detection of irradiated milk powder using visible and near-infrared spectroscopy and chemometrics.

PubMed

Kong, W W; Zhang, C; Liu, F; Gong, A P; He, Y

2013-08-01

The objective of this study was to examine the possibility of applying visible and near-infrared spectroscopy to the quantitative detection of irradiation dose of irradiated milk powder. A total of 150 samples were used: 100 for the calibration set and 50 for the validation set. The samples were irradiated at 5 different dose levels in the dose range 0 to 6.0 kGy. Six different pretreatment methods were compared. The prediction results of full spectra given by linear and nonlinear calibration methods suggested that Savitzky-Golay smoothing and first derivative were suitable pretreatment methods in this study. Regression coefficient analysis was applied to select effective wavelengths (EW). Less than 10 EW were selected and they were useful for portable detection instrument or sensor development. Partial least squares, extreme learning machine, and least squares support vector machine were used. The best prediction performance was achieved by the EW-extreme learning machine model with first-derivative spectra, and correlation coefficients=0.97 and root mean square error of prediction=0.844. This study provided a new approach for the fast detection of irradiation dose of milk powder. The results could be helpful for quality detection and safety monitoring of milk powder. Copyright © 2013 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Prospective performance evaluation of selected common virtual screening tools. Case study: Cyclooxygenase (COX) 1 and 2.

PubMed

Kaserer, Teresa; Temml, Veronika; Kutil, Zsofia; Vanek, Tomas; Landa, Premysl; Schuster, Daniela

2015-01-01

Computational methods can be applied in drug development for the identification of novel lead candidates, but also for the prediction of pharmacokinetic properties and potential adverse effects, thereby aiding to prioritize and identify the most promising compounds. In principle, several techniques are available for this purpose, however, which one is the most suitable for a specific research objective still requires further investigation. Within this study, the performance of several programs, representing common virtual screening methods, was compared in a prospective manner. First, we selected top-ranked virtual screening hits from the three methods pharmacophore modeling, shape-based modeling, and docking. For comparison, these hits were then additionally predicted by external pharmacophore- and 2D similarity-based bioactivity profiling tools. Subsequently, the biological activities of the selected hits were assessed in vitro, which allowed for evaluating and comparing the prospective performance of the applied tools. Although all methods performed well, considerable differences were observed concerning hit rates, true positive and true negative hits, and hitlist composition. Our results suggest that a rational selection of the applied method represents a powerful strategy to maximize the success of a research project, tightly linked to its aims. We employed cyclooxygenase as application example, however, the focus of this study lied on highlighting the differences in the virtual screening tool performances and not in the identification of novel COX-inhibitors. Copyright © 2015 The Authors. Published by Elsevier Masson SAS.. All rights reserved.
Machine learning of swimming data via wisdom of crowd and regression analysis.

PubMed

Xie, Jiang; Xu, Junfu; Nie, Celine; Nie, Qing

2017-04-01

Every performance, in an officially sanctioned meet, by a registered USA swimmer is recorded into an online database with times dating back to 1980. For the first time, statistical analysis and machine learning methods are systematically applied to 4,022,631 swim records. In this study, we investigate performance features for all strokes as a function of age and gender. The variances in performance of males and females for different ages and strokes were studied, and the correlations of performances for different ages were estimated using the Pearson correlation. Regression analysis show the performance trends for both males and females at different ages and suggest critical ages for peak training. Moreover, we assess twelve popular machine learning methods to predict or classify swimmer performance. Each method exhibited different strengths or weaknesses in different cases, indicating no one method could predict well for all strokes. To address this problem, we propose a new method by combining multiple inference methods to derive Wisdom of Crowd Classifier (WoCC). Our simulation experiments demonstrate that the WoCC is a consistent method with better overall prediction accuracy. Our study reveals several new age-dependent trends in swimming and provides an accurate method for classifying and predicting swimming times.
Research of Water Level Prediction for a Continuous Flood due to Typhoons Based on a Machine Learning Method

NASA Astrophysics Data System (ADS)

Nakatsugawa, M.; Kobayashi, Y.; Okazaki, R.; Taniguchi, Y.

2017-12-01

This research aims to improve accuracy of water level prediction calculations for more effective river management. In August 2016, Hokkaido was visited by four typhoons, whose heavy rainfall caused severe flooding. In the Tokoro river basin of Eastern Hokkaido, the water level (WL) at the Kamikawazoe gauging station, which is at the lower reaches exceeded the design high-water level and the water rose to the highest level on record. To predict such flood conditions and mitigate disaster damage, it is necessary to improve the accuracy of prediction as well as to prolong the lead time (LT) required for disaster mitigation measures such as flood-fighting activities and evacuation actions by residents. There is the need to predict the river water level around the peak stage earlier and more accurately. Previous research dealing with WL prediction had proposed a method in which the WL at the lower reaches is estimated by the correlation with the WL at the upper reaches (hereinafter: "the water level correlation method"). Additionally, a runoff model-based method has been generally used in which the discharge is estimated by giving rainfall prediction data to a runoff model such as a storage function model and then the WL is estimated from that discharge by using a WL discharge rating curve (H-Q curve). In this research, an attempt was made to predict WL by applying the Random Forest (RF) method, which is a machine learning method that can estimate the contribution of explanatory variables. Furthermore, from the practical point of view, we investigated the prediction of WL based on a multiple correlation (MC) method involving factors using explanatory variables with high contribution in the RF method, and we examined the proper selection of explanatory variables and the extension of LT. The following results were found: 1) Based on the RF method tuned up by learning from previous floods, the WL for the abnormal flood case of August 2016 was properly predicted with a lead time of 6 h. 2) Based on the contribution of explanatory variables, factors were selected for the MC method. In this way, plausible prediction results were obtained.
Comparing methods for measuring the rate of spread of invading populations

Treesearch

Marius Gilbert; Andrew Liebhold

2010-01-01

Measuring rates of spread during biological invasions is important for predicting where and when invading organisms will spread in the future as well as for quantifying the influence of environmental conditions on invasion speed. While several methods have been proposed in the literature to measure spread rates, a comprehensive comparison of their accuracy when applied...
A Novel Method for Prediction of Nonlinear Aeroelastic Responses

DTIC Science & Technology

2010-01-01

where a and b are respectively the lengths of the longer and shorter sides ( Ugural and Fenster, 1981). Y Z X Figure 3.5: Twisted beam with 100×5×10...improved trans- verse shear. Computer Methods in Applied Mechanics and Engineering 50 (1), 7–91. cauta. Ugural , A. C. and S. K. Fenster (1981
An Efficient Pattern Mining Approach for Event Detection in Multivariate Temporal Data

PubMed Central

Batal, Iyad; Cooper, Gregory; Fradkin, Dmitriy; Harrison, James; Moerchen, Fabian; Hauskrecht, Milos

2015-01-01

This work proposes a pattern mining approach to learn event detection models from complex multivariate temporal data, such as electronic health records. We present Recent Temporal Pattern mining, a novel approach for efficiently finding predictive patterns for event detection problems. This approach first converts the time series data into time-interval sequences of temporal abstractions. It then constructs more complex time-interval patterns backward in time using temporal operators. We also present the Minimal Predictive Recent Temporal Patterns framework for selecting a small set of predictive and non-spurious patterns. We apply our methods for predicting adverse medical events in real-world clinical data. The results demonstrate the benefits of our methods in learning accurate event detection models, which is a key step for developing intelligent patient monitoring and decision support systems. PMID:26752800
Temporal effects in trend prediction: identifying the most popular nodes in the future.

PubMed

Zhou, Yanbo; Zeng, An; Wang, Wei-Hong

2015-01-01

Prediction is an important problem in different science domains. In this paper, we focus on trend prediction in complex networks, i.e. to identify the most popular nodes in the future. Due to the preferential attachment mechanism in real systems, nodes' recent degree and cumulative degree have been successfully applied to design trend prediction methods. Here we took into account more detailed information about the network evolution and proposed a temporal-based predictor (TBP). The TBP predicts the future trend by the node strength in the weighted network with the link weight equal to its exponential aging. Three data sets with time information are used to test the performance of the new method. We find that TBP have high general accuracy in predicting the future most popular nodes. More importantly, it can identify many potential objects with low popularity in the past but high popularity in the future. The effect of the decay speed in the exponential aging on the results is discussed in detail.
Temporal Effects in Trend Prediction: Identifying the Most Popular Nodes in the Future

PubMed Central

Zhou, Yanbo; Zeng, An; Wang, Wei-Hong

2015-01-01

Prediction is an important problem in different science domains. In this paper, we focus on trend prediction in complex networks, i.e. to identify the most popular nodes in the future. Due to the preferential attachment mechanism in real systems, nodes’ recent degree and cumulative degree have been successfully applied to design trend prediction methods. Here we took into account more detailed information about the network evolution and proposed a temporal-based predictor (TBP). The TBP predicts the future trend by the node strength in the weighted network with the link weight equal to its exponential aging. Three data sets with time information are used to test the performance of the new method. We find that TBP have high general accuracy in predicting the future most popular nodes. More importantly, it can identify many potential objects with low popularity in the past but high popularity in the future. The effect of the decay speed in the exponential aging on the results is discussed in detail. PMID:25806810
Complete fold annotation of the human proteome using a novel structural feature space

DOE PAGES

Middleton, Sarah A.; Illuminati, Joseph; Kim, Junhyong

2017-04-13

Recognition of protein structural fold is the starting point for many structure prediction tools and protein function inference. Fold prediction is computationally demanding and recognizing novel folds is difficult such that the majority of proteins have not been annotated for fold classification. Here we describe a new machine learning approach using a novel feature space that can be used for accurate recognition of all 1,221 currently known folds and inference of unknown novel folds. We show that our method achieves better than 94% accuracy even when many folds have only one training example. We demonstrate the utility of this methodmore » by predicting the folds of 34,330 human protein domains and showing that these predictions can yield useful insights into potential biological function, such as prediction of RNA-binding ability. Finally, our method can be applied to de novo fold prediction of entire proteomes and identify candidate novel fold families.« less
Prediction of sound transmission loss through multilayered panels by using Gaussian distribution of directional incident energy

PubMed

Kang; Ih; Kim; Kim

2000-03-01

In this study, a new prediction method is suggested for sound transmission loss (STL) of multilayered panels of infinite extent. Conventional methods such as random or field incidence approach often given significant discrepancies in predicting STL of multilayered panels when compared with the experiments. In this paper, appropriate directional distributions of incident energy to predict the STL of multilayered panels are proposed. In order to find a weighting function to represent the directional distribution of incident energy on the wall in a reverberation chamber, numerical simulations by using a ray-tracing technique are carried out. Simulation results reveal that the directional distribution can be approximately expressed by the Gaussian distribution function in terms of the angle of incidence. The Gaussian function is applied to predict the STL of various multilayered panel configurations as well as single panels. The compared results between the measurement and the prediction show good agreements, which validate the proposed Gaussian function approach.
Fast and simultaneous prediction of animal feed nutritive values using near infrared reflectance spectroscopy

NASA Astrophysics Data System (ADS)

Samadi; Wajizah, S.; Munawar, A. A.

2018-02-01

Feed plays an important factor in animal production. The purpose of this study is to apply NIRS method in determining feed values. NIRS spectra data were acquired for feed samples in wavelength range of 1000 - 2500 nm with 32 scans and 0.2 nm wavelength. Spectral data were corrected by de-trending (DT) and standard normal variate (SNV) methods. Prediction of in vitro dry matter digestibility (IVDMD) and in vitro organic matter digestibility (IVOMD) were established as model by using principal component regression (PCR) and validated using leave one out cross validation (LOOCV). Prediction performance was quantified using coefficient correlation (r) and residual predictive deviation (RPD) index. The results showed that IVDMD and IVOMD can be predicted by using SNV spectra data with r and RPD index: 0.93 and 2.78 for IVDMD ; 0.90 and 2.35 for IVOMD respectively. In conclusion, NIRS technique appears feasible to predict animal feed nutritive values.
Prediction of fat-free body mass from bioelectrical impedance and anthropometry among 3-year-old children using DXA

PubMed Central

Ejlerskov, Katrine T.; Jensen, Signe M.; Christensen, Line B.; Ritz, Christian; Michaelsen, Kim F.; Mølgaard, Christian

2014-01-01

For 3-year-old children suitable methods to estimate body composition are sparse. We aimed to develop predictive equations for estimating fat-free mass (FFM) from bioelectrical impedance (BIA) and anthropometry using dual-energy X-ray absorptiometry (DXA) as reference method using data from 99 healthy 3-year-old Danish children. Predictive equations were derived from two multiple linear regression models, a comprehensive model (height2/resistance (RI), six anthropometric measurements) and a simple model (RI, height, weight). Their uncertainty was quantified by means of 10-fold cross-validation approach. Prediction error of FFM was 3.0% for both equations (root mean square error: 360 and 356 g, respectively). The derived equations produced BIA-based prediction of FFM and FM near DXA scan results. We suggest that the predictive equations can be applied in similar population samples aged 2–4 years. The derived equations may prove useful for studies linking body composition to early risk factors and early onset of obesity. PMID:24463487
Prediction of fat-free body mass from bioelectrical impedance and anthropometry among 3-year-old children using DXA.

PubMed

Ejlerskov, Katrine T; Jensen, Signe M; Christensen, Line B; Ritz, Christian; Michaelsen, Kim F; Mølgaard, Christian

2014-01-27

For 3-year-old children suitable methods to estimate body composition are sparse. We aimed to develop predictive equations for estimating fat-free mass (FFM) from bioelectrical impedance (BIA) and anthropometry using dual-energy X-ray absorptiometry (DXA) as reference method using data from 99 healthy 3-year-old Danish children. Predictive equations were derived from two multiple linear regression models, a comprehensive model (height(2)/resistance (RI), six anthropometric measurements) and a simple model (RI, height, weight). Their uncertainty was quantified by means of 10-fold cross-validation approach. Prediction error of FFM was 3.0% for both equations (root mean square error: 360 and 356 g, respectively). The derived equations produced BIA-based prediction of FFM and FM near DXA scan results. We suggest that the predictive equations can be applied in similar population samples aged 2-4 years. The derived equations may prove useful for studies linking body composition to early risk factors and early onset of obesity.
Automatic Prediction of Conversion from Mild Cognitive Impairment to Probable Alzheimer’s Disease using Structural Magnetic Resonance Imaging

PubMed Central

Nho, Kwangsik; Shen, Li; Kim, Sungeun; Risacher, Shannon L.; West, John D.; Foroud, Tatiana; Jack, Clifford R.; Weiner, Michael W.; Saykin, Andrew J.

2010-01-01

Mild Cognitive Impairment (MCI) is thought to be a precursor to the development of early Alzheimer’s disease (AD). For early diagnosis of AD, the development of a model that is able to predict the conversion of amnestic MCI to AD is challenging. Using automatic whole-brain MRI analysis techniques and pattern classification methods, we developed a model to differentiate AD from healthy controls (HC), and then applied it to the prediction of MCI conversion to AD. Classification was performed using support vector machines (SVMs) together with a SVM-based feature selection method, which selected a set of most discriminating predictors for optimizing prediction accuracy. We obtained 90.5% cross-validation accuracy for classifying AD and HC, and 72.3% accuracy for predicting MCI conversion to AD. These analyses suggest that a classifier trained to separate HC vs. AD has substantial potential for predicting MCI conversion to AD. PMID:21347037
Prediction With Dimension Reduction of Multiple Molecular Data Sources for Patient Survival.

PubMed

Kaplan, Adam; Lock, Eric F

2017-01-01

Predictive modeling from high-dimensional genomic data is often preceded by a dimension reduction step, such as principal component analysis (PCA). However, the application of PCA is not straightforward for multisource data, wherein multiple sources of 'omics data measure different but related biological components. In this article, we use recent advances in the dimension reduction of multisource data for predictive modeling. In particular, we apply exploratory results from Joint and Individual Variation Explained (JIVE), an extension of PCA for multisource data, for prediction of differing response types. We conduct illustrative simulations to illustrate the practical advantages and interpretability of our approach. As an application example, we consider predicting survival for patients with glioblastoma multiforme from 3 data sources measuring messenger RNA expression, microRNA expression, and DNA methylation. We also introduce a method to estimate JIVE scores for new samples that were not used in the initial dimension reduction and study its theoretical properties; this method is implemented in the R package R.JIVE on CRAN, in the function jive.predict.

Using EarthScope magnetotelluric data to improve the resilience of the US power grid: rapid predictions of geomagnetically induced currents

NASA Astrophysics Data System (ADS)

Schultz, A.; Bonner, L. R., IV

2016-12-01

Existing methods to predict Geomagnetically Induced Currents (GICs) in power grids, such as the North American Electric Reliability Corporation standard adopted by the power industry, require explicit knowledge of the electrical resistivity structure of the crust and mantle to solve for ground level electric fields along transmission lines. The current standard is to apply regional 1-D resistivity models to this problem, which facilitates rapid solution of the governing equations. The systematic mapping of continental resistivity structure from projects such as EarthScope reveals several orders of magnitude of lateral variations in resistivity on local, regional and continental scales, resulting in electric field intensifications relative to existing 1-D solutions that can impact GICs to first order. The computational burden on the ground resistivity/GIC problem of coupled 3-D solutions inhibits the prediction of GICs in a timeframe useful to protecting power grids. In this work we reduce the problem to applying a set of filters, recognizing that the magnetotelluric impedance tensors implicitly contain all known information about the resistivity structure beneath a given site, and thus provides the required relationship between electric and magnetic fields at each site. We project real-time magnetic field data from distant magnetic observatories through a robustly calculated multivariate transfer function to locations where magnetotelluric impedance tensors had previously been obtained. This provides a real-time prediction of the magnetic field at each of those points. We then project the predicted magnetic fields through the impedance tensors to obtain predictions of electric fields induced at ground level. Thus, electric field predictions can be generated in real-time for an entire array from real-time observatory data, then interpolated onto points representing a power transmission line contained within the array to produce a combined electric field prediction necessary for GIC prediction along that line. This method produces more accurate predictions of ground electric fields in conductively heterogeneous areas that are not limited by distance from the nearest observatory, while still retaining comparable computational speeds as existing methods.
Conditional Density Estimation with HMM Based Support Vector Machines

NASA Astrophysics Data System (ADS)

Hu, Fasheng; Liu, Zhenqiu; Jia, Chunxin; Chen, Dechang

Conditional density estimation is very important in financial engineer, risk management, and other engineering computing problem. However, most regression models have a latent assumption that the probability density is a Gaussian distribution, which is not necessarily true in many real life applications. In this paper, we give a framework to estimate or predict the conditional density mixture dynamically. Through combining the Input-Output HMM with SVM regression together and building a SVM model in each state of the HMM, we can estimate a conditional density mixture instead of a single gaussian. With each SVM in each node, this model can be applied for not only regression but classifications as well. We applied this model to denoise the ECG data. The proposed method has the potential to apply to other time series such as stock market return predictions.
A Sensor Dynamic Measurement Error Prediction Model Based on NAPSO-SVM

PubMed Central

Jiang, Minlan; Jiang, Lan; Jiang, Dingde; Li, Fei

2018-01-01

Dynamic measurement error correction is an effective way to improve sensor precision. Dynamic measurement error prediction is an important part of error correction, and support vector machine (SVM) is often used for predicting the dynamic measurement errors of sensors. Traditionally, the SVM parameters were always set manually, which cannot ensure the model’s performance. In this paper, a SVM method based on an improved particle swarm optimization (NAPSO) is proposed to predict the dynamic measurement errors of sensors. Natural selection and simulated annealing are added in the PSO to raise the ability to avoid local optima. To verify the performance of NAPSO-SVM, three types of algorithms are selected to optimize the SVM’s parameters: the particle swarm optimization algorithm (PSO), the improved PSO optimization algorithm (NAPSO), and the glowworm swarm optimization (GSO). The dynamic measurement error data of two sensors are applied as the test data. The root mean squared error and mean absolute percentage error are employed to evaluate the prediction models’ performances. The experimental results show that among the three tested algorithms the NAPSO-SVM method has a better prediction precision and a less prediction errors, and it is an effective method for predicting the dynamic measurement errors of sensors. PMID:29342942
A Simple Plasma Retinol Isotope Ratio Method for Estimating β-Carotene Relative Bioefficacy in Humans: Validation with the Use of Model-Based Compartmental Analysis.

PubMed

Ford, Jennifer Lynn; Green, Joanne Balmer; Lietz, Georg; Oxley, Anthony; Green, Michael H

2017-09-01

Background: Provitamin A carotenoids are an important source of dietary vitamin A for many populations. Thus, accurate and simple methods for estimating carotenoid bioefficacy are needed to evaluate the vitamin A value of test solutions and plant sources. β-Carotene bioefficacy is often estimated from the ratio of the areas under plasma isotope response curves after subjects ingest labeled β-carotene and a labeled retinyl acetate reference dose [isotope reference method (IRM)], but to our knowledge, the method has not yet been evaluated for accuracy. Objectives: Our objectives were to develop and test a physiologically based compartmental model that includes both absorptive and postabsorptive β-carotene bioconversion and to use the model to evaluate the accuracy of the IRM and a simple plasma retinol isotope ratio [(RIR), labeled β-carotene-derived retinol/labeled reference-dose-derived retinol in one plasma sample] for estimating relative bioefficacy. Methods: We used model-based compartmental analysis (Simulation, Analysis and Modeling software) to develop and apply a model that provided known values for β-carotene bioefficacy. Theoretical data for 10 subjects were generated by the model and used to determine bioefficacy by RIR and IRM; predictions were compared with known values. We also applied RIR and IRM to previously published data. Results: Plasma RIR accurately predicted β-carotene relative bioefficacy at 14 d or later. IRM also accurately predicted bioefficacy by 14 d, except that, when there was substantial postabsorptive bioconversion, IRM underestimated bioefficacy. Based on our model, 1-d predictions of relative bioefficacy include absorptive plus a portion of early postabsorptive conversion. Conclusion: The plasma RIR is a simple tracer method that accurately predicts β-carotene relative bioefficacy based on analysis of one blood sample obtained at ≥14 d after co-ingestion of labeled β-carotene and retinyl acetate. The method also provides information about the contributions of absorptive and postabsorptive conversion to total bioefficacy if an additional sample is taken at 1 d. © 2017 American Society for Nutrition.
Electro-oculography-based detection of sleep-wake in sleep apnea patients.

PubMed

Virkkala, Jussi; Toppila, Jussi; Maasilta, Paula; Bachour, Adel

2015-09-01

Recently, we have developed a simple method that uses two electro-oculography (EOG) electrodes for the automatic scoring of sleep-wake in normal subjects. In this study, we investigated the usefulness of this method on 284 consecutive patients referred for a suspicion of sleep apnea who underwent a polysomnography (PSG). We applied the AASM 2007 scoring rules. A simple automatic sleep-wake classification algorithm based on 18-45 Hz beta power was applied to the calculated bipolar EOG channel and was compared to standard polysomnography. Epoch by epoch agreement was evaluated. Eighteen patients were excluded due to poor EOG quality. One hundred fifty-eight males and 108 females were studied, their mean age was 48 (range 17-89) years, apnea-hypopnea index 13 (range 0-96) /h, BMI 29 (range 17-52) kg/m(2), and sleep efficiency 78 (range 0-98) %. The mean agreement in sleep-wake states between EOG and PSG was 85% and the Cohen's kappa was 0.56. Overall epoch-by-epoch agreement was 85%, and the Cohen's kappa was 0.57 with positive predictive value of 91% and negative predictive value of 65%. The EOG method can be applied to patients referred for suspicion of sleep apnea to indicate the sleep-wake state.
Prediction and analysis of beta-turns in proteins by support vector machine.

PubMed

Pham, Tho Hoan; Satou, Kenji; Ho, Tu Bao

2003-01-01

Tight turn has long been recognized as one of the three important features of proteins after the alpha-helix and beta-sheet. Tight turns play an important role in globular proteins from both the structural and functional points of view. More than 90% tight turns are beta-turns. Analysis and prediction of beta-turns in particular and tight turns in general are very useful for the design of new molecules such as drugs, pesticides, and antigens. In this paper, we introduce a support vector machine (SVM) approach to prediction and analysis of beta-turns. We have investigated two aspects of applying SVM to the prediction and analysis of beta-turns. First, we developed a new SVM method, called BTSVM, which predicts beta-turns of a protein from its sequence. The prediction results on the dataset of 426 non-homologous protein chains by sevenfold cross-validation technique showed that our method is superior to the other previous methods. Second, we analyzed how amino acid positions support (or prevent) the formation of beta-turns based on the "multivariable" classification model of a linear SVM. This model is more general than the other ones of previous statistical methods. Our analysis results are more comprehensive and easier to use than previously published analysis results.
Predictive performance of the Vitrigel-eye irritancy test method using 118 chemicals.

PubMed

Yamaguchi, Hiroyuki; Kojima, Hajime; Takezawa, Toshiaki

2016-08-01

We recently developed a novel Vitrigel-eye irritancy test (EIT) method. The Vitrigel-EIT method is composed of two parts, i.e., the construction of a human corneal epithelium (HCE) model in a collagen vitrigel membrane chamber and the prediction of eye irritancy by analyzing the time-dependent profile of transepithelial electrical resistance values for 3 min after exposing a chemical to the HCE model. In this study, we estimated the predictive performance of Vitrigel-EIT method by testing a total of 118 chemicals. The category determined by the Vitrigel-EIT method in comparison to the globally harmonized system classification revealed that the sensitivity, specificity and accuracy were 90.1%, 65.9% and 80.5%, respectively. Here, five of seven false-negative chemicals were acidic chemicals inducing the irregular rising of transepithelial electrical resistance values. In case of eliminating the test chemical solutions showing pH 5 or lower, the sensitivity, specificity and accuracy were improved to 96.8%, 67.4% and 84.4%, respectively. Meanwhile, nine of 16 false-positive chemicals were classified irritant by the US Environmental Protection Agency. In addition, the disappearance of ZO-1, a tight junction-associated protein and MUC1, a cell membrane-spanning mucin was immunohistologically confirmed in the HCE models after exposing not only eye irritant chemicals but also false-positive chemicals, suggesting that such false-positive chemicals have an eye irritant potential. These data demonstrated that the Vitrigel-EIT method could provide excellent predictive performance to judge the widespread eye irritancy, including very mild irritant chemicals. We hope that the Vitrigel-EIT method contributes to the development of safe commodity chemicals. Copyright © 2015 The Authors. Journal of Applied Toxicology published by John Wiley & Sons Ltd. Copyright © 2015 The Authors. Journal of Applied Toxicology published by John Wiley & Sons Ltd.
Soil erosion model predictions using parent material/soil texture-based parameters compared to using site-specific parameters

Treesearch

R. B. Foltz; W. J. Elliot; N. S. Wagenbrenner

2011-01-01

Forested areas disturbed by access roads produce large amounts of sediment. One method to predict erosion and, hence, manage forest roads is the use of physically based soil erosion models. A perceived advantage of a physically based model is that it can be parameterized at one location and applied at another location with similar soil texture or geological parent...
Growth and Yield Predictions for Thinned and Unthinned Slash Pine Plantations on Cutover Sites in the West Gulf Region

Treesearch

Stanley J. Zarnoch; Donald P. Feduccia; V. Clark Baldwin; Tommy R. Dell

1991-01-01

A-growth and yield model has been developed for slash pine plantations on problem-free cutover sites in the west gulf region. The model was based on the moment-percentile method using the Weibull distribution for tree diameters. This technique was applied to untbinned and thinned stand projections and, subsequently, to the prediction of residual stands immediately...
An Alternative Procedure for Estimating Unit Learning Curves,

DTIC Science & Technology

1985-09-01

the model accurately describes the real-life situation, i.e., when the model is properly applied to the data, it can be a powerful tool for...predicting unit production costs. There are, however, some unique estimation problems inherent in the model . The usual method of generating predicted unit...production costs attempts to extend properties of least squares estimators to non- linear functions of these estimators. The result is biased estimates of
Uncertainty analysis of an inflow forecasting model: extension of the UNEEC machine learning-based method

NASA Astrophysics Data System (ADS)

Pianosi, Francesca; Lal Shrestha, Durga; Solomatine, Dimitri

2010-05-01

This research presents an extension of UNEEC (Uncertainty Estimation based on Local Errors and Clustering, Shrestha and Solomatine, 2006, 2008 & Solomatine and Shrestha, 2009) method in the direction of explicit inclusion of parameter uncertainty. UNEEC method assumes that there is an optimal model and the residuals of the model can be used to assess the uncertainty of the model prediction. It is assumed that all sources of uncertainty including input, parameter and model structure uncertainty are explicitly manifested in the model residuals. In this research, theses assumptions are relaxed, and the UNEEC method is extended to consider parameter uncertainty as well (abbreviated as UNEEC-P). In UNEEC-P, first we use Monte Carlo (MC) sampling in parameter space to generate N model realizations (each of which is a time series), estimate the prediction quantiles based on the empirical distribution functions of the model residuals considering all the residual realizations, and only then apply the standard UNEEC method that encapsulates the uncertainty of a hydrologic model (expressed by quantiles of the error distribution) in a machine learning model (e.g., ANN). UNEEC-P is applied first to a linear regression model of synthetic data, and then to a real case study of forecasting inflow to lake Lugano in northern Italy. The inflow forecasting model is a stochastic heteroscedastic model (Pianosi and Soncini-Sessa, 2009). The preliminary results show that the UNEEC-P method produces wider uncertainty bounds, which is consistent with the fact that the method considers also parameter uncertainty of the optimal model. In the future UNEEC method will be further extended to consider input and structure uncertainty which will provide more realistic estimation of model predictions.
An Investigation into the Roles of Theory of Mind, Emotion Regulation, and Attachment Styles in Predicting the Traits of Borderline Personality Disorder

PubMed Central

Ghiasi, Hamed; Mohammadi, Abolalfazl; Zarrinfar, Pouria

2016-01-01

Objective: Borderline personality disorder is one of the most complex and prevalent personality disorders. Many variables have so far been studied in relation to this disorder. This study aimed to investigate the role of emotion regulation, attachment styles, and theory of mind in predicting the traits of borderline personality disorder. Method: In this study, 85 patients with borderline personality disorder were selected using convenience sampling method. To measure the desired variables, the questionnaires of Gross emotion regulation, Collins and Read attachment styles, and Baron Cohen's Reading Mind from Eyes Test were applied. The data were analyzed using multivariate stepwise regression technique. Results: Emotion regulation, attachment styles, and theory of mind predicted 41.2% of the variance criterion altogether; among which, the shares of emotion regulation, attachment styles and theory of mind to the distribution of the traits of borderline personality disorder were 27.5%, 9.8%, and 3.9%, respectively.‎‎ Conclusion: The results of the study revealed that emotion regulation, attachment styles, and theory of mind are important variables in predicting the traits of borderline personality disorder and that these variables can be well applied for both the treatment and identification of this disorder. PMID:28050180
Pediatric Emergency Care Applied Research Network head injuryprediction rules: on the basis of cost and effectiveness

PubMed

Gökharman, Fatma Dilek; Aydın, Sonay; Fatihoğlu, Erdem; Koşar, Pınar Nercis

2017-12-19

Background/aim: Head injuries are commonly seen in the pediatric population. Noncontrast enhanced cranial CT is the method of choice to detect possible traumatic brain injury (TBI). Concerns about ionizing radiation exposure make the evaluation more challenging. The aim of this study was to evaluate the effectiveness of the Pediatric Emergency Care Applied Research Network (PECARN) rules in predicting clinically important TBI and to determine the amount of medical resource waste and unnecessary radiation exposure.Materials and methods: This retrospective study included 1041 pediatric patients presented to the emergency department. The patients were divided into subgroups of "appropriate for cranial CT", "not appropriate for cranial CT" and "cranial CT/observation of patient; both are appropriate". To determine the effectiveness of the PECARN rules, data were analyzed according to the presence of pathological findings Results: "Appropriate for cranial CT" results can predict pathology presence 118,056-fold compared to the "not appropriate for cranial CT" results. With "cranial CT/observation of patient; both are appropriate" results, pathology presence was predicted 11,457-fold compared to "not appropriate for cranial CT" results.Conclusion: PECARN rules can predict pathology presence successfully in pediatric TBI. Using PECARN can decrease resource waste and exposure to ionizing radiation.
An Investigation into the Roles of Theory of Mind, Emotion Regulation, and Attachment Styles in Predicting the Traits of Borderline Personality Disorder.

PubMed

Ghiasi, Hamed; Mohammadi, Abolalfazl; Zarrinfar, Pouria

2016-10-01

Objective: Borderline personality disorder is one of the most complex and prevalent personality disorders. Many variables have so far been studied in relation to this disorder. This study aimed to investigate the role of emotion regulation, attachment styles, and theory of mind in predicting the traits of borderline personality disorder. Method: In this study, 85 patients with borderline personality disorder were selected using convenience sampling method. To measure the desired variables, the questionnaires of Gross emotion regulation, Collins and Read attachment styles, and Baron Cohen's Reading Mind from Eyes Test were applied. The data were analyzed using multivariate stepwise regression technique. Results: Emotion regulation, attachment styles, and theory of mind predicted 41.2% of the variance criterion altogether; among which, the shares of emotion regulation, attachment styles and theory of mind to the distribution of the traits of borderline personality disorder were 27.5%, 9.8%, and 3.9%, respectively.‎‎ Conclusion : The results of the study revealed that emotion regulation, attachment styles, and theory of mind are important variables in predicting the traits of borderline personality disorder and that these variables can be well applied for both the treatment and identification of this disorder.
Linearized blade row compression component model. Stability and frequency response analysis of a J85-3 compressor

NASA Technical Reports Server (NTRS)

Tesch, W. A.; Moszee, R. H.; Steenken, W. G.

1976-01-01

NASA developed stability and frequency response analysis techniques were applied to a dynamic blade row compression component stability model to provide a more economic approach to surge line and frequency response determination than that provided by time-dependent methods. This blade row model was linearized and the Jacobian matrix was formed. The clean-inlet-flow stability characteristics of the compressors of two J85-13 engines were predicted by applying the alternate Routh-Hurwitz stability criterion to the Jacobian matrix. The predicted surge line agreed with the clean-inlet-flow surge line predicted by the time-dependent method to a high degree except for one engine at 94% corrected speed. No satisfactory explanation of this discrepancy was found. The frequency response of the linearized system was determined by evaluating its Laplace transfer function. The results of the linearized-frequency-response analysis agree with the time-dependent results when the time-dependent inlet total-pressure and exit-flow function amplitude boundary conditions are less than 1 percent and 3 percent, respectively. The stability analysis technique was extended to a two-sector parallel compressor model with and without interstage crossflow and predictions were carried out for total-pressure distortion extents of 180 deg, 90 deg, 60 deg, and 30 deg.
Likelihood of achieving air quality targets under model uncertainties.

PubMed

Digar, Antara; Cohan, Daniel S; Cox, Dennis D; Kim, Byeong-Uk; Boylan, James W

2011-01-01

Regulatory attainment demonstrations in the United States typically apply a bright-line test to predict whether a control strategy is sufficient to attain an air quality standard. Photochemical models are the best tools available to project future pollutant levels and are a critical part of regulatory attainment demonstrations. However, because photochemical models are uncertain and future meteorology is unknowable, future pollutant levels cannot be predicted perfectly and attainment cannot be guaranteed. This paper introduces a computationally efficient methodology for estimating the likelihood that an emission control strategy will achieve an air quality objective in light of uncertainties in photochemical model input parameters (e.g., uncertain emission and reaction rates, deposition velocities, and boundary conditions). The method incorporates Monte Carlo simulations of a reduced form model representing pollutant-precursor response under parametric uncertainty to probabilistically predict the improvement in air quality due to emission control. The method is applied to recent 8-h ozone attainment modeling for Atlanta, Georgia, to assess the likelihood that additional controls would achieve fixed (well-defined) or flexible (due to meteorological variability and uncertain emission trends) targets of air pollution reduction. The results show that in certain instances ranking of the predicted effectiveness of control strategies may differ between probabilistic and deterministic analyses.
Thermal barrier coating life-prediction model development

NASA Technical Reports Server (NTRS)

Strangman, T. E.; Neumann, J.; Liu, A.

1986-01-01

The program focuses on predicting the lives of two types of strain-tolerant and oxidation-resistant thermal barrier coating (TBC) systems that are produced by commercial coating suppliers to the gas turbine industry. The plasma-sprayed TBC system, composed of a low-pressure plasma-spray (LPPS) or an argon shrouded plasma-spray (ASPS) applied oxidation resistant NiCrAlY or (CoNiCrAlY) bond coating and an air-plasma-sprayed yttria partially stabilized zirconia insulative layer, is applied by both Chromalloy, Klock, and Union Carbide. The second type of TBS is applied by the electron beam-physical vapor deposition (EB-PVD) process by Temescal. The second year of the program was focused on specimen procurement, TMC system characterization, nondestructive evaluation methods, life prediction model development, and TFE731 engine testing of thermal barrier coated blades. Materials testing is approaching completion. Thermomechanical characterization of the TBC systems, with toughness, and spalling strain tests, was completed. Thermochemical testing is approximately two-thirds complete. Preliminary materials life models for the bond coating oxidation and zirconia sintering failure modes were developed. Integration of these life models with airfoil component analysis methods is in progress. Testing of high pressure turbine blades coated with the program TBS systems is in progress in a TFE731 turbofan engine. Eddy current technology feasibility was established with respect to nondestructively measuring zirconia layer thickness of a TBC system.
An auxiliary optimization method for complex public transit route network based on link prediction

NASA Astrophysics Data System (ADS)

Zhang, Lin; Lu, Jian; Yue, Xianfei; Zhou, Jialin; Li, Yunxuan; Wan, Qian

2018-02-01

Inspired by the missing (new) link prediction and the spurious existing link identification in link prediction theory, this paper establishes an auxiliary optimization method for public transit route network (PTRN) based on link prediction. First, link prediction applied to PTRN is described, and based on reviewing the previous studies, the summary indices set and its algorithms set are collected for the link prediction experiment. Second, through analyzing the topological properties of Jinan’s PTRN established by the Space R method, we found that this is a typical small-world network with a relatively large average clustering coefficient. This phenomenon indicates that the structural similarity-based link prediction will show a good performance in this network. Then, based on the link prediction experiment of the summary indices set, three indices with maximum accuracy are selected for auxiliary optimization of Jinan’s PTRN. Furthermore, these link prediction results show that the overall layout of Jinan’s PTRN is stable and orderly, except for a partial area that requires optimization and reconstruction. The above pattern conforms to the general pattern of the optimal development stage of PTRN in China. Finally, based on the missing (new) link prediction and the spurious existing link identification, we propose optimization schemes that can be used not only to optimize current PTRN but also to evaluate PTRN planning.
Nonlinear-drifted Brownian motion with multiple hidden states for remaining useful life prediction of rechargeable batteries

NASA Astrophysics Data System (ADS)

Wang, Dong; Zhao, Yang; Yang, Fangfang; Tsui, Kwok-Leung

2017-09-01

Brownian motion with adaptive drift has attracted much attention in prognostics because its first hitting time is highly relevant to remaining useful life prediction and it follows the inverse Gaussian distribution. Besides linear degradation modeling, nonlinear-drifted Brownian motion has been developed to model nonlinear degradation. Moreover, the first hitting time distribution of the nonlinear-drifted Brownian motion has been approximated by time-space transformation. In the previous studies, the drift coefficient is the only hidden state used in state space modeling of the nonlinear-drifted Brownian motion. Besides the drift coefficient, parameters of a nonlinear function used in the nonlinear-drifted Brownian motion should be treated as additional hidden states of state space modeling to make the nonlinear-drifted Brownian motion more flexible. In this paper, a prognostic method based on nonlinear-drifted Brownian motion with multiple hidden states is proposed and then it is applied to predict remaining useful life of rechargeable batteries. 26 sets of rechargeable battery degradation samples are analyzed to validate the effectiveness of the proposed prognostic method. Moreover, some comparisons with a standard particle filter based prognostic method, a spherical cubature particle filter based prognostic method and two classic Bayesian prognostic methods are conducted to highlight the superiority of the proposed prognostic method. Results show that the proposed prognostic method has lower average prediction errors than the particle filter based prognostic methods and the classic Bayesian prognostic methods for battery remaining useful life prediction.
Predicting β-Turns in Protein Using Kernel Logistic Regression

PubMed Central

Elbashir, Murtada Khalafallah; Sheng, Yu; Wang, Jianxin; Wu, FangXiang; Li, Min

2013-01-01

A β-turn is a secondary protein structure type that plays a significant role in protein configuration and function. On average 25% of amino acids in protein structures are located in β-turns. It is very important to develope an accurate and efficient method for β-turns prediction. Most of the current successful β-turns prediction methods use support vector machines (SVMs) or neural networks (NNs). The kernel logistic regression (KLR) is a powerful classification technique that has been applied successfully in many classification problems. However, it is often not found in β-turns classification, mainly because it is computationally expensive. In this paper, we used KLR to obtain sparse β-turns prediction in short evolution time. Secondary structure information and position-specific scoring matrices (PSSMs) are utilized as input features. We achieved Q total of 80.7% and MCC of 50% on BT426 dataset. These results show that KLR method with the right algorithm can yield performance equivalent to or even better than NNs and SVMs in β-turns prediction. In addition, KLR yields probabilistic outcome and has a well-defined extension to multiclass case. PMID:23509793

Predicting β-turns in protein using kernel logistic regression.

PubMed

Elbashir, Murtada Khalafallah; Sheng, Yu; Wang, Jianxin; Wu, Fangxiang; Li, Min

2013-01-01

A β-turn is a secondary protein structure type that plays a significant role in protein configuration and function. On average 25% of amino acids in protein structures are located in β-turns. It is very important to develope an accurate and efficient method for β-turns prediction. Most of the current successful β-turns prediction methods use support vector machines (SVMs) or neural networks (NNs). The kernel logistic regression (KLR) is a powerful classification technique that has been applied successfully in many classification problems. However, it is often not found in β-turns classification, mainly because it is computationally expensive. In this paper, we used KLR to obtain sparse β-turns prediction in short evolution time. Secondary structure information and position-specific scoring matrices (PSSMs) are utilized as input features. We achieved Q total of 80.7% and MCC of 50% on BT426 dataset. These results show that KLR method with the right algorithm can yield performance equivalent to or even better than NNs and SVMs in β-turns prediction. In addition, KLR yields probabilistic outcome and has a well-defined extension to multiclass case.
Elucidating dynamic metabolic physiology through network integration of quantitative time-course metabolomics

DOE PAGES

Bordbar, Aarash; Yurkovich, James T.; Paglia, Giuseppe; ...

2017-04-07

In this study, the increasing availability of metabolomics data necessitates novel methods for deeper data analysis and interpretation. We present a flux balance analysis method that allows for the computation of dynamic intracellular metabolic changes at the cellular scale through integration of time-course absolute quantitative metabolomics. This approach, termed “unsteady-state flux balance analysis” (uFBA), is applied to four cellular systems: three dynamic and one steady-state as a negative control. uFBA and FBA predictions are contrasted, and uFBA is found to be more accurate in predicting dynamic metabolic flux states for red blood cells, platelets, and Saccharomyces cerevisiae. Notably, only uFBAmore » predicts that stored red blood cells metabolize TCA intermediates to regenerate important cofactors, such as ATP, NADH, and NADPH. These pathway usage predictions were subsequently validated through 13C isotopic labeling and metabolic flux analysis in stored red blood cells. Utilizing time-course metabolomics data, uFBA provides an accurate method to predict metabolic physiology at the cellular scale for dynamic systems.« less
Elucidating dynamic metabolic physiology through network integration of quantitative time-course metabolomics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bordbar, Aarash; Yurkovich, James T.; Paglia, Giuseppe

In this study, the increasing availability of metabolomics data necessitates novel methods for deeper data analysis and interpretation. We present a flux balance analysis method that allows for the computation of dynamic intracellular metabolic changes at the cellular scale through integration of time-course absolute quantitative metabolomics. This approach, termed “unsteady-state flux balance analysis” (uFBA), is applied to four cellular systems: three dynamic and one steady-state as a negative control. uFBA and FBA predictions are contrasted, and uFBA is found to be more accurate in predicting dynamic metabolic flux states for red blood cells, platelets, and Saccharomyces cerevisiae. Notably, only uFBAmore » predicts that stored red blood cells metabolize TCA intermediates to regenerate important cofactors, such as ATP, NADH, and NADPH. These pathway usage predictions were subsequently validated through 13C isotopic labeling and metabolic flux analysis in stored red blood cells. Utilizing time-course metabolomics data, uFBA provides an accurate method to predict metabolic physiology at the cellular scale for dynamic systems.« less
Improving consensus contact prediction via server correlation reduction.

PubMed

Gao, Xin; Bu, Dongbo; Xu, Jinbo; Li, Ming

2009-05-06

Protein inter-residue contacts play a crucial role in the determination and prediction of protein structures. Previous studies on contact prediction indicate that although template-based consensus methods outperform sequence-based methods on targets with typical templates, such consensus methods perform poorly on new fold targets. However, we find out that even for new fold targets, the models generated by threading programs can contain many true contacts. The challenge is how to identify them. In this paper, we develop an integer linear programming model for consensus contact prediction. In contrast to the simple majority voting method assuming that all the individual servers are equally important and independent, the newly developed method evaluates their correlation by using maximum likelihood estimation and extracts independent latent servers from them by using principal component analysis. An integer linear programming method is then applied to assign a weight to each latent server to maximize the difference between true contacts and false ones. The proposed method is tested on the CASP7 data set. If the top L/5 predicted contacts are evaluated where L is the protein size, the average accuracy is 73%, which is much higher than that of any previously reported study. Moreover, if only the 15 new fold CASP7 targets are considered, our method achieves an average accuracy of 37%, which is much better than that of the majority voting method, SVM-LOMETS, SVM-SEQ, and SAM-T06. These methods demonstrate an average accuracy of 13.0%, 10.8%, 25.8% and 21.2%, respectively. Reducing server correlation and optimally combining independent latent servers show a significant improvement over the traditional consensus methods. This approach can hopefully provide a powerful tool for protein structure refinement and prediction use.
A Gaussian Processes Technique for Short-term Load Forecasting with Considerations of Uncertainty

NASA Astrophysics Data System (ADS)

Ohmi, Masataro; Mori, Hiroyuki

In this paper, an efficient method is proposed to deal with short-term load forecasting with the Gaussian Processes. Short-term load forecasting plays a key role to smooth power system operation such as economic load dispatching, unit commitment, etc. Recently, the deregulated and competitive power market increases the degree of uncertainty. As a result, it is more important to obtain better prediction results to save the cost. One of the most important aspects is that power system operator needs the upper and lower bounds of the predicted load to deal with the uncertainty while they require more accurate predicted values. The proposed method is based on the Bayes model in which output is expressed in a distribution rather than a point. To realize the model efficiently, this paper proposes the Gaussian Processes that consists of the Bayes linear model and kernel machine to obtain the distribution of the predicted value. The proposed method is successively applied to real data of daily maximum load forecasting.
Land Covers Classification Based on Random Forest Method Using Features from Full-Waveform LIDAR Data

NASA Astrophysics Data System (ADS)

Ma, L.; Zhou, M.; Li, C.

2017-09-01

In this study, a Random Forest (RF) based land covers classification method is presented to predict the types of land covers in Miyun area. The returned full-waveforms which were acquired by a LiteMapper 5600 airborne LiDAR system were processed, including waveform filtering, waveform decomposition and features extraction. The commonly used features that were distance, intensity, Full Width at Half Maximum (FWHM), skewness and kurtosis were extracted. These waveform features were used as attributes of training data for generating the RF prediction model. The RF prediction model was applied to predict the types of land covers in Miyun area as trees, buildings, farmland and ground. The classification results of these four types of land covers were obtained according to the ground truth information acquired from CCD image data of the same region. The RF classification results were compared with that of SVM method and show better results. The RF classification accuracy reached 89.73% and the classification Kappa was 0.8631.
Cox-nnet: An artificial neural network method for prognosis prediction of high-throughput omics data

PubMed Central

Ching, Travers; Zhu, Xun

2018-01-01

Artificial neural networks (ANN) are computing architectures with many interconnections of simple neural-inspired computing elements, and have been applied to biomedical fields such as imaging analysis and diagnosis. We have developed a new ANN framework called Cox-nnet to predict patient prognosis from high throughput transcriptomics data. In 10 TCGA RNA-Seq data sets, Cox-nnet achieves the same or better predictive accuracy compared to other methods, including Cox-proportional hazards regression (with LASSO, ridge, and mimimax concave penalty), Random Forests Survival and CoxBoost. Cox-nnet also reveals richer biological information, at both the pathway and gene levels. The outputs from the hidden layer node provide an alternative approach for survival-sensitive dimension reduction. In summary, we have developed a new method for accurate and efficient prognosis prediction on high throughput data, with functional biological insights. The source code is freely available at https://github.com/lanagarmire/cox-nnet. PMID:29634719
Effect of missing data on multitask prediction methods.

PubMed

de la Vega de León, Antonio; Chen, Beining; Gillet, Valerie J

2018-05-22

There has been a growing interest in multitask prediction in chemoinformatics, helped by the increasing use of deep neural networks in this field. This technique is applied to multitarget data sets, where compounds have been tested against different targets, with the aim of developing models to predict a profile of biological activities for a given compound. However, multitarget data sets tend to be sparse; i.e., not all compound-target combinations have experimental values. There has been little research on the effect of missing data on the performance of multitask methods. We have used two complete data sets to simulate sparseness by removing data from the training set. Different models to remove the data were compared. These sparse sets were used to train two different multitask methods, deep neural networks and Macau, which is a Bayesian probabilistic matrix factorization technique. Results from both methods were remarkably similar and showed that the performance decrease because of missing data is at first small before accelerating after large amounts of data are removed. This work provides a first approximation to assess how much data is required to produce good performance in multitask prediction exercises.
Predicting protein amidation sites by orchestrating amino acid sequence features

NASA Astrophysics Data System (ADS)

Zhao, Shuqiu; Yu, Hua; Gong, Xiujun

2017-08-01

Amidation is the fourth major category of post-translational modifications, which plays an important role in physiological and pathological processes. Identifying amidation sites can help us understanding the amidation and recognizing the original reason of many kinds of diseases. But the traditional experimental methods for predicting amidation sites are often time-consuming and expensive. In this study, we propose a computational method for predicting amidation sites by orchestrating amino acid sequence features. Three kinds of feature extraction methods are used to build a feature vector enabling to capture not only the physicochemical properties but also position related information of the amino acids. An extremely randomized trees algorithm is applied to choose the optimal features to remove redundancy and dependence among components of the feature vector by a supervised fashion. Finally the support vector machine classifier is used to label the amidation sites. When tested on an independent data set, it shows that the proposed method performs better than all the previous ones with the prediction accuracy of 0.962 at the Matthew's correlation coefficient of 0.89 and area under curve of 0.964.
Modeling ready biodegradability of fragrance materials.

PubMed

Ceriani, Lidia; Papa, Ester; Kovarich, Simona; Boethling, Robert; Gramatica, Paola

2015-06-01

In the present study, quantitative structure activity relationships were developed for predicting ready biodegradability of approximately 200 heterogeneous fragrance materials. Two classification methods, classification and regression tree (CART) and k-nearest neighbors (kNN), were applied to perform the modeling. The models were validated with multiple external prediction sets, and the structural applicability domain was verified by the leverage approach. The best models had good sensitivity (internal ≥80%; external ≥68%), specificity (internal ≥80%; external 73%), and overall accuracy (≥75%). Results from the comparison with BIOWIN global models, based on group contribution method, show that specific models developed in the present study perform better in prediction than BIOWIN6, in particular for the correct classification of not readily biodegradable fragrance materials. © 2015 SETAC.
Phase Transition between Black and Blue Phosphorenes: A Quantum Monte Carlo Study

NASA Astrophysics Data System (ADS)

Li, Lesheng; Yao, Yi; Reeves, Kyle; Kanai, Yosuke

Phase transition of the more common black phosphorene to blue phosphorene is of great interest because they are predicted to exhibit unique electronic and optical properties. However, these two phases are predicted to be separated by a rather large energy barrier. In this work, we study the transition pathway between black and blue phosphorenes by using the variable cell nudge elastic band method combined with density functional theory calculation. We show how diffusion quantum Monte Carlo method can be used for determining the energetics of the phase transition and demonstrate the use of two approaches for removing finite-size errors. Finally, we predict how applied stress can be used to control the energetic balance between these two different phases of phosphorene.
Analysis of concrete beams using applied element method

NASA Astrophysics Data System (ADS)

Lincy Christy, D.; Madhavan Pillai, T. M.; Nagarajan, Praveen

2018-03-01

The Applied Element Method (AEM) is a displacement based method of structural analysis. Some of its features are similar to that of Finite Element Method (FEM). In AEM, the structure is analysed by dividing it into several elements similar to FEM. But, in AEM, elements are connected by springs instead of nodes as in the case of FEM. In this paper, background to AEM is discussed and necessary equations are derived. For illustrating the application of AEM, it has been used to analyse plain concrete beam of fixed support condition. The analysis is limited to the analysis of 2-dimensional structures. It was found that the number of springs has no much influence on the results. AEM could predict deflection and reactions with reasonable degree of accuracy.
Pdf prediction of supersonic hydrogen flames

NASA Technical Reports Server (NTRS)

Eifler, P.; Kollmann, W.

1993-01-01

A hybrid method for the prediction of supersonic turbulent flows with combustion is developed consisting of a second order closure for the velocity field and a multi-scalar pdf method for the local thermodynamic state. It is shown that for non-premixed flames and chemical equilibrium mixture fraction, the logarithm of the (dimensionless) density, internal energy per unit mass and the divergence of the velocity have several advantages over other sets of scalars. The closure model is applied to a supersonic non-premixed flame burning hydrogen with air supplied by a supersonic coflow and the results are compared with a limited set of experimental data.
Higher Order Corrections in the CoLoRFulNNLO Framework

NASA Astrophysics Data System (ADS)

Somogyi, G.; Kardos, A.; Szőr, Z.; Trócsányi, Z.

We discuss the CoLoRFulNNLO method for computing higher order radiative corrections to jet cross sections in perturbative QCD. We apply our method to the calculation of event shapes and jet rates in three-jet production in electron-positron annihilation. We validate our code by comparing our predictions to previous results in the literature and present the jet cone energy fraction distribution at NNLO accuracy. We also present preliminary NNLO results for the three-jet rate using the Durham jet clustering algorithm matched to resummed predictions at NLL accuracy, and a comparison to LEP data.
Head-target tracking control of well drilling

NASA Astrophysics Data System (ADS)

Agzamov, Z. V.

2018-05-01

The method of directional drilling trajectory control for oil and gas wells using predictive models is considered in the paper. The developed method does not apply optimization and therefore there is no need for the high-performance computing. Nevertheless, it allows following the well-plan with high precision taking into account process input saturation. Controller output is calculated both from the present target reference point of the well-plan and from well trajectory prediction with using the analytical model. This method allows following a well-plan not only on angular, but also on the Cartesian coordinates. Simulation of the control system has confirmed the high precision and operation performance with a wide range of random disturbance action.
An evaluation of HEMT potential for millimeter-wave signal sources using interpolation and harmonic balance techniques

NASA Technical Reports Server (NTRS)

Kwon, Youngwoo; Pavlidis, Dimitris; Tutt, Marcel N.

1991-01-01

A large-signal analysis method based on an harmonic balance technique and a 2-D cubic spline interpolation function has been developed and applied to the prediction of InP-based HEMT oscillator performance for frequencies extending up to the submillimeter-wave range. The large-signal analysis method uses a limited number of DC and small-signal S-parameter data and allows the accurate characterization of HEMT large-signal behavior. The method has been validated experimentally using load-pull measurement. Oscillation frequency, power performance, and load requirements are discussed, with an operation capability of 300 GHz predicted using state-of-the-art devices (fmax is approximately equal to 450 GHz).
Dynamic and thermal analysis of high speed tapered roller bearings under combined loading

NASA Technical Reports Server (NTRS)

Crecelius, W. J.; Milke, D. R.

1973-01-01

The development of a computer program capable of predicting the thermal and kinetic performance of high-speed tapered roller bearings operating with fluid lubrication under applied axial, radial and moment loading (five degrees of freedom) is detailed. Various methods of applying lubrication can be considered as well as changes in bearing internal geometry which occur as the bearing is brought to operating speeds, loads and temperatures.
Properties of a Formal Method for Prediction of Emergent Behaviors in Swarm-based Systems

NASA Technical Reports Server (NTRS)

Rouff, Christopher; Vanderbilt, Amy; Hinchey, Mike; Truszkowski, Walt; Rash, James

2004-01-01

Autonomous intelligent swarms of satellites are being proposed for NASA missions that have complex behaviors and interactions. The emergent properties of swarms make these missions powerful, but at the same time more difficult to design and assure that proper behaviors will emerge. This paper gives the results of research into formal methods techniques for verification and validation of NASA swarm-based missions. Multiple formal methods were evaluated to determine their effectiveness in modeling and assuring the behavior of swarms of spacecraft. The NASA ANTS mission was used as an example of swarm intelligence for which to apply the formal methods. This paper will give the evaluation of these formal methods and give partial specifications of the ANTS mission using four selected methods. We then give an evaluation of the methods and the needed properties of a formal method for effective specification and prediction of emergent behavior in swarm-based systems.
New approach to predict photoallergic potentials of chemicals based on murine local lymph node assay.

PubMed

Maeda, Yosuke; Hirosaki, Haruka; Yamanaka, Hidenori; Takeyoshi, Masahiro

2018-05-23

Photoallergic dermatitis, caused by pharmaceuticals and other consumer products, is a very important issue in human health. However, S10 guidelines of the International Conference on Harmonization do not recommend the existing prediction methods for photoallergy because of their low predictability in human cases. We applied local lymph node assay (LLNA), a reliable, quantitative skin sensitization prediction test, to develop a new photoallergy prediction method. This method involves a three-step approach: (1) ultraviolet (UV) absorption analysis; (2) determination of no observed adverse effect level for skin phototoxicity based on LLNA; and (3) photoallergy evaluation based on LLNA. Photoallergic potential of chemicals was evaluated by comparing lymph node cell proliferation among groups treated with chemicals with minimal effect levels of skin sensitization and skin phototoxicity under UV irradiation (UV+) or non-UV irradiation (UV-). A case showing significant difference (P < .05) in lymph node cell proliferation rates between UV- and UV+ groups was considered positive for photoallergic reaction. After testing 13 chemicals, seven human photoallergens tested positive and the other six, with no evidence of causing photoallergic dermatitis or UV absorption, tested negative. Among these chemicals, both doxycycline hydrochloride and minocycline hydrochloride were tetracycline antibiotics with different photoallergic properties, and the new method clearly distinguished between the photoallergic properties of these chemicals. These findings suggested high predictability of our method; therefore, it is promising and effective in predicting human photoallergens. Copyright © 2018 John Wiley & Sons, Ltd.
PSSP-RFE: accurate prediction of protein structural class by recursive feature extraction from PSI-BLAST profile, physical-chemical property and functional annotations.

PubMed

Li, Liqi; Cui, Xiang; Yu, Sanjiu; Zhang, Yuan; Luo, Zhong; Yang, Hua; Zhou, Yue; Zheng, Xiaoqi

2014-01-01

Protein structure prediction is critical to functional annotation of the massively accumulated biological sequences, which prompts an imperative need for the development of high-throughput technologies. As a first and key step in protein structure prediction, protein structural class prediction becomes an increasingly challenging task. Amongst most homological-based approaches, the accuracies of protein structural class prediction are sufficiently high for high similarity datasets, but still far from being satisfactory for low similarity datasets, i.e., below 40% in pairwise sequence similarity. Therefore, we present a novel method for accurate and reliable protein structural class prediction for both high and low similarity datasets. This method is based on Support Vector Machine (SVM) in conjunction with integrated features from position-specific score matrix (PSSM), PROFEAT and Gene Ontology (GO). A feature selection approach, SVM-RFE, is also used to rank the integrated feature vectors through recursively removing the feature with the lowest ranking score. The definitive top features selected by SVM-RFE are input into the SVM engines to predict the structural class of a query protein. To validate our method, jackknife tests were applied to seven widely used benchmark datasets, reaching overall accuracies between 84.61% and 99.79%, which are significantly higher than those achieved by state-of-the-art tools. These results suggest that our method could serve as an accurate and cost-effective alternative to existing methods in protein structural classification, especially for low similarity datasets.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.